train your dog

Learning from how dogs learnLearning from how dogs learn

Prof. Bruce BlumbergProf. Bruce Blumberg

The Media Lab, MITThe Media Lab, MIT

bruce@media.mit.edubruce@media.mit.edu

www.media.mit.edu/~brucewww.media.mit.edu/~bruce

About me…

Practical & compelling real-time learning

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

My bias & focus

• Learning occurs within an innate structure Learning occurs within an innate structure that biases…that biases…• Attention

• Motivation

• Innate frequency, form and organization of behavior

• When certain things are most easily learned

• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?

sheep|dog:trial by eire

See sheep|dog video on my website

Object persistence

See object persistence video on my website

Temporal representation

See temporal representation (aka Goatzilla) video on my website

Alpha Wolf

See alpha wolf video on my website

Rover@home

See rover@home video on my website or go to Scientific American Frontiers website

Dobie T. Coyote Goes to School

See Dobie video on my website

Why look at Dog Training?

• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous

and far too big to search exhaustively

• To be compelling characters must

• Learn “obvious” contingencies between state, actions and consequences quickly

• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.

• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily

Invaluable resources

• Doing it, and talking to people who do Doing it, and talking to people who do it.it.

• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez

• Lindsay, Burch & Bailey, MackintoshLindsay, Burch & Bailey, Mackintosh

• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger

The problem facing dogs (real and synthetic)

Set of all possible actions

Set of all motivational

Set of all possible stimuli

What do I do, when, in order to best satisfy my motivational goals?

The space of possible stimuli is wicked big

Set of all possible stimuli

SmellsMotion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Time of Occurence

State Space

The space of possible actions is also very big

Set of all possible actions

Action

Time of Performance

Figure -8

Low shake

High -5

Left ear twitch

Action Space

Who gets credit for good things happening?

Yumm..

Action

Figure -8

Low shake

High -5

Left ear twitch

Motion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Who gets credit for good things happening?

stalk grab-bite

orient

kill-bitechase

Yumm..

Conventional idea: back propagation from goal

stalk grab-bite

orient

kill-bitechase

Yumm..

Time Credit flows backward

stalk grab-bite

orient

kill-bitechase

Yumm..

stalk grab-bite

orient

kill-bitechase

Yumm..

The problem

• If each element in sequence has 3 variants, If each element in sequence has 3 variants, there are 729 possible combinations of there are 729 possible combinations of which 1 may work (ignoring stimuli)which 1 may work (ignoring stimuli)

• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.stimuli-action pairs to explore.

• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached

• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?

Leyhausen’s suggestion…

stalk grab-bite

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

Leyhausen’s suggestion…

stalk grab-bite

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

Coppinger’s suggestion…

stalk grab-bite

orient

kill-bitechase

Time Varying innate tendency to follow behavior with “next” in sequence

Functional goal plays incidental role

stalk grab-bite

orient

kill-bitechase

Time Propagated value from functional goal plays incidental role

Yumm..

Big idea: innate biases make learning possible

• Biases include…Biases include…• Temporal Proximity implies causality

• Attend more readily to certain classes of stimuli than to others (motion vs. speech)

• Lazy discovery (pay attention once you have a reason to pay attention)

• Elements may be “innately” self-motivating and have local metric of “goodness”

Good trainers actively guide dog’s exploration

• BehavioralBehavioral• Train behavior, then cue

• Differential rewards encourage variability

• MotorMotor• Shaping

• Rewarding successive approximations

• Luring

• Pose, e.g. “down”• Trajectory, e.g. “figure-8”

Dogs constrain search for causal agents

Consequences Window:Trainer “clicks” signaling reward is coming.

When reward is actually received

Attention Window:Cue given immediately before or as dog is moving into desired pose

Sit Approach Eat

Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

Dogs use implicit feedback to guide perceptual learning

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

Build & update perceptual model of “sit-utterance”

Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

Dogs give credit where credit is due…

• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose

• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously

• ImplicationImplication• Dog associates reward with resulting

body configuration or trajectory and not just with “follow-your nose”

Observation: dogs give credit where credit is due

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-

utterance”

D.L.: Take Advantage of Predictable Regularities

• Constrain search for causal agents by taking Constrain search for causal agents by taking advantage of temporal proximity & natural advantage of temporal proximity & natural hierarchy of state spaceshierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for

identifying identifying • Promising state space and action space to

explore

• Good examples from which to construct perceptual models, e.g.,

• A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

D.L.: Make Them Easy to Train

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping• Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit assignmentTrainer friendly” credit assignment• Assign credit to candidate that matches

trainer’s expectation

The System

Dobie T. Coyote…

See dobie video on my website

Limitations and Future Work

• Important extensions Important extensions • Other kinds of learning (e.g., social or spatial)

• Generalization

• Sequences

• Expectation-based emotion system

• How will the system scale?How will the system scale?

Useful Insights

• UseUse• Temporal proximity to limit search.

• Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

Acknowledgements

• Members of the Synthetic Characters Members of the Synthetic Characters Group, past, present & futureGroup, past, present & future

• Gary WilkesGary Wilkes

• Funded by the Digital Life ConsortiumFunded by the Digital Life Consortium

train your dog

learning possible

possible actionsset

possible stimuliwhat

space of possible stimuli

home video

eiresee sheepdog video

schoolsee dobie video

stateaction spaces

Lifestyle

what equipment do you need? train your dog to scooter

4-h dog project resource guide 2010€¦ · why train your...

the national dog training academy dog training, … ·...

you will not believe how easy it is to train your dog when...

mongolian bankhar dog project€¦ · web viewlet your...

· 2020-02-22 · "dog training secrets" within 6 days,...

may 2017 - wordpress.com · 2017-06-16 · don’t shoot...

positive dog training secrets -...

foster program application - dupage county animal care...

why train a dog ?

how to train a dog

the dalmatian gazette - petsitusa.com · train your dog to...

dog training collars train your dog at home

clix - bitiba · clicker training is a fun and exciting way...

dog love to train

why should i train my dog

a train & t dog & b smooth

using predictable rewards to train your...

dog house training - how to house train your dog

top 3 reasons why you should train your dog