Download - Train Your Dog

Transcript
Page 1: Train Your Dog

Learning from how dogs learnLearning from how dogs learn

Prof. Bruce BlumbergProf. Bruce Blumberg

The Media Lab, MITThe Media Lab, MIT

[email protected]@media.mit.edu

www.media.mit.edu/~brucewww.media.mit.edu/~bruce

Page 2: Train Your Dog

About me…

Page 3: Train Your Dog

About me…

Page 4: Train Your Dog

Practical & compelling real-time learning

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

Page 5: Train Your Dog

My bias & focus

• Learning occurs within an innate structure Learning occurs within an innate structure that biases…that biases…• Attention

• Motivation

• Innate frequency, form and organization of behavior

• When certain things are most easily learned

• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?

Page 6: Train Your Dog

sheep|dog:trial by eire

See sheep|dog video on my website

Page 7: Train Your Dog

Object persistence

See object persistence video on my website

Page 8: Train Your Dog

Temporal representation

See temporal representation (aka Goatzilla) video on my website

Page 9: Train Your Dog

Alpha Wolf

See alpha wolf video on my website

Page 10: Train Your Dog

Rover@home

See rover@home video on my website or go to Scientific American Frontiers website

Page 11: Train Your Dog

Dobie T. Coyote Goes to School

See Dobie video on my website

Page 12: Train Your Dog

Why look at Dog Training?

• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous

and far too big to search exhaustively

• To be compelling characters must

• Learn “obvious” contingencies between state, actions and consequences quickly

• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.

• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily

Page 13: Train Your Dog

Invaluable resources

• Doing it, and talking to people who do Doing it, and talking to people who do it.it.

• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez

• Lindsay, Burch & Bailey, MackintoshLindsay, Burch & Bailey, Mackintosh

• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger

Page 14: Train Your Dog

The problem facing dogs (real and synthetic)

Set of all possible actions

Set of all motivational

goals

Set of all possible stimuli

What do I do, when, in order to best satisfy my motivational goals?

Page 15: Train Your Dog

The space of possible stimuli is wicked big

Set of all possible stimuli

SmellsMotion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Time of Occurence

State Space

Page 16: Train Your Dog

The space of possible actions is also very big

Set of all possible actions

Action

Time of Performance

Figure -8

Shake

Low shake

High -5

Beg

Down

Left ear twitch

Action Space

Page 17: Train Your Dog

Who gets credit for good things happening?

Yumm..

Action

Figure -8

Shake

Low shake

High -5

Beg

Down

Left ear twitch

Motion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Page 18: Train Your Dog

Who gets credit for good things happening?

stalk grab-bite

eye

orient

kill-bitechase

Yumm..

Time

Page 19: Train Your Dog

Conventional idea: back propagation from goal

stalk grab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

Page 20: Train Your Dog

Conventional idea: back propagation from goal

stalk grab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

Page 21: Train Your Dog

Conventional idea: back propagation from goal

stalk grab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

Page 22: Train Your Dog

The problem

• If each element in sequence has 3 variants, If each element in sequence has 3 variants, there are 729 possible combinations of there are 729 possible combinations of which 1 may work (ignoring stimuli)which 1 may work (ignoring stimuli)

• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.stimuli-action pairs to explore.

• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached

• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?

Page 23: Train Your Dog

Leyhausen’s suggestion…

stalk grab-bite

eye

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

Page 24: Train Your Dog

Leyhausen’s suggestion…

stalk grab-bite

eye

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

Page 25: Train Your Dog

Coppinger’s suggestion…

stalk grab-bite

eye

orient

kill-bitechase

Time Varying innate tendency to follow behavior with “next” in sequence

Page 26: Train Your Dog

Functional goal plays incidental role

stalk grab-bite

eye

orient

kill-bitechase

Time Propagated value from functional goal plays incidental role

Yumm..

Page 27: Train Your Dog

Big idea: innate biases make learning possible

• Biases include…Biases include…• Temporal Proximity implies causality

• Attend more readily to certain classes of stimuli than to others (motion vs. speech)

• Lazy discovery (pay attention once you have a reason to pay attention)

• Elements may be “innately” self-motivating and have local metric of “goodness”

Page 28: Train Your Dog

Good trainers actively guide dog’s exploration

• BehavioralBehavioral• Train behavior, then cue

• Differential rewards encourage variability

• MotorMotor• Shaping

• Rewarding successive approximations

• Luring

• Pose, e.g. “down”• Trajectory, e.g. “figure-8”

Page 29: Train Your Dog

Dogs constrain search for causal agents

Time

Consequences Window:Trainer “clicks” signaling reward is coming.

When reward is actually received

Attention Window:Cue given immediately before or as dog is moving into desired pose

Sit Approach Eat

Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

Page 30: Train Your Dog

Dogs use implicit feedback to guide perceptual learning

Sit

Time

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

Build & update perceptual model of “sit-utterance”

Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

Page 31: Train Your Dog

Dogs give credit where credit is due…

• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose

• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously

• ImplicationImplication• Dog associates reward with resulting

body configuration or trajectory and not just with “follow-your nose”

Page 32: Train Your Dog

Observation: dogs give credit where credit is due

Sit

Time

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-

utterance”

Page 33: Train Your Dog

D.L.: Take Advantage of Predictable Regularities

• Constrain search for causal agents by taking Constrain search for causal agents by taking advantage of temporal proximity & natural advantage of temporal proximity & natural hierarchy of state spaceshierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

Page 34: Train Your Dog

D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for

identifying identifying • Promising state space and action space to

explore

• Good examples from which to construct perceptual models, e.g.,

• A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

Page 35: Train Your Dog

D.L.: Make Them Easy to Train

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping• Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit assignmentTrainer friendly” credit assignment• Assign credit to candidate that matches

trainer’s expectation

Page 36: Train Your Dog

The System

Page 37: Train Your Dog

Dobie T. Coyote…

See dobie video on my website

Page 38: Train Your Dog

Limitations and Future Work

• Important extensions Important extensions • Other kinds of learning (e.g., social or spatial)

• Generalization

• Sequences

• Expectation-based emotion system

• How will the system scale?How will the system scale?

Page 39: Train Your Dog

Useful Insights

• UseUse• Temporal proximity to limit search.

• Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

Page 40: Train Your Dog

Acknowledgements

• Members of the Synthetic Characters Members of the Synthetic Characters Group, past, present & futureGroup, past, present & future

• Gary WilkesGary Wilkes

• Funded by the Digital Life ConsortiumFunded by the Digital Life Consortium


Top Related