train your dog
Post on 08-Jul-2015
219 Views
Preview:
DESCRIPTION
TRANSCRIPT
Learning from how dogs learnLearning from how dogs learn
Prof. Bruce BlumbergProf. Bruce Blumberg
The Media Lab, MITThe Media Lab, MIT
bruce@media.mit.edubruce@media.mit.edu
www.media.mit.edu/~brucewww.media.mit.edu/~bruce
About me…
About me…
Practical & compelling real-time learning
• Easy for interactive characters to learn what they ought to be able to learn
• Easy for a human trainer to guide learning process
• A compelling user experience
• Provide heuristics and practical design principles
My bias & focus
• Learning occurs within an innate structure Learning occurs within an innate structure that biases…that biases…• Attention
• Motivation
• Innate frequency, form and organization of behavior
• When certain things are most easily learned
• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?
sheep|dog:trial by eire
See sheep|dog video on my website
Object persistence
See object persistence video on my website
Temporal representation
See temporal representation (aka Goatzilla) video on my website
Alpha Wolf
See alpha wolf video on my website
Rover@home
See rover@home video on my website or go to Scientific American Frontiers website
Dobie T. Coyote Goes to School
See Dobie video on my website
Why look at Dog Training?
• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous
and far too big to search exhaustively
• To be compelling characters must
• Learn “obvious” contingencies between state, actions and consequences quickly
• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.
• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily
Invaluable resources
• Doing it, and talking to people who do Doing it, and talking to people who do it.it.
• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez
• Lindsay, Burch & Bailey, MackintoshLindsay, Burch & Bailey, Mackintosh
• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger
The problem facing dogs (real and synthetic)
Set of all possible actions
Set of all motivational
goals
Set of all possible stimuli
What do I do, when, in order to best satisfy my motivational goals?
The space of possible stimuli is wicked big
Set of all possible stimuli
SmellsMotion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
Time of Occurence
State Space
The space of possible actions is also very big
Set of all possible actions
Action
Time of Performance
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Action Space
Who gets credit for good things happening?
Yumm..
Action
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Motion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
Who gets credit for good things happening?
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
The problem
• If each element in sequence has 3 variants, If each element in sequence has 3 variants, there are 729 possible combinations of there are 729 possible combinations of which 1 may work (ignoring stimuli)which 1 may work (ignoring stimuli)
• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.stimuli-action pairs to explore.
• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached
• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?
Leyhausen’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
Leyhausen’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
Coppinger’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Varying innate tendency to follow behavior with “next” in sequence
Functional goal plays incidental role
stalk grab-bite
eye
orient
kill-bitechase
Time Propagated value from functional goal plays incidental role
Yumm..
Big idea: innate biases make learning possible
• Biases include…Biases include…• Temporal Proximity implies causality
• Attend more readily to certain classes of stimuli than to others (motion vs. speech)
• Lazy discovery (pay attention once you have a reason to pay attention)
• Elements may be “innately” self-motivating and have local metric of “goodness”
Good trainers actively guide dog’s exploration
• BehavioralBehavioral• Train behavior, then cue
• Differential rewards encourage variability
• MotorMotor• Shaping
• Rewarding successive approximations
• Luring
• Pose, e.g. “down”• Trajectory, e.g. “figure-8”
Dogs constrain search for causal agents
Time
Consequences Window:Trainer “clicks” signaling reward is coming.
When reward is actually received
Attention Window:Cue given immediately before or as dog is moving into desired pose
Sit Approach Eat
Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows
Dogs use implicit feedback to guide perceptual learning
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
Build & update perceptual model of “sit-utterance”
Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models
Dogs give credit where credit is due…
• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose
• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously
• ImplicationImplication• Dog associates reward with resulting
body configuration or trajectory and not just with “follow-your nose”
Observation: dogs give credit where credit is due
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-
utterance”
D.L.: Take Advantage of Predictable Regularities
• Constrain search for causal agents by taking Constrain search for causal agents by taking advantage of temporal proximity & natural advantage of temporal proximity & natural hierarchy of state spaceshierarchy of state spaces• Use consequences to bias choice of action
• But vary performance and attend to differences
• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand
D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for
identifying identifying • Promising state space and action space to
explore
• Good examples from which to construct perceptual models, e.g.,
• A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.
D.L.: Make Them Easy to Train
• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies
• Support Luring and ShapingSupport Luring and Shaping• Techniques to prompt infrequently expressed
or novel motor actions
• ““Trainer friendly” credit assignmentTrainer friendly” credit assignment• Assign credit to candidate that matches
trainer’s expectation
The System
Dobie T. Coyote…
See dobie video on my website
Limitations and Future Work
• Important extensions Important extensions • Other kinds of learning (e.g., social or spatial)
• Generalization
• Sequences
• Expectation-based emotion system
• How will the system scale?How will the system scale?
Useful Insights
• UseUse• Temporal proximity to limit search.
• Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration
• “trainer friendly” credit assignment
• Luring and shaping are essentialLuring and shaping are essential
Acknowledgements
• Members of the Synthetic Characters Members of the Synthetic Characters Group, past, present & futureGroup, past, present & future
• Gary WilkesGary Wilkes
• Funded by the Digital Life ConsortiumFunded by the Digital Life Consortium
top related