train your dog

Download Train Your Dog

Post on 08-Jul-2015

209 views

Category:

Lifestyle

0 download

Embed Size (px)

DESCRIPTION

Informational slide that shows how to train your dog properly and what is needed in order to succeed in your dog training.

TRANSCRIPT

  • Learning from how dogs learnProf. Bruce BlumbergThe Media Lab, MITbruce@media.mit.eduwww.media.mit.edu/~bruce

  • About me

  • About me

  • Practical & compelling real-time learningEasy for interactive characters to learn what they ought to be able to learnEasy for a human trainer to guide learning processA compelling user experienceProvide heuristics and practical design principles

  • My bias & focusLearning occurs within an innate structure that biasesAttentionMotivationInnate frequency, form and organization of behaviorWhen certain things are most easily learned What are the catalytic components of the scaffolding that make learning possible?

  • sheep|dog:trial by eireSee sheep|dog video on my website

  • Object persistenceSee object persistence video on my website

  • Temporal representationSee temporal representation (aka Goatzilla) video on my website

  • Alpha WolfSee alpha wolf video on my website

  • Rover@homeSee rover@home video on my website or go to Scientific American Frontiers website

  • Dobie T. Coyote Goes to SchoolSee Dobie video on my website

  • Why look at Dog Training?Interactive characters pose unique challenges:State, action and state-action spaces are often continuous and far too big to search exhaustivelyTo be compelling characters mustLearn obvious contingencies between state, actions and consequences quicklyEasy to train without visibility into internal state of character.Learning is only one thing they have to do.Dogs and their trainers seem to solve these problems easily

  • Invaluable resourcesDoing it, and talking to people who do it.Wilkes, Pryor, RamirezLindsay, Burch & Bailey, MackintoshLorenz, Leyhausen, Coppinger & Coppinger

  • The problem facing dogs (real and synthetic)Set of all possible actionsSet of all motivational goalsSet of all possible stimuliWhat do I do, when, in order to best satisfy my motivational goals?

  • The space of possible stimuli is wicked bigTime of OccurenceState Space

  • The space of possible actions is also very bigSet of all possible actionsActionTime of PerformanceAction Space

  • Who gets credit for good things happening?Yumm..ActionFigure -8ShakeHigh -5BegDownLeft ear twitchModality of Stimuli

  • Who gets credit for good things happening?Yumm..Time

  • Conventional idea: back propagation from goalstalkgrab-biteeyeorientkill-bitechaseYumm..TimeCredit flows backward

  • Conventional idea: back propagation from goalstalkgrab-biteeyeorientkill-bitechaseYumm..TimeCredit flows backward

  • Conventional idea: back propagation from goalstalkgrab-biteeyeorientkill-bitechaseYumm..TimeCredit flows backward

  • The problemIf each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli)If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.Dont know if it is the right sequence until goal is reachedWhat happens if variant needs to be learned?

  • Leyhausens suggestionstalkgrab-biteeyeorientkill-bitechaseTimeEach element is innately self-motivating and has innate reward metricmotivation & rewardmotivation & rewardmotivation & rewardmotivation & rewardmotivation & rewardmotivation & reward

  • Leyhausens suggestionstalkgrab-biteeyeorientkill-bitechaseTimeEach element is innately self-motivating and has innate reward metricmotivation & rewardmotivation & rewardmotivation & rewardmotivation & rewardmotivation & rewardmotivation & reward

  • Coppingers suggestionstalkgrab-biteeyeorientkill-bitechaseTimeVarying innate tendency to follow behavior with next in sequence

  • Functional goal plays incidental rolestalkgrab-biteeyeorientkill-bitechaseTimePropagated value from functional goal plays incidental roleYumm..

  • Big idea: innate biases make learning possible Biases includeTemporal Proximity implies causalityAttend more readily to certain classes of stimuli than to others (motion vs. speech)Lazy discovery (pay attention once you have a reason to pay attention)Elements may be innately self-motivating and have local metric of goodness

  • Good trainers actively guide dogs explorationBehavioralTrain behavior, then cueDifferential rewards encourage variabilityMotorShapingRewarding successive approximationsLuringPose, e.g. downTrajectory, e.g. figure-8

  • Dogs constrain search for causal agentsTimeConsequences Window:Trainer clicks signaling reward is coming. When reward is actually receivedAttention Window:Cue given immediately before or as dog is moving into desired poseSitApproachEatDogs make the problem tractable by constraining search for causal agents to narrow temporal windows

  • Dogs use implicit feedback to guide perceptual learningSitTimesit-utterance perceived. ApproachEatclick perceived. Dog decides to sit Build & update perceptual model of sit-utterance Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

  • Dogs give credit where credit is dueTrainer repeatedly lures dog through a trajectory or into a pose Eventually, dog performs behavior spontaneouslyImplicationDog associates reward with resulting body configuration or trajectory and not just with follow-your nose

  • Observation: dogs give credit where credit is dueSitTimesit-utterance perceived. ApproachEatclick perceived. Dog decides to sit Credit sitting in presence of sit-utteranceBuild & update perceptual model of sit-utterance

  • D.L.: Take Advantage of Predictable RegularitiesConstrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spacesUse consequences to bias choice of action But vary performance and attend to differencesExplore state and action spaces on as-needed basisBuild models on demand

  • D.L.: Make Use of All Feedback: Explicit & ImplicitUse rewarded action as context for identifying Promising state space and action space to exploreGood examples from which to construct perceptual models, e.g.,A good example of a sit-utterance is one that occurs within the context of a rewarded Sit.

  • D.L.: Make Them Easy to TrainRespond quickly to obvious contingenciesSupport Luring and ShapingTechniques to prompt infrequently expressed or novel motor actions Trainer friendly credit assignmentAssign credit to candidate that matches trainers expectation

  • The System

  • Dobie T. CoyoteSee dobie video on my website

  • Limitations and Future WorkImportant extensions Other kinds of learning (e.g., social or spatial)GeneralizationSequencesExpectation-based emotion systemHow will the system scale?

  • Useful InsightsUseTemporal proximity to limit search.Hierarchical representations of state, action and state-action space & use implicit feedback to guide explorationtrainer friendly credit assignmentLuring and shaping are essential

  • AcknowledgementsMembers of the Synthetic Characters Group, past, present & futureGary WilkesFunded by the Digital Life Consortium