evolving agent behavior in multiobjective domains using fitness-based shaping

26
Evolving Agent Behavior Evolving Agent Behavior in Multiobjective in Multiobjective Domains Using Fitness- Domains Using Fitness- Based Shaping Based Shaping Jacob Schrum and Risto Jacob Schrum and Risto Miikkulainen Miikkulainen University of Texas at Austin University of Texas at Austin Department of Computer Science Department of Computer Science

Upload: caleb-kim

Post on 03-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping. Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science. Typical Uses of MOEAs. Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolving Agent Behavior in Evolving Agent Behavior in Multiobjective Domains Using Multiobjective Domains Using

Fitness-Based ShapingFitness-Based Shaping

Jacob Schrum and Risto Jacob Schrum and Risto MiikkulainenMiikkulainen

University of Texas at AustinUniversity of Texas at Austin

Department of Computer ScienceDepartment of Computer Science

Page 2: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Typical Uses of MOEAsTypical Uses of MOEAs Where have MOEAs proven themselves?Where have MOEAs proven themselves?

Wireless Sensor Networks (Woehrle et al, 2010)Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009)Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006)Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004)Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004)Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003)Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001)Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997)Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995)Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, Design of laminated ceramic composites (Belegundu et al,

1994)1994)

Many engineering/design problems!Many engineering/design problems!

Page 3: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

New Domains for MOEAsNew Domains for MOEAs Simulated agents often face multiple Simulated agents often face multiple

objectivesobjectives Automatic discovery of intelligent behaviorAutomatic discovery of intelligent behavior

Video game opponents in Unreal Video game opponents in Unreal Tournament (van Hoorn, 2009) Tournament (van Hoorn, 2009)

Predator/prey scenarios Predator/prey scenarios (Schrum & Miikkulainen 2009)(Schrum & Miikkulainen 2009)

Race car driving in TORCS Race car driving in TORCS (Agapitos et al, 2008) (Agapitos et al, 2008)

Comparatively little so farComparatively little so far Direct application of MOEA seldom successfulDirect application of MOEA seldom successful

Success often depends on “Success often depends on “shapingshaping””

Page 4: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

What is Shaping?What is Shaping? Term from Behavioral PsychologyTerm from Behavioral Psychology Identified by B. F. Skinner (1938)Identified by B. F. Skinner (1938) Task-Based Example: Task-Based Example:

Train rat to press lever Train rat to press lever First reward proximityFirst reward proximity Then any interaction with leverThen any interaction with lever Then actual pressing of leverThen actual pressing of lever

Page 5: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolutionary ShapingEvolutionary Shaping Environment changes, making task harderEnvironment changes, making task harder Evolution shapes behavior across generationsEvolution shapes behavior across generations Example: Migration given continental drift [1]Example: Migration given continental drift [1]

Animals become accustomed to short migrationAnimals become accustomed to short migration Continental drift increases distance of migrationContinental drift increases distance of migration Ability to travel increasing distances requiredAbility to travel increasing distances required

EC models with incremental evolution (ex. [2])EC models with incremental evolution (ex. [2])[1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975.[2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.

Arctic Tern

Atlantic Salmon

Page 6: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Fitness-Based ShapingFitness-Based Shaping Not extensively usedNot extensively used Little/no domain knowledge neededLittle/no domain knowledge needed Multiobjective approach a good fitMultiobjective approach a good fit Selection criteria changeSelection criteria change

Exploiting ignored objectives (Exploiting ignored objectives (TUGTUG)) Exploiting unfilled niches (Exploiting unfilled niches (BDBD))

Behavior Space

Crowded Niches

Uncrowded Niches

Objective Space

Dominated, but exploiting mostly ignored objective

Uncrowded NichesUncrowded Niches

Page 7: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Mutiobjective OptimizationMutiobjective Optimization

Pareto dominance: iffPareto dominance: iff

Assumes maximizationAssumes maximization Want nondominated pointsWant nondominated points NSGA-II used in this workNSGA-II used in this work

What to evolve?What to evolve? NNs as control policiesNNs as control policies

uv

ii uvni :,,1 ii uvni :,,1 Nondominate

d

Page 8: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Constructive NeuroevolutionConstructive Neuroevolution Genetic Algorithms + Neural NetworksGenetic Algorithms + Neural Networks Build structure incrementally Build structure incrementally

(complexification)(complexification) Good at generating control policiesGood at generating control policies Three basic mutations (no crossover Three basic mutations (no crossover

used)used)

Perturb WeightAdd Connection Add Node

Page 9: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

TTargeting argeting UUnachieved nachieved GGoalsoals Main ideas:Main ideas:

Temporarily deactivate “easy” objectivesTemporarily deactivate “easy” objectives Focus on “hard” objectivesFocus on “hard” objectives

““Hard” and “easy” defined in terms of goal Hard” and “easy” defined in terms of goal valuesvalues Easy: average fitness “persists” above goal Easy: average fitness “persists” above goal

(achieved)(achieved) Hard: goal not yet achievedHard: goal not yet achieved

Objectives reactivated when no longer achieved Objectives reactivated when no longer achieved Increase goal values when all achievedIncrease goal values when all achieved

Evolution

Hard Objectives

Page 10: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

TUG ExampleTUG Example

Goal achieved

Other goals also achieved → Goals increase

Reset recency-weighted average

Noisy evaluations

Page 11: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

BBehavioral ehavioral DDiversityiversity Originally developed for single-objective tasks Originally developed for single-objective tasks

[3][3] Add behavioral diversity objectiveAdd behavioral diversity objective Encourage exploration of new behaviorsEncourage exploration of new behaviors Domain-specific behavior measure requiredDomain-specific behavior measure required

Extensions in this work:Extensions in this work: Multiobjective taskMultiobjective task Domain independent methodDomain independent method Only requires policy mapping Only requires policy mapping

ℝ ℝ to to ℝ , e.g. NNsℝ , e.g. NNs

[3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.

N M

Senses

Actions

Page 12: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Behavioral Diversity DetailsBehavioral Diversity Details Behavior vector:Behavior vector:

Given input vectors, concatenate outputsGiven input vectors, concatenate outputs

Behavioral diversity objective:Behavioral diversity objective: AVG distance from other AVG distance from other

behavior vectors behavior vectors

0.1 2.3 4.3 5.2 3.2

0.5 5.3 7.5 3.4 2.1

1.3 4.2 5.6 4.5 7.7

2.4 4.3 0.7 4.2 2.1 3.5 …

Behavior vector

High average distance from other points

Page 13: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Battle DomainBattle Domain Evolved monsters (blue)Evolved monsters (blue)

Monsters can hurt fighterMonsters can hurt fighter Scripted fighter (green)Scripted fighter (green)

Bat can hurt monstersBat can hurt monsters Three objectivesThree objectives

Deal damageDeal damage Avoid damageAvoid damage Stay aliveStay alive

Previous work required Previous work required incremental evolution to incremental evolution to solvesolve

Page 14: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Experimental ComparisonExperimental Comparison NN copied to 4 monsters NN copied to 4 monsters

Homogeneous teamsHomogeneous teams

In paperIn paper Control: Plain NSGA-IIControl: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goalsTUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectorsBD: NSGA-II with BD using random input vectors

Additional methods since publicationAdditional methods since publication TUG-Low: NSGA-II with TUG using minimal initial goalsTUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluationsBD-Obs: NSGA-II with BD using inputs from evaluations

Each repeated 30 timesEach repeated 30 times

Page 15: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Attainment Surfaces [4]Attainment Surfaces [4] Result attainment surfaceResult attainment surface

Shows space dominated by single Pareto frontShows space dominated by single Pareto front Summary attainment surface Summary attainment surface ss

Union of space dominated in at least Union of space dominated in at least ss out of out of nn runs runs Surface Surface ss weakly dominates weakly dominates s+1s+1, etc., etc.

Pareto Fronts(Approximation

Sets)

Result Attainment

Surfaces

Summary Attainment

Surfaces

Surface 1

Surface 2

Surface 3

Individual surfaces intersect

[4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.

Page 16: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Final Summary Attainment Final Summary Attainment SurfacesSurfaces

Control TUG BD

TUG-Low BD-Obs

Animation: worst to best summary attainment surface

Page 17: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Hypervolume Metric [5]Hypervolume Metric [5] Hypervolume of result attainment surfaceHypervolume of result attainment surface

Simply “volume” for 3 domain objectivesSimply “volume” for 3 domain objectives WRT reference pointWRT reference point

Slightly less than minimum scoresSlightly less than minimum scores Pareto-compliant metricPareto-compliant metric

Hypervolume = A + B + C + D

2121 HVHVFF

[5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.

Page 18: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

HypervolumeHypervolume

Page 19: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Successful BehaviorsSuccessful Behaviors

BD

BD-Obs

TUG

TUG-Low

Page 20: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

DiscussionDiscussion Control: more extreme trade-offsControl: more extreme trade-offs BD: more precise timingBD: more precise timing BD-Obs and BD similarBD-Obs and BD similar

““Real” inputs give no Real” inputs give no advantageadvantage

TUG: more teamworkTUG: more teamwork Particular initial objectivesParticular initial objectives

TUG-Low more like BD than TUGTUG-Low more like BD than TUG

ALL are better than ControlALL are better than Control

Page 21: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Future WorkFuture Work How to combine TUG and BDHow to combine TUG and BD

Naïve combination doesn’t workNaïve combination doesn’t work Scaling upScaling up

Many objectivesMany objectives More complex domainsMore complex domains Current work in Unreal Tournament Current work in Unreal Tournament

promisingpromising

Page 22: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

ConclusionConclusion BD and TUG improve MO evolutionBD and TUG improve MO evolution Domain independence!Domain independence!

Contrast to task-based shapingContrast to task-based shaping Expand MOEAs to a new range of Expand MOEAs to a new range of

domainsdomains

Page 23: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Questions?Questions?

Email: Email: [email protected]@cs.utexas.edu

See movies at:See movies at:

http://nn.cs.utexas.edu/?fitness-http://nn.cs.utexas.edu/?fitness-shapingshaping

Page 24: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

TUG DetailsTUG Details Persistence:Persistence:

Recency-weighted average Recency-weighted average surpasses goal surpasses goal

Goals:Goals:

Initial values based on domain knowledgeInitial values based on domain knowledge Or simply the minimal values for objectivesOr simply the minimal values for objectives Increase each goal when all are achievedIncrease each goal when all are achieved

Objectives reactivated when no longer Objectives reactivated when no longer achievedachieved

tr

)( 11 tttt rxrr

og)( max ooo gogg

Goal achieved

Page 25: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping
Page 26: Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

TUG CyclesTUG Cycles