hybrid behavior co-evolution and structure learning in behavior-based systems amir massoud farahmand...

41
Hybrid Behavior Co- evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (www. cs . ualberta . ca/~amir ) Majid Nili Ahmadabadi (b,c) Caro Lucas (b,c) Babak N. Araabi (b,c) a) Department of Computing Science, University of Alberta b) Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Engineering, University of Tehran c) School of Cognitive Sciences, IPM

Upload: elinor-walker

Post on 31-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Hybrid Behavior Co-evolution and Structure Learning in Behavior-based

SystemsAmir massoud Farahmand (a,b,c)

(www.cs.ualberta.ca/~amir)

Majid Nili Ahmadabadi (b,c)

Caro Lucas (b,c)

Babak N. Araabi (b,c)

a) Department of Computing Science, University of Alberta

b) Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Engineering,

University of Tehran

c) School of Cognitive Sciences, IPM

Page 2: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Motivation

Situated real-world agents (e.g.) face different uncertainties– Unknown environment/body

• [exact] Model of environment/body is not known

– Non-stationary environment/body

• Changing environment (offices, houses, streets, and almost everywhere)

• Aging• …

Designing a robust controller for such an agent is not easy.

Page 3: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Research Specification

• Goal: Automatic design of intelligent agent

• Architecture: Hierarchical behavior-based architectures (a version of Subsumption architecture)– Behavior-based systems:

• A robust successful approach for designing situated agents

• Behavioral decomposition• Behaviors: Sensors ---> Actions

• Evaluation: Objective performance measure is available (reinforcement signal)

– [Agent] Did I perform it correctly?!– [Tutor] Yes/No! (or 0.3)

build maps

explore

avoid obstacles

locomote

manipulatethe world

sensors actuators

Page 4: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

?How should we

DESIGNa behavior-based system?!

Page 5: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior-based System Design Methodologies

• Hand Design– Common in almost everywhere.– Complicated: may be even infeasible in complex problems– Even if it is possible to find a working system, it is probably not the

best solution.• Evolution

– Good solutions can be found (+)– Biologically plausible (+)– Time consuming (-)– Not fast in making new solutions (-)

• Learning– Biologically plausible (+)– Learning is essential for life-time survival of the agent. (+)– May get stuck in a local minimum (-)

Page 6: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Taxonomy of Design Methods

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsEvolution of Structure

Page 7: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Taxonomy of Design Methods

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsEvolution of Structure

Hybridization of Evolution and Learning

Page 8: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Problem Formulation

Behaviors

{ }{ }

ii

ii

iiiii

ii

iii

SSM

AASS

SssMssS

AA

ASB

′→

⊂⊂

∈∀=′′=′

∪=′

=′→′

:

,

);(

Action No

n1,...,i :

Page 9: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Problem Formulation

Purely Parallel Subsumption Architecture (PPSSA)

layer) in the is indicates(that

][ T)()2()1(

thj

mindexindexindex

iBjindex(i):

n m ... B BBT ≤=

[ ]oidanceObstacleAvtionBallCollecWanderingT =

•Different behaviors excites

•Higher behaviors can suppress lower ones.

•Controlling behavior

Page 10: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Problem FormulationReinforcement Signal and the Agent’s Value

Function

∑=

=N

iirN

R1

1

{ }

{ }[ ])1( behaviors ofset and structure agent with the

)1( behaviors ofset and structure agent with the1

1

,...,niBTRE

,...,niBTrN

EV

i

i

N

ttT

==

⎥⎦

⎤⎢⎣

⎡== ∑

=

π

π

•This function states the value of using a set of behaviors inan specific structure.•We want to maximize the agent’s value function

Page 11: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Problem FormulationDesign as an Optimization

• Structure Learning: Finding the best structure given a set of behaviors using learning

• Behavior Learning: Finding the best behaviors given the structure using learning

• Concurrent Behavior and Structure Learning

• Behavior Evolution: Finding the best behaviors given structure using evolution

• Behavior Evolution and Structure Learning { } T

BTi VBT

i,

** maxarg, =

TT

VT maxarg* =

{ } TB

i VBi

maxarg* =

{ } TBT

i VBTi,

** maxarg, =

{ } TB

i VBi

maxarg* =

Page 12: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsEvolution of Structure

Hybridization of Evolution and Learning

Page 13: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).

Page 14: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

Page 15: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles

2-The agent hits a wall!

Page 16: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.

Page 17: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox“explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.

Page 18: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningChallenging Issues

• Representation: How should the agent represent knowledge gathered during learning?– Sufficient (Concept space should be covered by Hypothesis space)– Generalization Capability– Tractable (small Hypothesis space)– Well-defined credit assignment

• Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture?– If the agent receives a reward/punishment, how should we

reward/punish the structure of the agent?• Learning: How should the agent update its knowledge when it

receives reinforcement signal?

Page 19: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningOvercoming Challenging Issues

• Our approach is defining a representation that allows decomposing the agent’s value function to simpler components.

• Structure can provide a lot of clues to us.

Page 20: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure Learning Zero Order Representation

avoid obstacles(0.8)

avoid obstacles(0.6)

explore(0.7)

explore(0.9)

locomote(0.4)Higher layer

Lower layer

ZO Value Table in the agent’s mind

locomote(0.4)

Page 21: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningZero Order Representation - Value Function

Decomposition[ ]

( ){ }

{ } { }

g)controllin is (gcontrollin is |1

...

g)controllin is (gcontrollin is |1

g)controllin is (gcontrollin is |1

g"controllin is "...g"controllin is "1

g"controllin is "...g"controllin is "g"controllin is "1

1

22

11

111

121

1

mmt

t

t

N

tmt

N

tt

N

tmt

N

ttT

LPLrN

E

LPLrN

E

LPLrN

E

LrELrN

E

LLLrN

E

rN

EREV

⋅⎥⎦⎤

⎢⎣⎡++

⋅⎥⎦⎤

⎢⎣⎡+

⋅⎥⎦⎤

⎢⎣⎡=

⎥⎦

⎤⎢⎣

⎡∧++⎥

⎤⎢⎣

⎡∧=

⎥⎦

⎤⎢⎣

⎡∨∨∨∧=

⎥⎦

⎤⎢⎣

⎡==

∑∑

==

=

=

π

π

π

ππ

π

ππ

Page 22: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningZero Order Representation - Value Function

Decomposition

{ }

{ } miVLBP

LBrN

ELBPLrN

E

n

jijij

n

jijtijit

,...,1 |

in behavior gcontrollin theis 1

|g]controllin is |1

[

1

1

==

⎥⎦⎤

⎢⎣⎡=

∑ ∑∑

=

=ππ

{ } ( )∑∑= =

=m

i

n

jiijijT LPVLBPV

1 1

gcontrollin is |

Agent’s value function

ZO components

Layer’s value

Page 23: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningZero Order Representation - Value Function

Decomposition

{ } ( )∑∑= =

==⇒

=

m

i

n

jiijij

TT

T

TT

LPVLBPVT

VT

1 1

*

*

gcontrollin is |maxargmaxarg

maxarg

Page 24: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Structure LearningZero Order Representation - Credit Assignment and Value Updating

• Controlling behavior is the only responsible behavior for the current reinforcement signal.

{ } ( )gcontrollin is |~

iijijij LPVLBPV =

( ) [ ]nijijnijnijnij rnLnBVVn

××+−=+

" step at time gcontrollin is "" step at time active is "~

1~

,,,1αα

Page 25: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsEvolution of Structure

Hybridization of Evolution and Learning

Page 26: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolutionMotivations

+• Learning can trap in the local

maxima of objective function• Evolutionary methods have

more chance to find the global maximum of the objective function

• Learning is sensitive (POMDP, non-Markov, …)

• Objective function may not be well-defined in robotics

-• Evolutionary robotics’

methods are usually slow– Fast changes of the

environment• Non-modular controllers

– Monolithic– No reusability

Page 27: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolutionIdeas

• Use evolution to search the difficult and big part of parameters’ space– Behaviors’ parameters space is usually the bigger one

• Use learning to do fast responses– Structure’s parameters space is usually the smaller one– A change is the structure results in different agent’s behavior

• Evolve behaviors separately – Modularity– re-usability

Page 28: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

We have different behavior (genetic) pools

Page 29: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

One behavior is selected randomly from each pool. We want to assess its fitness.

Page 30: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Agent interacts with the environment using an architecture that is built by selected behaviors

Page 31: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

… and tries to maximize its reward.

Page 32: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Based on the performance of the agent, a fitness is assigned to it.

Fitness

Page 33: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolutionFitness Sharing

• We can evaluate fitness of the agent after its interaction with the environment.

• How can we assess the fitness of each behavior based on the fitness of the agent? (remember that we have separate behavior pools)

• We approximate it!

{ } { } ⎥⎦

⎤⎢⎣

⎡∈== ∑

episodeK Last t, agent with the1

episodeK Last tepisodesK Last }{episodesK Last

BrK

EfV tBB

f Bij

( ) =1

NV B{ } i Last K episodes

B{ } i

∑ (Fitness)

Page 34: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Behavior Co-evolution

Each behavior’s genetic pool has conventional evolutionary operators/phenomena– Selection– Genetic Operators

• Crossover• Mutation

– Hard» Replacement

– Soft» Perturbation

oldoldnew ki

ji

ji BXXBB +=′

Page 35: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Multi-Robot Object Lifting Problem

• Three robots want to lift an object using their own local sensors– No central control– No communication– Local sensors

• Objectives– Reaching prescribed

height– Keeping tilt angle

small

A group of robots lifts a bulky object.

Page 36: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Multi-Robot Object Lifting Problem

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 37: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Multi-Robot Object Lifting Problem

QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.

Page 38: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Conclusion

• Hybridization of evolution and learning

• Evolution and learning search different subspaces of the solution space

• Competitive results to human-designs

Page 39: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Important Questions

• Is it possible to benefit from information gathered during learning?– Each agent learns an approximately good

structure’s arrangement. However, we do not use it at all!

• Is there any other way of sharing fitness of the agent between behaviors?– Now, we share all behaviors uniformly.

It seems that the answer to these questions is positive!

Page 40: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Future Research

• Can we decompose other problems (not just hierarchical behavior-based systems) similarly?!– Learning and evolution– Fast and Deep– Different subspaces of the solution space

• Other ways of fitness sharing– Low bias– Low variance

Page 41: Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (amir)amir

Questions?!