hybrid behavior co-evolution and structure learning in behavior-based systems amir massoud farahmand...

Hybrid Behavior Co-evolution and Structure Learning in Behavior-based

SystemsAmir massoud Farahmand (a,b,c)

(www.cs.ualberta.ca/~amir)

Majid Nili Ahmadabadi (b,c)

Caro Lucas (b,c)

Babak N. Araabi (b,c)

a) Department of Computing Science, University of Alberta

b) Control and Intelligent Processing Center of Excellence, Department of Electrical and Computer Engineering,

University of Tehran

c) School of Cognitive Sciences, IPM

http://www.cs.ualberta.ca/~amir






Motivation

Situated real-world agents (e.g.) face different uncertainties– Unknown environment/body

• [exact] Model of environment/body is not known

– Non-stationary environment/body

• Changing environment (offices, houses, streets, and almost everywhere)

• Aging• …

Designing a robust controller for such an agent is not easy.

Research Specification

• Goal: Automatic design of intelligent agent

• Architecture: Hierarchical behavior-based architectures (a version of Subsumption architecture)– Behavior-based systems:

• A robust successful approach for designing situated agents

• Behavioral decomposition• Behaviors: Sensors ---> Actions

• Evaluation: Objective performance measure is available (reinforcement signal)

– [Agent] Did I perform it correctly?!– [Tutor] Yes/No! (or 0.3)

build maps

explore

avoid obstacles

locomote

manipulatethe world

sensors actuators

?How should we

DESIGNa behavior-based system?!

Behavior-based System Design Methodologies

• Hand Design– Common in almost everywhere.– Complicated: may be even infeasible in complex problems– Even if it is possible to find a working system, it is probably not the

best solution.• Evolution

– Good solutions can be found (+)– Biologically plausible (+)– Time consuming (-)– Not fast in making new solutions (-)

• Learning– Biologically plausible (+)– Learning is essential for life-time survival of the agent. (+)– May get stuck in a local minimum (-)

Taxonomy of Design Methods

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsEvolution of Structure

Taxonomy of Design Methods


Learning Evolution




Hybridization of Evolution and Learning

Problem Formulation

Behaviors

{ }{ }

ii

ii

iiiii

ii

iii

SSM

AASS

SssMssS

AA

ASB

′→

⊂⊂

∈∀=′′=′

∪=′

=′→′

:

,

);(

Action No

n1,...,i :

Problem Formulation

Purely Parallel Subsumption Architecture (PPSSA)

layer) in the is indicates(that

][ T)()2()1(

thj

mindexindexindex

iBjindex(i):

n m ... B BBT ≤=

[ ]oidanceObstacleAvtionBallCollecWanderingT =

•Different behaviors excites

•Higher behaviors can suppress lower ones.

•Controlling behavior

Problem FormulationReinforcement Signal and the Agent’s Value

Function

∑=

=N

iirN

R1

1

{ }

{ }[ ])1( behaviors ofset and structure agent with the

)1( behaviors ofset and structure agent with the1

1

,...,niBTRE

,...,niBTrN

EV

i

i

N

ttT

==

⎥⎦

⎤⎢⎣

⎡== ∑

=

π

π

•This function states the value of using a set of behaviors inan specific structure.•We want to maximize the agent’s value function

Problem FormulationDesign as an Optimization

• Structure Learning: Finding the best structure given a set of behaviors using learning

• Behavior Learning: Finding the best behaviors given the structure using learning

• Concurrent Behavior and Structure Learning

• Behavior Evolution: Finding the best behaviors given structure using evolution

• Behavior Evolution and Structure Learning { } T

BTi VBT

i,

** maxarg, =

TT

VT maxarg* =

{ } TB

i VBi

maxarg* =

{ } TBT

i VBTi,

** maxarg, =

{ } TB

i VBi

maxarg* =


Learning Evolution





Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).

Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles

2-The agent hits a wall!

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox“explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.

Structure LearningChallenging Issues

• Representation: How should the agent represent knowledge gathered during learning?– Sufficient (Concept space should be covered by Hypothesis space)– Generalization Capability– Tractable (small Hypothesis space)– Well-defined credit assignment

• Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture?– If the agent receives a reward/punishment, how should we

reward/punish the structure of the agent?• Learning: How should the agent update its knowledge when it

receives reinforcement signal?

Structure LearningOvercoming Challenging Issues

• Our approach is defining a representation that allows decomposing the agent’s value function to simpler components.

• Structure can provide a lot of clues to us.

Structure Learning Zero Order Representation

avoid obstacles(0.8)

avoid obstacles(0.6)

explore(0.7)

explore(0.9)

locomote(0.4)Higher layer

Lower layer

ZO Value Table in the agent’s mind

locomote(0.4)

Structure LearningZero Order Representation - Value Function

Decomposition[ ]

( ){ }

{ } { }

g)controllin is (gcontrollin is |1

...



g"controllin is "...g"controllin is "1

g"controllin is "...g"controllin is "g"controllin is "1

1

22

11

111

121

1

mmt

t

t

N

tmt

N

tt

N

tmt

N

ttT

LPLrN

E

LPLrN

E

LPLrN

E

LrELrN

E

LLLrN

E

rN

EREV

⋅⎥⎦⎤

⎢⎣⎡++

⋅⎥⎦⎤

⎢⎣⎡+

⋅⎥⎦⎤

⎢⎣⎡=

⎥⎦

⎤⎢⎣

⎡∧++⎥

⎦

⎤⎢⎣

⎡∧=

⎥⎦

⎤⎢⎣

⎡∨∨∨∧=

⎥⎦

⎤⎢⎣

⎡==

∑

∑

∑

∑∑

∑

∑

==

=

=

π

π

π

ππ

π

ππ


Decomposition

{ }

{ } miVLBP

LBrN

ELBPLrN

E

n

jijij

n

jijtijit

,...,1 |

in behavior gcontrollin theis 1

|g]controllin is |1

[

1

1

==

⎥⎦⎤

⎢⎣⎡=

∑

∑ ∑∑

=

=ππ

{ } ( )∑∑= =

=m

i

n

jiijijT LPVLBPV

1 1

gcontrollin is |

Agent’s value function

ZO components

Layer’s value


Decomposition

{ } ( )∑∑= =

==⇒

=

m

i

n

jiijij

TT

T

TT

LPVLBPVT

VT

1 1

*

*

gcontrollin is |maxargmaxarg

maxarg

Structure LearningZero Order Representation - Credit Assignment and Value Updating

• Controlling behavior is the only responsible behavior for the current reinforcement signal.

{ } ( )gcontrollin is |~

iijijij LPVLBPV =

( ) [ ]nijijnijnijnij rnLnBVVn

××+−=+

" step at time gcontrollin is "" step at time active is "~

1~

,,,1αα


Learning Evolution





Behavior Co-evolutionMotivations

+• Learning can trap in the local

maxima of objective function• Evolutionary methods have

more chance to find the global maximum of the objective function

• Learning is sensitive (POMDP, non-Markov, …)

• Objective function may not be well-defined in robotics

-• Evolutionary robotics’

methods are usually slow– Fast changes of the

environment• Non-modular controllers

– Monolithic– No reusability

Behavior Co-evolutionIdeas

• Use evolution to search the difficult and big part of parameters’ space– Behaviors’ parameters space is usually the bigger one

• Use learning to do fast responses– Structure’s parameters space is usually the smaller one– A change is the structure results in different agent’s behavior

• Evolve behaviors separately – Modularity– re-usability

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

We have different behavior (genetic) pools


Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

One behavior is selected randomly from each pool. We want to assess its fitness.


Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Agent interacts with the environment using an architecture that is built by selected behaviors


Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

… and tries to maximize its reward.


Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Based on the performance of the agent, a fitness is assigned to it.

Fitness

Behavior Co-evolutionFitness Sharing

• We can evaluate fitness of the agent after its interaction with the environment.

• How can we assess the fitness of each behavior based on the fitness of the agent? (remember that we have separate behavior pools)

• We approximate it!

{ } { } ⎥⎦

⎤⎢⎣

⎡∈== ∑

∈

episodeK Last t, agent with the1

episodeK Last tepisodesK Last }{episodesK Last

BrK

EfV tBB

€

f Bij

( ) =1

NV B{ } i Last K episodes

B{ } i

∑ (Fitness)


Each behavior’s genetic pool has conventional evolutionary operators/phenomena– Selection– Genetic Operators

• Crossover• Mutation

– Hard» Replacement

– Soft» Perturbation

oldoldnew ki

ji

ji BXXBB +=′

Multi-Robot Object Lifting Problem

• Three robots want to lift an object using their own local sensors– No central control– No communication– Local sensors

• Objectives– Reaching prescribed

height– Keeping tilt angle

small

A group of robots lifts a bulky object.


QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.


QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.

Conclusion

• Hybridization of evolution and learning

• Evolution and learning search different subspaces of the solution space

• Competitive results to human-designs

Important Questions

• Is it possible to benefit from information gathered during learning?– Each agent learns an approximately good

structure’s arrangement. However, we do not use it at all!

• Is there any other way of sharing fitness of the agent between behaviors?– Now, we share all behaviors uniformly.

It seems that the answer to these questions is positive!

Future Research

• Can we decompose other problems (not just hierarchical behavior-based systems) similarly?!– Learning and evolution– Fast and Deep– Different subspaces of the solution space

• Other ways of fitness sharing– Low bias– Low variance

Questions?!

hybrid behavior co-evolution and structure learning in behavior-based systems amir massoud farahmand...

Documents