learning and evolution in hierarchical behavior-based systems amir massoud farahmand advisor: majid...

76
Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N. Araabi

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

Learning and Evolution in Hierarchical Behavior-based Systems

Amir massoud Farahmand

Advisor:

Majid Nili Ahmadabadi

Co-advisors:

Caro Lucas – Babak N. Araabi

Page 2: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 2

Motivation

Machines (e.g. robots): from labs. to homes, factories, … .

Machines face: Unknown environment/body

[exact] Model of environment/body is not known

Non-stationary environment/body Changing environment (offices,

houses, streets, and almost everywhere)

Aging Designer may not know how to

benefit from every aspects of her agent/environment

Page 3: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 3

Motivation

Difficulty of the design processMachines see different thingsMachines interact differentlyThe designer is not a machine!

I know what I want!

Our goal: Automatic design of intelligent machines

Page 4: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 4

Research Specification

Goal: Automatic design of intelligent robots

Architecture: Hierarchical behavior-based architectures.

Objective performance measure is available (reinforcement signal) [Agent] Did I perform it correctly?! [Tutor] Yes/No! (or 0.3)

Page 5: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 5

Behavior-based Approach to AI

Behavior-based approach as a successful alternative for classical AI approachNo {Abstraction, Planning, Deduction, … }

Behavioral (activity) decompositionagainst functional decomposition

Behavior: Sensor->Action (Direct link between perception and action)

Page 6: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 6

Behavioral Decomposition

build maps

explore

avoid obstacles

locomote

manipulatethe world

sensors actuators

Page 7: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 7

Behavior-based Design

Robust not sensitive to failure of particular part of the

system no need for precise perception as there is no

modelling thereReactive: Fast response as there is no long route

from perception to action

No explicit representation

Page 8: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 8

?How should we

DESIGNa behavior-based system?!

Page 9: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 9

Behavior-based System Design Methodologies

Hand Design Common in almost everywhere. Complicated: may be even infeasible in complex problems Even if it is possible to find a working system, it is not

optimal probably. Evolution

Good solutions can be found Biologically feasible Time consuming Not fast in making new solutions

Learning Biologically feasible Learning is essential for life-time survival of the agent.

Page 10: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 10

Taxonomy of Design Methods

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 11: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 11

Problem FormulationBehaviors

ii

ii

iiiii

ii

iii

SSM

AASS

SssMssS

AA

ASB

:

,

);(

Action No

n1,...,i :

Page 12: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 12

Problem FormulationPurely Parallel Subsumption Architecture (PPSSA)

layer) in the is indicates(that

][ T)()2()1(

thj

mindexindexindex

iBjindex(i):

n m ... B BBT

oidanceObstacleAvtionBallCollecWanderingT

•Different behaviors excites

•Higher behaviors can suppress lower ones.

•Controlling behavior

Page 13: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 13

Problem FormulationReinforcement Signal and the Agent’s Value Function

N

iirN

R1

1

)1( behaviors ofset and structure agent with the

)1( behaviors ofset and structure agent with the1

1

,...,niBTRE

,...,niBTrN

EV

i

i

N

ttT

•This function states the value of using a set of behaviors inan specific structure.•We want to maximize the agent’s value function

Page 14: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 14

Problem FormulationDesign as an Optimization

Structure Learning: Finding the best structure given a set of behaviors using learning

Behavior Learning: Finding the best behaviors given the structure using learning

Concurrent Behavior and Structure Learning

Behavior Evolution: Finding the best behaviors given structure using evolution

Behavior Evolution and Structure Learning

TBT

i VBTi,

** maxarg,

TT

VT maxarg*

TB

i VBi

maxarg*

TBT

i VBTi,

** maxarg,

TB

i VBi

maxarg*

Page 15: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 15

Where?!

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 16: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 16

Learning in Behavior-based Systems

There are a few researches on behavior-based learningMataric, Mahadevan, Maes, and ...

… but there is no deep investigation about it (specially mathematical formulation)!

And most of them incorporate flat architectures.

Page 17: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 17

Learning in Behavior-based Systems

We design: Structure (Hierarchy) Behavior

We Learn:Structure Learning

Organizing behaviors in the architecture using a behavior toolbox

Behavior Learning The correct mapping of each behavior

Page 18: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 18

Where?!

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 19: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 19

Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).

Page 20: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 20

Structure Learning

manipulatethe world

build maps

explore

locomote

avoid obstacles

Behavior Toolbox

Page 21: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 21

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles

2-The agent hits a wall!

Page 22: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 22

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.

Page 23: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 23

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox“explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.

Page 24: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 24

Structure LearningChallenging Issues

Representation: How should the agent represent knowledge gathered during learning? Sufficient (Concept space should be covered by Hypothesis

space) Generalization Capability Tractable (small Hypothesis space) Well-defined credit assignment

Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? If the agent receives a reward/punishment, how should we

reward/punish the structure of the agent? Learning: How should the agent update its knowledge

when it receives reinforcement signal?

Page 25: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 25

Structure LearningOvercoming Challenging Issues

Our approach is defining a representation that allows decomposing the agent’s value function to simpler components.

Decomposing the behavior of a multi-agent system to simpler components may enhance our vision to the problem under investigation.

Structure can provide a lot of clues to us.

Page 26: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 26

Structure Learning

Structure Learning

Zero Order Representation First Order Representation

The value of each behavior in each layer

The value of order (higher/lower)of behaviors in the structure

Page 27: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 27

Structure Learning Zero Order Representation

avoid obstacles(0.8)

avoid obstacles(0.6)

explore(0.7)

explore(0.9)

locomote(0.4)Higher layer

Lower layer

ZO Value Table in the agent’s mind

locomote(0.4)

Page 28: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 28

Structure LearningZero Order Representation - Value Function Decomposition

g)controllin is (gcontrollin is |1

...

g)controllin is (gcontrollin is |1

g)controllin is (gcontrollin is |1

g"controllin is "...g"controllin is "1

g"controllin is "...g"controllin is "g"controllin is "1

1

22

11

111

121

1

mmt

t

t

N

tmt

N

tt

N

tmt

N

ttT

LPLrN

E

LPLrN

E

LPLrN

E

LrELrN

E

LLLrN

E

rN

EREV

Page 29: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 29

Structure LearningZero Order Representation - Value Function Decomposition

miVLBP

LBrN

ELBPLrN

E

n

jijij

n

jijtijit

,...,1 |

in behavior gcontrollin theis 1

|g]controllin is |1

[

1

1

m

i

n

jiijijT LPVLBPV

1 1

gcontrollin is |

Agent’s value function

ZO components

Layer’s value

Page 30: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 30

Structure LearningZero Order Representation - Value Function Decomposition

m

i

n

jiijij

TT

T

TT

LPVLBPVT

VT

1 1

*

*

gcontrollin is |maxargmaxarg

maxarg

Page 31: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 31

Structure LearningZero Order Representation - Credit Assignment and Value Updating

Controlling behavior is the only responsible behavior for the current reinforcement signal.

gcontrollin is |~

iijijij LPVLBPV

nijijnijnijnij rnLnBVVn

" step at time gcontrollin is "" step at time active is "~

1~

,,,1

Page 32: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 32

Structure LearningFirst Order Representation

Page 33: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 33

Structure LearningFirst Order Representation

m

iiindexkiindex

N

tt

N

ttT BPBr

NEr

NEV

1][

11

g)controllin is (gcontrollin is |1

]1

[

j

T

kjj

T

kj BBB

jkk

BBBj

kN

tt

k

N

tt

k

N

tt

VVB

Br

NE

BrN

E

BrN

E

;

0

;1

1

1

behavior activenext theis

and gcontrollin is 1

active is elsenobody and gcontrollin is 1

gcontrollin is |1

Page 34: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 34

Structure LearningFirst Order Representation

m

ii

i

jjindexiindexiindexT BPVVV

1

1

1)()(0)( g)controllin is (

Page 35: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 35

Structure LearningFirst Order Representation – Credit Assignment

If only one behavior becomes activated, we should update V0(i) . If two or more behaviors become active, we must update V(i>j) for which ‘i’ is the index of the controlling behavior and ‘j’ which is the index of the next active behavior .

Page 36: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 36

A Break!A Break!

Page 37: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 37

Introduction to Experiments

Abstract problemMulti-robot object

lifting problem I will only discuss

this problem now.

A group of robots lifts a bulky object.

Page 38: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 38

ExperimentsStructure Learning

0 5 10 15 20 25 30 35 40 45 50-50

0

50

100

150

Episode

Rew

ard

ZO

FO

Hand-designed structure

Random structure

Comparison of the average gained reward of two different structure learning methods (Zero Order (ZO) and First Order (FO)), hand-designed structure, and random structure for the object lifting problem.

Page 39: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 39

Where?!

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 40: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 40

Behavior Learning

No more behavior repertoire assumptionAll we know

Sensor/Actuator dimensionsReinforcement Signal

Page 41: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 41

Behavior LearningChallenging Issues

How should behaviors cooperative with each other to maximize the performance of the agent?

How should we assign credit to behaviors of the architecture?

How should each behavior update its knowledge?

Page 42: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 42

Behavior Learning

1. B2, B3, and B4 excite

2. B4 takes the control

3. Punishment!!!

?!

Page 43: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 43

Behavior Learning

Augmenting the action space with a pseudo-action named NoAction (NA)

NA does nothing and let lower behaviors take control

1. B2, B3, B4 excite

2. B4 proposed NA

3. B3 proposes an action and takes control

4. Reward!

Page 44: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 44

Behavior Learning

NA lets behaviors to cooperateHow should we force them to

cooperative correctly?!Hierarchical Credit Assignment Problem

Boolean-like algebra for logically expressible multi-agent systems

3121321 AAAAAAA

Page 45: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 45

Behavior Learning

unknown:

unknown:

unknown:

:)(

:

:

*

l

l

u

u

R

B

B

B

NAB

B

Ti

behaviorsupper

excitednot behavior gcontrollin

*

behaviorslower

1)(...)(1:1

NABNABBT

kuuR

Page 46: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 46

Behavior LearningOptimality

*

**

*

*

in excited is

" " ofon contributi by the achieved is Reward)()(

Ss

i

iSsiSsi

dsSsspsBpsR

SsBsREsREr

Internal states of different behaviors excites in different regions

Page 47: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 47

Behavior LearningOptimality

iii

Ss

iiiii

aBsBpsR

dsSsspaBsBpsRasQ

selects in excited is )(

selects in excited is ,

Ss

iii dsSsspNABsBpsRNAsQ selects in excited is ),(

iiiii AaasQNAsQ ),(),(

Page 48: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 48

Behavior LearningValue Updating

) selects and in behavior gcontrollin is (

)(),(,),(1, ,,1

iii

iiikiiiiiikiii

asB

srasasQasasQkk

)select and in excited are s andbehavior gcontrollin is and B;(

)(),(,),(1,

i

T

,,1

NAsBBBB

srNAsNAsQNAsNAsQ

jjijj

jikjjjjkjj kk

For the case of immediate reward

Page 49: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 49

Behavior LearningValue Updating

For the general return case, we should use Monte Carlo estimation.

Bootstrapping method is not applicable.

Page 50: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 50

Concurrent Behavior and Structure Learning

ApplyingBehavior Learning

State-Action MappingsStructure Learning

Hierarchy

Page 51: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 51

ExperimentsBehavior Learning

0 5 10 15 20 25 30 35 40 45 505

10

15

20

25

30

Episodes

Ave

rage

Gai

ned

Rew

ard

Str. Learning Beh./Str. LearningBeh. Learning

Reward comparison between structure learning, behavior learning, and concurrent behavior/structure learning methods for the object lifting task.

Page 52: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 52

ExperimentsBehavior Learning

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Average Gained Reward

Pro

babi

lity

Random Hand-designed

Str.Learning

Beh./Str.Learning

Beh. Learning

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Average Gained Reward

Pro

babi

lity

Random

Beh./Str.Learning

Hand-designed

Beh. Learning

Str. Learning

Learning phase Testing phase

Page 53: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 53

ExperimentsBehavior Learning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 120

22

24

26

28

30

32

34

Percentile of the superior results

Ave

rage

Gai

ned

Rew

ard

Hand-designed

Str. Learning

Beh. Learning

Beh./Str. Learning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122

24

26

28

30

32

34

Percentile of the superior results

Ave

rage

Gai

ned

Rew

ard

Beh./Str. Learning

Beh. Learning

Str. Learning

Hand-designed

Learning phase Testing phase

Page 54: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 54

ExperimentsBehavior Learning

A sample trajectory showing the position of robot-object contact points, the tilt angle of the object during object lifting, and controlling behavior of robots in each time steps after sufficient structure/behavior learning. Behaviors correspondence with numbers of lowest diagram is as follows: 0 (No Behavior), 1 (Push More), 2 (Don’t Go Fast), 3 (Stop), 4 (Hurry up), 5 (Slow down).

0 0.5 1 1.52

2.5

3

3.5

Time (sec)

Hei

ght

0 0.5 1 1.50

10

20

Time (sec)

Tilt

Ang

le

0 0.5 1 1.50

12

34

5

Time (sec)Con

trol

ling

Beh

avio

rs

robot 1

robot 2

robot 3

Page 55: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 55

Where?!

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 56: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 56

Behavior Co-evolutionMotivations

+ Learning can trap in local

maxima of objective function Learning is sensitive

(POMDP, non-Markov, …) Evolutionary methods have

more chance to find the global maximum of the objective function

Objective function may not be well-defined in robotics

- Evolutionary robotics’

methods are usually slow Fast changes of the

environment Non-modular controllers

Monolithic No reusability

Page 57: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 57

Behavior Co-evolutionMotivations

Use evolution to search the difficult and big part of parameters’ space Behaviors’ parameters space is usually the bigger one

Use learning to do fast responses Structure’s parameters space is usually the smaller

one A change is the structure results in different agent’s

behavior

Evolve behaviors separately (modularity and re-usability)

Page 58: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 58

Behavior Co-evolution

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Evolve each kind of behavior in its own genetic pool

Page 59: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 59

Behavior Co-evolutionFitness Sharing

Fitness of the agent Fitness of each behavior?!

Fitness SharingUniformValue-based

Page 60: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 62

Behavior Co-evolution

Each behavior’s genetic pool SelectionGenetic Operators

CrossoverMutation

Hard Replacement

Soft Perturbation

oldoldnew ki

ji

ji BXXBB

Page 61: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 63

Where?!

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 62: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 64

Memetic Algorithm

We waste learned knowledge after each agent’s lifetime

Meme as a unit of information that reproduces itself as people exchange idea

Traditional memetic algorithms: Evolutionary Method: Meme exchange Local Search: Meme refinement

May be called as Hybrid Evolutionary Algorithm

Page 63: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 65

Memetic Algorithm

Two different interpretations of meme:Current hybridization of behavior co-

evolution and structure learningSimilar to traditional MADifference with traditional MA: different

parameters spaces are being searchedMeme as a cultural bias

Page 64: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 66

Memetic Algorithm

Experienced individuals store their experiences in the form of meme in the culture.

Newborn individuals get a new meme from the culture.

Structure as a meme

Page 65: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 67

Memetic Algorithm

Agent

Behavior Pool 1

Behavior Pool 2

Behavior Pool n

Meme Pool(Culture)

Page 66: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 68

Memetic Algorithm

Each meme has its own value

Value of the meme is updated using the fitness of the agent

Valuable memes have more chance to be selected for newborn individuals

iTi fT ,: *M

iiTTTT TBAAfffiniini

,: 11

Page 67: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 69

ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm

(Object Lifting) Averaged last five episodes fitness comparison for different design methods: 1) evolution of behaviors (uniform fitness sharing) and learning structure (blue), 2) evolution of behaviors (valued-based fitness sharing) and learning structure (black), 3) hand-designed behaviors with learning structure (green), and 4) hand-designed behaviors and structure (red). Dotted line across the hand-designed cases (3 and 4) show one standard deviation region across the mean performance.

0 5 10 15 20 25 30 35 40 45 50-150

-100

-50

0

50

100

150

200

250

300

350

Generations

Fitn

ess

Structure Learning - Value-based Fitness Sharing

Structure Learning - Uniform Fitness Sharing

Hand-designed Behaviors and Structure

Hand-designed Behavior/Learning Structure

Page 68: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 70

ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm

(Object Lifting) Averaged last five episodes and lifetime fitness comparison for uniform fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is much higher.

0 5 10 15 20 25 30 35 40 45 50-200

-150

-100

-50

0

50

100

150

200

250

300

Generations

Fitn

ess

and

Life

time

Fitn

ess

Structure Learning - No Meme Pool

Structure Learning - with Meme Pool

Hand-designed Structure/Behavior Evolution

Hand-designed Behaviors/Structure Learning

Page 69: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 71

ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm

(Object Lifting) Probability distribution comparison for uniform fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.

-300 -200 -100 0 100 200 3000

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 1

Meme (M) No Meme (N) Fixed Str. (F)

0 50 100 150 200 250 300 3500

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 5

100 150 200 250 300 3500

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 20

100 150 200 250 300 3500

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 50

F N M N M

F M N

N M

N

M

F

N M

N

M

M N

F

Page 70: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 72

ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm

0 5 10 15 20 25 30 35 40 45 50-200

-150

-100

-50

0

50

100

150

200

250

300

Generations

Fitn

ess

and

Life

time

Fitn

ess

Structure Learning - with Meme Pool

Structure Learning - No Meme Pool

Hand-designed Behaviors/Structure Learning

Hand-designed Structure/Behavior Evolution

(Object Lifting) Averaged last five episodes and lifetime fitness comparison for value-based fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is higher.

Page 71: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 73

ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm

Figure 13. (Object Lifting) Probability distribution comparison for value-based fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.

-400 -300 -200 -100 0 100 200 3000

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 1

Meme (M) No Meme (N) Fixed Str. (F)

-400 -300 -200 -100 0 100 200 3000

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 5

0 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 20

0 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1

Fitness

Pro

babi

lity

Generation 50

F

M

N

M N

F

N

M

F

M

N

F

N

M

Page 72: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 74

Other Topics

Probabilistic Analysis of PPSSAChange in the excitation probability

Change in the controlling probability of each layer.

Some estimate of learning timeThe effect of reinforcement signal

uncertainty onValue functionPolicy of the agent

Page 73: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 75

Conclusions

Behavior-based System Design

Learning Evolution

Structure (hierarchy) learning

Behavior learningCo-evolution of

behaviorsHybridization of

Evolution and Learning

Memetic Algorithm

Page 74: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 76

Contributions

Deep and mathematical investigation of behavior-based systems

Tackling the design process from different approaches Learning Evolution

Culture-based methods

Structure learning is quite new in hierarchical reinforcement learning

Page 75: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 77

Suggestions for the Future Work

Extending the proposed methods to more complex architectures

Automatic behaviors’ state space extraction Traditional clustering methods are not suitable

Convergence proof in learningAutomatic Abstraction of Knowledge

Simultaneous low-level and high-level decision making

Investigations on the reinforcement signal design

Page 76: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N

University of Tehran - Dept. of ECE 78

Thanks!Thanks!