learning and evolution in hierarchical behavior-based systems amir massoud farahmand advisor: majid...
Post on 20-Dec-2015
221 views
TRANSCRIPT
Learning and Evolution in Hierarchical Behavior-based Systems
Amir massoud Farahmand
Advisor:
Majid Nili Ahmadabadi
Co-advisors:
Caro Lucas – Babak N. Araabi
University of Tehran - Dept. of ECE 2
Motivation
Machines (e.g. robots): from labs. to homes, factories, … .
Machines face: Unknown environment/body
[exact] Model of environment/body is not known
Non-stationary environment/body Changing environment (offices,
houses, streets, and almost everywhere)
Aging Designer may not know how to
benefit from every aspects of her agent/environment
University of Tehran - Dept. of ECE 3
Motivation
Difficulty of the design processMachines see different thingsMachines interact differentlyThe designer is not a machine!
I know what I want!
Our goal: Automatic design of intelligent machines
University of Tehran - Dept. of ECE 4
Research Specification
Goal: Automatic design of intelligent robots
Architecture: Hierarchical behavior-based architectures.
Objective performance measure is available (reinforcement signal) [Agent] Did I perform it correctly?! [Tutor] Yes/No! (or 0.3)
University of Tehran - Dept. of ECE 5
Behavior-based Approach to AI
Behavior-based approach as a successful alternative for classical AI approachNo {Abstraction, Planning, Deduction, … }
Behavioral (activity) decompositionagainst functional decomposition
Behavior: Sensor->Action (Direct link between perception and action)
University of Tehran - Dept. of ECE 6
Behavioral Decomposition
build maps
explore
avoid obstacles
locomote
manipulatethe world
sensors actuators
University of Tehran - Dept. of ECE 7
Behavior-based Design
Robust not sensitive to failure of particular part of the
system no need for precise perception as there is no
modelling thereReactive: Fast response as there is no long route
from perception to action
No explicit representation
University of Tehran - Dept. of ECE 8
?How should we
DESIGNa behavior-based system?!
University of Tehran - Dept. of ECE 9
Behavior-based System Design Methodologies
Hand Design Common in almost everywhere. Complicated: may be even infeasible in complex problems Even if it is possible to find a working system, it is not
optimal probably. Evolution
Good solutions can be found Biologically feasible Time consuming Not fast in making new solutions
Learning Biologically feasible Learning is essential for life-time survival of the agent.
University of Tehran - Dept. of ECE 10
Taxonomy of Design Methods
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 11
Problem FormulationBehaviors
ii
ii
iiiii
ii
iii
SSM
AASS
SssMssS
AA
ASB
:
,
);(
Action No
n1,...,i :
University of Tehran - Dept. of ECE 12
Problem FormulationPurely Parallel Subsumption Architecture (PPSSA)
layer) in the is indicates(that
][ T)()2()1(
thj
mindexindexindex
iBjindex(i):
n m ... B BBT
oidanceObstacleAvtionBallCollecWanderingT
•Different behaviors excites
•Higher behaviors can suppress lower ones.
•Controlling behavior
University of Tehran - Dept. of ECE 13
Problem FormulationReinforcement Signal and the Agent’s Value Function
N
iirN
R1
1
)1( behaviors ofset and structure agent with the
)1( behaviors ofset and structure agent with the1
1
,...,niBTRE
,...,niBTrN
EV
i
i
N
ttT
•This function states the value of using a set of behaviors inan specific structure.•We want to maximize the agent’s value function
University of Tehran - Dept. of ECE 14
Problem FormulationDesign as an Optimization
Structure Learning: Finding the best structure given a set of behaviors using learning
Behavior Learning: Finding the best behaviors given the structure using learning
Concurrent Behavior and Structure Learning
Behavior Evolution: Finding the best behaviors given structure using evolution
Behavior Evolution and Structure Learning
TBT
i VBTi,
** maxarg,
TT
VT maxarg*
TB
i VBi
maxarg*
TBT
i VBTi,
** maxarg,
TB
i VBi
maxarg*
University of Tehran - Dept. of ECE 15
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 16
Learning in Behavior-based Systems
There are a few researches on behavior-based learningMataric, Mahadevan, Maes, and ...
… but there is no deep investigation about it (specially mathematical formulation)!
And most of them incorporate flat architectures.
University of Tehran - Dept. of ECE 17
Learning in Behavior-based Systems
We design: Structure (Hierarchy) Behavior
We Learn:Structure Learning
Organizing behaviors in the architecture using a behavior toolbox
Behavior Learning The correct mapping of each behavior
University of Tehran - Dept. of ECE 18
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 19
Structure Learning
manipulatethe world
build maps
explore
locomote
avoid obstacles
Behavior Toolbox
The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).
University of Tehran - Dept. of ECE 20
Structure Learning
manipulatethe world
build maps
explore
locomote
avoid obstacles
Behavior Toolbox
University of Tehran - Dept. of ECE 21
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles
2-The agent hits a wall!
University of Tehran - Dept. of ECE 22
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.
University of Tehran - Dept. of ECE 23
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox“explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.
University of Tehran - Dept. of ECE 24
Structure LearningChallenging Issues
Representation: How should the agent represent knowledge gathered during learning? Sufficient (Concept space should be covered by Hypothesis
space) Generalization Capability Tractable (small Hypothesis space) Well-defined credit assignment
Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? If the agent receives a reward/punishment, how should we
reward/punish the structure of the agent? Learning: How should the agent update its knowledge
when it receives reinforcement signal?
University of Tehran - Dept. of ECE 25
Structure LearningOvercoming Challenging Issues
Our approach is defining a representation that allows decomposing the agent’s value function to simpler components.
Decomposing the behavior of a multi-agent system to simpler components may enhance our vision to the problem under investigation.
Structure can provide a lot of clues to us.
University of Tehran - Dept. of ECE 26
Structure Learning
Structure Learning
Zero Order Representation First Order Representation
The value of each behavior in each layer
The value of order (higher/lower)of behaviors in the structure
University of Tehran - Dept. of ECE 27
Structure Learning Zero Order Representation
avoid obstacles(0.8)
avoid obstacles(0.6)
explore(0.7)
explore(0.9)
locomote(0.4)Higher layer
Lower layer
ZO Value Table in the agent’s mind
locomote(0.4)
University of Tehran - Dept. of ECE 28
Structure LearningZero Order Representation - Value Function Decomposition
g)controllin is (gcontrollin is |1
...
g)controllin is (gcontrollin is |1
g)controllin is (gcontrollin is |1
g"controllin is "...g"controllin is "1
g"controllin is "...g"controllin is "g"controllin is "1
1
22
11
111
121
1
mmt
t
t
N
tmt
N
tt
N
tmt
N
ttT
LPLrN
E
LPLrN
E
LPLrN
E
LrELrN
E
LLLrN
E
rN
EREV
University of Tehran - Dept. of ECE 29
Structure LearningZero Order Representation - Value Function Decomposition
miVLBP
LBrN
ELBPLrN
E
n
jijij
n
jijtijit
,...,1 |
in behavior gcontrollin theis 1
|g]controllin is |1
[
1
1
m
i
n
jiijijT LPVLBPV
1 1
gcontrollin is |
Agent’s value function
ZO components
Layer’s value
University of Tehran - Dept. of ECE 30
Structure LearningZero Order Representation - Value Function Decomposition
m
i
n
jiijij
TT
T
TT
LPVLBPVT
VT
1 1
*
*
gcontrollin is |maxargmaxarg
maxarg
University of Tehran - Dept. of ECE 31
Structure LearningZero Order Representation - Credit Assignment and Value Updating
Controlling behavior is the only responsible behavior for the current reinforcement signal.
gcontrollin is |~
iijijij LPVLBPV
nijijnijnijnij rnLnBVVn
" step at time gcontrollin is "" step at time active is "~
1~
,,,1
University of Tehran - Dept. of ECE 32
Structure LearningFirst Order Representation
University of Tehran - Dept. of ECE 33
Structure LearningFirst Order Representation
m
iiindexkiindex
N
tt
N
ttT BPBr
NEr
NEV
1][
11
g)controllin is (gcontrollin is |1
]1
[
j
T
kjj
T
kj BBB
jkk
BBBj
kN
tt
k
N
tt
k
N
tt
VVB
Br
NE
BrN
E
BrN
E
;
0
;1
1
1
behavior activenext theis
and gcontrollin is 1
active is elsenobody and gcontrollin is 1
gcontrollin is |1
University of Tehran - Dept. of ECE 34
Structure LearningFirst Order Representation
m
ii
i
jjindexiindexiindexT BPVVV
1
1
1)()(0)( g)controllin is (
University of Tehran - Dept. of ECE 35
Structure LearningFirst Order Representation – Credit Assignment
If only one behavior becomes activated, we should update V0(i) . If two or more behaviors become active, we must update V(i>j) for which ‘i’ is the index of the controlling behavior and ‘j’ which is the index of the next active behavior .
University of Tehran - Dept. of ECE 36
A Break!A Break!
University of Tehran - Dept. of ECE 37
Introduction to Experiments
Abstract problemMulti-robot object
lifting problem I will only discuss
this problem now.
A group of robots lifts a bulky object.
University of Tehran - Dept. of ECE 38
ExperimentsStructure Learning
0 5 10 15 20 25 30 35 40 45 50-50
0
50
100
150
Episode
Rew
ard
ZO
FO
Hand-designed structure
Random structure
Comparison of the average gained reward of two different structure learning methods (Zero Order (ZO) and First Order (FO)), hand-designed structure, and random structure for the object lifting problem.
University of Tehran - Dept. of ECE 39
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 40
Behavior Learning
No more behavior repertoire assumptionAll we know
Sensor/Actuator dimensionsReinforcement Signal
University of Tehran - Dept. of ECE 41
Behavior LearningChallenging Issues
How should behaviors cooperative with each other to maximize the performance of the agent?
How should we assign credit to behaviors of the architecture?
How should each behavior update its knowledge?
University of Tehran - Dept. of ECE 42
Behavior Learning
1. B2, B3, and B4 excite
2. B4 takes the control
3. Punishment!!!
?!
University of Tehran - Dept. of ECE 43
Behavior Learning
Augmenting the action space with a pseudo-action named NoAction (NA)
NA does nothing and let lower behaviors take control
1. B2, B3, B4 excite
2. B4 proposed NA
3. B3 proposes an action and takes control
4. Reward!
University of Tehran - Dept. of ECE 44
Behavior Learning
NA lets behaviors to cooperateHow should we force them to
cooperative correctly?!Hierarchical Credit Assignment Problem
Boolean-like algebra for logically expressible multi-agent systems
3121321 AAAAAAA
University of Tehran - Dept. of ECE 45
Behavior Learning
unknown:
unknown:
unknown:
:)(
:
:
*
l
l
u
u
R
B
B
B
NAB
B
Ti
behaviorsupper
excitednot behavior gcontrollin
*
behaviorslower
1)(...)(1:1
NABNABBT
kuuR
University of Tehran - Dept. of ECE 46
Behavior LearningOptimality
*
**
*
*
in excited is
" " ofon contributi by the achieved is Reward)()(
Ss
i
iSsiSsi
dsSsspsBpsR
SsBsREsREr
Internal states of different behaviors excites in different regions
University of Tehran - Dept. of ECE 47
Behavior LearningOptimality
iii
Ss
iiiii
aBsBpsR
dsSsspaBsBpsRasQ
selects in excited is )(
selects in excited is ,
Ss
iii dsSsspNABsBpsRNAsQ selects in excited is ),(
iiiii AaasQNAsQ ),(),(
University of Tehran - Dept. of ECE 48
Behavior LearningValue Updating
) selects and in behavior gcontrollin is (
)(),(,),(1, ,,1
iii
iiikiiiiiikiii
asB
srasasQasasQkk
)select and in excited are s andbehavior gcontrollin is and B;(
)(),(,),(1,
i
T
,,1
NAsBBBB
srNAsNAsQNAsNAsQ
jjijj
jikjjjjkjj kk
For the case of immediate reward
University of Tehran - Dept. of ECE 49
Behavior LearningValue Updating
For the general return case, we should use Monte Carlo estimation.
Bootstrapping method is not applicable.
University of Tehran - Dept. of ECE 50
Concurrent Behavior and Structure Learning
ApplyingBehavior Learning
State-Action MappingsStructure Learning
Hierarchy
University of Tehran - Dept. of ECE 51
ExperimentsBehavior Learning
0 5 10 15 20 25 30 35 40 45 505
10
15
20
25
30
Episodes
Ave
rage
Gai
ned
Rew
ard
Str. Learning Beh./Str. LearningBeh. Learning
Reward comparison between structure learning, behavior learning, and concurrent behavior/structure learning methods for the object lifting task.
University of Tehran - Dept. of ECE 52
ExperimentsBehavior Learning
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Average Gained Reward
Pro
babi
lity
Random Hand-designed
Str.Learning
Beh./Str.Learning
Beh. Learning
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Average Gained Reward
Pro
babi
lity
Random
Beh./Str.Learning
Hand-designed
Beh. Learning
Str. Learning
Learning phase Testing phase
University of Tehran - Dept. of ECE 53
ExperimentsBehavior Learning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 120
22
24
26
28
30
32
34
Percentile of the superior results
Ave
rage
Gai
ned
Rew
ard
Hand-designed
Str. Learning
Beh. Learning
Beh./Str. Learning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122
24
26
28
30
32
34
Percentile of the superior results
Ave
rage
Gai
ned
Rew
ard
Beh./Str. Learning
Beh. Learning
Str. Learning
Hand-designed
Learning phase Testing phase
University of Tehran - Dept. of ECE 54
ExperimentsBehavior Learning
A sample trajectory showing the position of robot-object contact points, the tilt angle of the object during object lifting, and controlling behavior of robots in each time steps after sufficient structure/behavior learning. Behaviors correspondence with numbers of lowest diagram is as follows: 0 (No Behavior), 1 (Push More), 2 (Don’t Go Fast), 3 (Stop), 4 (Hurry up), 5 (Slow down).
0 0.5 1 1.52
2.5
3
3.5
Time (sec)
Hei
ght
0 0.5 1 1.50
10
20
Time (sec)
Tilt
Ang
le
0 0.5 1 1.50
12
34
5
Time (sec)Con
trol
ling
Beh
avio
rs
robot 1
robot 2
robot 3
University of Tehran - Dept. of ECE 55
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 56
Behavior Co-evolutionMotivations
+ Learning can trap in local
maxima of objective function Learning is sensitive
(POMDP, non-Markov, …) Evolutionary methods have
more chance to find the global maximum of the objective function
Objective function may not be well-defined in robotics
- Evolutionary robotics’
methods are usually slow Fast changes of the
environment Non-modular controllers
Monolithic No reusability
University of Tehran - Dept. of ECE 57
Behavior Co-evolutionMotivations
Use evolution to search the difficult and big part of parameters’ space Behaviors’ parameters space is usually the bigger one
Use learning to do fast responses Structure’s parameters space is usually the smaller
one A change is the structure results in different agent’s
behavior
Evolve behaviors separately (modularity and re-usability)
University of Tehran - Dept. of ECE 58
Behavior Co-evolution
Agent
Behavior Pool 1
Behavior Pool 2
Behavior Pool n
Evolve each kind of behavior in its own genetic pool
University of Tehran - Dept. of ECE 59
Behavior Co-evolutionFitness Sharing
Fitness of the agent Fitness of each behavior?!
Fitness SharingUniformValue-based
University of Tehran - Dept. of ECE 62
Behavior Co-evolution
Each behavior’s genetic pool SelectionGenetic Operators
CrossoverMutation
Hard Replacement
Soft Perturbation
oldoldnew ki
ji
ji BXXBB
University of Tehran - Dept. of ECE 63
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 64
Memetic Algorithm
We waste learned knowledge after each agent’s lifetime
Meme as a unit of information that reproduces itself as people exchange idea
Traditional memetic algorithms: Evolutionary Method: Meme exchange Local Search: Meme refinement
May be called as Hybrid Evolutionary Algorithm
University of Tehran - Dept. of ECE 65
Memetic Algorithm
Two different interpretations of meme:Current hybridization of behavior co-
evolution and structure learningSimilar to traditional MADifference with traditional MA: different
parameters spaces are being searchedMeme as a cultural bias
University of Tehran - Dept. of ECE 66
Memetic Algorithm
Experienced individuals store their experiences in the form of meme in the culture.
Newborn individuals get a new meme from the culture.
Structure as a meme
University of Tehran - Dept. of ECE 67
Memetic Algorithm
Agent
Behavior Pool 1
Behavior Pool 2
Behavior Pool n
Meme Pool(Culture)
University of Tehran - Dept. of ECE 68
Memetic Algorithm
Each meme has its own value
Value of the meme is updated using the fitness of the agent
Valuable memes have more chance to be selected for newborn individuals
iTi fT ,: *M
iiTTTT TBAAfffiniini
,: 11
University of Tehran - Dept. of ECE 69
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Averaged last five episodes fitness comparison for different design methods: 1) evolution of behaviors (uniform fitness sharing) and learning structure (blue), 2) evolution of behaviors (valued-based fitness sharing) and learning structure (black), 3) hand-designed behaviors with learning structure (green), and 4) hand-designed behaviors and structure (red). Dotted line across the hand-designed cases (3 and 4) show one standard deviation region across the mean performance.
0 5 10 15 20 25 30 35 40 45 50-150
-100
-50
0
50
100
150
200
250
300
350
Generations
Fitn
ess
Structure Learning - Value-based Fitness Sharing
Structure Learning - Uniform Fitness Sharing
Hand-designed Behaviors and Structure
Hand-designed Behavior/Learning Structure
University of Tehran - Dept. of ECE 70
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Averaged last five episodes and lifetime fitness comparison for uniform fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is much higher.
0 5 10 15 20 25 30 35 40 45 50-200
-150
-100
-50
0
50
100
150
200
250
300
Generations
Fitn
ess
and
Life
time
Fitn
ess
Structure Learning - No Meme Pool
Structure Learning - with Meme Pool
Hand-designed Structure/Behavior Evolution
Hand-designed Behaviors/Structure Learning
University of Tehran - Dept. of ECE 71
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Probability distribution comparison for uniform fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.
-300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 1
Meme (M) No Meme (N) Fixed Str. (F)
0 50 100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 5
100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 20
100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 50
F N M N M
F M N
N M
N
M
F
N M
N
M
M N
F
University of Tehran - Dept. of ECE 72
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
0 5 10 15 20 25 30 35 40 45 50-200
-150
-100
-50
0
50
100
150
200
250
300
Generations
Fitn
ess
and
Life
time
Fitn
ess
Structure Learning - with Meme Pool
Structure Learning - No Meme Pool
Hand-designed Behaviors/Structure Learning
Hand-designed Structure/Behavior Evolution
(Object Lifting) Averaged last five episodes and lifetime fitness comparison for value-based fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is higher.
University of Tehran - Dept. of ECE 73
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
Figure 13. (Object Lifting) Probability distribution comparison for value-based fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.
-400 -300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 1
Meme (M) No Meme (N) Fixed Str. (F)
-400 -300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 5
0 50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 20
0 50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 50
F
M
N
M N
F
N
M
F
M
N
F
N
M
University of Tehran - Dept. of ECE 74
Other Topics
Probabilistic Analysis of PPSSAChange in the excitation probability
Change in the controlling probability of each layer.
Some estimate of learning timeThe effect of reinforcement signal
uncertainty onValue functionPolicy of the agent
University of Tehran - Dept. of ECE 75
Conclusions
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
University of Tehran - Dept. of ECE 76
Contributions
Deep and mathematical investigation of behavior-based systems
Tackling the design process from different approaches Learning Evolution
Culture-based methods
Structure learning is quite new in hierarchical reinforcement learning
University of Tehran - Dept. of ECE 77
Suggestions for the Future Work
Extending the proposed methods to more complex architectures
Automatic behaviors’ state space extraction Traditional clustering methods are not suitable
Convergence proof in learningAutomatic Abstraction of Knowledge
Simultaneous low-level and high-level decision making
Investigations on the reinforcement signal design
University of Tehran - Dept. of ECE 78
Thanks!Thanks!