tutorial.floreano
TRANSCRIPT
A TUTORIAL
Stefano NolfiNeural Systems & Artificial LifeNational Research CouncilRoma, [email protected]
Dario FloreanoMicroengineering Dept.Swiss Federal Institute of TechnologyLausanne, [email protected]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
The method
fitness function
� � � �
ivV
�∋� )
11
t e s t
t e s t
t e s t
G e n . 0
s e l e c tr e p r o d u c ea n d m u t a t e
G e n . 1
s e l e c t
G e n . n
… … … . .
r e p r o d u c ea n d m u t a t e
genotype-to-phenotypemapping
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Behavior-Based Robotics & ER
locomote
avoid hitting things
explore
manipulate the world
build maps
sensors actuators
behavior-based robotics
[Brooks, 1986]
evolutionary robotics
?
?
?
?
?
sensors actuators
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Learning Robotics & ER
sensors
motors
desired output orteaching signal
[Kodjabachian & Meyer, 1999]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Artificial Life & ER
[Menczer and Belew, 1997] [Floreano and Mondada 1994]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
How to Evolve Robots
evolution on the real world
[Floreano and Nolfi, 1998]
evolution on simulation
+ test on the real robot
[Nolfi, Floreano, Miglino, Mondada 1994]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolution in the Real World
mechanical robustness
energy supply
analysis
[© K-Team SA] [© K-Team SA] [© K-Team SA]
[Floreano and Mondada, 1994]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolution in Simulation
Different physical sensors and actuators may perform differently because of slight differences in their electronics or mechanics.
Physical sensors deliver uncertain values and commands to actuators have uncertain effects.
The body of the robot and the environment should be accurately reproduced in the simulation.
0
20
32020
80
0
256
512
768
1024
y
z
x
0
20
140200
260
0
256
512
768
1024
y
z
x
4th IF sensor 8th IF sensor
[Nolfi, Floreano, Miglino and Mondada 1994; Miglino, Lund, Nolfi, 1995]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Designing the Fitness Function
FEE functions that describe how the controller should work (functional), rate the system on the basis of several variables and constraints (explicit), and employ precise external measuring devices (external) are appropriate to optimize a set of parameters for complex but well defined control problem in a well-controlled environment.
BII functions that rate only the behavioral outcome of an evolutionary controller (behavioral), rely on few variables and constraints (implicit) that be computed on-board (internal) are suitable for developing adaptive robots capable of autonomous operation in partially unknown and unpredictable environments without human intervention.
[Floreano et al, 2000]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Genetic Encoding
Evolvability
Expressive power
Compactess
Simplicity[Gruau, 1994, Nolfi and Floreano 2000]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Adaptation is more Powerful than Decomposition and Integration
The main strategy followed to develop mobile robots has been that of Divide and Conquer:
1) divide the problem into a list of hopefully simpler sub-problems
2) build a set of modules or layers able to solve each sub-problem
3) integrate the modules so to solve the whole problem
Unfortunately, it is not clear how a desired behavior should be broken down
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Proximal and Distal Descriptions of Behaviors
motorspace
sensoryspace
environment
discriminate
approach
avoid
explore
distaldescription
proximaldescription
[Nolfi, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Discrimination Task (1)
explore
avoid
approach
discriminate
sensors actuators
decomposition and integration
walls and cylinders
small and large cylinders
[Nolfi, 1996,1999]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Discrimination Task (2)
g e n o t y p e p h e n o t y p e explore
avoid
approach
discriminate
sensors actuators
[Nolfi, 1996]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Discrimination Task (3)
[Scheier, Pfeifer, and Kuniyoshi, 1998]
Evolved robots act so to select sensory patterns that are easy to discriminate
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
The Importance of Self-organization
Operating a decomposition at the level of the distal description of behavior does not necessarily simplify the challenge
By allowing individuals to self-organise, artificial evolution tends to find simple solutions that exploit the interaction between the robot and the environment and between the different internal mechanism of the control system.
[Nolfi, 1996,1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Modularity and Behaviors
Garbage CollectingBehavior
Human Design
Explore
Discriminate
Displace in front
Pick-up
Release
COORDINATE
Garbage CollectingBehavior
Adaptation
?
?
?
…….
?
?
Is modularity useful in ER ?
What is the relation between self-organized neural modules and behaviors ?
[Nolfi, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
The Garbage Collecting Task (1)
SelectorNeurons
OutputNeurons
IR-Sensors LB-Sensor
left m. right m. pick-up release
m o t o r s ( b )m o t o r s ( a )m o t o r s
m o t o r s
I R - s e n s o r s & B L - s e n s o r I R - s e n s o r s & B L - s e n s o r
I R - s e n s o r s & B L - s e n s o r I R - s e n s o r s & B L - s e n s o r
A B
C D
[Nolfi, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
The Garbage Collecting Task (2)
There is not a correspondence between self-organized neural modules and sub-behaviors
Modular neural controller able to self-organize outperform other architectures
0
3
6
9
1 2
0 2 5 0 5 0 0 7 5 0 1 0 0 0
g e n e r a t i o n s
succ
essf
ul e
poch
s
[Nolfi, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolving “complex” behaviors
Bootstrap problem: selecting individuals directly for their ability to solve a task only works for simple tasks
Incremental Evolution: starting with a simplified version of the task and then progressively increasing complexity
Including in the selection criterion also a reward for sub-components of the desired behavior
Start with a simplified version of the task and then progressively increase its complexity by modifying the selection criterion
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Visually-Guided Robots
[Cliff et al. 1993; Harvey et al. 1994]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Learning & Evolution: Interactions• Different time scales, different mechanisms, similar effects
• Learning Advantages in Evolution [Nolfi & Floreano, 1999]:– Adapt to changes that occur faster than a generation– Extract information that might channel the course of evolution– Help and guide evolution– Reduce genetic complexity and increase population diversity
• Learning Costs in Evolution [Mayley, 1997]:– Delay in the ability to achieve fit behaviors– Increased unreliability (learning wrong things)– Physical damages, energy waste, tutoring
• Baldwin effect [Baldwin, 1896; Morgan, 1896; Waddington, 1942]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Hinton & Nowlan model [1987]
• Learning samples space in the surrounding of the individual• Fitness landscape is smoothed and evolution becomes faster• Baldwin effect (assimilation of features normally « learnt »)
• Model constraints:– Learning task and evolutionary task are the same– Learning is a random process– Environment is static– Genotype and Phenotype space are correlated
00?11???0111?0?1?0?1
11
?0
?0
Fitness=correct combination of weights
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Different Tasks [Nolfi, Elman, Parisi, 1994]
- Evolving for food- Learning predictions- Learning mechanism=BP
- Increased speed & fitness- Genetic assimilation
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Perspectives on Landscape
Correlated landscapes [Parisi & Nolfi, 1996]
Relearning effectsto compensate mutations [Harvey, 1997]
(it may hold only in few cases)
A C
B1Q
P
B2
A=weights evolved for food findingC=weights trained for predictionB1, B2= new position after mutationFitness=higher when closer to A
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolutionary Reinforcement Learning
• Evolving both action and evaluation connection strengths [Ackley & Littman, 1991]
• Action module modifies weights during lifetime using CRBP
• ERL better better performance than E alone or RL alone
• Baldwin effect• Method validated on
mobile robots [Medeen, 1996]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolutionary Auto-teaching
• All weights genetically encoded, but one half teaches the other half using Delta rule [Nolfi & Parisi, 1991]
• Individuals can live in one of two environments, randomly determined at birth
• Learning individuals adapt strategy to environment and display higher fitness
Learning
No learning
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolution of Learning Mechanisms (1)
• Encoding learning rules, NOT learning weights [Floreano & Mondada, 1994]
• Weights always initialized to random values• Different weights can use different rules within same network• Adaptive method can be applied to node encoding (short genotypes)
1 synapse
synapse sign
synapse strength
Genetically-determined
1 synapse
synapse signlearning rule- hebb- postsynaptic- presynaptic- covariancelearning rate
Adaptive
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Sequential task & unpredictable change
• Faster and better results [Floreano & Urzelai, 2000]
• Automatic decomposition of sequential task
• Synapses continuously change
• Evolved robots adapt online to upredictable change [Urzelai &
Floreano, 2000]:– Illumination– From simulations to robots– Environmental layout– Different robotic platform– Lesions to motor gears
[Eggenberge et al., 1999]
Genetically-determined Adaptive
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Summary• Learning is very useful for robotic evolution:
– accelerates and boosts evolutionary performance– can cope with fast changing environments– can adapt to unpredictable sources of change
• Lamarck evolution (inherit learned properties) may provide short-term gains [Lund, 1999], but it does not display all the advantages listed above [Sasaki & Tokoro, 1997, 1999]
• Distinction between learning and adaptation [Floreano & Urzelai, 2000]: – Adaptation does not necessary develops and capitalize upon
new skills and knowledge– Learning is an incremental process whereby new skills and
knowledge are gradually acquired and integrated
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Competitive Co-evolution
• Fitness of each population depends on fitness of opponent population. Examples:– Predator-prey– Host-parasite
• It may increase adaptive power by producing an evolutionary arms race [Dawkins & Krebs, 1979]
• More complex solutions may incrementally emerge as each population tries to win over the opponent
• It may be a solution to the boostrap problem• Fitness function plays a less important role• Continuously changing fitness landscape may help to
prevent stagnation in local minima [Hillis, 1990]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Pitfalls
The same set of solutionsmay be discovered overand over again. Thiscycling behavior may end up in very simple solutions.Solution: Retain best individuals of last few gens(Hall-of-Fame->all gens).
Whereas in conventional evolution the fitnesslandscape is static and fitness is a monotonicfunction of progress, in competitive co-evolutionthe fitness landscape can be modified by thecompetitor and fitness function is no longer anindicator of progress.Solution: Master Fitness (after evolution test each best against all best), CIAO graphs (test each best against all previous best).
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Examples of Co-evolutionary Agents
Simulated predator-prey[Cliff & Miller, 1997]
Distance-based fitness100s generationsCIAO method et al.Evolution of sensors
Ball-catching agents[Sims, 1994]
Distance-based fitnessRare good results
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Robots
• Energetically autonomous• Predator-prey scenarion• Time-based fitness• Controllers downloaded to increase reaction speed• Retain last best 5 controllers for testing individuals
• Predators=vision+proximity• Prey=proximity+faster• Predator genotype longer• Prey has initial position advantage
Floreano, Nolfi, & Mondada, 1998
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Results
aa
20 40 60 80 100
0.2
0.4
0.6
0.8
1
20 40 60 80 100
0.2
0.4
0.6
0.8
1
generations generations
best predatorfitness
best preyfitness
d
t
t
d
Predators do not attemptto minimize distance
Prey maximize distance
progress best fun
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Increasing Environmental Complexity
36
θ
240
θ
…prevents premature cycling [Nolfi & Floreano, 1999]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Summary• Competitive co-evolution is challenging because:
– Fitness landscape is continuously changing– Hard to monitor progress online– Cycling local minima
• When environment is sufficiently complex, or Hall-of-Fame method is used, the system develops increasing more complex solutions
• It can work and capitalize on very implicit, internal, and behavioral fitness functions by exploring a large range of behaviors triggered by opponents
• When co-evolving adaptive mechanisms, prey resort to random actions whereas predators adapt online to the prey strategy and report better performance [Floreano & Nolfi, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolvable Hardware
• Evolution of electronic circuits http://www.cogs.susx.ac.uk/users/adrianth/EHW_groups.html
• Evolution of body morphologies (including sensors)
• Why evolve hardware?– Hardware choice constrains environmental interactions and
the course of evolution– Evolved solutions can be more efficient than those designed
by humans– Develope new adaptive materials with self-configuration and
self-repair abilities
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolutionary Control Circuits
• Thompson’s unconstrained evolution• Xilinx, family 6000, overwrite global synchronization• Tone reproduction• Robot control• Fitness landscape studies (very rugged, neutral networks)
Evolvable HardwareModule for Kheperahttp://www.aai.ca
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolutionary Control Circuits
• Keymeulen: evolution of vision based controllers• Find ball while avoiding obstacles• Constrained evolution, entirely on physical robot
• De Garis: CAM Brain, composed of tens of Xilinx FPGAs, 6000 family• Growth of neural circuits using CA with evolved rules• Willing to evolve brain for kitten robot. Pitfall: speed limited by sensory- motor loop.
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evolutionary Morphologies
• Evolution of Lego Structures [Funes et al,, 1997]
• Bridges• Cranes
• Extended to objects and robot bodies• see www.demo.cs.brandeis.edu
• Example of evolved crane [Funes et al,, 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Morphologies
Karl Sims, 1994 Komosinski & Ulatowski, 1999http://www.frams.poznan.pl
Effect of doubling sensor range on body/wheel size [Lund et al., 1997]
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Suggestions for Further Research
• Encoding and mapping of control systems• Exploration of alternative building blocks• Integration of growth, learning, and maturation• Incremental and open-ended evolution• Morphology and sensory co-evolution• Application to large-scale circuits• User-directed evolution• Comparison with other adaptive techniques• Further readings:
– Nolfi, S. & Floreano, D. Evolutionary Robotics. The Biology, Technoloy, and Intelligence of Self-Organizing Machines. MIT Press, October 2000
– Husbands, P. & Meyer, J-A. (Eds.) Evolutionary Robotics. Proceedings of the 1st European Workshop, Springer Verlag, 1998
– Gomi, T. (Ed.) Evolutionary Robotics. Volume series: I (1997), II (1998), III (2000), AAI Books.
TUTORIAL Stefano Nolfi & Dario Floreano, 2000
Evorobot Simulator
Sources, binaries, and documentation files freely available at: http://gral.ip.rm.cnr.it/evorobot/simulator.html [Nolfi, 2000]