genetic algorithms -...
TRANSCRIPT
EE682 Intelligent Control Theory Prof J H KimEE682 Intelligent Control Theory, Prof. J.-H. Kim
Lecture 19
Genetic Algorithms - I
What are they?yHow do they work?Why do they work?Why do they work?
Robot Intelligence Technology Lab.
Genetic Algorithms
Evolutionary Computation: Evolutionary Computation: GA, EP, ES, GP, etc
GAs What are they? What are they? How do they work? Why do they work?
2
Derivative-Free Optimization
Characteristics: Derivative freeness: instead, repeated evaluations of the objective function
Intuitive guideline: evolution and thermodynamics.
Slowness: generally slower than derivative-based optimization
Flexibility: complex objective ftn. w/o sacrificing coding/computation time
Randomness: random number generator in deciding next search direction
Analytic opacity: because of randomness and problem-specific nature
Iterative nature: terminal condition Computation time Optimization goal: (the best objective ftn, fk ) < (a certain preset goal value) Minimal improvement: ( fk - fk -1 ) < (a preset value) Minimal relative improvement: (( fk - fk -1 ) / fk -1 ) < (a preset value)
3
Minimal relative improvement: (( fk fk 1 ) / fk 1 ) (a preset value)
Genetic Algorithms
Derivative-freeness Parallel-search procedure: can be implemented on parallel-processing
machine for massively speeding upA li bl t b th ti d di t ( bi t i l) ti i ti Applicable to both continuous and discrete (combinatorial) optimization
Stochastic: less likely to get trapped in local minima Flexibility: both structure and parameter identification in complex modelsy p p
Terminology Chromosome: a binary bit string Population (gene pool)
A t f h A set of chromosomes Population-based optimization
Generation
4
GAs: What are they?
What are they?St h ti l ith b d t l h Stochastic algorithm based on natural phenomena:
genetic inheritance and Darwinian strife for survival
C. Darwin’s principle of natural selection
G Mendel’s basic principles of hereditary
5
G. Mendel s basic principles of hereditary
GAs: What are they?
Balancing between exploitation and exploration: Exploiting the best solution and exploring the search space
Aim to complex problems: Large-scale combinatorial optimization problem and Large-scale combinatorial optimization problem and
highly constrained engineering problems
Belong to the class of probabilistic algorithms: Directed and stochastic search
Maintain a population of potential solutions P f l i di i l h Perform a multi-directional search
Cf. Hill-climbing method and Simulated annealing technique A single point in the search spaceg p p
Crossover: information exchange between different potential solutions Mutation: introduce some extra variability into the population
6
GAs: What are they?
Five components:p
A genetic representation
A way to create an initial population
An evaluation function considered as an environment
Genetic operators
V l f i Values for various parameters
7
GAs: What are they?
Encoding scheme:
(11,6,9) 1011 0110 1001 - binary code
1110 0101 1101 - gray code
Fitness evaluation: f (x) or ranking
g y
f ( ) g
Selection: Determine which parents participate in producing offspring
for the next generationSelection probability of the i th chromosome: Selection probability of the i-th chromosome:
n
kkii ffp
1
8
k 1
GAs: What are they?
Crossover operator Crossover rate Crossover rate Generate new chromosomes
100|11110 100|10010One-point crossover101|10010
1|0011|1101|0110|010
101|11110
1|0110|1101|0011|010
One point crossover
Two-point crossover
Mutation operatorG h
1|0110|010 1|0011|010
Generate new chromosomes Flip a bit with a probability (mutation rate)
10011110 10011010
ElitismK b t b
10011110 10011010Mutated bit
9
Keep best members
GAs: How do they work?
How do they work? Maximize a function of k variables
E h i bl i d d bi t i f l th Each variable is coded as a binary string of length Suppose six decimal places for the variables’ values is desirable
ix im
Construction of a roulette wheel: for the selection process Calculate the fitness value eval (vi ) for each chromosome vi
10
(i =1, …, pop_size)
GAs: How do they work?
Find the total fitness of the population
Calculate the probability of a selection p for each chromosome v Calculate the probability of a selection pi for each chromosome vi
Calculate a cumulative probability qi for each chromosome vi
Selection: spinning the roulette wheel pop_size times
Generate a random (float) number r from the range [0, 1]
If r < q1, then select the first chromosome (v1); otherwise select th i th h h th t)2( ii
11
the i-th chromosome such that . )_2( sizepopivi ii qrq 1
GAs: How do they work?
Crossover: crossover probability pc , expected #:G d b f h [0 1]
sizepoppc _
Generate a random number r from the range [0, 1] If r < pc , select given chromosome for crossover Mate selected chromosome randomly: for each pair of coupledMate selected chromosome randomly: for each pair of coupled
chromosomes we generate a random integer number pos from the range [1, m-1], where m is the total length, i.e. total number of bits in a chromosomein a chromosome
Two chromosomes and are replaced by a pair of their offspring:
and
Mutation: mutation probability pm, p y pm,Expected # mutated bits:
Generate a random number r from the range [0 , 1]sizepopmpm _
12
If r < pm , mutate the bit
GAs: How do they work?
Ex) The problem:
where .A th i i i 4 d i l l f h i blAssume the precision is 4 decimal places for each variable
: This range is divided into equal size ranges. Since , 18 bits are required to represent 1817 2100001.152 q pa variable x1.
: This range is divided into equal size ranges.Since 15 bits are required to represent a variable xSince , 15 bits are required to represent a variable x2.
String (000...0) corresponds to a
13
String (000...0) corresponds to a String (111...1) corresponds to b
GAs: How do they work?
String corresponds to:
Let us consider a string of 18+15 = 33 bits:(010001001011010000111110010100010) The first 18 bits, 010001001011010000, represent
x
The next 15 bits, 111110010100010, represent
1x
The next 15 bits, 111110010100010, represent
So the chromosome (010001001011010000111110010100010)corresponds to
The fitness value for this chromosome is
14
GAs: How do they work?
To optimize the function f using ti l ith ta genetic algorithm, we create
a population of pop_size = 20 chromosomes.
All 33 bits in all chromosomes are initialized randomly.
15
GAs: How do they work?
Evaluation:
16
GAs: How do they work?
Construction of a roulette wheel: Total fitness:
Probability of a selection for each chromosome
17
GAs: How do they work?
Roulette wheel: for the cumulative probabilities .1q
Generated random numbers:
18
GAs: How do they work?
New population:
19
GAs: How do they work?
Crossover: assume 0.25 cp
For each chromosome in the new population, generate a random p p , gnumber r:
20
GAs: How do they work?
After crossover
21
GAs: How do they work?
Mutation: assume 0.01 mp
Expected # mutated bits per generated = =p p g= 6.6 bits. For every bit in the population, generate a random number, if r < 0.01, mutate the bit.
22
GAs: How do they work?
After mutations:
23
GAs: How do they work?
Evaluation of new population:
24
GAs: How do they work?
Note that the total fitness of the new population F (just after one generation) is 447.049688, much higher than total fitness of the previous population, 387.776822.
Also, the best chromosome now has a better evaluation (33 351874) than the best chromosome from the(33.351874) than the best chromosome from the previous population (30.060205).
25
GAs: How do they work?
After 1000 generations...
26
GAs: How do they work?
Evaluation
Remark Remark Most values are over 30. Population starts to converge. However, in generation 396
the best chromosome had value of 38.827553. What happened?
27
pp
GAs: How do they work?
Classical Genetic Algorithms
Fixed-length binary strings
Two operators for crossover and mutation, respectivelywo ope a o s o c ossove a d u a o , espec ve y
Though nicely theorized, failed in many areas
Neatness inabilit to deal ith nontri ial constraints Neatness inability to deal with nontrivial constraints
Hard to implementation of constraints
Too domain independent to be useful in many applications
28
GAs: How do they work?
Hybrid GA by Davis: GA + Current algorithm Use the current encoding
Hybridize where possible: Incorporate the positive features of the current algorithm
Adapt the genetic operators: Create crossover and mutation operators for the new type of encoding,
incorporate domain-based heuristics
Si il t E l ti P (Z Mi h l i ) t th Similar to Evolution Programs (Z. Michalewicz) except the assumption of the existence of one or more current (traditional) algorithm available on the problem domainalgorithm available on the problem domain
29
GAs: Why do they work?
A schema: A template allowing exploration of similarities among chromosomes
Built by introducing a don’t care symbol (★) into the alphabet of genes
A schema:
S = (★★ 1 1 1 ★★★★★)
The schema S matches 27 strings:(0 0 1 1 1 0 0 0 0 0)(0 0 1 1 1 0 0 0 0 1)(0 0 1 1 1 0 0 0 0 1)(0 0 1 1 1 0 0 0 1 0)(0 0 1 1 1 0 0 0 1 1)
...
(1 1 1 1 1 1 1 1 1 0)(1 1 1 1 1 1 1 1 1 1)
30
(1 1 1 1 1 1 1 1 1 1)
GAs: Why do they work?
Strings and schemata: each string of the length m is matched by 2m schemataby 2 schemata.
Ex) Let us consider a string (1001110001). This string is matched by the following 210 schemata:by the following 2 schemata:
31
Order of a schema
The order of the schema S (denoted by o(S)) is
the number of 0 and 1 positions, i.e., fixed positions
(non-don’t care positions) present in the schema(non don t care positions), present in the schema.
Ex) The following schema of length 10
S (★★★ 0 0 1★ 1 1 0)S = (★★★ 0 0 1★ 1 1 0),
has the order o(S) = 6.( )
32
Length of a schema
The defining length of the schema S (denoted by (S)) is the distance between the first and the last fixed string positions
the distance between the first and the last fixed string positions. It defines the compactness of information contained in a schema.
I th i E 6410)(SIn the previous Ex, .
Note that the schema with a single fixed position has a defining
6410)( S
length of zero.
Ex) S = (★★★ 0 0 1★ 1 1 0 )Ex) S1= (★★★ 0 0 1★ 1 1 0 )
S2= (★★★★ 0 0★★ 0★ )S3= ( 1 1 1 0 1★★ 0 0 1 )3
and ,459)(S,6410)(S8)( and ,3)( ,6)(
21
321
SoSoSo
33
9110)(S)()(
3
21
Fitness of a schema
(S, t): the number of strings in a population at time t, matched by a schema S.
eval(S, t): fitness of a schema S at time t - Defined as the average fitness of all strings in the population
matched by the schema S.
Assume there are p strings in the population },,{ 1 ipi vv
matched by a schema S at the time t.
Then,
./)(),(1
pvevaltSeval p
j ij
34
Schema and selection
After the selection step, we expect to have (S, t +1) stringst h d b h S Si
matched by schema S. Since
(1) for an average string matched by a schema S,the probability of its selection (in a single stringselection) is equal to eval(S, t)/F(t),
(2) the number of strings matched by a schema S is (S, t), and
(3) the number of single string selections is pop size
(3) the number of single string selections is pop size,
it is clear that ),(/),(),()1,( tFtSevalsizepoptStS
where F(t) is the total fitness of the whole populationat time t.
),(),(_),(),( p p
35
Reproductive schema growth equation
/)()(
populationtheoffitnessaverage thegConsiderin
itFtF
)(/)()()1(
formula above therewritecan we,_/)()(
tFtSevaltStS
sizepoptFtF
receivesschemaaverage" below"a,generationnext in the stringsofnumber increasingan receives schema average" above"an that means This
.)(/),(),()1,( tFtSevaltStS
If we assume that a schema S remains above average by then
level. same on the stays schema averagean and strings, ofnumber decreasingggg
If we assume that a schema S remains above average by , then
)(/))()((
and ,)1)(0,()1,(
tFtFtSeval
StS t
schema. average belowfor 0 and schemata, average abovefor 0 where
,)(/))(),((
tFtFtSeval
36
Example
Assume pop size = 20, the length of a string (of a schema Template) m = 33,and the population consists of the following strings:and the population consists of the following strings:
37
Example
For a given schema,S0= (★★★★111★★★★★★★★★★★★★★★★★★★★★★★★★★)
is, that ,3),( 1615130 v,v,vtS
3/)867227230602053031670227()(257)( :length defining The 3 :schema theoforder The
0
00
tSlS
), o(S S
20/776822387)()(
081378.27 3/)867227.23060205.30316702.27(),(
20
0
/pop sizevevaltF
tSeval
396751.1)(/),(
388841.19
20/776822.387)()(
0
1
tFtSeval
/pop_sizevevaltFi i
strings.such 85.51.3967513 ),2(at
,by matched strings 4.191.3967513 ),1(At 396751.1)(/),(
20
0
t
SttFtSeval
38
Example
New population:
. and :strings 5 matches )1( at time schema theIndeed, 2019181170 'v,'v,'v,'v,'vtS
39
Effect of crossover
A string, (1110111100) is matched by 210 schemata; in particular it is matched by S1 = (★★★★ 1 1 1★★★),y 1 ( ),
S2 = (1 1★★★★★★ 0 0).After the crossover with (1010111101), none of the offspring matches S2.
In general, a crossover site is selected uniformly among m-1 possible sites.
This implies that the probability of destruction of a schema S isThis implies that the probability of destruction of a schema S is
and consequently the probability of schema survival is
,1)()(
mSSpd
and consequently, the probability of schema survival is
Th b i li di i f h f h if
.1)(1)(
mSSps
The above inequality condition comes from the fact that even ifa crossover site is selected between fixed positions in a schema,there is still a chance for the schema to survive
40
there is still a chance for the schema to survive.
Combined effect: selection and crossover
The combined effect of selection and crossover gives us a new form of the reproductive schema growth equation:
.1)(-1 )(/),(),()1,(
m
SptFtSevaltStS c
Since the probability of the alteration of a single bit is pm,Effect of mutation
p y g pm,the probability of a single bit survival is 1- pm.A single mutation is independent from other mutations,
th b bilit f h S i i t tiso the probability of a schema S surviving a mutation(i.e., sequence of one-bit mutations) is
)1()( )(SopSp
)(1)(by edapproximat becan y propabilit this,1 Since
.)1()( )(
m
ms
pSoSpp
pSp
41
.)(1)( ms pSoSp
Combined effect of selection, crossover, and mutation
The combined effect of selection, crossover, and mutation gives us a new form of the reproductive schema growth equation:us a new form of the reproductive schema growth equation:
.)(1)(1 )(/),(),()1,(
mc pSom
SptFtSevaltStS
Schema Theorem:Short, low-order, above-average schemata receive exponentially increasing trials in subsequent generations of a genetic algorithmincreasing trials in subsequent generations of a genetic algorithm.
Building Block Hypothesis:A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks.“Just as a child creates magnificent fortresses through the arrangement ofJust as a child creates magnificent fortresses through the arrangement of simple blocks of wood, so does a GA seek near optimal performance through the juxtaposition of short, low-order, high performance schemata.”
42
Implicit Parallelism
Holland showed that at least pop_ size of schemata are processed usefully he called this property an implicit parallelism as it isusefully - he called this property an implicit parallelism, as it is obtained without any extra memory/processing requirements. It is interesting to note that in a population of pop size strings g p p p p_ gthere are many more than pop_size schemata represented. This constitutes possibly the only known example of a combinatorial explosion working to our advantage instead of our disadvantage.
The building block hypothesis is just an article of faith, which g yp j ,for some problems is easily violated – deception.
A phenomenon of deception is strongly connected with the p p g yconcept of epistasis, which means strong interaction among genes in a chromosome.
43
Deception
S1= ( 1 1 1★★★★★★★★ )
★★★★★★★★★S2= (★★★★★★★★★ 1 1 )Their combination, S3= ( 1 1 1★★★★★★ 1 1 ) is much less fit than
S ( 0 0 0★★★★★★ 0 0 )S4= ( 0 0 0 ★★★★★★ 0 0 ).Optimal solution (S3 matches it): S0= ( 1 1 1 1 1 1 1 1 1 1 1 )
D i b f diffi l i i i S i i d Deception because of difficulties in converging to S0 , since it tends to converge to S4.
Three approaches were proposed to deal with deception Three approaches were proposed to deal with deception.1. Prior knowledge of the objective function to code it in an appropriate
way.h hi d i l i i hi i2. Use the third operator, inversion: it selects two points within a string
and inverts the order of bits between selected points, but remembering the bit’s ‘meaning.’
44
3. Use a messy genetic algorithm (mGA).