Evolutionary Computation
Lecture 11
Algorithm Configura4on and Theore4cal Analysis
1
Outline
• Algorithm Configuration
• Theoretical Analysis
2 2
Algorithm Configuration
• Question: If an EA toolbox is available (which is true), which operators/parameters shall I use?
• Aim: To find an appropriate EA instance (configuration of EA) for an optimization task.
3 3
Toolbo
x
Representa4on 1 Representa4on 2 Representa4on 3
Operator 1 Operator 2 Operator 3
Scheme 1 Scheme 2 Scheme 3
EA instance
Operator & Parameter
Scheme & Parameter
Representa4on
Algorithm Configuration
• Configuring an EA involves setting the values of – categorical variables (type of operators/schemes) – numerical variables (e.g., crossover rate, population-size)
• We refer both the categorical and numerical variables as parameter of EAs
• More formally, Configuring an EA is another search problem, for which the best parameter vectors are searched to maximize the performance.
4 4
Algorithm Configuration
• Performance measures may differ in different context: – The best solution quality – The time required to reach a threshold of solution quality
• Due to the stochastic nature of EAs, a performance measure can be viewed as a random variable.
• Multiple runs are usually required to get a reliable estimate of performance.
5 5
Algorithm Configuration
• The rough idea: Generate-and-Test
• Example 1-1: A naïve approach – Generate p candidate parameter vectors – Get p EA instances, each configured with a parameter vector – Run each EA instance for r times – In each run, n solution vectors is generated and tested – Compare the parameter vectors using performance measures
to get the best configuration.
6 6
Algorithm Configuration
• The naïve approach requires calculating the objective function f(x) for prn times. – sounds too costly.
• Smarter approach is needed to reduce search effort. – Reduce p – Reduce r – Reduce n
7 7
Algorithm Configuration
• Racing procedures: An approach for reducing r.
• Basic idea: Stop running (testing) an EA instance if there is indication that it is not promising.
• By not promising, we mean an EA instance performs significantly worse than the best EA instance identified so far.
• EA instances are compared after each run rather than after running all of them for r times.
8 8
Algorithm Configuration
• Example 1-2:
• Many parameter vectors may be removed without running r times. 9 9
1. Generate p candidate parameter vectors
2. Set E as the set of p EA instances
3. REPEAT: • Run each EA instance once and update its (average) performance • Identify the EA instance with the best performance • Remove from E the EA instances whose performance is
significantly worse than the best EA instance
4. Until size(E)=1 or the number of runs reaches r
Algorithm Configuration
• There are a few statistical tests that can be used to indicate significant difference in performance. – Analysis of Variance (ANOVA). – Kruskal-Wallis (Rank Based Analysis of Variance). – Hoeffding’s bound. – Unpaired Student T-Test.
10 10
Algorithm Configuration
• Racing procedure may not be sufficient in case: – p is very large – The set of candidate parameter vectors is infinite
• Basic idea: To apply an iterative search procedure to the parameter space. – Meta-EA: using EAs to search for the best configuration – Iterative local search
• Hopefully, the best configuration can be found with less trial parameter vectors.
11 11
Algorithm Configuration
• Example 1-3:
12 12
1. Generate p’ candidate parameter vectors (p’<p)
2. Get p’ EA instances
3. REPEAT
• Run each EA instance for r times and assess its performance • Generate another p’ parameter vectors
4. Until halting criterion is satisfied
Algorithm Configuration
• Iterative search can be combined with a racing procedure to reduce both p and r .
• Example 1-4:
13 13
1. Generate p’ candidate parameter vectors (p’<p)
2. Get p’ EA instances
3. REPEAT
• Applying Racing procedure to compare performance of EA instances • Generate another p’ parameter vectors
4. Until halting criterion is satisfied
Algorithm Configuration
• Further enhancement: Modeling the landscape of performance measures – For identifying promising parameter vectors – For filtering out unpromising parameter vectors
• The modeling task is a supervised learning problem – Input data: parameter vectors – Target: performance
14 14
Algorithm Configuration
• Example 1-5 (Sequential Parameter Optimization):
15 15
1. Generate p’ candidate parameter vectors (p’<p)
2. Get p’ EA instances
3. Run each EA instance for r times and assess its performance
4. Build a model M to approximate the performance landscape
5. REPEAT
• Generate another p’ parameter vectors • Identify the most promising parameter vectors with M • Only the promising parameter vectors are run for r times • Update M with the newly “assessed” parameter vectors
6. Until halting criterion is satisfied
Algorithm Configuration
• Remark on reducing n – The key question involved: Will the “relative order” of two
EAs maintain unchanged during the evolution?
– The lesson learned: the answer is problem and algorithm dependent, and thus few systematic approach have been investigated for this issue.
16 16
0 1000 2000 3000 4000 50000.2
0.4
0.6
0.8
p
0 1000 2000 3000 4000 50000.4
0.5
0.6
0.7
fp
0 1000 2000 3000 4000 50000.4
0.6
0.8
1
f5
CRm
0 500 1000 15000.4
0.6
0.8
0 500 1000 15000.4
0.6
0.8
0 500 1000 15000
0.5
1
f9
p
fp
CRm
0 500 1000 1500 2000 2500 30000
0.5
1
0 500 1000 1500 2000 2500 30000.5
0.6
0.7
0 500 1000 1500 2000 2500 30000.4
0.6
0.8
1
fcec5
p
fp
CRm
0 500 1000 1500 2000 2500 30000.4
0.6
0.8
0 500 1000 1500 2000 2500 30000.5
0.6
0.7
0 500 1000 1500 2000 2500 30000
0.5
1
fcec9
p
fp
CRm
Fig. 2. The self-adaptation curves of p, fp and CRm for f5, f9, fcec5,and fcec9. On the vertical axes are shown their values (between 0 and 1),while on the horizontal axes are shown the number of generations.
0 1000 2000 3000 4000 500010
!30
10!25
10!20
10!15
10!10
10!5
100
105
1010
f5
SaNSDESaDENSDE
0 500 1000 150010
!8
10!6
10!4
10!2
100
102
104
f9
SaNSDESaDENSDE
0 500 1000 1500 2000 2500 300010
!1
100
101
102
103
104
105
fcec5
SaNSDESaDENSDE
0 500 1000 1500 2000 2500 300010
!20
10!15
10!10
10!5
100
105
fcec9
SaNSDESaDENSDE
Fig. 3. The evolution curves for f5, f9, fcec5 and fcec9. The verticalaxes show the distance to the optimum and the horizontal axes show thenumber of generations.
1114 2008 IEEE Congress on Evolutionary Computation (CEC 2008)
Yes: Blue curve vs. Green Curve No: Blue Curve vs. Red Curve
Algorithm Configuration
• So far, we mainly touched the case of configuring EA on a single optimization task (problem instance)
• What if we want to find a more general configuration that can be
applied to multiple problem instances? • Previously introduced ideas can be adapted to this scenario. • Instead of looking at the results of r runs, comparisons are made
mainly based on the performance on multiple problem instances. – F-RACE – ParamILS
17 17
Algorithm Configuration
• Additional Remarks – Operators/parameters can also be adjusted on-the-fly. This
type of EAs are usually called adaptive/self-adaptive EAs.
– With a unified “representation” of EA’s behaviors, there will huge room for involving machine learning techniques.
– Algorithm configuration is also closely related to the emerging terminology “hyper-heuristic”.
18 18
Outline
• Algorithm Configuration
• Theoretical Analysis
本部分内容取自南京大学俞扬博士IJCAI’2013报告的PPT,特此表示感谢。
19 19
Theoretical Analysis
20 20
intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
summary
Conventional algorithm analysis
ProblemAlgorithm
Measurement
Sorting
Shortest Path
Linear Programming
Quick Sort average time complexity
Dijkstra’s algorithm
O(n log n)
O(|V |2)average time complexity
Simplex
worst case time complexity: exponentialsmoothed complexity: polynomial
Theoretical Analysis
21 21
intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
summary
Time complexity
What about an algorithm sorts (5,4,2,8,9) in 3 steps?
measured in a class of problem instances
measure the growing rate as the problem size increases
e.g. all possible arrays of 5 numbers
e.g.
average complexityworst case complexity
2n2
asymptotical notation O(n2)
Theoretical Analysis
22 22
intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
summary
But for EAs...
ProblemAlgorithm
Measurement
nature phenomena
problem unknownnot designed with knowledge of problems
theoretical understanding is even more important
Theoretical Analysis
23 23
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Markov chain modeling
A general procedure: initialization
population
evaluation & selection
offspring
reproduction
population 0
expand along time:
population 1
population 2
population 3
...
Markov chain:
state 0
state 1
state 2
state 3
...
Theoretical Analysis
24 24
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Markov chain:
state 0
state 1
state 2
state 3
...
P (⇠3 | ⇠2, ⇠1, ⇠0) = P (⇠3 | ⇠2)
⇠0 ⇠1 ⇠2 ⇠3
Markov property
solution space
optimal solutions
S⇤S
population space
optimal populations
X ⇤X = Sm
involves at least one optimal solution
Theoretical Analysis
25 25
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Let ⇠ be a Markov chain. Define
↵
t
=
X
x/2X
⇤
P (⇠
t+1 2 X ⇤ | ⇠t
= x)P (⇠
t
= x)�X
x2X
⇤
P (⇠
t+1 /2 X ⇤ | ⇠t
= x)P (⇠
t
= x).
Then ⇠ converges to X ⇤if and only if ↵ satisfies:
Convergence
Does an EA converge to the global optimal solutions?lim
t!+1P (⇠t 2 X ⇤) = 1
Considered as closed:
Theorem: (discrete version derived from [He & Yu, 01])
gain of optimality in one steploss of optimality in one step
P (⇠0 2 X ⇤) ++1X
t=0
↵t = 1
Theoretical Analysis
26 26
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Convergence
Does an EA converge to the global optimal solutions?lim
t!+1P (⇠t 2 X ⇤) = 1
An EA that1. uses global operators2. preserves the best solutionalways converges to the optimal solutions
But life is limited! How fast does it converge?
Considered as closed:
gain of optimality > 0loss of optimality = 0
Theoretical Analysis
27 27
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Examples in simple cases
ProblemAlgorithm
Measurement
OneMax
(1+1)-EALeadingOnes Expected Running Time(ERT)
Linear Pseudo-Boolean Functions
LongPath
...
Theoretical Analysis
28 28
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Running time analysis
Running time of an EA:the number of solutions evaluated until reaching an optimal solution of the given problem for the first time
the most time consuming stepmay meet many times
Running time analysis:running time with respect to the problem size (e.g. n)the expected running time/ERT
ERT with high probabilitye.g. expected running timeO(n2)
e.g. expected running time with probability at leastO(n lnn) 1� 1
2n
computational complexity
Theoretical Analysis
29 29
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
(1+1)-EA
1: s a randomly drawn solution from X2: for t=1,2,. . . do
3: s0 mutate(s)4: if f(s0) � f(s) then5: s s0
6: end if
7: terminate if meets a stopping criterion
8: end for
A simple EA: (1+1)-EA
An extremely simplified EAmissing some features of real EAs
one-bit mutation
bitwise mutation
randomly choose one bit and change its value
change every bit with some probability (e.g. )
for maximization, allow neutral changes
find an optimal solution
no population
no crossover
1
n
Theoretical Analysis
30 30
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax Problem: count the number of 1 bits
f(x) =nX
i=1
xi
EAs do not have the knowledge of the problems only able to call f (x)no difference with any other functions f : {0, 1}n ! R
not only optimizing the problem, but also guessing the problem
argmax
x2{0,1}n
nX
i=1
x
i
fitness:
Theoretical Analysis
31 31
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
(1+1)-EA with bitwise mutation (flip each bit with probability ):
the probability of flipping i particular bits: (1
n)i(1� 1
n)n�i
1
n
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
i
prob
abili
ty
monotonically decreasing
but always positive
Theoretical Analysis
32 32
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
the solutions with the same number of 1-bits share the same f valuesolutions with
0 1-bitssolutions with
1 1-bitssolutions with
2 1-bitssolutions with
n 1-bits
many transitions
......S0 = S⇤S1 S2 Sm
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
33 33
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
the solutions with the same number of 1-bits share the same f valuesolutions with
0 1-bitssolutions with
1 1-bitssolutions with
2 1-bitssolutions with
n 1-bits
an upper bound: a path visits all subspaces
......S0 = S⇤S1 S2 Sm
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
34 34
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
the solutions with the same number of 1-bits share the same f valuesolutions with
0 1-bitssolutions with
1 1-bitssolutions with
2 1-bitssolutions with
n 1-bits
p = n(1
n)(n� 1
n)n�1
......S0 = S⇤S1 S2 Sm
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
35 35
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
the solutions with the same number of 1-bits share the same f valuesolutions with
0 1-bitssolutions with
1 1-bitssolutions with
2 1-bitssolutions with
n 1-bits
p = n(1
n)(n� 1
n)n�1
p �✓n� 1
1
◆(1
n)(n� 1
n)n�1
......S0 = S⇤S1 S2 Sm
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
36 36
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
the solutions with the same number of 1-bits share the same f valuesolutions with
0 1-bitssolutions with
1 1-bitssolutions with
2 1-bitssolutions with
n 1-bits
p = n(1
n)(n� 1
n)n�1
p �✓n� 1
1
◆(1
n)(n� 1
n)n�1
p �✓n� i
1
◆(1
n)(n� 1
n)n�1
......S0 = S⇤S1 S2 Sm
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
37 37
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
OneMax: f(x) =nX
i=1
xi
probability of transition
expected #steps the transition happens 1
n� i· n · (1 + 1
n� 1)n�1 ⇠ 1
n� i· n · e
p �✓n� i
1
◆(1
n)(n� 1
n)n�1
summed up
ERT upper bound O(n lnn)
n�1X
i=0
en
i= enHn ⇠ en lnn
(1+1)-EA with bitwise mutation (flip each bit with probability ):1
n
Theoretical Analysis
38 38
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
General analysis tools
running time analysis is commonly problem specific
going to derive the ERT of an EA in a problem
“where to look” “what to calculate”
need a guide to tell what to look and what to follow to accomplish the analysis
- Fitness Level Method- Drift Analysis- Convergence-rate Based Method
Theoretical Analysis
39 39
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Fitness Level Method
solution space
partition the solution space into subspaces ...
with increasing fitness
= S⇤
S1
S2
Sm
[Wegener, 02] a population is treated as the best solution in the population
1. initialization probability of being in each subspace ⇡0(Si)
Then calculate:
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
2. bounds of progress probability for :
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
the ERT is then upper bounded by: and lower bounded by:
x 2 Si
Theoretical Analysis
40 40
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Example in OneMax
solutions with 0 1-bit
solutions with 1 1-bit
solutions with 2 1-bits
solutions with n 1-bits
...... = S⇤S1 S2 Sm
partition
progress probability for :x 2 Si
a lower bound: flipping one 0-bit but no 1-bits:✓n� i
1
◆(1
n)(n� 1
n)n�1
⇡0(Si) =
�ni
�
2ninitialization distribution:
ERT:
Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level
sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time
taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is
formally described in Lemma 2.
Lemma 2 (Fitness Level Method [39])
For an elitist EA process ⇠ on a problem f , letS1
, ...,Sm be a<f -partition, let vi P (⇠t+1
2 [mj=i+1
Sj |
⇠t = x) for all x 2 Si, and ui � P (⇠t+1
2 [mj=i+1
Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA
process is at most
X
1im�1
⇡0
(Si) ·m�1X
j=i
1
vj,
and is at least
X
1im�1
⇡0
(Si) ·1
ui.
Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as
the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a
population of solutions, the notation ⇠ will denote that the best solution of the population is in the
solution space S.
Lemma 3 (Refined Fitness Level Method [35, 36])
For an elitist EA process ⇠ with a fitness function f , letS1
, ...,Sm be a<f -partition, let vi minj1
�i,j
P (⇠t+1
2
Sj | ⇠t = x) and ui � maxj1
�i,j
P (⇠t+1
2 Sj | ⇠t = x) for all x 2 Si wherePm
j=i+1
�i,j = 1, and let
�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm
k=j �i,k � �l for all j > i, �u � 1 � vj+1
/vj and
�l � 1� uj+1
/uj for all 1 j m� 2. Then the DCFHT of the process is at most
m�1X
i=1
⇡0
(Si) ·�1
vi+ �u
m�1X
j=i+1
1
vj
�,
and the DCFHT of the process is at least
m�1X
i=1
⇡0
(Si) · (1
ui+ �l
m�1X
j=i+1
1
uj).
The refined fitness level method follows the general idea of the fitness level method, while intro-
duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.
When � is small, the EA has a high probability to jump across many levels and thus make a large
progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take
1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original
fitness level method. Therefore, the original fitness level method is a special case of the refined one.
20
⇡0(S0)m�1X
j=1
1
vj2 O(n lnn)
S3
Theoretical Analysis
41 41
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Drift Analysis [Hajek, 82][Sasaki & Hajek, 88][He & Yao, 01][He & Yao, 04]
distance function V measuring “distance” of a solution to optimal solutions. V(x*)=0
optimal solutions
S⇤
xV(x)
Then calculate:
2. bounds of progress distance for every step:
the ERT is then upper bounded by: and lower bounded by:
1. initialization probability of solutions ⇡0(x)
progress toward the optimum of every step of an EA, which directly derives the number of steps
the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-
trary (possibly depends on the objective function), thus it covers the variants including the original
definition of drift analysis [17] and the multiplicative drift analysis [10].
Definition 12 (Distance Function)
For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and
V (x) > 0 for all x 2 X � X ⇤ is called a distance function.
Lemma 8 (Drift Analysis)
For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that
8t : cl E[[V (⇠t)� V (⇠t+1
) | ⇠t]],
we have E[[⌧ | ⇠0
⇠ ⇡0
]] P
x2X ⇡0
(x)V (x)/cl;
and if there exists a non-negative value cu such that
8t : cu � E[[V (⇠t)� V (⇠t+1
) | ⇠t]],
we have E[[⌧ | ⇠0
⇠ ⇡0
]] �P
x2X ⇡0
(x)V (x)/cu.
It should be noted that, when cl or cu is negative, the obtained running time bound is also negative
and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.
Characterization 3 (Drift Analysis)
For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:
Paramters: a distance function V .
Input:
cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1
) | ⇠t]] for all t � 0;
cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1
) | ⇠t]] for all t � 0.
Output:
AuDA =
Px2X ⇡
0
(x)V (x)/cl;
AlDA =
Px2X ⇡
0
(x)V (x)/cu.
7.2. The Power of Switch Analysis from Drift Analysis
Theorem 4
ADA is reducible to ASA.
Lemma 9
ADA is upper-bound reducible to ASA.
28
progress toward the optimum of every step of an EA, which directly derives the number of steps
the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-
trary (possibly depends on the objective function), thus it covers the variants including the original
definition of drift analysis [17] and the multiplicative drift analysis [10].
Definition 12 (Distance Function)
For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and
V (x) > 0 for all x 2 X � X ⇤ is called a distance function.
Lemma 8 (Drift Analysis)
For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that
8t : cl E[[V (⇠t)� V (⇠t+1
) | ⇠t]],
we have E[[⌧ | ⇠0
⇠ ⇡0
]] P
x2X ⇡0
(x)V (x)/cl;
and if there exists a non-negative value cu such that
8t : cu � E[[V (⇠t)� V (⇠t+1
) | ⇠t]],
we have E[[⌧ | ⇠0
⇠ ⇡0
]] �P
x2X ⇡0
(x)V (x)/cu.
It should be noted that, when cl or cu is negative, the obtained running time bound is also negative
and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.
Characterization 3 (Drift Analysis)
For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:
Paramters: a distance function V .
Input:
cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1
) | ⇠t]] for all t � 0;
cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1
) | ⇠t]] for all t � 0.
Output:
AuDA =
Px2X ⇡
0
(x)V (x)/cl;
AlDA =
Px2X ⇡
0
(x)V (x)/cu.
7.2. The Power of Switch Analysis from Drift Analysis
Theorem 4
ADA is reducible to ASA.
Lemma 9
ADA is upper-bound reducible to ASA.
28
most simplified version
cl E[V (⇠t)� V (⇠t+1) | ⇠t]cu � E[V (⇠t)� V (⇠t+1) | ⇠t]
Theoretical Analysis
42 42
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Example in LeadingOnes
LeadingOnes Problem: count the number of leading 1-bitsf(11011111) = 2f(x) =
nX
i=1
iY
j=1
xi
argmax
x2{0,1}n
nX
i=1
iY
j=1
x
i
fitness:
Distance function: V (x) = n� f(x)distance of optimal solutions is zero
zerozero
E[V (⇠t)� V (⇠t+1) | ⇠t] =I(V (⇠t) > V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) < V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) = V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]
The drift:
Only need to care the expected progress:11...10......keep flip
probability of making progress >=probability of increasing at least one leading 1-bit
E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1
n)i � 1
n(1� 1
n)n�1 � 1
en
Theoretical Analysis
43 43
intro. to theory Markov chain problem dependency RTA analysis tools
An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances
on parameters on comparison with classics on real-world situations summary
Example in LeadingOnes
LeadingOnes Problem:
fitness:
Distance function: V (x) = n� f(x)distance of optimal solutions is zero
E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1
n)i � 1
n(1� 1
n)n�1 � 1
en
ERT is then upper bounded asX
x2X
⇡0(x)V (x)1en
V ((00 . . . 0))1en
=n1en
2 O(n2)
the exact running time is approximate 0.86n2 [Böttcher, et al., 10]
f(x) =nX
i=1
iY
j=1
xi
argmax
x2{0,1}n
nX
i=1
iY
j=1
x
i
count the number of leading 1-bitsf(11011111) = 2
Theoretical Analysis
• Additional Remarks – There are more analytical tools than what we’ve talked today.
– Different analytical tools may suit different algorithms/problems (at least from an intuitive perspective).
– Theoretical analysis of EAs have been extended to more complicated problems (e.g., NP-hard problems such as Traveling Salesman, Minimum Vertex Cover, etc).
– In addition to deepening our understanding, theoretical analysis may also help configuring EAs for a specific problem.
44 44
End of Lecture 11
45