Download - Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Evolutionary Computation

Lecture 11

Algorithm Configura4on and Theore4cal Analysis

1

Outline

•  Algorithm Configuration

•  Theoretical Analysis

2 2

Algorithm Configuration

•  Question: If an EA toolbox is available (which is true), which operators/parameters shall I use?

•  Aim: To find an appropriate EA instance (configuration of EA) for an optimization task.

3 3

Toolbo

x

Representa4on 1 Representa4on 2 Representa4on 3

Operator 1 Operator 2 Operator 3

Scheme 1 Scheme 2 Scheme 3

EA instance

Operator & Parameter

Scheme & Parameter

Representa4on


•  Configuring an EA involves setting the values of –  categorical variables (type of operators/schemes) –  numerical variables (e.g., crossover rate, population-size)

•  We refer both the categorical and numerical variables as parameter of EAs

•  More formally, Configuring an EA is another search problem, for which the best parameter vectors are searched to maximize the performance.

4 4


•  Performance measures may differ in different context: –  The best solution quality –  The time required to reach a threshold of solution quality

•  Due to the stochastic nature of EAs, a performance measure can be viewed as a random variable.

•  Multiple runs are usually required to get a reliable estimate of performance.

5 5


•  The rough idea: Generate-and-Test

•  Example 1-1: A naïve approach –  Generate p candidate parameter vectors –  Get p EA instances, each configured with a parameter vector –  Run each EA instance for r times –  In each run, n solution vectors is generated and tested –  Compare the parameter vectors using performance measures

to get the best configuration.

6 6


•  The naïve approach requires calculating the objective function f(x) for prn times. – sounds too costly.

•  Smarter approach is needed to reduce search effort. –  Reduce p –  Reduce r –  Reduce n

7 7


•  Racing procedures: An approach for reducing r.

•  Basic idea: Stop running (testing) an EA instance if there is indication that it is not promising.

•  By not promising, we mean an EA instance performs significantly worse than the best EA instance identified so far.

•  EA instances are compared after each run rather than after running all of them for r times.

8 8


•  Example 1-2:

•  Many parameter vectors may be removed without running r times. 9 9

1.  Generate p candidate parameter vectors

2.  Set E as the set of p EA instances

3.  REPEAT: •  Run each EA instance once and update its (average) performance •  Identify the EA instance with the best performance •  Remove from E the EA instances whose performance is

significantly worse than the best EA instance

4.  Until size(E)=1 or the number of runs reaches r


•  There are a few statistical tests that can be used to indicate significant difference in performance. –  Analysis of Variance (ANOVA). –  Kruskal-Wallis (Rank Based Analysis of Variance). –  Hoeffding’s bound. –  Unpaired Student T-Test.

10 10


•  Racing procedure may not be sufficient in case: –  p is very large –  The set of candidate parameter vectors is infinite

•  Basic idea: To apply an iterative search procedure to the parameter space. –  Meta-EA: using EAs to search for the best configuration –  Iterative local search

•  Hopefully, the best configuration can be found with less trial parameter vectors.

11 11


•  Example 1-3:

12 12

1.  Generate p’ candidate parameter vectors (p’<p)

2.  Get p’ EA instances

3.  REPEAT

•  Run each EA instance for r times and assess its performance •  Generate another p’ parameter vectors

4.  Until halting criterion is satisfied


•  Iterative search can be combined with a racing procedure to reduce both p and r .

•  Example 1-4:

13 13



3.  REPEAT

•  Applying Racing procedure to compare performance of EA instances •  Generate another p’ parameter vectors



•  Further enhancement: Modeling the landscape of performance measures –  For identifying promising parameter vectors –  For filtering out unpromising parameter vectors

•  The modeling task is a supervised learning problem –  Input data: parameter vectors –  Target: performance

14 14


•  Example 1-5 (Sequential Parameter Optimization):

15 15



3.  Run each EA instance for r times and assess its performance

4.  Build a model M to approximate the performance landscape

5.  REPEAT

•  Generate another p’ parameter vectors •  Identify the most promising parameter vectors with M •  Only the promising parameter vectors are run for r times •  Update M with the newly “assessed” parameter vectors



•  Remark on reducing n –  The key question involved: Will the “relative order” of two

EAs maintain unchanged during the evolution?

–  The lesson learned: the answer is problem and algorithm dependent, and thus few systematic approach have been investigated for this issue.

16 16

0 1000 2000 3000 4000 50000.2

0.4

0.6

0.8

p

0 1000 2000 3000 4000 50000.4

0.5

0.6

0.7

fp

0 1000 2000 3000 4000 50000.4

0.6

0.8

1

f5

CRm

0 500 1000 15000.4

0.6

0.8

0 500 1000 15000.4

0.6

0.8

0 500 1000 15000

0.5

1

f9

p

fp

CRm

0 500 1000 1500 2000 2500 30000

0.5

1

0 500 1000 1500 2000 2500 30000.5

0.6

0.7

0 500 1000 1500 2000 2500 30000.4

0.6

0.8

1

fcec5

p

fp

CRm

0 500 1000 1500 2000 2500 30000.4

0.6

0.8

0 500 1000 1500 2000 2500 30000.5

0.6

0.7

0 500 1000 1500 2000 2500 30000

0.5

1

fcec9

p

fp

CRm

Fig. 2. The self-adaptation curves of p, fp and CRm for f5, f9, fcec5,and fcec9. On the vertical axes are shown their values (between 0 and 1),while on the horizontal axes are shown the number of generations.

0 1000 2000 3000 4000 500010

!30

10!25

10!20

10!15

10!10

10!5

100

105

1010

f5

SaNSDESaDENSDE

0 500 1000 150010

!8

10!6

10!4

10!2

100

102

104

f9

SaNSDESaDENSDE

0 500 1000 1500 2000 2500 300010

!1

100

101

102

103

104

105

fcec5

SaNSDESaDENSDE

0 500 1000 1500 2000 2500 300010

!20

10!15

10!10

10!5

100

105

fcec9

SaNSDESaDENSDE

Fig. 3. The evolution curves for f5, f9, fcec5 and fcec9. The verticalaxes show the distance to the optimum and the horizontal axes show thenumber of generations.

1114 2008 IEEE Congress on Evolutionary Computation (CEC 2008)

Yes: Blue curve vs. Green Curve No: Blue Curve vs. Red Curve


•  So far, we mainly touched the case of configuring EA on a single optimization task (problem instance)

•  What if we want to find a more general configuration that can be

applied to multiple problem instances? •  Previously introduced ideas can be adapted to this scenario. •  Instead of looking at the results of r runs, comparisons are made

mainly based on the performance on multiple problem instances. –  F-RACE –  ParamILS

17 17


•  Additional Remarks –  Operators/parameters can also be adjusted on-the-fly. This

type of EAs are usually called adaptive/self-adaptive EAs.

–  With a unified “representation” of EA’s behaviors, there will huge room for involving machine learning techniques.

–  Algorithm configuration is also closely related to the emerging terminology “hyper-heuristic”.

18 18

Outline

•  Algorithm Configuration

•  Theoretical Analysis

本部分内容取自南京大学俞扬博士IJCAI’2013报告的PPT，特此表示感谢。

19 19

Theoretical Analysis

20 20

intro. to theory Markov chain problem dependency RTA analysis tools on parameters on comparison with classics on real-world situations

An Introduction to Evolutionary OptimizationRecent Theoretical and Practical Advances

summary

Conventional algorithm analysis

ProblemAlgorithm

Measurement

Sorting

Shortest Path

Linear Programming

Quick Sort average time complexity

Dijkstra’s algorithm

O(n log n)

O(|V |2)average time complexity

Simplex

worst case time complexity: exponentialsmoothed complexity: polynomial


21 21



summary

Time complexity

What about an algorithm sorts (5,4,2,8,9) in 3 steps?

measured in a class of problem instances

measure the growing rate as the problem size increases

e.g. all possible arrays of 5 numbers

e.g.

average complexityworst case complexity

2n2

asymptotical notation O(n2)


22 22



summary

But for EAs...

ProblemAlgorithm

Measurement

nature phenomena

problem unknownnot designed with knowledge of problems

theoretical understanding is even more important


23 23

intro. to theory Markov chain problem dependency RTA analysis tools


on parameters on comparison with classics on real-world situations summary

Markov chain modeling

A general procedure: initialization

population

evaluation & selection

offspring

reproduction

population 0

expand along time:

population 1

population 2

population 3

...

Markov chain:

state 0

state 1

state 2

state 3

...


24 24




Markov chain:

state 0

state 1

state 2

state 3

...

P (⇠3 | ⇠2, ⇠1, ⇠0) = P (⇠3 | ⇠2)

⇠0 ⇠1 ⇠2 ⇠3

Markov property

solution space

optimal solutions

S⇤S

population space

optimal populations

X ⇤X = Sm

involves at least one optimal solution


25 25




Let ⇠ be a Markov chain. Define

↵

t

=

X

x/2X

⇤

P (⇠

t+1 2 X ⇤ | ⇠t

= x)P (⇠

t

= x)�X

x2X

⇤

P (⇠

t+1 /2 X ⇤ | ⇠t

= x)P (⇠

t

= x).

Then ⇠ converges to X ⇤if and only if ↵ satisfies:

Convergence

Does an EA converge to the global optimal solutions?lim

t!+1P (⇠t 2 X ⇤) = 1

Considered as closed:

Theorem: (discrete version derived from [He & Yu, 01])

gain of optimality in one steploss of optimality in one step

P (⇠0 2 X ⇤) ++1X

t=0

↵t = 1


26 26




Convergence

Does an EA converge to the global optimal solutions?lim

t!+1P (⇠t 2 X ⇤) = 1

An EA that1. uses global operators2. preserves the best solutionalways converges to the optimal solutions

But life is limited! How fast does it converge?

Considered as closed:

gain of optimality > 0loss of optimality = 0


27 27




Examples in simple cases

ProblemAlgorithm

Measurement

OneMax

(1+1)-EALeadingOnes Expected Running Time(ERT)

Linear Pseudo-Boolean Functions

LongPath

...


28 28




Running time analysis

Running time of an EA:the number of solutions evaluated until reaching an optimal solution of the given problem for the first time

the most time consuming stepmay meet many times

Running time analysis:running time with respect to the problem size (e.g. n)the expected running time/ERT

ERT with high probabilitye.g. expected running timeO(n2)

e.g. expected running time with probability at leastO(n lnn) 1� 1

2n

computational complexity


29 29




(1+1)-EA

1: s a randomly drawn solution from X2: for t=1,2,. . . do

3: s0 mutate(s)4: if f(s0) � f(s) then5: s s0

6: end if

7: terminate if meets a stopping criterion

8: end for

A simple EA: (1+1)-EA

An extremely simplified EAmissing some features of real EAs

one-bit mutation

bitwise mutation

randomly choose one bit and change its value

change every bit with some probability (e.g. )

for maximization, allow neutral changes

find an optimal solution

no population

no crossover

1

n


30 30




OneMax Problem: count the number of 1 bits

f(x) =nX

i=1

xi

EAs do not have the knowledge of the problems only able to call f (x)no difference with any other functions f : {0, 1}n ! R

not only optimizing the problem, but also guessing the problem

argmax

x2{0,1}n

nX

i=1

x

i

fitness:


31 31




OneMax: f(x) =nX

i=1

xi

(1+1)-EA with bitwise mutation (flip each bit with probability ):

the probability of flipping i particular bits: (1

n)i(1� 1

n)n�i

1

n

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

i

prob

abili

ty

monotonically decreasing

but always positive


32 32




OneMax: f(x) =nX

i=1

xi

the solutions with the same number of 1-bits share the same f valuesolutions with

0 1-bitssolutions with



n 1-bits

many transitions

......S0 = S⇤S1 S2 Sm

(1+1)-EA with bitwise mutation (flip each bit with probability ):1

n


33 33




OneMax: f(x) =nX

i=1

xi





n 1-bits

an upper bound: a path visits all subspaces

......S0 = S⇤S1 S2 Sm


n


34 34




OneMax: f(x) =nX

i=1

xi





n 1-bits

p = n(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm


n


35 35




OneMax: f(x) =nX

i=1

xi





n 1-bits

p = n(1

n)(n� 1

n)n�1

p �✓n� 1

1

◆(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm


n


36 36




OneMax: f(x) =nX

i=1

xi





n 1-bits

p = n(1

n)(n� 1

n)n�1

p �✓n� 1

1

◆(1

n)(n� 1

n)n�1

p �✓n� i

1

◆(1

n)(n� 1

n)n�1

......S0 = S⇤S1 S2 Sm


n


37 37




OneMax: f(x) =nX

i=1

xi

probability of transition

expected #steps the transition happens 1

n� i· n · (1 + 1

n� 1)n�1 ⇠ 1

n� i· n · e

p �✓n� i

1

◆(1

n)(n� 1

n)n�1

summed up

ERT upper bound O(n lnn)

n�1X

i=0

en

i= enHn ⇠ en lnn


n


38 38




General analysis tools

running time analysis is commonly problem specific

going to derive the ERT of an EA in a problem

“where to look” “what to calculate”

need a guide to tell what to look and what to follow to accomplish the analysis

- Fitness Level Method- Drift Analysis- Convergence-rate Based Method


39 39




Fitness Level Method

solution space

partition the solution space into subspaces ...

with increasing fitness

= S⇤

S1

S2

Sm

[Wegener, 02] a population is treated as the best solution in the population

1. initialization probability of being in each subspace ⇡0(Si)

Then calculate:

Note that elitist (i.e. never lose the best solution) EAs select solutions with better fitness. The level

sets, intuitively, form stairs, for which an upper bound can be derived by summing up the time

taken for getting off every stair, and a lower bound is the minimum time of getting off a stair. This is

formally described in Lemma 2.

Lemma 2 (Fitness Level Method [39])

For an elitist EA process ⇠ on a problem f , letS1

, ...,Sm be a<f -partition, let vi P (⇠t+1

2 [mj=i+1

Sj |

⇠t = x) for all x 2 Si, and ui � P (⇠t+1

2 [mj=i+1

Sj | ⇠t = x) for all x 2 Si. Then, the DCFHT of the EA

process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.

Later on, more elaborated fitness level method was discovered by Sudholt [36], which we call as

the refined fitness level method in this paper as in Lemma 3. In the lemma, when the EA uses a

population of solutions, the notation ⇠ will denote that the best solution of the population is in the

solution space S.

Lemma 3 (Refined Fitness Level Method [35, 36])

For an elitist EA process ⇠ with a fitness function f , letS1

, ...,Sm be a<f -partition, let vi minj1

�i,j

P (⇠t+1

2

Sj | ⇠t = x) and ui � maxj1

�i,j

P (⇠t+1

2 Sj | ⇠t = x) for all x 2 Si wherePm

j=i+1

�i,j = 1, and let

�u,�l 2 [0, 1] be constants such that �u � �i,j/Pm

k=j �i,k � �l for all j > i, �u � 1 � vj+1

/vj and

�l � 1� uj+1

/uj for all 1 j m� 2. Then the DCFHT of the process is at most

m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,

and the DCFHT of the process is at least

m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).

The refined fitness level method follows the general idea of the fitness level method, while intro-

duces a variable � that reflects the distribution of the probability that the EA jumps to better levels.

When � is small, the EA has a high probability to jump across many levels and thus make a large

progress; when � is large, the EA can only take a small progress in every step. Obviously, � can take

1 for upper bounds and 0 for lower bounds, which degrades the refined method to be the original

fitness level method. Therefore, the original fitness level method is a special case of the refined one.

20








2 [mj=i+1

Sj |


2 [mj=i+1


process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.




solution space S.




�i,j

P (⇠t+1

2


�i,j

P (⇠t+1


j=i+1

�i,j = 1, and let



/vj and

�l � 1� uj+1


m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,


m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).







20








2 [mj=i+1

Sj |


2 [mj=i+1


process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.




solution space S.




�i,j

P (⇠t+1

2


�i,j

P (⇠t+1


j=i+1

�i,j = 1, and let



/vj and

�l � 1� uj+1


m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,


m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).







20

2. bounds of progress probability for :








2 [mj=i+1

Sj |


2 [mj=i+1


process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.




solution space S.




�i,j

P (⇠t+1

2


�i,j

P (⇠t+1


j=i+1

�i,j = 1, and let



/vj and

�l � 1� uj+1


m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,


m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).







20








2 [mj=i+1

Sj |


2 [mj=i+1


process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.




solution space S.




�i,j

P (⇠t+1

2


�i,j

P (⇠t+1


j=i+1

�i,j = 1, and let



/vj and

�l � 1� uj+1


m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,


m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).







20

the ERT is then upper bounded by: and lower bounded by:

x 2 Si


40 40




Example in OneMax

solutions with 0 1-bit

solutions with 1 1-bit

solutions with 2 1-bits

solutions with n 1-bits

...... = S⇤S1 S2 Sm

partition

progress probability for :x 2 Si

a lower bound: flipping one 0-bit but no 1-bits:✓n� i

1

◆(1

n)(n� 1

n)n�1

⇡0(Si) =

�ni

�

2ninitialization distribution:

ERT:








2 [mj=i+1

Sj |


2 [mj=i+1


process is at most

X

1im�1

⇡0

(Si) ·m�1X

j=i

1

vj,

and is at least

X

1im�1

⇡0

(Si) ·1

ui.




solution space S.




�i,j

P (⇠t+1

2


�i,j

P (⇠t+1


j=i+1

�i,j = 1, and let



/vj and

�l � 1� uj+1


m�1X

i=1

⇡0

(Si) ·�1

vi+ �u

m�1X

j=i+1

1

vj

�,


m�1X

i=1

⇡0

(Si) · (1

ui+ �l

m�1X

j=i+1

1

uj).







20

⇡0(S0)m�1X

j=1

1

vj2 O(n lnn)

S3


41 41




Drift Analysis [Hajek, 82][Sasaki & Hajek, 88][He & Yao, 01][He & Yao, 04]

distance function V measuring “distance” of a solution to optimal solutions. V(x*)=0

optimal solutions

S⇤

xV(x)

Then calculate:

2. bounds of progress distance for every step:

the ERT is then upper bounded by: and lower bounded by:

1. initialization probability of solutions ⇡0(x)

progress toward the optimum of every step of an EA, which directly derives the number of steps

the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-

trary (possibly depends on the objective function), thus it covers the variants including the original

definition of drift analysis [17] and the multiplicative drift analysis [10].

Definition 12 (Distance Function)

For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and

V (x) > 0 for all x 2 X � X ⇤ is called a distance function.

Lemma 8 (Drift Analysis)

For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that

8t : cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]],

we have E[[⌧ | ⇠0

⇠ ⇡0

]] P

x2X ⇡0

(x)V (x)/cl;

and if there exists a non-negative value cu such that

8t : cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]],


⇠ ⇡0

]] �P

x2X ⇡0

(x)V (x)/cu.

It should be noted that, when cl or cu is negative, the obtained running time bound is also negative

and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.

Characterization 3 (Drift Analysis)

For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:

Paramters: a distance function V .

Input:

cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0;

cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0.

Output:

AuDA =

Px2X ⇡

0

(x)V (x)/cl;

AlDA =

Px2X ⇡

0

(x)V (x)/cu.

7.2. The Power of Switch Analysis from Drift Analysis

Theorem 4

ADA is reducible to ASA.

Lemma 9

ADA is upper-bound reducible to ASA.

28

progress toward the optimum of every step of an EA, which directly derives the number of steps

the EA takes to arrive at the optimum. Note that in Lemma 8, the distance function can be arbi-

trary (possibly depends on the objective function), thus it covers the variants including the original

definition of drift analysis [17] and the multiplicative drift analysis [10].

Definition 12 (Distance Function)

For a space X with the optimal subspace X ⇤, a function V satisfying V (x) = 0 for all x 2 X ⇤ and

V (x) > 0 for all x 2 X � X ⇤ is called a distance function.

Lemma 8 (Drift Analysis)

For an EA process ⇠ 2 X , let V be a distance function, if there exists a positive value cl such that

8t : cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]],


⇠ ⇡0

]] P

x2X ⇡0

(x)V (x)/cl;

and if there exists a non-negative value cu such that

8t : cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]],


⇠ ⇡0

]] �P

x2X ⇡0

(x)V (x)/cu.

It should be noted that, when cl or cu is negative, the obtained running time bound is also negative

and thus meaningless. In this case, we will say that the drift is invalid and the analysis fails.

Characterization 3 (Drift Analysis)

For an EA process ⇠ 2 X , the drift analysis ADA is defined by its parameters, input and output:

Paramters: a distance function V .

Input:

cl > 0 for upper bound analysis such that cl E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0;

cu > 0 for lower bound analysis such that cu � E[[V (⇠t)� V (⇠t+1

) | ⇠t]] for all t � 0.

Output:

AuDA =

Px2X ⇡

0

(x)V (x)/cl;

AlDA =

Px2X ⇡

0

(x)V (x)/cu.

7.2. The Power of Switch Analysis from Drift Analysis

Theorem 4

ADA is reducible to ASA.

Lemma 9

ADA is upper-bound reducible to ASA.

28

most simplified version

cl E[V (⇠t)� V (⇠t+1) | ⇠t]cu � E[V (⇠t)� V (⇠t+1) | ⇠t]


42 42




Example in LeadingOnes

LeadingOnes Problem: count the number of leading 1-bitsf(11011111) = 2f(x) =

nX

i=1

iY

j=1

xi

argmax

x2{0,1}n

nX

i=1

iY

j=1

x

i

fitness:

Distance function: V (x) = n� f(x)distance of optimal solutions is zero

zerozero

E[V (⇠t)� V (⇠t+1) | ⇠t] =I(V (⇠t) > V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) < V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]+I(V (⇠t) = V (⇠t+1))E[V (⇠t)� V (⇠t+1) | ⇠t]

The drift:

Only need to care the expected progress:11...10......keep flip

probability of making progress >=probability of increasing at least one leading 1-bit

E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1

n)i � 1

n(1� 1

n)n�1 � 1

en


43 43




Example in LeadingOnes

LeadingOnes Problem:

fitness:

Distance function: V (x) = n� f(x)distance of optimal solutions is zero

E[V (⇠t)� V (⇠t+1) | ⇠t] � 1 · 1n(1� 1

n)i � 1

n(1� 1

n)n�1 � 1

en

ERT is then upper bounded asX

x2X

⇡0(x)V (x)1en

V ((00 . . . 0))1en

=n1en

2 O(n2)

the exact running time is approximate 0.86n2 [Böttcher, et al., 10]

f(x) =nX

i=1

iY

j=1

xi

argmax

x2{0,1}n

nX

i=1

iY

j=1

x

i

count the number of leading 1-bitsf(11011111) = 2


•  Additional Remarks –  There are more analytical tools than what we’ve talked today.

–  Different analytical tools may suit different algorithms/problems (at least from an intuitive perspective).

–  Theoretical analysis of EAs have been extended to more complicated problems (e.g., NP-hard problems such as Traveling Salesman, Minimum Vertex Cover, etc).

–  In addition to deepening our understanding, theoretical analysis may also help configuring EAs for a specific problem.

44 44

End of Lecture 11

45

Download - Evolutionary Computation - USTCstaff.ustc.edu.cn › ~ketang › PPT › ECLec11.pdf · Theoretical Analysis 28 intro. to theory Markov chain problem dependency RTA analysis tools

Top Related