fda- a scalable evolutionary algorithm for the optimization of adfs by hossein momeni

FDA- A scalable evolutionary algorithm for the optimization of ADFs

By Hossein Momeni

Outline

• Factorization Theorem• FDA• Analysis of FDA for large populations• Boltzmann and Truncation selections• Finite and critical population • Numerical results• LFDA

Factorized Distributed Algorithm

Iran University of Science and Technology November 2006Of 47

Introduction• In a deceptive function the global optimum

x=(1,…,1) is isolated.• Neighbors of the second best fitness value x=(0,

…,0) have large fitness value• GAs are deceived by the fitness distribution• Most Gas will convergence to x=(0,…,0)



Solutions • Mathematical methods are suitable to optimize

deceptive functions• Consider additively decomposed functions (ADF)

• Sj are non-overlapping substrings of X with k elements

• This class of functions is of great theoretical and practical importance

• Optimization of an arbitrary in this space is NP complete



ADFs Optimization Approaches• Adaptive recombination• Explicit detection of relations

(kargupta&Goldberg, 97)• Dependency trees(Baluja&Davies, 97)• Bivariate marginal distributions

(pelikan&Muhleinbein,98) • Estimation of Distributions(Muhlenbein et

all,1997)



ADF

• Definition: An additively decomposed function (ADF) is defined by:

• For theoretical analysis, use Boltzmann Distribution

)()(i

i

sSs

i xfxf

Xsssss il ,...,, 21



Gibbs or Boltzmann distribution• Definition: The Gibbs or Boltzmann distribution of a

function f is defined for u>=1 by

• is partition function• larger function value f(x) and larger p(x)• Such a search distribution is suitable for an

optimization problem• exponential computation

u

u

F

xfExpxp

)(:)(

uF



Reduce of B.D. computation



1) Approximate the Boltzmann distribution (simulated Annealing)

2) Look for ADFs with distribution computation in Polynomial time

• factorize distribution into a product of marginal and conditional probabilities (used by FDA)

Input sets for Factorization theoremDefinition: if S={s1,s2, …, sl} for i=1, 2,…, l then

In the decomposable graphs theory:

di histories

bi residuals

ci separators



Factorization Theorem

Theorem1: Let p(x) be a Boltzmann distribution on X

If

then



FDAr

S0: set t=0, generate (1-r)*N>>o point randomly and r*N points (Equation 16)

S1: selection

S2: Compute using selected points

S3: Generate a new population

S4: If termination criteria is met, Finish

S5: Add the best point of previous generation to generated points (elitist)

S6: Set t=t+1, Go to Step2

),( txxpii cb

s

l

icb

s txxptxpii

1

),()1,(



Analysis of Factorization Algorithm• The computational Complexity depends on the factorization

and population size N• Number of function evaluations: FE=GENe*N

GENe is the number of generation till Convergence p(x,t+1)=p(x,t)

• The computational Complexity of computing N new search points is

• The Computational Complexity of computing probability is

Nlnts)compl(Npoi

Ml

i

)2(compl(p)1

si



Analysis of … (Contd)• Computation of FDA depends on:

1) Number of decomposition functions (l)2) Size of the defining sets (si)

3) Size of selected point (M)

• An infinite population is needed to exactly computation

• Should use a minimal population size N* in a numerical efficient FDA

• Computation of N* is a difficult problem for any search method using a population of points



FDA-FAC• S0: set i=1, is non-linear sub-function

• S1: compute

• S2: Select sk which has maximal overlap with and

• S3: if no set is found go to step 5

• S4: Set if i<L go Step1

• S5: Compute the factorization using Eq. 6 with sets

is~

i

jji sd

1

~:~

id~

ik ds~

1:,~1 iiss ki

is~



Generation of Initial Population

• Normally the initial population is generated randomly

• with ADF, initial point can be generated with this information.

• Generate subsets with high local fitness values• Distribution is an approximation of • Conditional probabilities are computed using local

fitness functions



Generation of Initial Population….

• The larger u, the steeper distribution• if u=1 the distribution is uniform.

• if function Onemax(n)=∑xi then • FDA computes span=1 and u=10



Generation of Initial Population….• if function Onemax(n)=∑xi then • FDA computes span=1 and u=10

• There will be 10 times more 1s than 0s in the initial population

• Such an initial population might not give a B.D. • Only half of the population is generated by this

method• Other half is generated randomly



Convergence of FDA• If points are selected base on Bol. Distribution

convergence of FDA is proved.• The distribution ps of selected points is given by:

• If p(x,t) is B.D. then ps(x,t) is B.D. • FDA computes new search points according to



• Theorem2 : If the initial points are distributed according to with u>=1, then for FDA the distribution at generation is given by

with

Tip: B. Selection with fixed basis v>1 defines an annealing schedule with that t is number of generation

Theorem3 remains valid for any annealing schedule with

tvuw .

))ln()ln((1)( uvttT



• Theorem 3(Convergence): Let be the set of optima, then base on Theorem 2 :

• FDA with B. selection is exact simulated annealing algorithm.

• simulated annealing is controlled by 2 parameters: N(T) and annealing schedule

• N can be called population size

,...},{ 21 optoptopt xxX



Truncation Selection Vs B. selection

• Numerically truncation selection is easier to implement• With truncation threshold ד the best ד*N individual are

selected.• Conditional probabilities of selected point is: • Based on factorization theorem to generate new search points :

• Problem: After Truncation selection the distribution is not B.D. therefore:

• With this inequality that this makes a convergence proof difficult.

),( txxpii cb

s

l

icb

s txxptxPii

1

),()1,(

l

icb

ss txxptxpii

1

),(),(

),()1,( txptxp opts

opt



Theoretical Analysis for Infinite populations

• For analysis two linear function will be investigated:

• OneMax has (n+1) different fitness value which are multinomial D.

• Int has 2n different fitness value.• For ADFs the multinomial distribution is typical• The distribution generated by Int is more special• Both functions is linear, therefore can use following

factorization:

n

ii

in

n

iin

xxInt

xxOneMax

1

1

1

2)(

)(

n

ii txptxp

1

),()1,(



• Theorem 4 For B. selection with basis v the probabilities

distribution for OneMax is given by:

• Number of generations to generate the optimum is given by:

nt

xtf

v

vtXp

)1(),(

)(

)ln(

ln

v

n

GEN



• Theorem 5For Truncation selection ד with selection intensity Iד

the marginal probability p(t) obeys for OneMax

• The approximate solution of this equation is :

Where

• The number of generations till convergence is given by:

))(1)(()()1( tptnpn

Itptp

))12arcsin(sin(1(5.0)( 0 ptn

Itp

I

npt ))12arcsin(

2( 0



Comparison Truncation & B. selection

• T.S. need more number of generation to convergence than B.S.

• GENe is of order for B.S. and for T.S. is

• If basis v is small (e.g. v=1.2) T.S. convergence is faster



• B.S. with fixed v gives an annealing schedule of



• FDA with truncation selection generates a B.D. with annealing schedule

• The annealing schedule depends on the average fitness and the variance of the population.





• For Int the B.D. is concentrated around the optimum

• The selected population has a small diversity• In finite population this cause a problem, some

genes will get fixed to wrong alleles

Analysis of FDA for Finite Populations



In finite population, convergence of FDA can be Probabilistic

Analysis of FDA for Finite Populations



Cumulative fixation probability for Int(16) Truncation Selection vs. Boltzmann selection with v=1.01

fda- a scalable evolutionary algorithm for the optimization of adfs by hossein momeni

Documents

distribution computation

search distribution

best fitness value x

factorization theoremtheorem1

function f

decomposed function

theoretical analysis

global optimum x