efficiency enhancement of estimation of distribution algorithms

Efficiency Enhancement of Estimation ofDistribution Algorithms

Martin Pelikan, David E. Goldberg, Kumara Sastry

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)University of Missouri, St. Louis, MO

http://medal.cs.umsl.edu/

[email protected]

Martin Pelikan, David E. Goldberg, Kumara Sastry Efficiency Enhancement of EDAs

http://medal.cs.umsl.edu/

[email protected]

Motivation

Estimation of distribution algorithms (EDAs)

I EDAs guide search by building and sampling an explicitprobabilitic model of high-quality solutions.

I EDAs can solve broad classes of hard problems scalably, oftenin low-order polynomial time.

I But scalable performance is sometimes not enough.I Problem is large (thousands, millions, or more variables).I Evaluation is complex (minutes, hours, or more).

Efficiency enhancement techniques

I Enhance efficiency of an evolutionary algorithm or EDA.

I Examples: Parallelization and hybridization.

I Can lead to substantial speedups of even 10,000 or more.

I One of the most important directions in EDA research.


Outline

1. Estimation of distribution algorithms.

2. Computational bottlenecks.

3. Efficiency enhancement techniques.

4. Summary and conclusions.


Estimation of Distribution Algorithms

I EDAs replace crossover and mutation byI Building a probabilistic model of selected solutions.I Sampling the built model to generate new solutions.

I Main differences between different EDAsI Models.I Representations.

Martin Pelikan, Probabilistic Model-Building GAs 9

Probabilistic Model-Building GAs

01011

11000

11001

11101

11001

10101

01011

11000

Selectedpopulation

Currentpopulation

ProbabilisticModel 11011

00111

01111

11001

Newpopulation

…replace crossover+mutation with learning and sampling probabilistic model


Bottlenecks

Computational bottlenecks

I Fitness evaluation.

I Model building.

I Model sampling.

I Other operators typically not a problem.

Examples

I We want to solve a problem of 1,000,000 variables.

I We want to solve a problem where evaluation takes 1 hour.


Efficiency Enhancement Techniques (EETs)

Categories of EETs

1. Parallelization.

2. Hybridization.

3. Time continuation.

4. Fitness evaluation relaxation.

5. Incremental and sporadic model building.

6. Prior knowledge utilization.

7. Learning from experience.


1. Parallelization

Motivation

I Parallel computers and multicore computers common.

I Evolutionary algorithms can be parallelized relatively easily.

I Why not parallelize an EDA to address the bottlenecks?

Parallelization of EDAs

I Parallelize the components of an EDA.I Can parallelize all bottlenecks and more

I Distribute evaluations.I Distribute model building.I Distribute model sampling.

I If done well, we can achieve near linear speedups.


Parallelization of Evaluations

Master-slave parallelization of evaluations (Cantu-Paz, 2000)

I One processing unit becomes master, the others act as slaves.

I Master unit does most of the work.I Evaluations are distributed

I Master distributes solutions for evaluation to slaves.I Slaves assign fitness values and return results.I Master collects the results and continues.I Master can also do some evaluations.

Master

Slave-1 Slave-2 Slave-P


Parallelization of Model Building

How to parallelize model building

I Learning model structure (usually more complex).

I Learning model parameters.

Learning parameters

I Can use master-slave architecture.

I Slaves compute subsets of parameters.

Learning structure

I Not so straightforward due to model restrictions.

I But can do this quite well even for complex models.


Parallelization: Results

Parallelization of model building in BOA (Ocenasek & Pelikan, 2004)

8 Parallel EDAs 197

Table 8.1. The resulting values of coefficients c1, c2, c3, c4, c5

MBOA part Coefficient Estimated value R2 value

Selection c1 8.73E−09 0.978Model building c2 1.00E−07 0.979Model sampling c3 1.58E−07 0.934Replacement (RTR) c4 ∗ a 2.18E−10 0.989Evaluation c5 1.34E−07 0.918

1

6

11

16

21

26

31

36

41

1 6 11 16 21 26 31 36 41 46 51processors

spee

dup

speedup for 2D spin glass 20x20, N=4000predicted speedup for 2D spin glass 20x20, N=4000speedup for 2D spin glass 25x25, N=6000predicted speedup for 2D spin glass 25x25, N=6000speedup for 2D spin glass 30x30, N=8000predicted speedup for 2D spin glass 30x30, N=8000

Fig. 8.2. The comparison of the speedup predicted from the numerical model andthe speedup computed from the empirical data measured on sequential MBOA solv-ing 2D Ising spin glass instances of size 20 × 20, 25 × 25, and 30 × 30. Populationsize was scaled approximately linearly with the problem size

For each Ising spin glass size 100, 225, 400, 625, 900 and each populationsize N = 500, N = 1,000, N = 1,500, N = 2,000, N = 4,000, N = 6,000, andN = 8,000 we choose 10 random benchmark instances and average the dura-tion of each MBOA part. The coefficients hold for one generation of MBOAperformed on Intel Pentium-4 at 2.4 GHz.

Figure 8.2 shows how the predicted speedup changes for increasing P andcompares it with the speedup computed from the measured duration of eachpart of sequential MBOA. We considered three different sizes of spin glassinstances 20×20, 25×25, and 30×30 and we linearly increased the populationsize with problem size (N = 4,000, 6,000, 8,000). The predicted speedup fitsnicely the empirical speedup, especially for large problems. Additionally, it canbe seen that – in the idealized case without communication – it is possible touse a large number of processors (more than P = 50) without observing anysignificant speedup saturation.


2. Hybridization

Motivation

I Local search can reach local optima quickly.

I Why not combine these with an EDA?

I Many applications of evolutionary algorithms use such hybrids.

Hybridization

I Combine EDA with another optimization technique.

I Typically a form of local search is used in a global-local hybrid.

I But we can go beyond local search (e.g. exact methods).


2. Hybridization (cont’d)

Model-directed hybridization

I Models provide us with information about problem landscape.

I We can use this information to design efficient hybrids.

Sources of speedup

I Quickly find optimum when at the correct basin of attraction.

I Reduce variance and thus population size.

Comment

I Hybridization addresses all main bottlenecks.

I Good hybrids should work better than individual techniques.


Hybridization: Results

Speedup due to variance reduction (Pelikan et al., 2003, 2006)

I hBOA with deterministic hill climber.

I hBOA with cluster exact approximation (exact method).

Population size Number of evaluations

102

103

101

102

103

104

Problem Size

Pop

ulat

ion

Siz

e

hBOAhBOA with simple hill climberhBOA with cluster exact approximation

(a) Population size.

102

103

102

103

104

105

106

Problem Size

Num

ber

of E

valu

atio

ns

hBOAhBOA with simple hill climberhBOA with cluster exact approximation

(b) Number of evaluations.

Figure 3: Here we show the growth of the population size and the number of evaluations forhBOA on the 2D Ising spin glass with periodic boundary conditions (Pelikan et al., 2004; Pelikan& Hartmann, 2006). The results are shown for (1) hBOA itself, (2) hBOA combined with asimple bit-flip hill climber, and (3) hBOA hybridized with the exact method called cluster exactapproximation. The results indicate that the reduction in the number of evaluations in all hBOA-based hybrids comes primarily from the reduction in the population size.

the estimation of these parameters from the model building process because, unlike standard GAs,PMBGAs provide an explicit model of the landscape, which is updated as the search progresses.

4.4.5 Model-directed hybridization

The theory and practice of hybrids to this point has been ad hoc and generally ex post facto. Thepromise of the proposed work is that model building can integrate macro-level, micro-level effects,and coordinate the choice of local search neighborhoods as the search progresses online. Specifically,the use of model-building methods of PMBGAs leads to a number of steps that can be helpful tohybrid integration and coordination:

1. Calculate a probabilistic model of best points in the space.

2. Calculate a fitness surrogate using the probabilistic model.

3. Estimate basin hitting probabilities and time constants.

4. Estimate population sizing requirements.

5. Select appropriate L or adapt L neighborhood to local estimates.

Each of these steps is briefly discussed herein.The most important input of a model-directed hybrid is the probabilistic model that captures

the structure and the properties of the problem landscape. Such a model can be built by a PM-BGA, but one may use a different approach to construct the model (including prior problem-specific

13


Model-Directed Hybridization: Example

Neighborhood operators based on Bayes nets of hBOA (Lima et al., 2006)I Change correlated variables simultaneously

I Include parents.I Include children.I Include both parents and children.

Parents Children Parents+Children

��

��

��

��

��

��

X2

X1 X5

X6

X4

X3

(a) Parental

��

��

��

��

X2

X1 X5

X6

X4

X3

(b) Children

��

��

��

��

��

��

��

��

X2

X1 X5

X6

X4

X3

(c) Parental+Children

Figure 1: Topology of the (a) parental, (b) children, and (c) parental+children neighborhoods ofvariable X2.

A somewhat related approach has been recently proposed by Handa (Handa, 2006), where thetraditional bitwise mutation operator is employed in the estimation of Bayesian networks algo-rithm (EBNA) (Etxeberria & Larranaga, 1999) and consequently variables that depend on the mu-tated node are resampled according to the conditional probabilities for the new instance. Althoughthis mutation operator takes into account the dependencies between variables, it is specifically de-signed to perturb solutions in order to maintain diversity in the population. Our approach is tointerpret the structure of the Bayesian network as a set of linkage groups that are used to defineneighborhoods to be explored by local search.

4 BOA with Substructural Hillclimbing

This section introduces a hillclimber that uses the parental neighborhood defined in the previoussection to perform hillclimbing in the substructural space of an individual. This hillclimbing isperformed for a proportion of the population in BOA to speedup convergence to good solutions, asin traditional hybrid GEAs. After the offspring population is sampled from the probabilistic modeland evaluated, each individual is submitted to substructural hillclimbing with probability pls. Thesubstructural hillclimber can be described as follows:

1. Consider the first variable Xi according with the ancestral reverse ordering of variables in theBayesian network.

2. Choose the values (xi, πi) associated with the maximal substructural fitness f(Xi|Πi).

3. Set variables (Xi, Πi) of the considered individual to values (xi, πi) if the overall fitness of theindividual is improved by doing so, otherwise leave the individual unchanged.

4. Repeat steps 2-3 for all remaining variables following the ancestral reverse order of variables.

Some details need further explanation. First, we use the reverse order of that used to sample thevariables of new solutions, where each node is preceded by its parents. By doing so, higher-order

5


Model-Directed Hybridization: Results

Resuls with parental neighborhoods for hBOA (Lima et al., 2006)

20 40 80 1401

5

10

20

Problem Size, l

Num. of generations

pls = 0

pls = 0.0005

pls = 0.001

20 40 80 140

2

3

4

5

6

Problem Size, l

Speedup,

η ls

pls = 0.0005

pls = 0.001

Figure 4: Number of generations required to get the optimum and the speedup obtained by per-forming substructural local search on a number of concatenated 5-bit Trap functions. The speedupscales as O(ℓ0.45) for ℓ ≤ 80. For ℓ > 80 the speedup grow is more moderate for the optimal valueof pls = 0.0005, while for higher proportions of local search the speedup starts to decrease due todiversity reduction in the population.

here.The reduction of the slope in the speedup curve for larger problem sizes is also related to

the structure of the model learned by BOA. Analyzing the dependency groups captured by theBayesian network with decision trees, it can be observed that the number and size of spuriouslinkages increases with problem size. By spurious linkage we mean additional variables that areconsidered together with a correct linkage group. Although the structure of the Bayesian networkcaptures such spurious dependencies, the conditional probabilities nearly express independencybetween the spurious variables and the correct linkage, therefore not affecting the capability ofsampling such variables as if they were independent. In fact, this capability of decision trees todetect more complex dependencies is one of the keys in hierarchical BOA (Pelikan, 2005) to solvemore complex decomposable problems such as hierarchical problems.

6 Summary and Conclusions

In this paper, we have introduced the use of substructural neighborhoods to perform local search inBOA. Three different substructural neighborhoods—based on the structure of the learned Bayesiannetwork—were proposed. A hillclimber that effectively searches in the subsolution search space wasincorporated in BOA, using a surrogate fitness model to evaluate competing substructures. Theresults showed that incorporating substructural local search in BOA leads to a significant reductionin the number of generations necessary to solve the problem, while providing substantial speedupsin terms of number of evaluations. More importantly, the relevance of designing and hybridizingcompetent operators that automatically identify the problem decomposition and important problemsubstructures have been empirically highlighted.

An important topic for future work is to perform some sort of pre-processing on the dependencygroups to remove spurious linkages from the substructural neighborhoods. This could be obtained,for example, by not considering those pairwise dependencies that lead to an amount of decreasein entropy that lies below some threshold. Also, the continual improvement function of the score

9


3. Time Continuation

Motivation

I Should we run EDA with a large population for a smallnumber of convergence epochs, or should we run EDA with asmall population for a large number of convergence epochs?

I Analogically, we can consider crossover vs. mutation andother tradeoffs involved in evolutionary algorithms and EDAs.

Time continuation

I How to best combine different modes of operation andoperators of an EDA?

I Small population vs. large population.

I Crossover vs. mutation.

I Analogically for other operators and modes of operation.


3. Time Continuation (cont’d)

Two potential goals

I Find best solution within given time.

I Find a good enough solution as fast as possible.

Comment

I Time continuation also addresses all important bottlenecks.


Time Continuation: Results

Crossover versus mutation (Sastry & Goldberg, 2004)

I Standard ECGA works better as noise increases.

I Local search based on ECGA model is better for weak noise.

I Hybrid combines the benefits.

13

Speedup via Adaptive Time ContinuationLarge pop., single epoch vs small pop., multiple epochs.Mutation vs. recombination (crossover).As noise and complexity increases, switch from mutation to crossover. [Lima et al, 2005; Lima et al 2006; Pelikan et al., 2006, Pelikan, 2008]

[US utility patent pending]Martin Pelikan, David E. Goldberg, Kumara Sastry Efficiency Enhancement of EDAs

4. Fitness Evaluation Relaxation

Motivation

I Fitness evaluation sometimes serious bottleneck.

I If we run out of other options, we need to speed up evaluation.

Fitness evaluation relaxation

I Create a model of fitness function and evaluate somesolutions with the model instead of the fitness function.

I Switch between accurate and slow fitness function andinaccurate and fast fitness function optimally.

I Adapt fitness function to always provide as accurateevaluations as necessary as quickly as possible.


4. Fitness Evaluation Relaxation (cont’d)

Two common effects of evaluation relaxation

I Speedup due to the reduction of the number of expensivefitness evaluations.

I Slowdown due to decreased accuracy of evaluation.

Comment

I The speedup should overshadow the slowdown.

I Only useful when evaluation is expensive.

I The cheaper the evaluation function, the less significantbenefits.


Example: Fitness Modeling in ECGA

Basic idea

I Extend probabilistic model to also approximate fitness.I For each linkage group, store a table with

I One row for each instance of variables in this group.I Each row stores a probability (from selected solutions).I Each row also stores a fitness contribution of the instance.

ExampleModel structure:[ 1 4 ] [2] [3]

Model parameters:X1 X4 p f X2 p f X3 p f0 0 0.45 +0.8 0 0.25 +0.2 0 0.50 +0.450 1 0.25 -0.2 0 0.75 -0.1 1 0.50 -0.451 0 0.10 -0.71 1 0.20 +0.1


Evaluation Relaxation: Results

Fitness modeling in EDAs

I Marginal product models of ECGA (Sastry et al., 2004, 2006).

I Bayesian network models of BOA (Pelikan & Sastry, 2004).

Evaluation Relaxation on Noisy Fitness Functions

Only 1-3% individuals need expensive evaluation

Results similar to previous studies [Sastry et al, 2004]

10

Supermultiplicative Speedups: EDAs + EETsSynergistic integration of probabilistic models built by EDAs and efficiency enhancement techniques

E.g., Evaluation relaxationEDAs learn structural model

Induce surrogate form from structural model

Estimate coefficients using standard methods.

Only 1-15% individuals need evaluation

Speed-Up: 30–53Compare to 1.2 using standard methods.

[Pelikan & Sastry, 2004; Sastry, et al, 2004; Sastry, et al, 2006]


5. Incremental and Sporadic Model Building

Motivation

I Model structure often takes long to learn.

I But model structure does not change fast.

I Can we exploit this fact?

Incremental and sporadic model building

I Reduce time spent in model building.I Incremental model building (Etxeberria et al., 1999)

I Change structure incrementally using the structure from theprevious iteration as a starting point.

I Sporadic model building (Pelikan et al., 2005)I Learn new structure only sometimes, otherwise use the old

structure from the previous iteration.


5. Incremental and Sporadic Model Building (cont’d)

Incremental EDAs

I Fully replace the population by a probabilistic model.

I This reduces memory complexity.

I Model is incrementally updated.

I Corresponds to steady-state genetic algorithms.I Examples

I PBIL (Baluja, 1994).I cGA (Harik et al., 1998).I iBOA (Pelikan et al., 2008).

The billion bit result

I Solution to noisy problem of over billion bits (Goldberg,Sastry, Llora, 2007).

I Done with well implemented and parallelized compact GA.


Incremental and Sporadic Model Building: Results

Sporadic model building for hBOA (Pelikan et al., 2005,2006)

100 200 400 800

2.5

3

3.5

4

4.5

Problem size

Spe

edup

of C

PU

tim

e pe

r ru

n

CPU speedup with SMB

(a) 2D spin glass.

50 100 200 400

2

2.5

3

3.5

4

4.5

Problem sizeS

peed

up o

f CP

U ti

me

per

run


(b) 3D spin glass.

Figure 10: CPU-time speedup for hBOA with SMB on 2D and 3D Ising spin glasses with ±Jcouplings and periodic boundary conditions. The structure-building period is tsb =

√n/2.

30

100 200 400 800

2.5

3

3.5

4

4.5

Problem size

Spe

edup

of C

PU

tim

e pe

r ru

n


(a) 2D spin glass.

50 100 200 400

2

2.5

3

3.5

4

4.5

Problem size

Spe

edup

of C

PU

tim

e pe

r ru

n


(b) 3D spin glass.

Figure 10: CPU-time speedup for hBOA with SMB on 2D and 3D Ising spin glasses with ±Jcouplings and periodic boundary conditions. The structure-building period is tsb =

√n/2.

30


6. Prior Knowledge Utilization

Motivation

I Practitioner may often have information about the problem.

I Can we use this knowledge to speed up the search?

Prior knowledge utilization

I Incorporate prior problem knowledge to speed up the search.I Two ways to incorporate prior knowledge

I Bias population.I Bias models.

I ExamplesI Inject known good solutions into the initial population.I Restrict model structure according to background knowledge.I Bias parameters of the model toward known promising regions.

I Can address all bottlenecks, but may also hurt if misleading.


Prior Knowledge Utilization: Results

Using prior knowledge (Baluja, 2006)

I Graph coloring with 4 colors.

I Restrict tree models to only allow edges in the graph.

I Results shown for 2000 nodes.

I Table shows number of satisfied constraints (bigger = better):

Connectivity No edges Any edges Restrict models

2 3,190 3,193 3,5005 7,799 7,812 8,048

10 15,438 15,446 15,63620 30,604 30,636 30,781


7. Learning from Experience

Motivation

I Consider solving many instances of the same problem class.

I Can we learn to solve these instances faster?

Learning from experience

I Use results from past runs to speed up future runsI High-quality solutions.I Models discovered and their relationship to problem definition.

I ExamplesI Constraint satisfaction problems (MAXSAT, spin glass).I Quadratic assignment problem.


Learning from Experience: Results

Learning from past models (Hauschild et al., 2008)0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 1

← 2 ← 3

← 4

← 5

← 6← 7

← 8

16 →

Original Ratio of Total Dependencies

Exe

cutio

n T

ime

Spe

edup

(a) 16× 16

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 1

← 2

← 3

← 4← 5

← 6

← 7← 8

← 9← 10

20 →


Exe

cutio

n T

ime

Spe

edup

(b) 20× 20

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 2

← 3

← 4← 5

← 6← 7

← 8

← 9← 10

← 11← 12

24 →


Exe

cutio

n T

ime

Spe

edup

(c) 24× 24

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 2

← 3

← 4← 5← 6

← 7← 8

← 9← 10← 11

← 12← 13

← 14

28 →

Original Ratio of Total DependenciesE

xecu

tion

Tim

e S

peed

up

(d) 28× 28

Figure 5: Execution-time speedup with model restriction based on the maximum distance on the2D Ising spin glass.

the speedups obtained with the distance-based model restriction are slightly better than with thePCM-based approach. We also see the same pattern of hBOA considering approximately the samepercentage of total dependencies using each of the methods.

5.4 Experiments with Distance-Based Bias on MAXSAT

In section 5.3 we saw that restricting model structure by distance leads to significant speedups ofhBOA on 2D Ising spin glasses. Can this approach be applied to other problems? In this sectionwe will attempt to answer this question by looking at combined-graph coloring problems encodedas instances of the MAXSAT problem.

To restrict models by maximum distance on MAXSAT we must first define a distance metric.Before defining the metric, we create a graph corresponding to the underlying MAXSAT instanceby creating a special node for each proposition and connecting all propositions that appear in thesame clause with an edge of length 1. Then, the distance between two propositions is defined as theshortest path between these propositions in the underlying graph. The distances can be computedusing an all-pairs shortest path algorithm. For example, consider the following MAXSAT instance:

19


Combined Effects: Multiplicative and Supermultiplicative

Combined effects

I Additive.

I Multiplicative.

I Supermultiplicative.

Key question

I How to combine different efficiency enhancement techniquesto maximize speedups?

I Can we exploit EDA components to make better combines?


Combined Effects: Multiplicative and Supermultiplicative

Multiplicative effects are significant

Technique Speedup

Sporadic model building 4Parallelization on 50 processors 50Learning from experience 2.5Total 4× 50× 2.5 = 500

Supermultiplicative effects are incredible

I The effects may be better than multiplicative.

I This happens for example when we utilize the model providedby an EDA to design more effective efficiency enhancements.

I Examples: Model-directed hybridization, evaluation relaxation.


Summary and Conclusions

What has been done

I Motivation for efficiency enhancement techniques (EETs).

I Overview of EETs for EDAs.

Conclusions

I EETs represents one of the most promising and importantdirections in EDAs and EC at large.

I EETs can provide huge speedups and allow solution ofextremely large and complex problems.

I EDAs provide new opportunitites for EETs compared tostandard evolutionary algorithms.


Acknowledgments

Acknowledgments

I NSF; NSF CAREER grant ECS-0547013.

I U.S. Air Force, AFOSR; FA9550-06-1-0096.

I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.


efficiency enhancement of estimation of distribution algorithms

Economy & Finance