performance of evolutionary algorithms on nk landscapes with nearest neighbor interactions and...

Performance of Evolutionary Algorithms on NKLandscapes with Nearest Neighbor Interactions

and Tunable Overlap

Martin Pelikan, Kumara Sastry, David E. Goldberg,Martin V. Butz, and Mark Hauschild

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)University of Missouri, St. Louis, MO

http://medal.cs.umsl.edu/pelikan@cs.umsl.edu

Download MEDAL Report No. 2009002

http://medal.cs.umsl.edu/files/2009002.pdf

M. Pelikan, K. Sastry, D.E. Goldberg, M.V. Butz, M. Hauschild NK Landscapes with Nearest Neighbors and Tunable Overlap

Motivation

Testing evolutionary algorithms

I Adversarial problems on the boundary of design envelope.I Random instances of important classes of problems.I Real-world problems.

This work bridges and extends two prior studies on randomproblems

I Random additively decomposable problems (rADPs)(Pelikan et al., 2006).

I NK landscapes (superset of rADPs)(Pelikan et al., 2007).

This study

I Propose the class of polynomially solvable NK landscapes withnearest neighbor interactions and tunable overlap.

I Generate large number of instances of proposed problem class.I Test evolutionary algorithms on the generated instances.I Analyze the results.

Outline

1. Additively decomposable problems

I NK landscapes.

I Random additively decomposable problems (rADPs).

2. NK with nearest neighbors and tunable overlap.

3. Experiments.

4. Conclusions and future work.

Additively Decomposable Problems (ADPs)

Additively decomposable problem (ADP)

I Fitness defined as

f(X1, X2, . . . , Xn) =m∑

fi(Si),

I n is the number of bits (variables),I m is the number of subproblems,I Si is the subset of variables in ith subproblem.

I ADPs play crucial role in design and analysis of GAs & EDAs.I All problems in this work are ADPs.

Two prior studies on ADPs serve as starting points

I Unrestricted NK landscapes.I Restricted random ADPs (rADPs).

NK Landscape

NK landscape

I Proposed by Kauffman (1989).

I Model of rugged landscape and popular test function.I An NK landscape is defined by

I Number of bits, n.I Number of neighbors per bit, k.I Set of k neighbors Π(Xi) for i-th bit, Xi.I Subfunction fi defining contribution of Xi and Π(Xi).

I The objective function fnk to maximize is then defined as

fnk(X0, X1, . . . , Xn−1) =n−1∑i=0

fi(Xi,Π(Xi)).

NK Landscape

Exmaple for n = 9 and k = 2:

Restricted Random ADPs (rADPs) of Bounded Order

Order-k rADPs with and without overlap

I Each subproblem contains k bits.

I Separable problems contain non-overlapping subproblems:Tight linkage: Shuffled:

I There may be overlap in o bits between neighboringsubproblems (may also be shuffled):Tight linkage: Shuffled:

Properties of NK Landscapes and rADPs

Common properties

I Additive decomposability.I Subproblems are complex (look-up tables).I High multimodality, complex structure.I Overlap further increases problem difficulty.I Challenge for most genetic algorithms and local search.

NK landscapes

I NP-completeness (can’t solve worst case in polynomial time).

I Using prior knowledge of problem structure, we can exactlysolve rADPs in polynomial time (dynamic programming) inO(2kn) evaluations.

I Multivariate EDAs can solve shuffled EDAs polynomially fast.

NK Landscapes with Nearest Neighbors & Tunable Overlap

NK Landscapes with Nearest Neighbors and Tunable Overlap

I Neighbors of each bit are restricted to the following k bits.

I For simplicity, the neighborhoods don’t wrap around.I Some subproblems may be excluded to provide a mechanism

for tuning the size of overlap.I Use parameter step ∈ {1, 2, . . . , k + 1}.I Only subproblems at positions i, i mod step = 0 contribute.

I Bit positions shuffled randomly to eliminate tight linkage.

High overlap (k = 2, step = 1):

Sequential Shuffled

Notestep = 1 maximizes the amount of overlap between subproblems.

Low overlap (k = 2, step = 2):

Sequential Shuffled

Notestep parameter allows tuning of the size of overlap.

No overlap (k = 2, step = 3):

Sequential Shuffled

Notestep = k + 1 implies separability (subproblems are independent).

I Nearest neighbors enable polynomial solvabilityI Deshuffle the string.I Use dynamic programming.

I Parameter step enables tunining the overlap betweensubproblems:

I For standard NK landscapes, step = 1.I With larger values of step, the amount of overlap between

consequent subproblems is reduced.I For step = k + 1, the problem becomes separable (the

subproblems are fully independent).

Problem Instances

Parameters

I n = 20 to 120.

I k = 2 to 5.

I step = 1 to k + 1 for each k.

Variety of instances

I For each (n, k, step), generate 10,000 random instances.

I Overall 1,800,000 unique problem instances.

Compared Algorithms

Basic algorithms

I Hierarchical Bayesian optimization algorithm (hBOA).

I Genetic algorithm with uniform crossover (GAU).

I Genetic algorithm with twopoint crossover (G2P).

Local search

I Single-bit-flip hill climbing (DHC) on each solution.

I Improves performance of all methods.

Niching

I Restricted tournament replacement (niching).

Results: Flips Until Optimum; hBOA; k = 2 and k = 5

20 40 60 80 10010

Problem size

k=2, step=1k=2, step=2k=2, step=3

20 40 60 80 10010

Problem size

k=3, step=1k=3, step=2k=3, step=3k=3, step=4

20 40 60 80 10010

Problem size

k=4, step=1k=4, step=2k=4, step=3k=4, step=4k=4, step=5

20 40 60 80 100

Problem size

k=5, step=1k=5, step=2k=5, step=3k=5, step=4k=5, step=5k=5, step=6

Figure 1: Average number of flips for hBOA.

To visualize the effects of k on performance of all compared algorithms, figure 6 shows thegrowth of the number of DHC flips with k for hBOA and GA on problems of size n = 120; theresults for UMDA are not included, because UMDA was incapable of solving many instances ofthis size in practical time. Two cases are considered: (1) step = 1, corresponding to standardNK landscapes and (2) step = k + 1, corresponding to the separable problem with no interactionsbetween the different subproblems. For both cases, the vertical axis is shown in log-scale to supportthe hypothesis that the time complexity of selectorecombinative genetic algorithms should growexponentially fast with the order of problem decomposition even when recombination is capableof identifying and processing the subproblems in an adequate problem decomposition. The resultsconfirm this hypothesis—indeed, the number of flips for all algorithms appears to grow at leastexponentially fast with k, regardless of the value of the step parameter.

5.5 Comparison of All Algorithms

How do the different algorithms compare in terms of performance? While it is difficult to comparethe exact running times due to the variety of computer hardware used and the accuracy of timemeasurements, we can easily compare other recorded statistics, such as the number of DHC flipsor the number of evaluations until optimum. The main focus is again on the number of DHCflips because for each fitness evaluation at least one flip is typically performed and that is why thenumber of flips is expected to be greater or equal than both the number of evaluations as well asthe product of the population size and the number of generations.

One of the most straightforward approaches to quantify relative performance of two algorithmsis to compute the ratio of the number of DHC flips (or some other statistic) for each probleminstance. The mean and other moments of the empirical distribution of these ratios can then beestimated for different problem sizes and problem types. The results can then be used to better

20 40 60 80 10010

Problem size

20 40 60 80 10010

Problem size

20 40 60 80 10010

Problem size

20 40 60 80 100

Problem size

Figure 1: Average number of flips for hBOA.

To visualize the effects of k on performance of all compared algorithms, figure 6 shows thegrowth of the number of DHC flips with k for hBOA and GA on problems of size n = 120; theresults for UMDA are not included, because UMDA was incapable of solving many instances ofthis size in practical time. Two cases are considered: (1) step = 1, corresponding to standardNK landscapes and (2) step = k + 1, corresponding to the separable problem with no interactionsbetween the different subproblems. For both cases, the vertical axis is shown in log-scale to supportthe hypothesis that the time complexity of selectorecombinative genetic algorithms should growexponentially fast with the order of problem decomposition even when recombination is capableof identifying and processing the subproblems in an adequate problem decomposition. The resultsconfirm this hypothesis—indeed, the number of flips for all algorithms appears to grow at leastexponentially fast with k, regardless of the value of the step parameter.

5.5 Comparison of All Algorithms

How do the different algorithms compare in terms of performance? While it is difficult to comparethe exact running times due to the variety of computer hardware used and the accuracy of timemeasurements, we can easily compare other recorded statistics, such as the number of DHC flipsor the number of evaluations until optimum. The main focus is again on the number of DHCflips because for each fitness evaluation at least one flip is typically performed and that is why thenumber of flips is expected to be greater or equal than both the number of evaluations as well asthe product of the population size and the number of generations.

One of the most straightforward approaches to quantify relative performance of two algorithmsis to compute the ratio of the number of DHC flips (or some other statistic) for each probleminstance. The mean and other moments of the empirical distribution of these ratios can then beestimated for different problem sizes and problem types. The results can then be used to better

I Growth appears to be polynomial w.r.t. problem size, n.

I Performance best with no overlap.

I Besides n, performance depends on both k and step.

Results: Comparison w.r.t. Flips

DHC steps (flips) until optimumn k step hBOA GA (uniform) GA (twopoint)

120 5 1 37,155 141,108 220,318120 5 2 40,151 212,635 353,748120 5 3 37,480 249,217 443,570120 5 4 27,411 195,673 310,894120 5 5 15,589 100,378 145,406120 5 6 9,607 35,101 47,576

Results: Comparison w.r.t. Evaluations

Number of evaluations until optimumn k step hBOA GA (uniform) GA (twopoint)

120 5 1 7,414 16,519 34,696120 5 2 9,011 25,032 56,059120 5 3 9,988 30,285 72,359120 5 4 8,606 24,016 51,521120 5 5 7,307 13,749 26,807120 5 6 7,328 6,004 10,949

Results: Flips Until Optimum; hBOA vs. GA; k = 5

20 40 60 80 1000.4

Problem size

20 40 60 80 1000.5

Problem sizeN

20 40 60 80 100

Problem size

20 40 60 80 100

Problem size

Figure 7: Ratio of the number of flips for GA with uniform crossover and hBOA.

Number of DHC flips until optimumn k step hBOA GA (uniform) GA (twopoint)

120 5 1 37,155 141,108 220,318120 5 2 40,151 212,635 353,748120 5 3 37,480 249,217 443,570120 5 4 27,411 195,673 310,894120 5 5 15,589 100,378 145,406120 5 6 9,607 35,101 47,576

Table 1: Comparison of the number of DHC flips until optimum for hBOA and GA. For all settings,the superiority of the results obtained by hBOA was verified with paired t-test with 99% confidence.

tributions of the best and the second best instances of a subproblem, and the noise models fitnesscontributions of other subproblems (Goldberg & Rudnick, 1991; Goldberg, Deb, & Clark, 1992).The smaller the signal-to-noise ratio, the larger the expected population size as well as the overallcomplexity of an algorithm. As was discussed above, the signal-to-noise ratio is influenced primarilyby the value of n; however, the signal-to-noise ratio also depends on the subproblems themselves.The influence of the signal-to-noise ratio on algorithm performance should be strongest for sepa-rable problems with uniform scaling where all subproblems have approximately the same signal;for problems with overlap and nonuniform scaling, other factors contribute to instance difficulty aswell. Another important factor influencing problem difficulty of decomposable problems is the scal-ing of the signal coming from different subproblems (Thierens, Goldberg, & Pereira, 1998). Nextwe examine the influence of the signal-to-noise ratio and scaling on performance of the comparedalgorithms in more detail.

Figure 14 visualizes the effects of signal-to-noise ratio on the number of flips until optimum

20 40 60 80 1000.4

Problem size

20 40 60 80 1000.5

Problem size

20 40 60 80 100

Problem size

20 40 60 80 100

Problem size

Figure 8: Ratio of the number of flips for GA with twopoint crossover and hBOA.

for n = 120, k = 5, and step ∈ {1, 6}; since UMDA was not capable of solving many of theseproblem instances in practical time, the results for UMDA are not included. The figure shows theaverage number of DHC flips until optimum for different percentages of instances with smallestsignal-to-noise ratios. To make the visualization more effective, the number of flips is normalizedby dividing the values by the mean number of flips over the entire set of instances. The resultsclearly show that for the separable problems (that is, step = 6), the smaller the signal-to-noiseratio, the greater the number of flips. However, for problem instances with strong overlap (that is,step = 1), problem difficulty does not appear to be directly related to the signal-to-noise ratio andthe primary source of problem difficulty appears to be elsewhere.

Figure 15 visualizes the influence of scaling on the number of flips until optimum. The figureshows the average number of flips until optimum for different percentages of instances with smallestsignal variance. The larger the variance of the signal, the less uniformly the signal is distributed be-tween the different subproblems. For the separable problem (that is, step = 6), the more uniformlyscaled instances appear to be more difficult for all compared algorithms than the less uniformlyscaled ones. For instances with strong overlap (that is, step = 1), the effects of scaling on algorithmperformance are negligible; again, the source of problem difficulty appears to be elsewhere.

Two observations related to the signal-to-noise ratio and scaling are somewhat surprising:(1) Although scalability of selectorecombinative GAs gets worse with nonuniform scaling of sub-problems, the results indicate that the actual performance is better on more nonuniformly scaledproblems. (2) Performance of the compared algorithms on problems with strong overlap does notappear to be directly affected by signal-to-noise ratio or signal variance. How could these resultsbe explained?

We believe that the primary reason why more uniformly scaled problems are more difficult forall tested algorithms is related to effectiveness of recombination. More specifically, practically anyrecombination operator becomes more effective when the scaling is highly nonuniform; on the other

I hBOA outperforms both versions of GA.

I Differences grow faster than polynomially with n.

I Besides n, differences depend on both k and step.

Results: Correlations Between Algorithms

step = 1 (high overlap):

step = 6 (separable):

I GA versions more similar than hBOA with GA.I Correlations stronger for problems with more overlap/less

structure.

Problem Difficulty: Signal-to-Noise and Signal Variance

Signal and noise

I Signal: The difference between fitness of the best and the 2nd

best solutions to a subproblem.I Noise: Models contributions of other subproblems.

Signal-to-noise ratio

I Decision making done by GA is stochastic.I The larger the signal-to-noise ratio, the easier the decision

making.

Signal variance

I Sequential vs. parallel convergence.I How much do contributions of different subproblems differ?I One way to model this is to look at the variance of the signal.

step = 1 (high overlap) step = 6 (separable)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 14000

Percentage of overlap

k=5k=4k=3k=2

(a) hBOA

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 12000

128000

256000

512000

1024000

k=5k=4k=3k=2

(b) GA (uniform)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 12000

128000

256000

512000

1024000

k=5k=4k=3k=2

(c) GA (twopoint)

Figure 13: Influence of overlap for n = 120 and k = 5 (step varies with overlap).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

Signal to noise percentile (% smallest)

GA (twpoint)GA (uniform)hBOA

(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

(b) step = 6

Figure 14: Influence of signal-to-noise ratio on the number of flips for n = 120 and k = 5.

Acknowledgments

This project was sponsored by the National Science Foundation under CAREER grant ECS-0547013, by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF,under grant FA9550-06-1-0096, and by the University of Missouri in St. Louis through the HighPerformance Computing Collaboratory sponsored by Information Technology Services, and theResearch Award and Research Board programs.

The U.S. Government is authorized to reproduce and distribute reprints for government pur-poses notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reflectthe views of the National Science Foundation, the Air Force Office of Scientific Research, or theU.S. Government. Some experiments were done using the hBOA software developed by MartinPelikan and David E. Goldberg at the University of Illinois at Urbana-Champaign and most exper-iments were performed on the Beowulf cluster maintained by ITS at the University of Missouri inSt. Louis.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 14000

k=5k=4k=3k=2

(a) hBOA

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 12000

128000

256000

512000

1024000

k=5k=4k=3k=2

(b) GA (uniform)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 12000

128000

256000

512000

1024000

k=5k=4k=3k=2

(c) GA (twopoint)

Figure 13: Influence of overlap for n = 120 and k = 5 (step varies with overlap).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

(b) step = 6

Figure 14: Influence of signal-to-noise ratio on the number of flips for n = 120 and k = 5.

Acknowledgments

This project was sponsored by the National Science Foundation under CAREER grant ECS-0547013, by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF,under grant FA9550-06-1-0096, and by the University of Missouri in St. Louis through the HighPerformance Computing Collaboratory sponsored by Information Technology Services, and theResearch Award and Research Board programs.

The U.S. Government is authorized to reproduce and distribute reprints for government pur-poses notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reflectthe views of the National Science Foundation, the Air Force Office of Scientific Research, or theU.S. Government. Some experiments were done using the hBOA software developed by MartinPelikan and David E. Goldberg at the University of Illinois at Urbana-Champaign and most exper-iments were performed on the Beowulf cluster maintained by ITS at the University of Missouri inSt. Louis.

I For separable problems, noise clearly matters.

I For problems with overlap, noise appears insignificant.

step = 1 (high overlap) step = 6 (separable)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

Signal variance percentile (% smallest)

GA (twopoint)GA (uniform)hBOA

(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

(b) step = 6

Figure 15: Influence of signal variance on the number of flips for n = 120 and k = 5.

References

Ackley, D. H. (1987). An empirical study of bit vector function optimization. Genetic Algorithmsand Simulated Annealing , 170–204.

Aguirre, H. E., & Tanaka, K. (2003). Genetic algorithms on nk-landscapes: Effects of selection,drift, mutation, and recombination. In Raidl, G. R., et al. (Eds.), Applications of EvolutionaryComputing: EvoWorkshops 2003 (pp. 131–142).

Altenberg, L. (1997). NK landscapes. In Back, T., Fogel, D. B., & Michalewicz, Z. (Eds.), Hand-book of Evolutionary Computation (pp. B2.7:5–10). Bristol, New York: Institute of PhysicsPublishing and Oxford University Press.

Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic searchbased function optimization and competitive learning (Tech. Rep. No. CMU-CS-94-163). Pitts-burgh, PA: Carnegie Mellon University.

Barnes, J. W., Dimova, B., Dokov, S. P., & Solomon, A. (2003). The theory of elementarylandscapes. Appl. Math. Lett., 16 (3), 337–343.

Cheeseman, P., Kanefsky, B., & Taylor, W. M. (1991). Where the really hard problems are.Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-91), 331–337.

Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian approach to learning Bayesiannetworks with local structure (Technical Report MSR-TR-97-07). Redmond, WA: MicrosoftResearch.

Choi, S.-S., Jung, K., & Kim, J. H. (2005). Phase transition in a random NK landscape model.pp. 1241–1248.

Coffin, D. J., & Smith, R. E. (2007). Why is parity hard for estimation of distribution algorithms?Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2007), 624–624.

Deb, K., & Goldberg, D. E. (1991). Analyzing deception in trap functions (IlliGAL Report No.91009). Urbana, IL: University of Illinois at Urbana-Champaign, Illinois Genetic AlgorithmsLaboratory.

Friedman, N., & Goldszmidt, M. (1999). Learning Bayesian networks with local structure. InJordan, M. I. (Ed.), Graphical models (pp. 421–459). Cambridge, MA: MIT Press.