nature-inspired optimization algorithms || differential evolution

6 Differential Evolution

Differential evolution (DE) is a vector-based metaheuristic algorithm that has goodconvergence properties. There are many DE variants, and they have been applied ina wide range of disciplines. This chapter provides a brief introduction to the basicdifferential evolution and its main implementation details and variants. Fundamentalconvergence properties in terms of population variance are also discussed.

6.1 Introduction

Differential evolution, or DE, was developed in R. Storn and K. Price in their nominalpapers in 1996 and 1997 [7,8]. DE is a vector-based metaheuristic algorithm, whichhas some similarity to pattern search and genetic algorithms due to its use of crossoverand mutation. In fact, DE can be considered as a further development to genetic algo-rithms with explicit updating equations, which make it possible to do some theoreticalanalysis. DE is a stochastic search algorithm with self-organizing tendency and doesnot use the information of derivatives. Thus, it is a population-based, derivative-freemethod. In addition, DE uses real numbers as solution strings, so no encoding anddecoding is needed.

As in genetic algorithms, design parameters in a d-dimensional search space arerepresented as vectors, and various genetic operators are operated over their bits ofstrings. However, unlike genetic algorithms, differential evolution carries out opera-tions over each component (or each dimension of the solution). Almost everythingis done in terms of vectors. For example, in genetic algorithms, mutation is carriedout at one site or multiple sites of a chromosome, whereas in differential evolu-tion, a difference vector of two randomly chosen population vectors is used to per-turb an existing vector. Such vectorized mutation can be viewed as a more efficientapproach from the implementation point of view. This kind of perturbation is carriedout over each population vector and thus can be expected to be more efficient. Similarly,crossover is also a vector-based, component-wise exchange of chromosomes or vectorsegments.

Apart from using mutation and crossover as differential operators, DE has explicitupdating equations. This also makes it straightforward to implement and to design newvariants.

Nature-Inspired Optimization Algorithms. http://dx.doi.org/10.1016/B978-0-12-416743-8.00006-3© 2014 Elsevier Inc. All rights reserved.

http://dx.doi.org/10.1016/B978-0-12-416743-8.00006-3

90 Nature-Inspired Optimization Algorithms

6.2 Differential Evolution

For a d-dimensional optimization problem with d parameters, a population of n solutionvectors are initially generated. We have xi , where i = 1, 2, . . . , n. For each solutionxi at any generation t , we use the conventional notation as

xti = (xt

1,i , xt2,i , . . . , xt

d,i ), (6.1)

which consists of d-components in the d-dimensional space. This vector can be con-sidered the chromosomes or genomes.

Differential evolution consists of three main steps: mutation, crossover, and selection.Mutation is carried out by the mutation scheme. For each vector xi at any time or

generation t , we first randomly choose three distinct vectors xp, xq , and xr at t (seeFigure 6.1), and then we generate a so-called donor vector by the mutation scheme

vt+1i = xt

p + F(xtq − xt

r ), (6.2)

where F ∈ [0, 2] is a parameter, often referred to as the differential weight. This requiresthat the minimum number of the population size is n ≥ 4. In principle, F ∈ [0, 2], butin practice, a scheme with F ∈ [0, 1] is more efficient and stable. In fact, almost allthe studies in the literature use F ∈ (0, 1).

From Figure 6.1, we can see that the perturbation δ = F(xq − xr ) to the vector xp

is used to generate a donor vector vi , and such perturbation is directed.The crossover is controlled by a crossover parameter Cr ∈ [0, 1], controlling the

rate or probability for crossover. The actual crossover can be carried out in two ways:binomial and exponential. The binomial scheme performs crossover on each of thed components or variables/parameters. By generating a uniformly distributed randomnumber ri ∈ [0, 1], the j th component of vi is manipulated as

ut+1j,i =

{v j,i if ri ≤ Cr ,

xtj,i otherwise,

j = 1, 2, . . . , d. (6.3)

This way, it can be decided randomly whether to exchange each component with adonor vector or not.

In the exponential scheme, a segment of the donor vector is selected, and this seg-ment starts with a random integer k with a random length L , which can include many

xp

δ

vixr

xq

Figure 6.1 Schematic representation of mutation vectors in differential evolution with movementδ = F(xq − xr ).

Differential Evolution 91

Differential EvolutionInitialize the population x with randomly generated solutionsSet the weight F ∈ [0, 2] and crossover probability Cr ∈ [0, 1]while (stopping criterion)

for i = 1 to n,For each xi, randomly choose 3 distinct vectors xp, xr and xr

Generate a new vector v by DE scheme (6.2)Generate a random index Jr ∈ {1, 2, ..., d} by permutationGenerate a randomly distributed number ri ∈ [0, 1]for j = 1 to d,For each parameter vj,i (jth component of vi), update

ut+1j,i =

vt+1j,i if ri ≤ Cr or j = Jr

xtj,i if ri > Cr and j = Jr,

endSelect and update the solution by (6.5)end

endPost-process and output the best solution found

Figure 6.2 Pseudo code of differential evolution.

components. Mathematically, this is to choose k ∈ [0, d −1] and L ∈ [1, d] randomly,and we have

ut+1j,i =

{vt

j,i for j = k, . . . , k − L + 1 ∈ [1, d],xt

j,i otherwise.(6.4)

Because the binomial is simpler to implement, we will use the binomial crossover inour implementation.

Selection is essentially the same as that used in genetic algorithms. We select thefittest and, for the minimization problem, the minimum objective value. Therefore,we have

xt+1i =

{ut+1

i if f (ut+1i ) ≤ f (xt

i ),

xti otherwise.

(6.5)

All three components can be seen in the pseudo code shown in Figure 6.2. It is worthpointing out here that the use of J is to ensure that vt+1

i �= xti , which may increase the

evolutionary or exploratory efficiency. The overall search efficiency is controlled bytwo parameters: the differential weight F and the crossover probability Cr .

6.3 Variants

Most studies have focused on the choice of F, Cr and n as well as the modificationsof (6.2). In fact, when generating mutation vectors, we can use many different waysof formulating (6.2), and this leads to various schemes with the naming convention:DE/x/y/z, where x is the mutation scheme (rand or best), y is the number of difference


vectors, and z is the crossover scheme (binomial or exponential). So, DE/Rand/1/∗means the basic DE scheme using random mutation and one difference vector witheither a binomial or exponential crossover scheme.

The basic DE/Rand/1/Bin scheme is given in (6.2). That is,

vt+1i = xt

p + F(xtq − xt

r ). (6.6)

If we replace the xtp by the current best xbest found so far, we have the so-called

DE/Best/1/Bin scheme,

vt+1i = xt

best + F(xtq − xt

r ). (6.7)

There is no reason that we should not use more than three distinct vectors. For example.if we use four different vectors plus the current best, we have the DE/Best/2/Bin scheme

vt+1i = xt

best + F(xtk1

+ xtk2

− xtk3

− xtk4

). (6.8)

Furthermore, if we use five different vectors, we have the DE/Rand/2/Bin scheme

vt+1i = xt

k1+ F1(xt

k2− xt

k3) + F2(xt

k4− xt

k5), (6.9)

where F1 and F2 are differential weights in [0, 1]. Obviously, for simplicity, we canalso take F1 = F2 = F .

Following the similar strategy, we can design various schemes. For example, thesevariants can be written in a generalized form

vt+1i = xt

k1+

m∑s=1

Fs · (xtk2(s)

− xtk3(s)

), (6.10)

where m = 1, 2, 3, . . . and Fs(s = 1, . . . , m) are the scale factors. The number ofvectors involved on the right-hand side is 2m + 1. In the preceding variants, m = 1and m = 2 are used [9,14].

On the other hand, there is another type of variant that uses an additional influenceparameter λ ∈ (0, 1). For example the DE/rand-to-best/1/∗ variant can be written as

vt+1i = λxt

best + (1 − λ)xtk1

+ F(xtk2

− xtk3

), (6.11)

which introduces an extra parameter λ. Again, this type of variants can be written in ageneralized form:

vt+1i = λxt

best + (1 − λ)xtk1

+ Fm∑

s=1

(xtk2(s)

− xtk3(s)

). (6.12)

In fact, more than 10 different schemes have been formulated; for details, refer to Priceet al. [9]

There are other good variants of DE, including self-adapting control parameters indifferential evolution (jDE) by Brest et al. [2], self-adaptive DE (SaDE) [6], and DEwith the eagle strategy [11].

Multiobjective variants also exist. For a detailed review, refer to Das and Suganthan[4] or Chakraborty [3].


6.4 Choice of Parameters

The choice of parameters is important. Both empirical observations and parametricstudies suggest that parameter values should be fine-tuned [4,5,9].

The scale factor F is the most sensitive one. Though F ∈ [0, 2] is acceptable intheory, F ∈ (0, 1) is more efficient in practice. In fact, F ∈ [0.4, 0.95] is a good range,with a good first choice being F = [0.7, 0.9].

The crossover parameter Cr ∈ [0.1, 0.8] seems a good range, and the first choicecan use Cr = 0.5.

It is suggested that the population size n should depend on the dimensionality d ofthe problem. That is, n = 5d to 10d. This may be a disadvantage for higher-dimensionalproblems because the population size can be large. However, there is no reason that afixed value (say, n = 40 or 100) cannot be tried first.

6.5 Convergence Analysis

There are relatively extensive studies on the convergence properties and the parametersensitivity of differential evolution in the literature [4,5,10,12–14]. For example, it hasbeen observed that the performance of DE is more sensitive to the value of the scalefactor F than to Cr . As another example, Xue et al. [10] analyzed the general variantin the form of Eq. (6.12), concluding that

2m F2 + (1 − λ)2 > 1, λ ∈ (0, 1). (6.13)

They also pointed out that λ should be reasonably large so as to give better convergence.The main results in the rest of this section are primarily based on the studies

by Zaharie [13,14]. One of the conditions to avoid premature convergence of anypopulation-based algorithms is to maintain a good degree of diversity in the population[1]. Based on the flat objective landscape (thus without selection pressure), the vari-ance of any DE variant can be computed using component-based expected varianceE(var(P)) of the population P = {P1, P2, . . . , Pn}. Because all the variants we men-tioned earlier are linear, the expected variance of DE depends linearly on the varianceof the current population X = (x1, . . . , xn). That is,

E(var(P)) = c var(X), (6.14)

where c is a constant.For the basic DE/rand/1/∗ schemes, Zaharie calculated the variance of the popu-

lation without selection and obtained

E(var(P)) =(

1 + 2pm F2 − pm(2 − pm)

n

)var(X), (6.15)

where pm is the mutation probability that is related to the crossover parameter Cr as

pm =⎧⎨⎩

Cr (1 − 1d ) + 1

d (binomial crossover),

1−Cdr

d(1−Cr )(exponential crossover).

(6.16)


If the initial population is X (t=0) = X(0), the variance of the population after t gener-ations/iterations becomes

var(P) =(

1 + 2F2 pm − pm(2 − pm)

n

)tvar(X(0)). (6.17)

This defines a critical value of F as

1 + 2F2 pm − pm(2 − pm)

n= 1, (6.18)

which gives

Fc =√

(2 − pm)

n. (6.19)

Therefore, the population variance decreases if F < Fc and increases if F > Fc.Obviously, the population variance will remain constant when F = Fc.

For example, for Cr = 0.5 and d = 10 in the binomial crossover case, we have

pm = Cr

(1 − 1

d

)+ 1

d= 0.5 ×

(1 − 1

10

)+ 1

10= 0.55. (6.20)

Empirical observations suggest that F ∈ [0.4, 0.95], with a good first choice of F = 0.9[4,5,9]. Alternatively, for a given pm , we can estimate the Cr value as

Cr = (pm − 1/d)

(1 − 1/d). (6.21)

So, for a moderate pm = 0.5 and d = 10, we have

Cr = (0.2 − 1/10)

(1 − 1/10)≈ 0.44. (6.22)

Interestingly, when d → ∞, Cr → pm .Obviously, these results are valid for the basic DE/Rand/1/∗. For other variants,

the formulas are similar, with some modifications including the parameter λ.These theoretical results mean that when var(P) is decreasing, the DE algorithm is

converging. In other words, when var(P) → 0, the algorithm has converged. However,such convergence can be premature because the converged state may not be the trueglobal optimality of the problem.

From the preceding formula (6.17), we can see that the variance or the populationdiversity also largely depends on the initial population, which is a significant drawbackof DE. In addition, some difficulty may arise in dealing with nonsparable nonlinearfunctions. Furthermore, there is no sufficient evidence that DE can deal with combina-torial problems efficiently due to the potential difficulty in discretizing the differentialoperators and defining effective neighborhoods [4,3,9].


6.6 Implementation

The implementation of differential evolution is relatively straightforward, comparingwith genetic algorithms. If a vector/matrix-based software package such as Matlab isused, the implementation becomes even more simple.

The simple version of the DE in Matlab/Octave given next is for unconstrainedoptimization, where Rosenbrock’s function with three variables is solved by default.To solve constrained optimization problems, this program can easily be extended incombination with the penalty method discussed elsewhere in the book.


References

[1] Beyer HG. On the dynamics of EAs without selection. In: Banzaf W, Reeves C, edi-tors. Foundations of genetic algorithms. San Francisco, CA, USA: Morgan KaufmannPublishers; 1999. p. 5–26.

[2] Brest J, Greiner S, Boskovic B, Mernik M, Zumer V. Self-adapting control parameters indifferential evolution: a comparative study on numerical benchmark functions. IEEE TransEvol Comput 2006;10(6):646–57.

http://refhub.elsevier.com/B978-0-12-416743-8.00006-3/h0005





















































[3] Chakraborty UK. Advances in differential evolution. Studies in computational intelli-gence, vol. 143. Heidelberg, Germany: Springer; 2008.

[4] Das S, Suganthan PN. Differential evolution: a survey of the state-of-the-art. IEEE TransEvol Comput 2011;15(1):4–31.

[5] Jeyakumar G, Velayutham CS. Convergence analysis of differential evolution variants onunconstrained global optimization functions. Int J Art Intell Appl 2011;2(2):116–26.

[6] Qin AK, Huang VL, Suganthan PN. Differential evolution algorithm with strategy adapta-tion for global numerical optimization. IEEE Trans Evol Comput 2009;13(2):398–417.

[7] Storn R. On the usage of differential evolution for function optimization. In: Biennialconference of the North American fuzzy information processing society (NAFIPS); 1996.p. 519–23.

[8] Storn R, Price K. Differential evolution: a simple and efficient heuristic for global opti-mization over continuous spaces. J Global Optimization 1997;11:341–59.

[9] Price K, Storn R, Lampinen J. Differential evolution: a practical approach to global opti-mization. Germany: Springer; 2005.

[10] Xue F, Sanderson AC, Graves RJ. Multiobjective differential evolution: algorithm, conver-gence analysis and applications. In: Proceedings of congress on evolutionary computation(CEC 2005), Edinburgh, vol. 1. Edinburgh, Scotland: IEEE Publication; 2005. p. 743–50.

[11] Yang XS, Deb S. Two-stage eagle strategy with differential evolution. Int J Bio-InspiredComput 2012;4(1):1–5.

[12] Zaharie D. A comparative analysis of crossover variants in differential evolution. In: GanzhaM, Paprzycki M, Pilichowski TP, editors. Proceedings of the international multiconfer-enceon computer science and information technology, October 15–17, 2007. Poland: Wisla;2007. p. 171–81. (Polskie Towarzystwo Informatyczne, PIPS, Katowice).

[13] Zaharie D. Influence of crossover on the behavior of the differential evolution algorithm.Appl Soft Comput 2009;9(3):1126–38.

[14] Zaharie D. Differential evolution: from theoretical results to practical insights. In: MatousekR, editor. Proceedings of the 18th international conference on soft computing, June 27–29,Brno. Czech Republic: Brno University of Technology Press; 2012. p. 126–31.


































































































































































































































nature-inspired optimization algorithms || differential evolution

Documents