injecting external solutions into cma-es · arxiv:1110.4181v1 [cs.lg] 19 oct 2011 apport de...

15
arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical Methods INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Injecting External Solutions Into CMA-ES Nikolaus Hansen N° 7748 October 2011

Upload: others

Post on 14-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

arX

iv:1

110.

4181

v1 [

cs.L

G]

19 O

ct 2

011

appor t de r ech er ch e

ISS

N02

49-6

399

ISR

NIN

RIA

/RR

--77

48--

FR

+E

NG

Optimization, Learning and Statistical Methods

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Injecting External Solutions Into CMA-ES

Nikolaus Hansen

N° 7748

October 2011

Page 2: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical
Page 3: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Centre de recherche INRIA Saclay – Île-de-FranceParc Orsay Université

4, rue Jacques Monod, 91893 ORSAY CedexTéléphone : +33 1 72 92 59 00

Injecting External Solutions Into CMA-ES

Nikolaus Hansen

Theme : Optimization, Learning and Statistical MethodsApplied Mathematics, Computation and Simulation

Équipe-Projet TAO

Rapport de recherche n° 7748 — October 2011 — 12 pages

Abstract: This report considers how to inject external candidate solutionsinto the CMA-ES algorithm. The injected solutions might stem from a gradientor a Newton step, a surrogate model optimizer or any other oracle or searchmechanism. They can also be the result of a repair mechanism, for example torender infeasible solutions feasible. Only small modifications to the CMA-ESare necessary to turn injection into a reliable and effective method: too longsteps need to be tightly renormalized. The main objective of this report is toreveal this simple mechanism.

Depending on the source of the injected solutions, interesting variants ofCMA-ES arise. When the best-ever solution is always (re-)injected, an elitistvariant of CMA-ES with weighted multi-recombination arises. When all so-lutions are injected from an external source, the resulting algorithm might beviewed as adaptive encoding with step-size control.

In first experiments, injected solutions of very good quality lead to a conver-gence speed twice as fast as on the (simple) sphere function without injection.This means that we observe an impressive speed-up on otherwise difficult tosolve functions. Single bad injected solutions on the other hand do no signifi-cant harm.

Key-words: CMA-ES, external solutions, gradient, injection, repair

Page 4: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Résumé : Pas de résumé

Mots-clés : Pas de motclef

Page 5: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 3

Contents

1 Introduction 3

2 Injection in the CMA-ES Algorithm 4

3 Discussion 7

4 Preliminary Experiments 9

5 Further Considerations 9

6 Summary and Conclusion 9

1 Introduction

The CMA-ES (Covariance Matrix Adaptation Evolution Strategy [4, 3, 2]) is asearch stochastic algorithm for non-convex continuous optimization in a black-box setting, where we want minimize the objective function (or fitness function)

f : Rn → R, x 7→ f(x)

without exploiting any a priori specified structure of f . The CMA-ES algorithmentertains a multivariate normal sampling distribution for x and updates thedistribution parameters with a comparatively sophisticated procedure, see Fig-ure 1. While the algorithm is quite robust to large irregularities in the objectivefunction f , even small changes of the update procedure can lead to a dramaticbreak down of its performance. This property has been perceived as a mainweakness of the algorithm.

In this report we show how to make CMA-ES robust to (almost) arbitrarychanges of the solutions used in the update procedure. In other words, we revealthe measures to properly inject external proposals for either candidate solutionpoints or directions into the CMA-ES algorithm by replacing some of the in-ternal solutions originally sampled by CMA-ES, or, equivalently, use solutionsthat are modified in any desired way (for example to make them feasible).

External or modified proposal solutions or directions can have a variety ofsources.

• a gradient or Newton direction;

• an improved solution, for example the result of a local search started froma solution sampled by CMA-ES (Lamarckian learning), which allows touse CMA-ES in the context of memetic algorithms;

• a repaired solution, for example from a previously infeasible solution;

• an optimal solution of a surrogate model built from already evaluatedsolutions;

• the best-ever solution seen so far;

• proposals from any algorithm running in parallel to CMA-ES (migration).

RR n° 7748

Page 6: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 4

Because injecting a single bad solution essentially corresponds to decreasing thepopulation size by one, no particular care needs to be taken that only (excep-tionally) good solutions are introduced. Any promising source of solutions mightbe used. Within CMA-ES, solutions are sampled symmetrically and thereforealso virtually never lead to a systematic improvement before selection.

When all originally sampled internal solutions are replaced, the resultingprocedure resembles adaptive encoding [6]. The main differences to adaptiveencoding are: (i) external solutions are represented in the original (phenotypic)space (ii) step-size control remains in place and (iii) the parameter setting isdifferent. Using a different (genotyp) representation to generate new externalsolutions is the crucial idea of adaptive encoding and can also be employed here.

The modifications introduced in CMA-ES are small but will often be decisive.They are outlined in the next section.

Notations Throughout this report, we use for E‖N (0, I)‖ =√2Γ(n+1

2 )/Γ(n2 )the approximation

√n(1− 1

4n + 121n2

). The notation a ∧ bc + d denotes the

minimum of a and bc+ d.

2 Injection in the CMA-ES Algorithm

The CMA-ES algorithm that tolerates injected solutions is displayed in Fig. 1.New parts are highlighted with shaded background. Injected solutions replacexi in (1). The decisive function αclip(., .) used in (3) and (6) reads

αclip(c, x) = 1 ∧ c

x. (12)

However, different choices for αclip(., .) are possible, or even desirable, and dis-cussed below. With parameter setting cy = cmy = ∆max

σ = ∞, the originalCMA-ES is recovered (in this case, Equations (3) and (6) are meaningless).

An injected direction, v, is used by setting

xi = mt + σt

√n

‖C−1/2v‖v . (13)

If v represents a gradient direction, using instead

xi = mt + σt

√n

‖C1/2v‖Cv (14)

seems to suggest itself. Remark that internal perturbations in CMA-ES followC1/2N (0, I), where N (0, I) is isotropic and ‖N (0, I)‖ ≈ √n.1

The decisive operation for injected solutions is given in Equation (3) ofFigure 1. Their Mahalanobis distance to the distribution mean is clipped atcy ≈

√n+2, preventing artificially long steps to enter the adaptation procedure.

Additionally, but in most cases rather irrelevant after clipping the single steps,∆max

σ (see Table 1) keeps possible step-size changes below the factor exp(0.6) ≈1.82. Otherwise, the depicted algorithm is not further modified (unless cmy is set

1The symmetric Cholesky factor C1/2 does not supply a rotated coordinate system as

desired for adaptive encoding. In this case, we sample using BDN (0, I) ∼ C1/2N (0, I),where BD : Rn → R

n is the linear decoding.

RR n° 7748

Page 7: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 5

xi ∼ mt + σt ×Ct1/2N (0, I) for i = 1, . . . , λ (1)

yi =xi:λ −mt

σtwhere f(x1:λ) ≤ · · · ≤ f(xµ:λ) ≤ f(xµ+1:λ) . . . (2)

yi ← αclip

(

cy, ‖Ct−12yi‖

)

× yi if xi:λ was injected (3)

∆m =

xm −mt

σtif xm was injected

µ∑

i=1

wiyi otherwise(4)

mt+1 = mt + cmσt∆m (5)

∆m = αclip

(

cmy ,√µw‖Ct−

12∆m‖

)

×∆m if xm was injected or. . .(6)

pt+1σ = (1− cσ)p

tσ +

cσ(2− cσ)µw Ct−12 ∆m (7)

hσ =

{

1 if ‖pt+1σ ‖2 < n(1 − (1− cσ)

2(t+1))(2 + 4/(n+ 1))

0 otherwise(8)

pt+1c = (1− cc)p

tc + hσ

cc(2− cc)µw ∆m (9)

Ct+1 = (1− c′1 − cµ)Ct + c1 p

t+1c pt+1

cT

︸ ︷︷ ︸

rank one update

+ cµ

µ∑

i=1

wiyiyTi (10)

σt+1 = σt × exp

(

∆maxσ ∧ cσ

( ‖pt+1σ ‖

E‖N (0, I)‖ − 1

))

(11)

Figure 1: Update equations for the state variables in the (µ/µw, λ)-CMA-ES

with iteration index t = 0, 1, 2, . . . and mt ∈ Rn, σt ∈ R+, Ct ∈ R

n×n positivedefinite, pt

σ,ptc ∈ R

n and pt=0σ = pt=0

c = 0, Ct=0 = I and parameters takenfrom Table 1. We have additionally c′1 = c1(1− (1−hσ

2)cc(2− cc). The chosenordering of equations allows to remove the time index. The symbol xi:λ is thei-th best of the solutions x1, . . . ,xλ. The “optimal” ordering of (3), (4), (5) and(6) is an open issue.

<∞). We also use the original internal strategy parameters for CMA-ES whichseems particularly reasonable if only a smaller fraction of internal solutions isreplaced in (1).

Strong injection: mean shift If we want to make a strong impact with aninjection, we can shift the mean. We compute

∆m =xm −mt

σt(15)

from the injected solution xm as in (4). When no further solutions xi are used,the remaining update equations can be performed with cµ = 0. With cm = 1(the default), mt+1 = xm. In order to prevent an unrealistic large shift of mt

RR n° 7748

Page 8: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 6

Table 1: Default parameter values of (µ/µw, λ)-CMA-ES taken from [5], where

by definition∑µ

i=1 |wi| = 1 and µ−1w

=∑µ

i=1 w2i and a − b ∧ c+ d := min(a −

b, c+ d). Only population size λ is possibly left to the users choice (see also [1])

λ = 4 + ⌊3 lnn⌋µ =

⌊λ2

wi =ln(

λ+1

2

)−ln i

∑µj=1

(

ln(

λ+1

2

)−ln j

)

cy =√n+ 2n/(n+ 2)

cmy =√2n+ 2n/(n+ 2)

cm = 1

cσ = µw+2n+µw+3

dσ = 1 + cσ + 2 max(

0,√

µw−1n+1 − 1

)

cc = 4+0×µw/nn+4+0×2µw/n

c1 = αcov min(1,λ/6)(n+1.3)2+µw

cµ = 1− c1 ∧ αcovµw−2+1/µw

(n+2)2+αcovµw/2

αcov = 2 could be chosen < 2, e.g. αcov = 0.5 for noisy problems∆max

σ = 1.0 or even 0.6

in (5) we might exchange the order of (5) and (6), therefore applying the lengthadjustment for ∆m in (6) before the actual mean shift (5).

Parameter setting The setting of cy ≈√n + 2 is motivated in Figure 2.

The figure depicts the relative deviation of ‖C−1/2yi‖ from its expected value.Given its original distribution from CMA-ES, less than 10% of the yi in (3) areactually clipped. For n > 10, the fraction is smaller than 1%.

The typical length of√µw∆m depends on

õw and is often essentially larger

than√n. Therefore the setting cmy =

√n + 2 leads to a visible impairment of

the otherwise unmodified CMA-ES. This suggests that cmy ≈√2n + 2 could

be a reasonable choice, however the setting of cmy yet needs further empiricalvalidation.

In principle, the order of Equations (3), (4), (5) and (6) can be changedunder the constraint that the computation of ∆m in (4) is done before ∆m isused in (5) and (6). More specifically, four variants are available, implied by theexchange of (3) and (4), or (5) and (6), respectively, (another variant that usesunclipped yi for mt+1 but clipped ones for ∆m in the further computationsis possible, however not by simple exchange of equations). The variants differin whether ∆m is computed from clipped yi and whether ∆m itself is clippedbefore or after to compute mt+1. All these variations seem feasible, because anunconstraint shift of mt+1 is per se not critical for the algorithm behavior.

RR n° 7748

Page 9: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 7

100 101 102 1031.0

0.5

0.0

0.5

1.0

1.5

2.0

2.5

100 101 102 1030.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Figure 2: Relative deviation of ‖C−1/2yi‖ (top) and ‖C−1/2yi‖2 (bottom) fromits expected value plotted versus dimension. All values are normalized as x 7→x/E‖N (0, I)‖ − 1 (top) and x2 7→ (x2/n − 1)/2 (bottom), compare also (11)for cσ = dσ = 1. Plotted are statistics of the random variable x (top) and x2

(bottom), where x2 follows a chi square distribution with n degrees of freedom,like ‖C− 1

2yi‖2 does without injections under neutral selection. Plotted againstn are modal value (x =

√n− 1 and x2 = 0 ∨ n − 2 respectively as dots), the

approximation of the expected value√n and the expected value n respectively

(thin solid), the 1, 10, 50, 90, and 99%tile (dashed) and x = cy =√n+2n/(n+

2).

3 Discussion

All update equations starting from (4) are formulated relative to the originalsample distribution. This means we are, in principle, free to change the distri-

RR n° 7748

Page 10: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 8

bution before each iteration step. Many reasonable adjustments are possible.A mean shift2 (injecting xm resembles an arbitrary mean shift with additionalfurther updates based on this mean shift), changing the step-size σ, increasingsmall variances in C. . . The modification advised in this report is necessary, ifxi is not in accordance with the distribution in (1).

With the introduced modification(s) the CMA-ES can also be used in theadaptive encoding context (however using σ for the encoding-decoding mightonly turn out to be useful if the encoding is an affine linear transformation). Inthe original adaptive encoding [6], different normalization measures have beentaken for the cumulation in pc and for the covariance matrix update, and thestep-size adaptation has been entirely omitted. In this report here, by defaultthe tight normalization of the single steps is the only measure (unless xm isinjected). The new normalization replaces the multiplication of the single stepswith αi in [6] in the covariance matrix update. The new normalization is tighter:choosing cy ≈ 2

√n, instead of cy ≈

√n + 2, would be comparable to [6]. The

setting therefore allows to apply step-size control reliably. However, the newsetting is less tight for the mean step, as cmy =

√n (without taking a minimum

in (6)) would be comparable to [6], while we use now cmy =∞ unless an explicitemean-shift is performed. This setting might fail, if all new points point into thesame direction viewed from xm (suggesting cmy ≈ 2

√n as a compromise). The

new setting seems to be slightly simpler and might turn out preferable also inthe adaptive encoding setting, even when leaving aside step-size adaptation.

Due to the minor modifications we do not expect an adverse interferencewith negative updates of the covariance matrix as in active CMA-ES [7, 5].On the contrary, limiting the length of steps that enter the negative updatemitigates a principle design flaw of negative updates: long steps tend to beworse (and therefore enter the negative update with a higher probability) andtend to produce a stronger update effect, both just because they are long andnot because they indicate an undesirable direction.

Finally, it is well possible to inject the same solution several times, for ex-ample based on its superior quality. One might, for example, consider to un-conditionally (re-)inject the best-ever solution in every iteration. Then, an “eli-tist algorithm with comma selection” arises—introducing an easy and appeal-ing way to implement elitism in evolution strategies with weighted multi-recombination.

A generalized approach to normalize injected solutions compares the em-

pirical CDF of the lengths lti = ‖Ct−12yi‖, i = 1, . . . , λ, t = 1, 2, . . . , with a

desired CDF, F , and reduces the length of yi such that the observed relativefrequency of lengths larger than or equal to lti is below, say, 1.2(1 − F (lti)).In (12), the desired CDF is very crudely chosen to be F (x) = 1x<cy . Thedesired operation in theory is lti ← F−1

desired(Ftrue(lti)). A practicable imple-

mentation could compare lti =√2 × (‖Ct−

12yi‖ − E‖N (0, I)‖) with the stan-

dard normal distribution F , in that a correction is applied if lti > 1 and1t1λ

j,k 1lkj ≥lti> 1.2(1−F(lti)).

2However a mean shift without further updates will impair the meaning of the evolution

paths.

RR n° 7748

Page 11: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 9

4 Preliminary Experiments

Preliminary empirical investigations have been conducted by injecting a singleslightly perturbed optimal solution. This is virtually the best case scenariowhen the distribution mean m is far away from the optimum. This becomes theworst case scenario, when m is closer to the optimum than the perturbation.Single runs on the sphere function and the Rosenbrock function are shown inFigures 3 and 4.The upper black graph depicts the worst iteration-wise functionvalue and reveals the (sharp) transition between best and worst case scenarioby showing convergence first and stagnation afterwards.

The improvement on the sphere function is limited to a factor of about two,namely due to the maximal iteration-wise step-size decrement. This limit can beexceeded by additionally decreasing σ when the injected solution is trustworthy,has a good quality, and is close to m (in the norm defined by σ2C). The preciseimplementation (the question of what is close to m) might also depend on µw.As to be expected, the effect of injecting single bad solutions (worst case scenarioin the later stage) is negligible.

The improvement on the Rosenbrock function exceeds our expectation: wesee a speed-up by a factor of almost n, simply because the speed is similar to theone on the sphere function with injection. Again, this speed-up can be furtherenhanced by step-size decrements.

Experiments for an injected mean-shift have not been conducted yet.Experiments injecting always the best-ever solution reveal a moderate per-

formance impairment when searching multimodal landscapes.

5 Further Considerations

Another case of application is temporary freezing of some variables (coordinates)to the same value in all candidate solutions xi. (This decreases the length of thestep in the Euclidean norm, but due to correlations in the distribution this canlead to exceptionally long steps in Mahalanobis distance even if the frozen valueis borrowed from m). In this case, it is also advisable to slightly modify thestep-size equations (8) and (11). Given j variables are frozen, these variablesare not taken into account for computing ‖pt+1

σ ‖ and consequently n− j is usedinstead of n in (8) and E‖N (0, I)‖ is computed for n − j dimensions in (11).After one iteration, the respective components of yi will be zero (given cm = 1)and also cy should be set as for dimension n − j. In principle, all parametersfrom Table 1 can then be set as for dimension n − j. Additionally, in order toavoid numerical problems, the diagonal elements of the frozen coordinates of thecovariance matrix should be kept at least in the order of the smallest eigenvalue.

6 Summary and Conclusion

Using candidate proposals in the CMA-ES that do not directly stem from thesample distribution of CMA-ES can often lead to a failure of the algorithm.The effective counter measures however turn out to be comparatively simple:only the appearance of large steps needs to be tightly controlled, where large isdefined w.r.t. the original sample distribution. The possibility to inject any can-didate solution is valuable in many situations. In case of bounds or constraints

RR n° 7748

Page 12: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 10

10-D 40-D

0 200 400 600 800 1000 1200 140010-1110-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=9.9892463060038842e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 200 400 600 800 1000 1200 14000.06

0.04

0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

x(0)=7.35353797497e-06

x(1)=2.17470475015e-06x(2)=-2.07609480701e-06

x(3)=7.71977254459e-06x(4)=8.97004655478e-06

x(5)=6.60234316936e-06x(6)=6.22582690494e-06

x(7)=-5.14687475832e-06x(8)=-2.33979804886e-06

x(9)=3.16877079838e-06

Object Variables (mean, 10-D, popsize~10)

0 200 400 600 800 1000 1200 1400function evaluations

10-1

100

101Scaling (All Main Axes)

0 200 400 600 800 1000 1200 1400function evaluations

10-1

100

101

0 1

2 3 4

5

6 7 8

9

Standard Deviations in All Coordinates0 50010001500200025003000350040004500

10-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=8.9126160453020577e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 1000 2000 3000 4000 50000.10

0.05

0.00

0.05

0.10

0.15

x(0)=-3.22317260454e-06

x(1)=-5.99944783379e-06

x(2)=-3.51966387927e-06

x(3)=5.88019574122e-06

x(4)=-2.10066178158e-06

x(5)=4.01539253647e-06x(6)=2.57169678637e-06x(7)=3.23962985559e-06x(8)=4.91524377246e-06

x(9)=3.00572221952e-08

x(10)=1.46759722195e-06

x(11)=3.24392253462e-07

x(12)=6.46585205545e-06x(13)=5.1974608443e-06

x(14)=-2.11585115984e-06

x(15)=-5.68112726929e-06x(16)=-4.42098170422e-06

x(17)=-1.25011910904e-06

x(18)=-6.09987854172e-06

x(19)=4.03170949653e-06

x(20)=1.27159522457e-06

x(21)=5.65519767469e-06

x(22)=8.25066216912e-06

x(23)=-2.50000689444e-06

x(24)=-3.96185066364e-06

x(25)=-2.60482614585e-06

x(26)=-7.41837084235e-06

x(27)=1.76013889065e-06

x(28)=7.12845235546e-07x(29)=1.07195190623e-06

x(30)=6.83608591971e-06

x(31)=-6.98466363029e-06x(32)=-9.8179960705e-06

x(33)=6.60311087842e-07

x(34)=-2.03426135631e-06

x(35)=2.45607472668e-06x(36)=1.28862956995e-06

x(37)=-3.12200024236e-06

x(38)=2.85298030444e-06

x(39)=1.19982524954e-06

Object Variables (mean, 40-D, popsize~15)

0 50010001500200025003000350040004500function evaluations

10-1

100

101Scaling (All Main Axes)

0 1000 2000 3000 4000 5000function evaluations

10-1

100

101

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Standard Deviations in All Coordinates

0 100 200 300 400 500 600 700 80010-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=6.127193062594945e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 100 200 300 400 500 600 700 800 9000.06

0.04

0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

x(0)=1.59676256188e-06x(1)=2.24655222661e-06

x(2)=1.28795905854e-05x(3)=1.86981396626e-05

x(4)=-4.21970801208e-06

x(5)=-3.24661926043e-07x(6)=-1.92709106625e-06

x(7)=-5.27549301363e-06x(8)=-4.44916580737e-06

x(9)=1.20122085425e-05

Object Variables (mean, 10-D, popsize~10)

0 100 200 300 400 500 600 700 800function evaluations

10-1

100

101Scaling (All Main Axes)

0 100 200 300 400 500 600 700 800 900function evaluations

10-1

100

101

0 1 2 3

4 5

6 7 8 9

Standard Deviations in All Coordinates0 500 1000 1500 2000 2500 3000

10-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=9.8967894666749705e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 500 1000 1500 2000 2500 3000 35000.10

0.05

0.00

0.05

0.10

0.15

x(0)=3.16091586283e-06

x(1)=-1.14232796812e-07

x(2)=2.99567467577e-06

x(3)=-8.3314631784e-07

x(4)=-5.14419377166e-06

x(5)=4.94370274661e-06

x(6)=-4.57624521422e-06x(7)=-4.52134632741e-06

x(8)=5.51795006253e-06

x(9)=7.00097114785e-07

x(10)=5.96984815642e-06

x(11)=-7.16834163793e-06

x(12)=8.66856133568e-06

x(13)=-4.59459280242e-08

x(14)=-4.45716462033e-06

x(15)=-3.6567661543e-08x(16)=4.13855250413e-07

x(17)=-5.48213304334e-06

x(18)=-2.69564319876e-06

x(19)=2.5848229757e-06x(20)=3.29197099661e-06x(21)=4.00461837867e-06

x(22)=8.45983027081e-07

x(23)=-2.85006602013e-06

x(24)=1.34975202489e-06x(25)=1.91507979104e-06

x(26)=4.38794284399e-06

x(27)=-1.57696023796e-07

x(28)=-5.67500588115e-06

x(29)=1.56220769775e-07

x(30)=4.01199487416e-06

x(31)=1.91164817824e-06x(32)=1.08289855157e-06

x(33)=-2.52121920682e-06

x(34)=1.89602121391e-06

x(35)=6.74273805234e-06

x(36)=-4.4413543163e-06x(37)=-5.2558315177e-06x(38)=-7.14696423167e-06

x(39)=-3.45975143095e-06

Object Variables (mean, 40-D, popsize~15)

0 500 1000 1500 2000 2500 3000function evaluations

10-1

100

101Scaling (All Main Axes)

0 500 1000 1500 2000 2500 3000 3500function evaluations

10-1

100

101

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Standard Deviations in All Coordinates

0 100 200 300 400 50010-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=9.0160553349061801e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 100 200 300 400 500 6000.04

0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

x(0)=-4.18028411309e-06x(1)=7.740916644e-08

x(2)=-7.39832379525e-06

x(3)=1.10897737945e-05

x(4)=-5.54555902795e-06

x(5)=1.46046388436e-05

x(6)=-1.4176189806e-05

x(7)=3.96702905989e-06

x(8)=-4.6234083942e-06

x(9)=9.42904610007e-06

Object Variables (mean, 10-D, popsize~5)

0 100 200 300 400 500function evaluations

10-1

100

101Scaling (All Main Axes)

0 100 200 300 400 500 600function evaluations

10-1

100

101

0

1

2

3 4 5 6 7

8 9

Standard Deviations in All Coordinates0 500 1000 1500 2000 2500

10-1010-910-810-710-610-510-410-310-210-1100101

max stdmin std

.f_recent=9.864585568505451e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 500 1000 1500 2000 25000.04

0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

x(0)=2.06488937341e-06

x(1)=5.48072753565e-06

x(2)=-6.23343015458e-06

x(3)=1.10207412201e-05

x(4)=3.75142166344e-06

x(5)=1.41935827742e-06

x(6)=-2.03351791131e-06

x(7)=-8.33188817389e-06

x(8)=-8.80369377514e-07

x(9)=1.25971526981e-06x(10)=-2.01341768636e-07

x(11)=4.42749843489e-06x(12)=5.92014632136e-06

x(13)=3.6710307013e-06

x(14)=-4.53953388987e-06

x(15)=-2.00424467852e-06

x(16)=4.07755242347e-06

x(17)=-4.48513208816e-07x(18)=-2.75207012959e-07

x(19)=-4.34012016036e-06x(20)=-3.65237036244e-06

x(21)=-6.26589063459e-06x(22)=-6.00061163516e-06

x(23)=1.26458542928e-06

x(24)=-1.43536621467e-06x(25)=-4.80960656679e-07

x(26)=-8.28195581378e-06

x(27)=5.84248561813e-06

x(28)=-1.49562838792e-06

x(29)=-4.70550184218e-06

x(30)=3.27904990781e-06

x(31)=-2.08537439346e-06

x(32)=-1.33095247065e-06

x(33)=1.11409396905e-06

x(34)=-1.24972865975e-05

x(35)=2.33597637581e-06

x(36)=-2.19508346928e-06

x(37)=7.40008086093e-06

x(38)=-8.90530860283e-06

x(39)=-2.12684500342e-06

Object Variables (mean, 40-D, popsize~5)

0 500 1000 1500 2000 2500function evaluations

10-1

100

101Scaling (All Main Axes)

0 500 1000 1500 2000 2500function evaluations

10-1

100

101

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Standard Deviations in All Coordinates

Figure 3: Runs of CMA-ES on the sphere function, middle and lower row witha single injected solution distributed as 10−4 × N (0, I) (which corresponds inthe beginning to a slightly disturbed gradient direction). Black lines show theevolution of median and worst solution. The evolution of the worst solution indi-cates that—with default parameter setting on the sphere function—a speed-upby a factor of two can be achieved. The reason for the comparatively moderatespeed-up is that the step-size decrease per iteration is limited. With reducedpopulation size (lower row) the speed-up increases because the number of it-erations to reach function value 10−6 in the best case scenario remains almostconstant.

where a repair mechanism is available, this might serve as basis for a new classof well-performing constraint handling mechanisms.

RR n° 7748

Page 13: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 11

10-D 40-D

0 1000 2000 3000 4000 5000 6000 700010-1110-1010-910-810-710-610-510-410-310-210-1100101102

max std

min std

.f_recent=9.7528435374881615e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 1000 2000 3000 4000 5000 6000 70000.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

x(0)=0.999999473802x(1)=0.999998937789

x(2)=0.99999991503x(3)=1.00000023934x(4)=1.00000038084x(5)=1.00000138728x(6)=1.000001871x(7)=1.00000294315x(8)=1.00000549872x(9)=1.00000970594

Object Variables (mean, 10-D, popsize~10)

0 1000 2000 3000 4000 5000 6000 7000function evaluations

10-2

10-1

100

101Scaling (All Main Axes)

0 1000 2000 3000 4000 5000 6000 7000function evaluations

10-2

10-1

100

101

0

1 2 3

4 5 6 7 8 9

Standard Deviations in All Coordinates0 1000020000300004000050000600007000080000

10-1010-910-810-710-610-510-410-310-210-1100101102

max std

min std

.f_recent=8.2121575972789761e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 1000020000300004000050000600007000080000900000.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

x(0)=1.00000023541

x(1)=0.999999905585x(2)=0.999999858685

x(3)=0.999999774681x(4)=0.99999981075

x(5)=0.999999770398

x(6)=1.00000001031

x(7)=0.999999774025

x(8)=1.00000020669

x(9)=1.00000007045

x(10)=1.00000043258

x(11)=0.999999806802

x(12)=0.999999740153

x(13)=0.999999831037

x(14)=0.999999455406x(15)=0.999999704289

x(16)=0.999999906702

x(17)=1.0000002904

x(18)=1.00000002063

x(19)=1.0000002987

x(20)=0.999999917941

x(21)=1.00000021043

x(22)=1.00000041176x(23)=1.00000039018x(24)=1.00000036685

x(25)=1.00000023611

x(26)=1.00000007744x(27)=1.00000007467

x(28)=0.999999901872

x(29)=0.999999806009

x(30)=1.00000017143

x(31)=0.999999944705

x(32)=1.00000024121

x(33)=1.00000012367x(34)=1.00000020946

x(35)=1.00000032907

x(36)=1.00000098845x(37)=1.00000184917x(38)=1.00000350308x(39)=1.00000774348

Object Variables (mean, 40-D, popsize~15)

0 1000020000300004000050000600007000080000function evaluations

10-2

10-1

100

101Scaling (All Main Axes)

0 100002000030000400005000060000700008000090000function evaluations

10-1

100

101

0

1 2

3 4 5 6 7 8 9

10

11 12 13 14

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Standard Deviations in All Coordinates

0 200 400 600 800 1000 1200 140010-1010-910-810-710-610-510-410-310-210-1100101102103

max stdmin std

.f_recent=9.5357873198379816e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 200 400 600 800 1000 1200 14000.0

0.2

0.4

0.6

0.8

1.0

1.2

x(0)=1.00000024988x(1)=1.00000021184

x(2)=0.999999490138

x(3)=1.00000001627x(4)=0.999999533504

x(5)=0.999998627092x(6)=0.999998332716x(7)=0.999996030023x(8)=0.999991239945x(9)=0.9999831689

Object Variables (mean, 10-D, popsize~10)

0 200 400 600 800 1000 1200 1400function evaluations

10-1

100

101Scaling (All Main Axes)

0 200 400 600 800 1000 1200 1400function evaluations

10-1

100

101

0 1 2

3 4 5

6

7

8

9

Standard Deviations in All Coordinates0 2000 4000 6000 8000 10000

10-11

10-9

10-7

10-5

10-3

10-1

101

103

max std

min std

.f_recent=9.5421128404541433e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 2000 4000 6000 8000 10000 120000.0

0.2

0.4

0.6

0.8

1.0

1.2

x(0)=1.00000002952

x(1)=0.999999894965

x(2)=1.00000007035

x(3)=0.999999825867

x(4)=0.999999487987x(5)=0.999999565718x(6)=0.99999944438

x(7)=0.999999763748

x(8)=0.999999906668

x(9)=1.00000005929

x(10)=0.999999814949x(11)=0.999999830811

x(12)=1.00000012932

x(13)=0.999999927946

x(14)=0.9999997946x(15)=0.999999768969

x(16)=0.999999828332

x(17)=1.00000011253

x(18)=1.00000050303x(19)=1.00000029491x(20)=1.00000019934

x(21)=0.999999825994

x(22)=1.00000005244

x(23)=0.999999900044

x(24)=1.00000013549x(25)=1.00000012405

x(26)=1.00000002883

x(27)=0.999999699802x(28)=0.999999710965

x(29)=0.999999842685

x(30)=0.999999944539

x(31)=1.00000038273x(32)=1.00000029737x(33)=1.00000030719x(34)=1.00000059718x(35)=1.00000133518x(36)=1.00000247392x(37)=1.0000046318x(38)=1.00000925779x(39)=1.00001797064

Object Variables (mean, 40-D, popsize~15)

0 2000 4000 6000 8000 10000function evaluations

10-1

100

101

102Scaling (All Main Axes)

0 2000 4000 6000 8000 10000 12000function evaluations

10-1

100

101

102

0 1

2 3

4

5

6

7

8

9 10 11

12 13 14 15

16 17 18

19

20

21 22

23

24

25

26

27

28 29

30 31

32 33 34 35 36 37 38 39

Standard Deviations in All Coordinates

0 500 1000 1500 200010-1010-810-610-410-2100102104106108

101010121014

max stdmin std

.f_recent=9.4397392178451074e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 500 1000 1500 2000 250050

40

30

20

10

0

10

20

30

x(0)=0.999999355076

x(1)=0.999999385462

x(2)=0.999999363821

x(3)=1.00000022188

x(4)=1.00000022083

x(5)=0.999999923085

x(6)=1.00000002311

x(7)=1.00000143607

x(8)=1.00000192816

x(9)=1.00000379461

Object Variables (mean, 10-D, popsize~10)

0 500 1000 1500 2000function evaluations

10-1

100

101

102Scaling (All Main Axes)

0 500 1000 1500 2000 2500function evaluations

10-1

100

101

0 1

2 3

4

5

6

7 8

9Standard Deviations in All Coordinates

0 2000 4000 6000 8000 1000010-1010-710-410-1102105108

10111014101710201023

max stdmin std

.f_recent=8.9188090183489226e-10

blue:abs(f), cyan:f-min(f), green:sigma, red:axis ratio

0 2000 4000 6000 8000 1000010000

5000

0

5000

10000

15000

x(0)=1.00000007804

x(1)=0.999999957608

x(2)=1.00000006819

x(3)=0.999999860211

x(4)=0.999999735251

x(5)=0.99999988967

x(6)=0.999999623005

x(7)=0.999999855127

x(8)=1.00000007205

x(9)=0.999999958386

x(10)=0.999999751004x(11)=0.999999695516

x(12)=0.999999957938x(13)=0.999999984419x(14)=0.999999990947x(15)=1.00000001912x(16)=1.00000004219

x(17)=1.0000004158x(18)=1.0000001987

x(19)=0.99999985713x(20)=0.999999893352x(21)=0.999999919897x(22)=0.999999922778

x(23)=1.00000003685

x(24)=1.00000008607x(25)=1.00000009914

x(26)=0.999999835089

x(27)=1.00000003553

x(28)=1.00000009191x(29)=1.00000039393x(30)=1.00000017625

x(31)=1.00000005229

x(32)=1.00000001867

x(33)=0.999999678006x(34)=0.999999279104x(35)=0.999998268623x(36)=0.999996155021x(37)=0.999991771704x(38)=0.999983038041x(39)=0.999965734191

Object Variables (mean, 40-D, popsize~15)

0 2000 4000 6000 8000 10000function evaluations

10-1

100

101

102Scaling (All Main Axes)

0 2000 4000 6000 8000 10000function evaluations

10-1

100

101

0 1

2 3

4 5 6 7

8

9 10

11 12 13

14

15 16 17 18

19

20 21 22

23

24 25

26 27 28 29 30 31

32 33 34

35 36

37 38

39

Standard Deviations in All Coordinates

Figure 4: Runs of CMA-ES on the Rosenbrock function, middle and lower rowwith a single injected solution distributed as 1 + 10−4 × N (0, I), lower rowwithout the clipping in (3). With injection, the convergence speed is againtwice as fast as on the sphere function. Where the injected solutions are useful(for function values larger than about 10−4), the algorithm is almost n timesfaster than without injection (600 vs 5000 and 2000 vs 70000 evaluations in 10-and 40-D). Without clipping, the run does not fail only because the step-sizeincrement is limited to exp(∆max

σ ) = 2.718 . . . per iteration.

Acknowledgment This work was supported by the ANR-2010-COSI-002 grant(SIMINOLE) of the French National Research Agency.

RR n° 7748

Page 14: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Injecting External Solutions Into CMA-ES 12

References

[1] A. Auger and N. Hansen. A restart CMA evolution strategy with increasingpopulation size. In Proc. IEEE Congress On Evolutionary Computation,pages 1769–1776, 2005.

[2] N. Hansen and S. Kern. Evaluating the CMA Evolution Strategy on mul-timodal test functions. In X. Yao et al., editors, Parallel Problem Solvingfrom Nature PPSN VIII, volume 3242 of LNCS, pages 282–291. Springer,2004.

[3] N. Hansen, S. D. Müller, and P. Koumoutsakos. Reducing the time complex-ity of the derandomized evolution strategy with covariance matrix adapta-tion. Evolutionary Computation, 11(1):1–18, 2003.

[4] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation inevolution strategies. Evolutionary Computation, 9(2):159–195, 2001.

[5] N. Hansen and R. Ros. Benchmarking a weighted negative covariance matrixupdate on the BBOB-2010 noiseless testbed. In GECCO 2010 Proceedings ofthe 12th annual conference companion on Genetic and evolutionary computa-tion Genetic And Evolutionary Computation Conference, pages 1673–1680,Portland United States, 2010.

[6] Nikolaus Hansen. Adaptive encoding: How to render search coordinate sys-tem invariant. In G. Rudolph et al., editors, Parallel Problem Solving fromNature (PPSN X), LNCS, pages 205–214, 2008.

[7] G.A. Jastrebski and D.V. Arnold. Improving evolution strategies throughactive covariance matrix adaptation. In Evolutionary Computation, 2006.CEC 2006. IEEE Congress on, pages 2814 –2821, 2006.

RR n° 7748

Page 15: Injecting External Solutions Into CMA-ES · arXiv:1110.4181v1 [cs.LG] 19 Oct 2011 apport de recherche ISSN 0249-6399 ISRN INRIA/RR--7748--FR+ENG Optimization, Learning and Statistical

Centre de recherche INRIA Saclay – Île-de-FranceParc Orsay Université - ZAC des Vignes

4, rue Jacques Monod - 91893 Orsay Cedex (France)

Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Universitaire - 351, cours de la Libération - 33405 Talence CedexCentre de recherche INRIA Grenoble – Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier

Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d’AscqCentre de recherche INRIA Nancy – Grand Est : LORIA, Technopôle de Nancy-Brabois - Campus scientifique

615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy CedexCentre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay CedexCentre de recherche INRIA Rennes – Bretagne Atlantique : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex

Centre de recherche INRIA Sophia Antipolis – Méditerranée :2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex

ÉditeurINRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)

http://www.inria.fr

ISSN 0249-6399