self-adaptive surrogate-assisted covariance matrix...

Black-box optimization

Support Vector Machines

Self-Adaptive Surrogate-Assisted CMA-ES

Self-Adaptive Surrogate-Assisted Covariance

Matrix Adaptation Evolution Strategy

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag

INRIA − CNRS − Univ. Paris-Sud

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Surrogate optimization: SVM for CMA 1/ 27

Motivations

Find Argmin {F : X 7→ R}

Context: ill-posed optimization problems continuous

Function F (fitness function) on X ⊂ Rd

Gradient not available or not useful

F available as an oracle (black box)

Build {x1,x2, . . .} → Argmin(F)Black-Box approaches

+ Robust

− High computational costs:

number of expensive functionevaluations (e.g. CFD)

Surrogate-Assisted Optimization

Principle

Gather E = {(xi,F(xi))} training setBuild F̂ from E learn surrogate model

Use surrogate model F̂ for some time:Optimization: use F̂ instead of true F in std algo

Filtering: select promising xi based on F̂ in population-based algo.

Compute F(xi) for some xi

Update F̂Iterate

Surrogate-Assisted Optimization, cont

Issues

Learning

Hypothesis space (polynoms, neural nets, Gaussian processes,...)

Selection of training set (prune, update, ...)

What is the learning quality indicator ?

Interaction of Learning & Optimization modules

Schedule (when to relearn)

∗ How to use F̂ to support optimization search

∗∗ How to use search results to support learning F̂

This talk

∗ Using Covariance-Matrix Estimation within Support Vector

Machines

∗∗ Self-adaptation of surrogate model hyper-parameters

Content

1 Black-box optimizationCovariance Matrix Adaptation Evolution Strategy

2 Support Vector Machines

Support Vector Machine (SVM)

Rank-based SVM

3 Self-Adaptive Surrogate-Assisted CMA-ES

AlgorithmResults

Covariance Matrix Adaptation Evolution Strategy

(µ, λ)-Covariance Matrix Adaptation Evolution Strategy

Rank-µ Update

yi ∼ N (0,C) , C = I

xi = m + σ yi, σ = 1

sampling of λ

solutions

Cµ = 1µ

∑yi:λy

C← (1− 1)×C + 1×Cµ

calculating C frombest µ out of λ

mnew ←m + 1µ

∑yi:λ

new distribution

Ruling principles:i). the adaptation increases the probability of successful steps to appear again;ii). C ≈ H

Invariance: Guarantee for Generalization

Invariance properties of CMA-ES

Invariance to order preserving

transformations in function spacetrue for all comparison-based algorithms

Translation and rotation invariancethanks to C

−3 −2 −1 0 1 2 3−3

CMA-ES is almost parameterless

Tuning on a small set of functions Hansen & Ostermeier 2001

Default values generalize to whole classes

Exception: population size for multi-modal functions

try restarts (see our PPSN’2012 paper)

State-of-the-art Results

BBOB – Black-Box Optimization Benchmarking

ACM-GECCO workshop, in 2009, 2010 and 2012

Set of 24 benchmark functions, dimensions 2 to 40

With known difficulties (ill-conditioning, non-separability, . . . )

Noisy and non-noisy versions

Competitors include

BFGS (Matlab version),

DFO (Derivative-FreeOptimization, Powell 04)

Differential Evolution

Particle Swarm

Optimization

and many more

Rank-based SVM

Contents

Rank-based SVM

AlgorithmResults

Rank-based SVM

Support Vector Machine for ClassificationLinear Classifier

w2/ || ||

supportvector

Main Idea

Training Data:

(xi, yi)|xi ∈ IRd, yi ∈ {−1,+1}}n

i=1〈w, xi〉 − b ≥ +1 ⇒ yi = +1;〈w, xi〉 − b ≤ −1 ⇒ yi = −1;

Optimization Problem: Primal Form

Minimize{w, ξ}12 ||w||

2 + C∑n

i=1 ξisubject to: yi(〈w, xi〉 − b) ≥ 1− ξi, ξi ≥ 0

Rank-based SVM

Support Vector Machine for ClassificationLinear Classifier

w2/ || ||

supportvector

Optimization Problem: Dual Form

Quadratic in Lagrangian multipliers:

Maximize{α}∑n

i αi −12

∑ni,j=1 αiαjyiyj 〈xi, xj〉

subject to: 0 ≤ αi ≤ C,∑n

i αiyi = 0

Properties

Decision Function:

F̂ (x) = sign(∑n

i αiyi 〈xi, x〉 − b)The Dual form may be solved using standardquadratic programming solver.

Rank-based SVM

Support Vector Machine for ClassificationNon-Linear Classifier

ww, (x)F -b = +1

w, (x)F -b = -1

w2/ || ||

a) b) c)

support vector

Non-linear classification with the "Kernel trick"

Maximize{α}∑n

i αi −12

∑ni,j=1 αiαjyiyjK(xi, xj)

subject to: 0 ≤ αi ≤ C,∑n

i αiyi = 0,where K(x, x′) =def < Φ(x),Φ(x′) > is the Kernel function a

Decision Function: F̂ (x) = sign(∑n

i αiyiK(xi, x)− b)

aΦ must be chosen such that K is positive semi-definite

Rank-based SVM

Support Vector Machine for ClassificationNon-Linear Classifier: Kernels

Gaussian or Radial Basis Function (RBF): K(xi, xj) = exp(‖xi−xj‖

2σ2 )

Rank-based SVM

Regression To Value or Regression To Rank?Why Comparison-Based Surrogates?

Example

F (x) = x21 + (x1 + x2)

An efficient Evolutionary Algorithm (EA) with

surrogate models may be 4.3 faster on F (x).

But the same EA is only 2.4 faster on G(x) = F (x)1/4! a

aCMA-ES with quadratic meta-model (lmm-CMA-ES) on fSchwefel 2-D

Comparison-based surrogate models → invariance to

rank-preserving transformations of F (x)!

Rank-based SVM

Rank-based Support Vector Machine

Find F̂ (x) which preserves the ordering of training/test points

On training set E = {xi, i = 1 . . . n}

expert gives preferences:

(xik ≻ xjk), k = 1 . . .K

underconstrained regression

L( 1r )

L( 2r)

Order constraints Primal form

Minimize 12 ||w||2 + C

∑Kk=1 ξk

subject to ∀ k, 〈w,xik 〉 − 〈w,xjk 〉 ≥ 1− ξk

Rank-based SVM

Surrogate Models for CMA-ES

Using Rank-SVM

Builds a global model using Rank-SVM

xi ≻ xj iff F(xi) < F(xj)

Kernel and parameters highly problem-dependent

ACM Algorithm

Use C from CMA-ES as Gaussian kernel

I. Loshchilov, M. Schoenauer, M. Sebag (2010). "Comparison-based optimizers need

comparison-based surrogates”

Rank-based SVM

Model LearningNon-Separable Ellipsoid Problem

K(xi, xj) = e−(xi−xj)

T (xi−xj)

2σ2 ; KC(xi, xj) = e−(xi−xj)

T C−1(xi−xj)

Invariance to rotation of the search space thanks to C ≈ H−1 !

Algorithm

Results

Contents

Rank-based SVM

AlgorithmResults

Algorithm

Results

Using the Surrogate ModelDirect Optimization of Surrogate Model

Simple optimization loop:optimize F for 1 generation, then optimize F̂ for n̂ generations.

0 5 10 15 200

Number of generations

Speedup

Dimension 10

F1 Sphere

F6 AttractSector

F8 Rosenbrock

F10 RotEllipsoid

F11 Discus

F12 Cigar

F13 SharpRidge

F14 SumOfPow

How tochoose n̂?

Algorithm

Results

Adaptation of Surrogate’s Life-Length

Test set: λ recently evaluated points

Model error: fraction of incorrectly ordered points

Number of generation is inversely proportion to the model error:

Model Error

of genera

0 0.5 1.0

Algorithm

Results

Model Hyper-Parameter Sensitive Results

The speed-up of surrogate-assisted CMA-ES is very sensitive to thenumber of training points!

50 100 150 200 250 300 350 4000.5

Number of training points

Speedup

Dimension 10

F1 Sphere

F5 LinearSlope

F6 AttractSector

F8 Rosenbrock

F10 RotEllipsoid

F11 Discus

F12 Cigar

F13 SharpRidge

F14 SumOfPow

Average

Algorithm

Results

The Proposed s∗aACM Algorithm

Surrogate-assisted

CMA-ES with online

adaptation of modelhyper-parameters.

Algorithm

Results

Online Adaptation of Model Hyper-Parameters

1000 2000 3000 4000 5000 60000

F8 Rosenbrock 20−D

Number of function evaluations

1000 2000 3000 4000 5000 600010

Ntraining

csigma

Algorithm

Results

Self-adaptation of model hyper-parameters is better than the

best offline settings! (IPOP-s∗aACM vs IPOP-aACM)

Improvements of original CMA lead to improvements of itssurrogate-assisted version (IPOP-s∗aACM vs IPOP-s∗ACM)

0 1000 2000 3000 4000 5000 6000 7000

10−5

Fitness

Rotated Ellipsoid 10−D

IPOP−CMA−ES

IPOP−aCMA−ES

IPOP−ACM−ES

IPOP−aACM−ES

IPOP−s∗

ACM−ES

IPOP−s∗

aACM−ES

0 1000 2000 3000 4000 5000 6000 7000

10−5

Fitness

Rosenbrock 10−D

IPOP−CMA−ES

IPOP−aCMA−ES

IPOP−ACM−ES

IPOP−aACM−ES

IPOP−s∗

ACM−ES

IPOP−s∗

aACM−ES

Algorithm

Results

Results on Black-Box Optimization CompetitionBIPOP-s∗aACM and IPOP-s∗aACM (with restarts) on 24 noiseless 20 dimensional functions

0 1 2 3 4 5 6 7 8 9log10 of (ERT / dimension)

Proportion of functions

RANDOMSEARCHSPSABAYEDADIRECTDE-PSOGALSfminbndLSstepRCGARosenbrockMCSABCPSOPOEMSEDA-PSONELDERDOERRNELDERoPOEMSFULLNEWUOAALPSGLOBALPSO_BoundsBFGSONEFIFTHCauchy-EDANBC-CMACMA-ESPLUSSELNEWUOAAVGNEWUOAG3PCX1komma4mirser1plus1CMAEGSDEuniformDE-F-AUCMA-LS-CHAINVNSiAMALGAMIPOP-CMA-ESAMALGAMIPOP-ACTCMA-ESIPOP-saACM-ESMOSBIPOP-CMA-ESBIPOP-saACM-ESbest 2009f1-24

Machine Learning for Optimization: Discussion

s∗aACM is from 2 to 4 times faster on Uni-Modal Problems.

Invariant to rank-preserving and orthogonal transformations of

the search space : Yes

The computation complexity (the cost of speed-up) is O(d3)

The source code is available online:https://sites.google.com/site/acmesgecco/

Open Questions

How to improve the search on multi-modal functions ?

What else from Machine Learning can improve the search ?

Thank you for your attention!

Questions?

Related Publications:

Joint work with Marc Schoenauer and Michèle Sebag:"Alternative Restart Strategies for CMA-ES". Parallel Problem Solving from Nature (PPSN XII),

September 2012.

"Black-box Optimization Benchmarking of IPOP-saACM-ES and BIPOP-saACM-ES on theBBOB-2012 Noiseless Testbed". GECCO’2012 BBOB Workshop, July 2012.

"Dominance-Based Pareto-Surrogate for Multi-Objective Optimization". Simulated Evolution andLearning (SEAL 2010), December 2010.

"Comparison-Based Optimizers Need Comparison-Based Surrogates". Parallel Problem Solvingfrom Nature (PPSN XI), September 2010.

"A Pareto-Compliant Surrogate Approach for Multiobjective Optimization". GECCO’2010 , July2010.

"A Mono Surrogate for Multiobjective Optimization". GECCO’ 2010 EMO workshop, July 2010.

self-adaptive surrogate-assisted covariance matrix...

Documents

from covariance matrices to covariance operators - data...

ventura county; and to surrogate or not to surrogate

gecco’14 - warsaw university of technology › docstore...

gecco stavebnice

surrogate advertising

surrogate presentation

gecco 2004 competitions

2011 simulated car racing championship @ gecco-2011

surrogate to survive: surrogate marketing · surrogate to...

6 gecco workshop on blackbox optimization...

2010 simulated car racing championship @ gecco-2010

gecco 2013, industrial challenge, 2nd place

from covariance matrices to covariance operators - …2...

gecco award winners

introduction to genetic programming tutorial...

surrogate mother

1 covariance, covariance and variance rules, and correlation...

surrogate mothers

first time surrogate marie - international surrogacy...

dinámica gecco - universidadfinanciera.mx€¦ · web...