bias and variance in continuous eda: massively parallel continuous optimization

Biais and varriance in continuous EDA

F. Teytaud, O. Teytaud

EA, Starsbourg 2009

Tao, Inria Saclay Ile-De-France, LRI (Université Paris Sud, France), UMR CNRS 8623, I&A team, Digiteo, Pascal Network of Excellenc

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Evolutionary algorithms are parallel

Straightforward parallelization: If pop size = , then linear parallelization until processors.

But are there algorithms which really benefit from large ?

Goal of this paper

Restrict our attention to continuous domains;

Restrict our attention to unconstrained problems;

Restrict our attention to convergence rate (monomodal problems);

Restrict our attention to no covariance (but all algorithms can be generalized to cov.)

Analyze the speed-up as a function of , assuming at least processors.

Outline

Introduction


State of the art


Conclusions

The main rules for step-size adaptation

( )While I have time{ ( ,..., ) ( ,Generate points x1 x distributed as N x )

,...,Evaluate the fitness at x1 x , Update x update

}

: Main trouble choosing

- Cumulative step size adaptation

-Mutative self adaptation

Estimation of Multivariate Normal Algorithm

Main algorithms

Main algorithms

We have a simple and proved trick against

premature convergence now (Gecco paper).

Outline

Introduction


State of the art


Conclusions

Results from Beyer and Sendhoff

Cumulative step-size adaptation <== not very good for large

Mutative self-adaptation <== much better (+ covariance possible)

Estimation of Multivariate Normal Algorithm <== ? ? ?

Outline

Introduction


State of the art


Conclusions

First, we confirm results from Beyer and Sendhoff

(sphere function;see Beyer and Sendhoff for more)

EMNA or SSA on the sphere


twice faster than CSA

three times faster than CSA

2.5 times faster than CSA

2.5 times faster than CSA


80% faster than SA

30% fasterthan SA

13% fasterthan SA

13% fasterthan SA

Anisotropic version: one step-size per axis

Anisotropic version: one step-size per axis

We recover, on the Cigar or Schwefel functions,

results similar to the sphere

Outline

Introduction


State of the art


Conclusions

Conclusions, 1/2: EMNA is great for parallelism

Continuous spaces, unconstrained optimization, from the point of view of convergence rate.

Simple algorithm, similar to EMNA (trick against premature convergence); simpler than SA or CSA.

Parallel performance > SA >> CSA.

Straightforward covariance-based version

Parameter-free (but: smaller when large ?)

Straightforward fault-tolerance (important for grids/clouds!)

Conclusions, 2/2: fundamental issues

As known since Beyer, 2001,

(1, ) far less parallel than ( µ/µ, )

The higher the dimension, the better the speed-up (consistent with Fournier et al., PPSN08).

# of processors can be linear as a function of dimension

bias and variance in continuous eda: massively parallel continuous optimization

Technology