bias and variance in continuous eda: massively parallel continuous optimization

22
Biais and varriance in continuous EDA F. Teytaud, O. Teytaud EA, Starsbourg 2009 Tao, Inria Saclay Ile-De-France, LRI (Université Paris Sud, France), UMR CNRS 8623, I&A team, Digiteo, Pascal Network of Excellenc

Upload: olivier-teytaud

Post on 01-Jul-2015

184 views

Category:

Technology


2 download

DESCRIPTION

Parallel continuous derivative free optimization

TRANSCRIPT

Page 1: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Biais and varriance in continuous EDA

F. Teytaud, O. Teytaud

EA, Starsbourg 2009

Tao, Inria Saclay Ile-De-France, LRI (Université Paris Sud, France), UMR CNRS 8623, I&A team, Digiteo, Pascal Network of Excellenc

Page 2: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Page 3: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Evolutionary algorithms are parallel

Straightforward parallelization: If pop size = , then linear parallelization until processors.

But are there algorithms which really benefit from large ?

Page 4: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Goal of this paper

Restrict our attention to continuous domains;

Restrict our attention to unconstrained problems;

Restrict our attention to convergence rate (monomodal problems);

Restrict our attention to no covariance (but all algorithms can be generalized to cov.)

Analyze the speed-up as a function of , assuming at least processors.

Page 5: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Page 6: Bias and Variance in Continuous EDA: massively parallel continuous optimization

The main rules for step-size adaptation

( )While I have time{ ( ,..., ) ( ,Generate points x1 x distributed as N x )

,...,Evaluate the fitness at x1 x , Update x update

}

: Main trouble choosing

- Cumulative step size adaptation

-Mutative self adaptation

Estimation of Multivariate Normal Algorithm

Page 7: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Main algorithms

Page 8: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Main algorithms

Page 9: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Main algorithms

Page 10: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Main algorithms

We have a simple and proved trick against

premature convergence now (Gecco paper).

Page 11: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Page 12: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Results from Beyer and Sendhoff

Cumulative step-size adaptation <== not very good for large

Mutative self-adaptation <== much better (+ covariance possible)

Estimation of Multivariate Normal Algorithm <== ? ? ?

Page 13: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Page 14: Bias and Variance in Continuous EDA: massively parallel continuous optimization

First, we confirm results from Beyer and Sendhoff

(sphere function;see Beyer and Sendhoff for more)

Page 15: Bias and Variance in Continuous EDA: massively parallel continuous optimization

EMNA or SSA on the sphere

Page 16: Bias and Variance in Continuous EDA: massively parallel continuous optimization

EMNA or SSA on the sphere

twice faster than CSA

three times faster than CSA

2.5 times faster than CSA

2.5 times faster than CSA

Page 17: Bias and Variance in Continuous EDA: massively parallel continuous optimization

EMNA or SSA on the sphere

80% faster than SA

30% fasterthan SA

13% fasterthan SA

13% fasterthan SA

Page 18: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Anisotropic version: one step-size per axis

Page 19: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Anisotropic version: one step-size per axis

We recover, on the Cigar or Schwefel functions,

results similar to the sphere

Page 20: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Outline

Introduction

Main step-size adaptation rules

State of the art

Experimental results

Conclusions

Page 21: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Conclusions, 1/2: EMNA is great for parallelism

Continuous spaces, unconstrained optimization, from the point of view of convergence rate.

Simple algorithm, similar to EMNA (trick against premature convergence); simpler than SA or CSA.

Parallel performance > SA >> CSA.

Straightforward covariance-based version

Parameter-free (but: smaller when large ?)

Straightforward fault-tolerance (important for grids/clouds!)

Page 22: Bias and Variance in Continuous EDA: massively parallel continuous optimization

Conclusions, 2/2: fundamental issues

As known since Beyer, 2001,

(1, ) far less parallel than ( µ/µ, )

The higher the dimension, the better the speed-up (consistent with Fournier et al., PPSN08).

# of processors can be linear as a function of dimension