machine learning and black-box expensive optimizationverel/talks/slides-aiwokrshop-18... ·...

19
Introduction Learning for optimization Optimization for learning Machine learning and black-box expensive optimization ebastien Verel Laboratoire d’Informatique, Signal et Image de la Cˆ ote d’opale (LISIC) Universit´ e du Littoral Cˆ ote d’Opale, Calais, France http://www-lisic.univ-littoral.fr/ ~ verel/ June, 18th, 2018

Upload: others

Post on 26-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Machine learning and

black-box expensive optimization

Sebastien Verel

Laboratoire d’Informatique, Signal et Image de la Cote d’opale (LISIC)Universite du Littoral Cote d’Opale, Calais, Francehttp://www-lisic.univ-littoral.fr/~verel/

June, 18th, 2018

Page 2: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 3: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 4: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 5: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 6: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 7: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !

Page 8: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Black-box (expensive) optimization

x −→ −→ f (x)

No information on the objective definition function f

Objective fonction :

can be irregular, non continuous, non differentiable, etc.

given by a computation or an (expensive) simulation

Few examples from the team :

• Mobility simulation (Florian Lepretre),• Plant’s biology, plant growth (Amaury Dubois),• Logistic simulation (Brahim Aboutaib),• Cellular automata,• Nuclear power plant (Valentin Drouet),

Page 9: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Real-world black-box expensive optimizationPhD of Mathieu Muniglia, 2014-2017, Valentin Drouet, 2017-2020, CEA, Paris

x −→ −→ f (x)

(73, . . . , 8) −→ −→ ∆zP

Multi-physic simulator

Expensive optimization : parallel computing, and surrogate model.

Page 10: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Adaptive distributed optimization algorithmsChristopher Jankee, Bilel Derbel, Cyril Fonlupt

Portfolio of algorithms :Control of algorithm during

optimization

How to select an algorithm ?Design reinforcement learning

methods for distributed computing(ε-greedy, adapt. pursuit, UCB, ...)

How to compute a reward ?Aggregation function of local

rewards (mean, max, etc.) for aglobal selection

Page 11: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Adaptive distributed optimization algorithmsChristopher Jankee, Bilel Derbel, Cyril Fonlupt

34

4

8

2

5

Portfolio of algorithms :Control of algorithm during

optimization

How to select an algorithm ?Design reinforcement learning

methods for distributed computing(ε-greedy, adapt. pursuit, UCB, ...)

How to compute a reward ?Aggregation function of local

rewards (mean, max, etc.) for aglobal selection

Page 12: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Adaptive distributed optimization algorithmsChristopher Jankee, Bilel Derbel, Cyril Fonlupt

34

4

8

2

5

Methodology

Use designed benchmarkfunctions with designedproperties andexperimental analysis

Portfolio of algorithms :Control of algorithm during

optimization

How to select an algorithm ?Design reinforcement learning

methods for distributed computing(ε-greedy, adapt. pursuit, UCB, ...)

How to compute a reward ?Aggregation function of local

rewards (mean, max, etc.) for aglobal selection

Page 13: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Features to learn : Mult.-Obj. fitness landscapeK. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille), 2010 - 2018

Fitness landscape : (X , f ,N ), Search space, obj. func., neighborhood relation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

conflicting objectives independent objectives correlated objectives

f_co

r_rw

srh

o#l

supp

_avg

_rw

s#l

supp

_avg

_aw

s#l

nd_a

vg_r

ws

#lnd

_avg

_aw

sle

ngth

_aw

s#s

up_a

vg_a

ws

#sup

_avg

_rw

s#i

nc_a

vg_r

ws

#inf

_avg

_rw

s#i

nc_a

vg_a

ws

#inf

_avg

_aw

shv

_r1_

rws

#sup

_r1_

rws

#inf

_r1_

rws

#lnd

_r1_

rws

nhv_

r1_r

ws

hvd_

r1_r

ws

#inc

_r1_

rws

k_n

#lsu

pp_r

1_rw

s nhv

d_av

g_rw

shv

d_av

g_aw

shv

_avg

_aw

shv

_avg

_rw

s mnh

v_av

g_rw

snh

v_av

g_aw

s

nhv_avg_awsnhv_avg_rwsmhv_avg_rwshv_avg_awshvd_avg_awshvd_avg_rwsn#lsupp_r1_rwsk_n#inc_r1_rwshvd_r1_rwsnhv_r1_rws#lnd_r1_rws#inf_r1_rws#sup_r1_rwshv_r1_rws#inf_avg_aws#inc_avg_aws#inf_avg_rws#inc_avg_rws#sup_avg_rws#sup_avg_awslength_aws#lnd_avg_aws#lnd_avg_rws#lsupp_avg_aws#lsupp_avg_rwsrhof_cor_rws

−1 0 1

Value

040

Kendall's tau

Cou

nt

Perf. prediction (cross-val.)feature set MAE MSE R2 rank

GSEMOall 0.007781 0.000118 0.951609 1enumeration 0.008411 0.000142 0.943046 2sampling all 0.009113 0.000161 0.932975 3sampling rws 0.009284 0.000167 0.930728 4sampling aws 0.010241 0.000195 0.917563 5{r ,m, n, k/n} 0.010609 0.000215 0.911350 6{r ,m, n} 0.026974 0.001123 0.518505 7{m, n} 0.032150 0.001545 0.340715 8

Page 14: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Learning/tuning parameters according to features

Page 15: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Learning/tuning parameters according to features

Page 16: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Surrogate model for pseudo-boolean functions

Goal : Replace/learn the (expensive) objective functionwith a (sheep) meta-model during the optimization process

Continuous optimization : NN, Gaussian Process (krigging),EGO : sample the next solution with max. expected improvement

GP : Random variables which have joint Gaussian distribution.

mean : m(y(x)) = µcovariance : cov(y(x), y(x ′)) = exp(−θd(x , x ′)p)

from : Rasmussen, Williams, GP for ML, MIT Press, 2006.

Page 17: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Surrogate model for pseudo-boolean functions

Goal : Replace/learn the (expensive) objective functionwith a (sheep) meta-model during the optimization process

Continuous optimization : NN, Gaussian Process (krigging),EGO : sample the next solution with max. expected improvement

Proposition

Walsh function basis : ∀x ∈ {0, 1}n, ϕk(x) = (−1)∑n−1

j=0 kjxj

f (x) =2n−1∑k=0

wk .ϕk(x) with wk =1

2n

∑x∈{0,1}n

f (x).ϕk(x)

Surrogate model :

f (x) =∑

k : o(ϕk )6d

wk .ϕk(x)

Estimate the coefficientswith LARS

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●0.00

0.01

0.02

0.03

0.04

0 100 200 300 400 500

Sample size

Mea

n A

bs. E

rror

of f

itnes

s

method●

krigingwalsh

Page 18: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Energy surface of deep learning problem

To learn Deep NN :High dimension spaceMinimize error with variants of stochastic gradient descent

Why does it works ?improves expressiveness but complicates optimization

What is the shape of energy surface ?

A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun. ”The loss surfaces of multilayer networks.”In Artificial Intelligence and Statistics, pp. 192-204. (2015).P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina.”Entropy-sgd : Biasing gradient descent into wide valleys.” ICLR (2017).S. Arora, N. Cohen, and E. Hazan. ”On the optimization of deep networks : Implicit acceleration byoverparameterization.” arXiv preprint arXiv :1802.06509 (2018).

Perspective

Study the geometry of fitness landscape...

Any idea, and collaboration are welcome !

Page 19: Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... · Introduction Learning for optimization Optimization for learning Real-world black-box expensive

Introduction Learning for optimization Optimization for learning

Energy surface of deep learning problem

To learn Deep NN :High dimension spaceMinimize error with variants of stochastic gradient descent

Why does it works ?improves expressiveness but complicates optimization

What is the shape of energy surface ?

A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun. ”The loss surfaces of multilayer networks.”In Artificial Intelligence and Statistics, pp. 192-204. (2015).P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina.”Entropy-sgd : Biasing gradient descent into wide valleys.” ICLR (2017).S. Arora, N. Cohen, and E. Hazan. ”On the optimization of deep networks : Implicit acceleration byoverparameterization.” arXiv preprint arXiv :1802.06509 (2018).

Perspective

Study the geometry of fitness landscape...

Any idea, and collaboration are welcome !