simple regret bandit algorithms for unstructured noisy optimization

Parameter tuning, bandit algorithms

Parameter tuning by bandits algorithms

Madeira,June 2010.

Outline

Parameter tuning

Bandits

Our testbed

Results

Parameter tuning is optimization

Parameter tuningOptimization:I have a function f.I want its global minimum f*, i.e. f(f*) f(x) for all x

Expensive optimization

Parameter tuning

Expensive optimization

Parameter tuningOptimization:I have a function f.I want its global minimum f*, i.e. f(f*) f(x) for all x

Expensive optimizationf is an expensive functiontakes hours of computation, or hours of cluster-based computation, orhours of human work.

or maybe is not a function.

Parameter tuning

Parameter tuning is great

Parameter tuningExpensive optimizationf is an expensive functiontakes hours of computation, or hours of cluster-based computation, orhours of human work.

or maybe is not a function.

Parameter tuningf is the testing of a program under given coefficients.

crucial in many applications==> expensive optimization

Outline

Parameter tuning

Bandits

Our testbed

Results

A ``bandit'' problem

p1,...,pN unknown probabilities [0,1]

At each time step i [1,n]choose ui {1,...,N} (as a function of uj and rj, j a wide literature==> centered on finite time analysis

Simple regret

What is simple regret ?

Regret = cumulated regretSimple regret: = expected value optimal expected value.

==> optimization on average==> no structure on the search space !==> what are good algorithms for this criterion ?

Simple regret




Simple regret




Bubeck et al: uniform is optimal!

n linear as a function of K.

n linear as a function of K.


Regret exponential as a function of n.

For a fixed regret, n linear as a function of K.







Yet, non-asymptotically,UCB is much better.(see also ``successive reject''algorithm)

Outline

Parameter tuning

Bandits

Our testbed

Results

Our testbed

Monte-Carlo Tree Search

Why should you care about Monte-Carlo Tree Search ?Recent algorithm (2006)

Very impressive in computer-Go

Now invading discrete time control, games, difficult planning (high dimensional cases!).

Our testbed


Why should you care about Monte-Carlo Tree Search ?Recent algorithm (2006)

Very impressive in computer-Go

Now invading discrete time control, games, difficult planning (high dimensional cases!).

A take-home message here:MCTS is not yet very well known.but it's a really great algorithm,in my humble opinion.

Go: from 29 to 6 stones

1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8 CrazyStone2008: win against a pro (4p) 19x19, H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo

2007: win against a pro (5p) 9x9 (blitz) MoGo2008: win against a pro (5p) 9x9 white MoGo2009: win against a pro (5p) 9x9 black MoGo2009: win against a pro (9p) 9x9 white Fuego2009: win against a pro (9p) 9x9 black MoGoTW

==> still 6 stones at least!

Go: from 29 to 6 stones

1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo2008: win against a pro (4p) 19x19, H8 CrazyStone2008: win against a pro (4p) 19x19, H7 CrazyStone2009: win against a pro (9p) 19x19, H7 MoGo2009: win against a pro (1p) 19x19, H6 MoGo

2007: win against a pro (5p) 9x9 (blitz) MoGo2008: win against a pro (5p) 9x9 white MoGo2009: win against a pro (5p) 9x9 black MoGo2009: win against a pro (9p) 9x9 white Fuego2009: win against a pro (9p) 9x9 black MoGoTW

==> still 6 stones at least!

All good results With MCTS

Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)UCT (Upper Confidence Trees)

UCT

UCT

UCT

UCT

UCT

Kocsis & Szepesvari (06)

Exploitation ...

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

... or exploration ?

SCORE = 0/2 + k.sqrt( log(10)/2 )


parallelizationmulti-core

message passing

tentative: GPGPU

frugality + consistency

automatic parameter tuning ( expensive optimization - DPS)

combinationexpert rules + supervised learning

learning-based Monte-Carlo

patterns (non-regression GP)

applications far from games (unstructured pbs)active learning + non-linear optimization

Spiral

starting: energy + robotics

Outline

Parameter tuning

Bandits

Our testbed

Results

A bit of theory: Bonferroni statistical confidence bounds

A statistical test is as follows: - I run n tests - I average the results - I choose a risk level - Statistical test: A little calculus (statistical test) says that I have precision



If there are K tests,my probabilityof being wrongis multiplied by K!



A statistical test with Bonferroni correction is as follows: - I run n tests for each of K cases - I average the results - I choose a risk level /K - Statistical test: A little calculus (statistical test) says that I have precision


If yes, uniform sampling should be ok, otherwise try UCB (and be lucky).

Results


We've been lucky.One of our test cases gives yes, the other gives no.We did not even have to cheat for having this.

First case: blitz games, nearly fast Xps.Second case: real games, very expensive optimization.

Results




Here the uniform sampling isreally convincing.

Results




Here the uniform sampling ismoderately convincing.

Yet, the empirical best was ok (a better with UCB than with uniform)

Conclusion

What do we propose ?

- A simple mathematically derived rule for predicting if we can trust uniform sampling.* If the answer is yes, then easy parallelization, and statistical validation is naturally included (Bonferroni correction). * If the answer is no uniform can't work. Maybe UCB or successive reject. Be lucky.

- Xps on MCTS (subliminal message: MCTS is great).

- Remark: take care of Bonferroni corrections. Much better for the regression testing of non-deterministic codes/data.

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

simple regret bandit algorithms for unstructured noisy optimization

Technology