my phd defence

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS


Jialin LIUadvised by: Olivier Teytaud & Marc Schoenauer

TAO, Inria, Univ. Paris-Saclay, UMR CNRS 8623, France

December 11, 2015

1 / 76


Motivation

Motivation

Why noisy optimization (i.e. optim. in front of a stochastic model) ?Not that many works on noisy optimizationfaults in networks: you can not use an average over 50 years(many lines would be 100% guaranteed)⇒ you need a (stochastic) model of faults

Why adversarial (i.e. worst case) problems ?Critical problems with uncertainties(technological breakthroughs, CO2 penalization ...)

Why portfolio (i.e. combining/selecting solvers) ?Great in combinatorial optimization→ let us generalize :)

Why MCTS ?Great recent toolStill many things to do

All related ?All applicable to gamesAll applicable to power systemsNash⇒ mixed strategy ' portfolio

2 / 76


Noisy Optimization

Optimization criteria for black-box noisy optimization

1 Motivation

2 Noisy OptimizationOptimization criteria for black-box noisy optimizationOptimization methods

Resampling methodsPairing

3 Portfolio and noisy optimizationPortfolio: state of the artRelationship between portfolio and noisy optimizationPortfolio of noisy optimization methodsConclusion

4 Adversarial portfolioAdversarial bandit

Adversarial FrameworkState-of-the-art

Contribution for computing Nash EquilibriumSparsity: sparse NE can be computed fasterParameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems)Application to gamesConclusion

5 Conclusion

3 / 76


Noisy Optimization


Black-box Noisy Optimization Framework

f : x 7→ f (x , ω)

from a domain D ⊂ Rd ↪→ Continuous optimization

to Rwith random variable ω.

Goal

x∗ = argminx∈Rd

Eωf (x , ω)

i.e. access to independent evaluations of f .

Black-Box case:↪→ do not use any internal property of f↪→ access to f (x) only, not ∇f (x)↪→ for a given x : randomly samples ω and returns f (x , ω)↪→ for its nth request, returns f (x , ωn)

x −→ −→ f (x , ω)

4 / 76


Noisy Optimization


Optimization criteria: State-of-the-art

Noise-free case: log-linear convergence [Auger, 2005, Rechenberg, 1973]

log ||xn − x∗||n

∼ A′ < 0 (1)

Noisy case: log-log convergence [Fabian, 1967]

log ||xn − x∗||log(n)

∼ A′′ < 0 (2)

Figure: y-axis: log ||xn − x∗||, x-axis:#eval for log-linear convergence in noise-free case orlog#eval for log-log convergence in noisy case.

5 / 76


Noisy Optimization


Optimization criteria: Convergence rates

Slopes for Uniform Rate, Simple Regret1[Bubeck et al., 2011] and Cumulative Regretx∗: the optimum of fxn: the nth evaluated search pointxn: the optimum estimated after nth evaluation

Uniform Rate URn = ||xn − x∗|| ↪→ all search points matter

Simple Regret SRn = Eωf (xn, ω)− Eωf (x∗, ω) ↪→ final recommendation matters

Cumulative Regret CRn =∑j≤n

(Eωf (xj , ω)− Eωf (x∗, ω)) ↪→ all recommendations matter

Convergence rates:

Slope(UR) = lim supn→∞

log(URn)

log(n)(3)

Slope(SR) = lim supn→∞

log(SRn)

log(n)(4)

Slope(CR) = lim supn→∞

log(CRn)

log(n). (5)

1Simple Regret = difference between expected payoff recommended vs optimal. 6 / 76


Noisy Optimization

Optimization methods

1 Motivation








5 Conclusion

7 / 76


Noisy Optimization


Tricks for handling noise:

Resampling: average multiple evaluations

Large population

Surrogate models

Specific methods (stochastic gradient descent with finite differences)

Here: focus on resampling

Resampling number: how many times do we resample noise ?

8 / 76


Noisy Optimization


Resampling methods: Non-adaptive resampling methods

[Recall] log-log convergence: log ||xn−x∗||log(n) ∼ A′′ < 0, n is evaluation number

Non-adaptive rules:Exponential rules with ad hoc parameters⇒ log-log convergence (mathematically proved by us)

Other rules as a function of #iter: square root, linear rules, polynomial rules

Other rules as a function of #iter and dimension

9 / 76


Noisy Optimization


Resampling methods: Adaptive resampling methods

Adaptive rules: Bernstein [Mnih et al., 2008, Heidrich-Meisner and Igel, 2009]

Here:

FOR a pair of search points x , x ′ to be compared DOWHILE computation time is not elapsed DO

1000 resamplings for x and x ′

IF mean(difference) >> std THENbreak

ENDIFENDWHILE ENDFOR

10 / 76


Noisy Optimization


Resampling methods: Comparison

With Continuous Noisy Optimization (CNO)

With Evolution Strategies (ES)

With Differential Evolution (DE)

11 / 76


Noisy Optimization


Comparison with CNO

Continuous Noisy Optimization: we propose

Iterative Noisy Optimization Algorithm (INOA)

as a general framework for noisy optimization.

Key points:Sampler which chooses a sampling around the current approximation,

Opt which updates the approximation of the optimum,

resampling number rn = Bdnβeand sampling step-size σn = A/nα

Main application: finite differences sampling + quadratic model

12 / 76


Noisy Optimization


Comparison with CNO: State-of-the-art and our results

3 types of noise: constant, linear or quadratic as a function of the SR:

Var(f (x , ω)) = O([Eωf (x , ω)− Eωf (x∗, ω)]z

)(6)

with z ∈ {0, 1, 2}.

z optimized for CR optimized for SRslope(SR) slope(CR) slope(SR) slope(CR)

0 (constant var) − 12

12 − 2

3 23[Fabian, 1967] [Dupac, 1957]

[Shamir, 2013]0 and −1

∞-differentiable [Fabian, 1967]

0 and “quadratic” −1[Dupac, 1957]

1 (linear var) −1 0 −1 0[Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010]

2 (quadratic var) −∞ 0 −∞ 0[Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008]

Table: State-of-the-art: Convergence rates. Blue: existing results, we also achieved. Red: newresults by us.

Main application: finite differences sampling + quadratic model

Various (new, proved) rates depending on assumptions

Recovers existing rates (with a same algorithm) and beyond

13 / 76


Noisy Optimization


Comparison with CNO: Results & Discussion

Our proposed algorithm (provably) reaches the same rate

as Kiefer-Wolfowitz algorithm when the noise has constant variance

as Bernstein-races optimization algorithms when the noise variance decreaseslinearly as a function of the simple regret

as Evolution Strategies when the noise variance decreases quadratically as afunction of the simple regret

⇒ no details here, focus on ES and DE.

14 / 76


Noisy Optimization


What about evolutionary algorithms ? Experiments with variance noise =constant (hard case)

Algorithms:ES + resampling

DE + resamplnig

Results: slope(SR) = − 12 in both cases

(with e.g. rules depending on #iter and dimension)

5 10 15 20 255

10

15

N1.01exp

N1.1exp

N2exp

Nscale

Figure: Modified function F4 of CEC2005, dimension 2. x-axis: log(#eval); y-axis: log(SR).

15 / 76


Noisy Optimization


Resampling methods: Partial conclusion

Conclusion:Adaptation of Newton’s algorithm for noisy fitness (∇f and Hf approximated byfinite differences+resamplings)→ leads to fast convergence rates + recovers many rates in one alg. + genericframework (but no proved application besides quadratic surrogate model)

Non-adaptive methods lead to log-log convergence (math+xp) in ES

Nscale = dd−2 exp( 4n5d )e ok (slope(SR) = − 1

2 ) for both ES and DE(nb: −1 possible with large mutation + small inheritance)

In progress:Adaptive resampling methods might be merged with bounds on resamplingnumbers⇒ in progress, unclear benefit for the moment.

16 / 76


Noisy Optimization


1 Motivation








5 Conclusion

17 / 76


Noisy Optimization


Variance reduction techniques

Monte Carlo [Hammersley and Handscomb, 1964, Billingsley, 1986]

Ef (x , ω) =1n

n∑i=1

f (x , ωi)→ Eωf (x , ω). (7)

Quasi Monte Carlo [Cranley and Patterson, 1976, Niederreiter, 1992,Wang and Hickernell, 2000, Mascagni and Chi, 2004]Use samples aimed at being as uniform as possible over the domain.

18 / 76


Noisy Optimization


Variance reduction techniques: white-box

Antithetic variates

Ensure some regularity of the sampling by using symmetriesEωf (x , ω)=1

n

∑n/2i=1 (f (x , ωi) + f (x ,−ωi)) .

Importance sampling

Instead of sampling ω with density dP, we sample ω′ with density dP′

Eωf (x , ω)=1n

∑ni=1

dP(ωi )dP′(ωi )

f (x , ωi).

Control variates

Instead of estimating Eωf (x , ω), we estimate Eω (f (x , ω)− g(x , ω))using Eωf (x , ω) = Eωg(x , ω)︸︷︷︸

A

+Eω (f (x , ω)− g(x , ω))︸︷︷︸B

.

19 / 76


Noisy Optimization


Variance reduction techniques: grey-box

Common random numbers (CRN) or pairing

Use the same samples ω1, . . . , ωn for all the population xn,1, . . . , xn,λ.

Seedn = {seedn,1, . . . , seedn,mn}.

Eωf (xn,k , ω) is then approximated as

1mn

mn∑i=1

f (xn,k , seedn,i).

Different forms of pairing:

Seedn is the same for all n

mn increases and nested sets Seedn, i.e.

∀n, i ≤ mn, mn+1 ≥ mn, seedn,i = seedn+1,i

all individuals in an offspring use the same seeds,+ seeds are 100% changed between offspring

20 / 76


Noisy Optimization


Pairing: Partial conclusion

No details, just our conclusion:

“almost” black-box

easy to implement

applicable for most applicationsOn the realistic problem, pairing provided a great improvementBut there are counterexamples in which it is detrimental.

21 / 76


Portfolio and noisy optimization

Portfolio: state of the art

1 Motivation








5 Conclusion

22 / 76



Portfolio: state of the art

Portfolio of optimization algorithms

Usually:

Portfolio→ Combinatorial Optimization (SAT Competition)

Recently:

Portfolio→ Continuous Optimization [Baudis and Posık, 2014]

This work:

Portfolio→ Noisy Optimization

↪→ Portfolio = choosing, online, between several algorithms

23 / 76



Relationship between portfolio and noisy optimization

1 Motivation








5 Conclusion

24 / 76



Relationship between portfolio and noisy optimization

Why portfolio in Noisy Optimization?

Stochastic problem

limited budget (time or total number of evaluations)

target: anytime convergence to the optimum

black-box

2 How to choose a suitable solver?

Algorithm Portfolios:Select automatically the best in a finite set of solvers

2Image from http://ethanclements.blogspot.fr/2010/12/postmodernism-essay-question.html

25 / 76



Portfolio of noisy optimization methods

1 Motivation








5 Conclusion

26 / 76




Portfolio of noisy optimization methods: proposal

A finite number of given noisy optimization solvers, “orthogonal”

Unfair distribution of budget

Information sharing (not very helpful here...)

→ Performs almost as well as the best solver

27 / 76




Portfolio of noisy optimization methods: NOPA

Algorithm 1 Noisy Optimization Portfolio Algorithm (NOPA).1: Input noisy optimization solvers Solver1, Solver2 . . . , SolverM

2: Input a lag function LAG : N+ 7→ N+

3: Input a non-decreasing integer sequence r1, r2, . . . I Periodic comparisons4: Input a non-decreasing integer sequence s1, s2, . . . I Number of resamplings5: n← 1 I Number of selections6: m← 1 I NOPA’s iteration number7: i∗ ← null I Index of recommended solver8: x∗ ← null I Recommendation9: while budget is not exhausted do

10: if m ≥ rn then11: i∗ = arg min

i∈{1,...,M}Esn [f (xi,LAG(rn))] I Algorithm selection

12: n← n + 113: else14: for i ∈ {1, . . . ,M} do15: Apply one evaluation for Solveri

16: end for17: m← m + 118: end if19: x∗ = xi∗,m I Update recommendation20: end while 28 / 76




Portfolio of noisy optimization methods: compare solvers early

lag function:LAG(n) ≤ n: lag

∀i ∈ {1, . . . ,M}, xi,LAG(n) 6= or = xi,n

Why this lag ?algorithms’ ranking is usually stable→ no use comparing the very last

it’s much cheaper to compare old points:comparing good (i.e. recent) points → comparing points with similar fitnesscomparing points with similar fitness → very expensive

29 / 76




Portfolio of noisy optimization methods: Theorem with fair budget distribution

Theorem with fair budget distribution

Assume that

each solver i ∈ {1, . . . ,M} has simple regret SRi,n = (1 + o(1)) Cinαi (as usual)

and noise variance = constant.

Then for some universal rn, sn, LAGn, a.s. there exists n0 such that, for n ≥ n0:

portfolio always chooses an optimal solver (optimal αi and Ci );

the portfolio uses ≤ M · rn(1 + o(1)) evaluations⇒ M times more than the bestsolver.

Interpretation

Negligible comparison budget (thanks to lag)

On classical log-log graphs, the portfolio should perform similarly to the bestsolver, within the log(M) shift (proved)

30 / 76




INOPA: introducing an unfair budget

NOPA: same budget for all solvers.

Remark:we compare old recommendations (LAGn << n)

they were known long ago, before spending all this budget

therefore, except selected solvers, most of the budget is wasted :(

⇒ Lazy evaluation paradigm: evaluate f (.) only when you need it for your output

⇒ Improved NOPA (INOPA): unfaired budget distribution

Use only LAG(rn) evaluations (negligible) on the sub-optimal solvers (INOPA)

log(M ′) shift with M ′ the number of optimal solvers (proved)

31 / 76




Experiments: Unimodal case

Noisy Optimization Algorithms (NOAs):SA-ES: Self-Adaptive Evolution Strategy

Fabian’s algorithm: a first-order method using gradients estimated by finitedifferences [Dvoretzky et al., 1956, Fabian, 1967]

Noisy Newton’s algorithm: a second-order method using a Hessian matrixapproximated also by finite differences (our contribution in CNO)

Solvers z = 0 (constant var) z = 1 (linear var) z = 2 (quadratic var)RSAES .114± .002 .118± .003 .113± .003Fabian1 −.838± .003 −1.011± .003 −1.016± .003Fabian2 .108± .003 −1.339± .003 −2.481± .003Newton −.070± .003 −.959± .092 −2.503± .285

NOPA no lag −.377± .048 −.978± .013 −2.106± .003NOPA −.747± .003 −.937± .005 −2.515± .095INOPA −.822± .003 −1.359± .027 −3.528± .144

Table: Slope(SR) for f (x) = ||x ||2 + ||x ||zN in dimension 15. Computation time = 40s.

32 / 76




Experiments: Stochastic unit commitment problem

Solver d = 45 d = 63 d = 105 d = 125RSAES .485± .071 .870± .078 .550± .097 .274± .097Fabian1 1.339± .043 1.895± .040 1.075± .047 .769± .047Fabian2 .394± .058 .521± .083 .436± .097 .307± .097Newton .749± .101 1.138± .128 .590± .147 .312± .147INOPA .394± .059 .547± .080 .242± .101 .242± .101

Table: Stochastic unit commitment problem (minimization). Computation time = 320s.

What’s more:Given a same budget, a INOPA of identical solvers can outperform its mono-solvers.

33 / 76



Conclusion

1 Motivation








5 Conclusion

34 / 76



Conclusion

Portfolio and noisy optimization: Conclusion

Main conclusion:portfolios also great in noisy opt.(because in noisy opt., with lag, comparison cost = small)

We show mathematically and empirically a log(M) shift when using M solvers, ona classical log-log scale

Bound improved to log(M ′) shift, with M ′ = nb. of optimal solvers, with unfairdistribution of budget (INOPA)

Take-home messages

portfolio = little overhead

unfair budget = no overhead if “orthogonal” portfolio (orthogonal→ M ′ = 1)We mathematically confirmed the idea of orthogonality found in[Samulowitz and Memisevic, 2007]

35 / 76


Adversarial portfolio

Adversarial bandit

1 Motivation








5 Conclusion

36 / 76



Adversarial bandit

Framework: Zero-sum matrix games

Game defined by matrix M

I choose (privately) i

Simultaneously, you choose (privately) j

I earn Mi,j

You earn −Mi,j

So this is zero-sum.

Figure: 0-sum matrix game.

rock paper scissorsrock 0.5 0 1

paper 1 0.5 0scissors 0 1 0.5

Table: Example of 1-sum matrix game:Rock-paper-scissors.

37 / 76



Adversarial bandit

Framework: Nash Equilibrium (NE)

Definition (Nash Equilibrium)

Zero-sum matrix game M

My strategy = probability distrib. on rows = x

Your strategy = probability distrib. on cols = y

Expected reward = xT My

There exists x∗, y∗ such that ∀x , y ,

xT My∗ ≤ x∗T My∗ ≤ x∗T My . (8)

(x∗, y∗) is a Nash Equilibrium (no unicity).

Definition (Approximate ε-Nash Equilibria)

(x∗, y∗) such thatxT My∗−ε ≤ x∗T My∗ ≤ x∗T My+ε. (9)

Example: The NE of Rock-paper-scissors is unique: (1/3, 1/3, 1/3).

38 / 76



Adversarial bandit

1 Motivation








5 Conclusion

39 / 76



Adversarial bandit

Methods for computing Nash Equilibrium

Algorithm Complexity Exact solution? Confidence Time

LP [von Stengel, 2002] O(Kα), α > 6 yes 1 constant

[Grigoriadis and Khachiyan, 1995] O(K log(K )

ε2 ) no 1 random

[Grigoriadis and Khachiyan, 1995] O(log2(K )

ε2 ) no 1 randomwith K

log(K )processors

EXP3 [Auer et al., 1995] O(K log(K )

ε2 ) no 1 − δ constant

Inf [Audibert and Bubeck, 2009] O(K log(K )

ε2 ) no 1 − δ constant

Our algorithm O(k3k K log K ) yes 1 − δ constant(if NE is k -sparse)

Table: State-of-the-art of computing Nash Equilibrium for ESMG MK×K .

40 / 76



Adversarial bandit

Adversarial bandit algorithm Exp3.P

Algorithm 2 Exp3.P: variant of Exp3. η and γ are two parameters.

1: Input η ∈ R I how much the distribution becomes peaked2: Input γ ∈ (0, 1] I exploration rate3: Input a time horizon (computational budget) T ∈ N+ and the number of arms K ∈ N+

4: Output a Nash-optimal policy p5: y ← 06: for i ← 1 to K do I initialization7: ωi ← exp( ηγ3

√TK )

8: end for9: for t ← 1 to T do

10: for i ← 1 to K do11: pi ← (1− γ) ωi∑K

j=1 ωj+ γ

K

12: end for13: Generate it according to (p1, p2, . . . , pK )14: Compute reward Rit ,t15: for i ← 1 to K do16: if i == it then

17: Ri ←Rit ,t

pi18: else19: Ri ← 020: end if

21: ωi ← ωi exp(γ3K (Ri +

η

pi√

TK)

)22: end for23: end for24: Return probability distribution (p1, p2, . . . , pK )

41 / 76



Contribution for computing Nash Equilibrium

1 Motivation








5 Conclusion

42 / 76




Sparse Nash Equilibria (1/2)

Considering x∗ a Nash-optimal policy for ZSMG MK×K :Let us assume that x∗ is unique and has at most k non-zero components (sparsity).

Let us show that x∗ is “discrete”:(Remark: Nash = solution of linear programming problem)

⇒ x∗ = also NE of a k × k submatrix: M ′k×k⇒ x∗ = solution of LP in dimension k⇒ x∗ = solution of k lin. eq. with coefficients in {−1, 0, 1}⇒ x∗ = inv-matrix × vector⇒ x∗ = obtained by “cofactors / det matrix”⇒ x∗ has denominator at most k

k2

By Hadamard determinant bound [Hadamard, 1893], [Brenner and Cummings, 1972]

43 / 76




Sparse Nash Equilibria (2/2)

Computation of sparse Nash Equilibria

Under assumption that the Nash is sparse:

x∗ is rational with “small” denominator (previous slide!)

So let us compute an ε-Nash (with ε small enough!) (sublinear time!)

And let us compute its closest approximation with “small denominator”(Hadamard)

Two new algorithms for exact Nash:

Rounding-EXP3: switch to closest approximation

Truncation-EXP3: remove small components and work on the remainingsubmatrix (exact solving)(requested precision ' k−3k/2 only⇒ compl. k3k K log K )

44 / 76




1 Motivation








5 Conclusion

45 / 76




Our proposal: Parameter-free adversarial bandit

No details here; in short:

We compare various existing parametrizations of EXP3

We select the best

We add sparsity as follows:

for a budget of T rounds of EXP3, threshold = maxi∈{1,...,m}

(Txi )α

T

⇒ we get a parameter-free bandit for adversarial problems

46 / 76



Application to robust optimization (power systems)

1 Motivation








5 Conclusion

47 / 76




Scenarios

Policies

Simulator

Average perf.

Robustness

Average cost

technological berakthrough

CO2 penalization

Maintain a connection

Create new connection,...

Scenarios

Policy

R(k, s)

Examples of scenario: CO2 penalization, gas curtailment in Eastern Europe,technological breakthroughExamples of policy: massive nuclear power plant building, massive renewableenergies

48 / 76




Nash-planning for scenario-based decision making

Decision tools

METHOD EXTRACTION EXTRACTION COMPUTATIONAL INTERPRETATIONOF POLICIES OF CRITICAL COSTSCENARIOS

Wald One One per policy K × S Nature decides later,minimizing our reward

Savage One One per policy K × S Nature decides later,maximizing our regret

Scenarios Handcrafted Handcrafted K ′ × S′ Human expertise

our proposal: Nash Nash-optimal Nash-optimal (K + S) × log(K + S)(∗)Nature decides

privately, before us

Table: Comparison between several tools for decision under uncertainty. K = |K| and S = |S|.⇒ in this case sparsity performs very well. (*)improved if sparse, by our previous result!

Nash⇒ fast selection of scenarios and options: sparsity bothfastens the NE computation andmakes the output more readable (smaller matrix)

49 / 76




Application to power investment problem: Testcase and parameterization

We consider (big toy problem):

310 investment policies (k)

39 scenarios (s)

reward: (k, s) 7→ R(k, s)

We

use Nash Equilibria, for their principled nature (Nature decides first and privately!that’s reasonable, right ?) and low computational cost in large scale settings

compute the equilibria thanks to EXP3 (tuned)...

... with sparsity, forimproving the precisionreducing the number of pure strategies in our recommendation (unreadable matrixotherwise!)

50 / 76




Application to power investment problem: Sparse-Nash algorithm

Algorithm 3 The Sparse-Nash algorithm for solving decision under uncertainty prob-lems.

Input A family K of possible decisions k (investment policies).Input A family S of scenarios s.Input A mapping (k , s) 7→ Rk,s, providing the rewardsRun truncated .Exp3.P on R, get

a probability distribution on K (support = key options) anda probability distribution on S (support = critical scenarios).

Emphasize the policy with highest probability.

51 / 76




Application to power investment problem: Results

αAVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS

T = K T = 10K T = 50K T = 100K T = 500K T = 1000K0.1 13804± 52 non-sparse non-sparse non-sparse non-sparse non-sparse0.3 2810± 59 non-sparse non-sparse non-sparse non-sparse non-sparse0.5 396± 16 non-sparse non-sparse 59049± 197 49819± 195 non-sparse0.7 43± 3 58925± 27 55383± 1507 46000± 278 9065± 160 non-sparse0.9 4± 0 993± 64 797± 42 504± 25 98± 5 52633± 523

0.99 1± 0 2± 0 3± 0 2± 0 2± 0 7± 1

αROBUST SCORE: WORST REWARD AGAINST PURE STRATEGIES

T = K T = 10K T = 50K T = 100K T = 500K T = 1000KNT 4.922e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-010.1 4.948e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-010.3 5.004e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-010.5 5.059e-01 4.928e-01 4.956e-01 4.991e-01 5.242e-01 4.938e-010.7 5.054e-01 4.928e-01 4.965e-01 5.031e-01 5.317e-01 4.938e-010.9 4.281e-01 5.137e-01 5.151e-01 5.140e-01 5.487e-01 4.960e-010.99 3.634e-01 4.357e-01 4.612e-01 4.683e-01 5.242e-01 5.390e-01Pure 3.505e-01 3.946e-01 4.287e-01 4.489e-01 5.143e-01 4.837e-01

Table: Average sparsity level and robust score. α is the truncation parameter. T is the budget.

52 / 76




Application to power investment problem: summary

Define long term scenarios (plenty!) ?

Build simulator R(k , s)

Classical solution (Savage): mink∈K

maxs∈S

regret(k , s)

Our proposal (Nash): automatically select submatrix

Our proposed tool has the following advantages:Natural extraction of interesting policies and critical scenarios:

α = .7 provides stable (and proved) results,but the extracted submatrix becomes easily readable (small enough) with largervalues of α.

Faster than Wald or Savage methodologies.

Take-home messages

We get a fast criterion, faster than Wald’s or Savage’s criteria, with a naturalinterpretation, and more readable⇒ but stochastic recommendation!

53 / 76



Application to games

1 Motivation








5 Conclusion

54 / 76




Two parts:

Seeds matter: **choose** your seeds !

More tricky but worth the effort: position-specific seeds !(towards a better asymptotic behavior of MCTS ?)

55 / 76




Optimizing random seeds: Correlations

Figure: Success rate per seed (ranked) in 5x5 Domineering, with standard deviations on y-axis:the seed has a significant impact.

Fact: the random seed matters !56 / 76




Optimizing random seeds: State-of-the-art

Stochastic algorithms randomly select their pseudo-random seed.

We propose to choose the seed(s), and to combine them.

State-of-the-art for combining random seeds:

[Nagarajan et al., 2015] combines several AIs

[Gaudel et al., 2010] uses Nash methods for combining several opening books

[Saint-Pierre and Teytaud, 2014] constructs several AIs from a single stochasticone and combines them by the BestSeed and Nash approaches

57 / 76




Trick: present results with one white seed per column and one black seed perrow

... ...

Column player gets 1-Mi,j

Row player gets Mi,j Mi,j

M1,1

M2,1

MK,1

M1,2 M1,K

M2,2 ...

... ...

...

...

...

...

...

...

MK,KMK,2

M2,K

...

K random seed for White

K random

seeds for Black

... ...

...

Figure: One black seed per row, one white seed per column.

58 / 76




Propositions: Nash & BestSeed

Nash

Nash = combines rows (more robust; we will see later)

BestSeed

BestSeed = just pick up the best row / best column59 / 76




Better than squared matrices: rectangle methods

Remark:for choosing a row, if #rows = #cols, then #rows is more critical than #cols;for a given budget, increase #rows and decrease #cols (same budget!)

K

KKt x Kt Kt

KtFigure: Left: square matrix of a game; right: rectangles of a game (K >> Kt ).

60 / 76




Does it work ? experiments on Domineering

The opponent uses seeds which have never been used during the learning of theportfolio (cross-validation).

Figure: Results for domineering, with the BestSeed (left) and the Nash (right) approach, against the baseline (K ′ = 1) and theexploiter ( K ′ > 1; opponent who “learns” very well). Kt = 900 in all experiments.

BestSeed performs well against the original algorithm (K ′ = 1), but poorlyagainst the exploiter ( K ′ > 1).Nash outperforms the original algorithm both w.r.t K ′ = 1 (all cases) and K ′ > 1(most cases).

61 / 76




Beyond cross-validation: experiments with transfer in the game of Go

Learning: BestSeed is applied to GnuGo, with MCTS and a budget of 400simulations.

Test: against “classical” GnuGo, i.e. the non-MCTS version of GnuGo.

Opponent Performance of BestSeed Performance with randomized seedGnuGo-classical level 1 1. (± 0 ) .995 (± 0 )GnuGo-classical level 2 1. (± 0 ) .995 (± 0 )GnuGo-classical level 3 1. (± 0 ) .99 (± 0 )GnuGo-classical level 4 1. (± 0 ) 1. (± 0 )GnuGo-classical level 5 1. (± 0 ) 1. (± 0 )GnuGo-classical level 6 1. (± 0 ) 1. (± 0 )GnuGo-classical level 7 .73 (± .013 ) .061 (± .004 )GnuGo-classical level 8 .73 (± .013 ) .106 (± .006 )GnuGo-classical level 9 .73 (± .013 ) .095 (± .006 )

GnuGo-classical level 10 .73 (± .013 ) .07 (± .004 )

Table: Performance of “BestSeed” and “randomized seed” against “classical” GnuGo.

Previous slide: we win against the AI which we have trained (but different seeds!).This slide: we improve the winning rate against another AI.

62 / 76




Optimizing random seeds: Partial conclusion

Conclusion:Seed optimization (NOT position specific) = can be seen as a simple andeffective tool for building an opening book with no development effort, no humanexpertise, no storage of database.

“Rectangle” provides significant improvements.

The online computational overhead of the methods is negligible.

The boosted AIs significantly outperform the baselines.

BestSeed performs well, but can be overfitted⇒ strength of Nash.

Further work:The use of online bandit algorithms for dynamically choosing K/Kt .

Note:The BestSeed and the Nash algorithms are not new.The algorithm and analysis of rectangles is new.The analysis of the impact of seeds is new.The applications to Domineering, Atari-go and Breakthfrough are new.

63 / 76




Two parts:

Seeds matter: **choose** your seeds !

More tricky but worth the effort: position-specific seeds !(towards a better asymptotic behavior of MCTS ?)

64 / 76




Optimizing position-based random seeds: Tsumego

Tsumego (by Yoji Ojima, Zen’s author)

Input: a Go position

Question: is this situation a win for white ?

Output: yes or no

Why so important?At the heart of many game algorithms

In Go, Exptime complete [Robson, 1983]

65 / 76




Classical algorithms

Monte Carlo (MC)[Bruegmann, 1993, Cazenave, 2006, Cazenave and Borsboom, 2007]

Monte Carlo Tree Search (MCTS) [Bouzy, 2004, Coulom, 2006]

Nested MC [Cazenave, 2009]

Voting scheme among MCTS [Gavin et al., ]

⇒ here weighted voting scheme among MCTS

66 / 76




Evaluation of the game value

Algorithm 4 Evaluation of the game value.1: Input current state s2: Input a policy πB for Black, depending on a seed in N+

3: Input a policy πW for White, depending on a seed in N+

4: for i ∈ {1, . . . ,K} do5: for j ∈ {1, . . . ,K} do6: Mi,j ← outcome of the game starting in s with πB playing as Black with seed

b(i) and πW playing as White with seed w(j)7: end for8: end for9: Compute weights p for Black and q for White for the matrix M (either BestSeed,

Nash, or other)10: Return pT Mq I approximate value of the game M

67 / 76




Classical case (MC/MCTS): unpaired Monte Carlo averaging

b(1)

K*K random seeds for Black

b(i)b(2) b(K*K) ... ...

w(1)

K*K random seeds for White

w(i)w(2) w(K*K) ... ...

... ...

Column player gets 1-Mi,j

Row player gets Mi,j Mi,j

M1,1

M2,1

MK,1

M1,2 M1,K

M2,2 ...

... ...

...

...

...

...

...

...

MK,KMK,2

M2,K

...

K random seed for White

K random

seeds for Black

... ...

...

Figure: Left: unpaired case (classical estimate by averaging); right: paired case: K seeds vs Kseeds.

68 / 76




Experiments: Applied methods and setting

Compared methods for approximating v(s)

Three methods use K 2 indep. batches of M MCTS-simulations using matrix ofseeds:

Nash reweighting = Nash-valueBestSeed reweighting = Intersection best row / best colPaired MC estimate = Average of the matrix

One unpaired method: classical MC estimate (the average of K 2 randomMCTS)

Baseline: a single long MCTS (=state of the art !)→only one which is not K 2-parallel

Parameter setting: GnuGo-MCTS [Bayer et al., 2008]

setting A: 1 000 simulations per move

setting B: 80 000 simulations per move

69 / 76




Experiments: Average results over 50 Tsumego problems

0 200 400 600 800 10000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Submatrix Size (N2)

Perf

orm

ance

NashPairedBestUnpairedMCTS(1)

(a) setting A: 1 000 simulations per move.

0 200 400 600 800 10000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Submatrix Size (N2)

Perf

orm

ance

Nash

Paired

Best

Unpaired

MCTS(1)

(b) setting B: 80 000 simulations per move.

Figure: Average over 50 Tsumego problems. x-axis: #simulations, y-axis: %correct answers.MCTS(1): one single MCTS run using all the budget.

Setting A (small budget): MCTS(1) outperforms weighted average of 81 MCTSruns (but we are more parallel !)Setting B (large budget): we outperform MCTS and all others by far⇒ consistent with the limited scalability of MCTS for huge number of sim.

70 / 76




Optimizing position-based random seeds: Partial conclusion

Main conclusion:novel way of evaluating game values using Nash Equilibrium(theoretical validation & experiments on 50 Tsumego problems)Nash or BestSeed predictor requires far less simulations for finding accurateresults + sometimes consistent whereas original MC is not !

We outperformedaverage of MCTS runs sharing the budgeta single MCTS using all the budget

→ For M large enough, our weighted averaging of 81 single MCTS runs with Msimulations is better than a MCTS run with 81M simulations :)

Take-home messages

We classify positions (“black wins” vs “white wins”).We use a WEIGHTED average of K 2 MCTS runs of M simulations.Our approach outperforms:

all tested voting schemes among K 2 MCTS estimates of M simulations,

and a pure MCTS of K 2 ×M simulations,

when M is large and K 2 = 81.

71 / 76



Conclusion

1 Motivation








5 Conclusion

72 / 76



Conclusion

A work on sparsity, at the core of ZSMG

A parameter-free adversarial bandit, obtained by tuning (no details provided inthis talk) + sparsity

Applications of ZSMG:

Nash + Sparsity→ faster + more readable robust decision making

Random seeds = new MCTS variants ?validated as opening book learning (Go, Atari-Go, Domineering, Breakthrough,Draughts,Phantom-Go. . . )position-specific seeds validated on Tsumego

73 / 76


Conclusion

1 Motivation








5 Conclusion

74 / 76


Conclusion

Conclusion & Further work

Noisy opt:An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended toother surrogate modelsES/DE with resamplings have good rates for linear/quad var, and/or robust criteria(UR); for other cases resamplings are not sufficient for optimal rates (“mutate largeinherit small” + huge population and/or surrogate models...)

Portfolio:Application to noisy opt.; great benefits with several solvers of a given modelTowards wider applications: portfolio of models ?

Adversarial portfolio: successful use of sparsity; parameter-free bandits ?

MCTS and seeds: room for 5 ph.D. ... if there is funding for it :-)

Most works here→ ROBUSTNESS by COMBINATION(robust to solvers, to models, to parameters, to seeds ...)

75 / 76


Conclusion

Thanks for your attention !Thanks to all the collaborators from Artelys, INRIA, CNRS, Univ.Paris-Saclay, Univ. Paris-Dauphine, Univ. du Littoral, NDHU ...

76 / 76


References

Some references I

Audibert, J.-Y. and Bubeck, S. (2009).Minimax policies for adversarial and stochastic bandits.In proceedings of the Annual Conference on Learning Theory (COLT).

Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (1995).Gambling in a rigged casino: the adversarial multi-armed bandit problem.In Proceedings of the 36th Annual Symposium on Foundations of ComputerScience, pages 322–331. IEEE Computer Society Press, Los Alamitos, CA.

Auger, A. (2005).Convergence results for the (1, λ)-sa-es using the theory of φ-irreducible markovchains.Theoretical Computer Science, 334(1):35–69.

Baudis, P. and Posık, P. (2014).Online black-box algorithm portfolios for continuous optimization.In Parallel Problem Solving from Nature–PPSN XIII, pages 40–49. Springer.

77 / 76


References

Some references II

Bayer, A., Bump, D., Daniel, E. B., Denholm, D., Dumonteil, J., Farneback, G.,Pogonyshev, P., Traber, T., Urvoy, T., and Wallin, I. (2008).Gnu go 3.8 documentation.Technical report, Free Software Fundation.

Billingsley, P. (1986).Probability and Measure.John Wiley and Sons.

Bouzy, B. (2004).Associating shallow and selective global tree search with Monte Carlo for 9x9 Go.

In 4rd Computer and Games Conference, Ramat-Gan.

Brenner, J. and Cummings, L. (1972).The Hadamard maximum determinant problem.Amer. Math. Monthly, 79:626–630.

Bruegmann, B. (1993).Monte-carlo Go (unpublished drafthttp://www.althofer.de/bruegmann-montecarlogo.pdf).

78 / 76


References

Some references III

Bubeck, S., Munos, R., and Stoltz, G. (2011).Pure exploration in finitely-armed and continuous-armed bandits.Theoretical Computer Science, 412(19):1832–1852.

Cazenave, T. (2006).A phantom-go program.In van den Herik, H. J., Hsu, S.-C., Hsu, T.-S., and Donkers, H. H. L. M., editors,Proceedings of Advances in Computer Games, volume 4250 of Lecture Notes inComputer Science, pages 120–125. Springer.

Cazenave, T. (2009).Nested monte-carlo search.In Boutilier, C., editor, IJCAI, pages 456–461.

Cazenave, T. and Borsboom, J. (2007).Golois wins phantom go tournament.ICGA Journal, 30(3):165–166.

79 / 76


References

Some references IV

Coulom, R. (2006).Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search.In P. Ciancarini and H. J. van den Herik, editors, Proceedings of the 5thInternational Conference on Computers and Games, Turin, Italy, pages 72–83.

Cranley, R. and Patterson, T. (1976).Randomization of number theoretic methods for multiple integration.SIAM J. Numer. Anal., 13(6):904,1914.

Dupac, V. (1957).O Kiefer-Wolfowitzove aproximacnı Methode.Casopis pro pestovanı matematiky, 082(1):47–75.

Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956).Asymptotic minimax character of the sample distribution function and of theclassical multinomial estimator.Annals of Mathematical Statistics, 33:642–669.

Fabian, V. (1967).Stochastic Approximation of Minima with Improved Asymptotic Speed.Annals of Mathematical statistics, 38:191–200.

80 / 76


References

Some references V

Gaudel, R., Hoock, J.-B., Perez, J., Sokolovska, N., and Teytaud, O. (2010).A Principled Method for Exploiting Opening Books.In International Conference on Computers and Games, pages 136–144,Kanazawa, Japon.

Gavin, C., Stewart, S., and Drake, P.Result aggregation in root-parallelized computer go.

Grigoriadis, M. D. and Khachiyan, L. G. (1995).A sublinear-time randomized approximation algorithm for matrix games.Operations Research Letters, 18(2):53–58.

Hadamard, J. (1893).Resolution d’une question relative aux determinants.Bull. Sci. Math., 17:240–246.

Hammersley, J. and Handscomb, D. (1964).Monte carlo methods, methuen & co.Ltd., London, page 40.

81 / 76


References

Some references VI

Heidrich-Meisner, V. and Igel, C. (2009).Hoeffding and bernstein races for selecting policies in evolutionary direct policysearch.In ICML ’09: Proceedings of the 26th Annual International Conference onMachine Learning, pages 401–408, New York, NY, USA. ACM.

Jebalia, M. and Auger, A. (2008).On multiplicative noise models for stochastic search.In et a.l., G. R., editor, Conference on Parallel Problem Solving from Nature(PPSN X), volume 5199, pages 52–61, Berlin, Heidelberg. Springer Verlag.

Liu, J., Saint-Pierre, D. L., Teytaud, O., et al. (2014).A mathematically derived number of resamplings for noisy optimization.In Genetic and Evolutionary Computation Conference (GECCO 2014).

Mascagni, M. and Chi, H. (2004).On the scrambled halton sequence.Monte-Carlo Methods Appl., 10(3):435–442.

82 / 76


References

Some references VII

Mnih, V., Szepesvari, C., and Audibert, J.-Y. (2008).Empirical Bernstein stopping.In ICML ’08: Proceedings of the 25th international conference on Machinelearning, pages 672–679, New York, NY, USA. ACM.

Nagarajan, V., Marcolino, L. S., and Tambe, M. (2015).Every team deserves a second chance: Identifying when things go wrong(student abstract version).In 29th Conference on Artificial Intelligence (AAAI 2015), Texas, USA.

Niederreiter, H. (1992).Random Number Generation and Quasi-Monte Carlo Methods.

Rechenberg, I. (1973).Evolutionstrategie: Optimierung Technischer Systeme nach Prinzipien desBiologischen Evolution.Fromman-Holzboog Verlag, Stuttgart.

Robson, J. M. (1983).The complexity of go.In IFIP Congress, pages 413–417.

83 / 76


References

Some references VIII

Rolet, P. and Teytaud, O. (2010).Adaptive noisy optimization.In Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekart, A., Esparcia-Alcazar, A.,Goh, C.-K., Merelo, J., Neri, F., Preuß, M., Togelius, J., and Yannakakis, G.,editors, Applications of Evolutionary Computation, volume 6024 of Lecture Notesin Computer Science, pages 592–601. Springer Berlin Heidelberg.

Saint-Pierre, D. L. and Teytaud, O. (2014).Nash and the Bandit Approach for Adversarial Portfolios.In CIG 2014 - Computational Intelligence in Games, Computational Intelligencein Games, page 7, Dortmund, Germany. IEEE, IEEE.

Samulowitz, H. and Memisevic, R. (2007).Learning to solve qbf.In Proceedings of the 22nd National Conference on Artificial Intelligence, pages255–260. AAAI.

Shamir, O. (2013).On the complexity of bandit and derivative-free stochastic convex optimization.In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14,2013, Princeton University, NJ, USA, pages 3–24.

84 / 76


References

Some references IX

Storn, R. (1996).On the usage of differential evolution for function optimization.In Fuzzy Information Processing Society, 1996. NAFIPS. 1996 BiennialConference of the North American, pages 519–523. IEEE.

von Stengel, B. (2002).Computing equilibria for two-person games.Handbook of Game Theory, 3:1723 – 1759.

Wang, X. and Hickernell, F. (2000).Randomized halton sequences.Math. Comput. Modelling, 32:887–899.

85 / 76

my phd defence

Presentations & Public Speaking