overview and recent advances in derivative free...
TRANSCRIPT
![Page 1: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/1.jpg)
Overview and Recent Advances in Derivative Free Optimization
Katya Scheinberg
Joint work with A. Berahas, J. Blanchet, L. Cao, C. Cartis, A. R. Conn, M. Menickelly, C.Paquette, L. Vicente
School of Operations Research and Information Engineering
IPAM Workshop: From Passive to Active: Generative and Reinforcement Learning withPhysics, Sept 23-27, 2019
![Page 2: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/2.jpg)
Local and Global Optimization
From Roos, Terlaky and DeKlerk, ”Nonlinear Optimisation”, 2002.
Katya Scheinberg (Cornell University) 2 / 37
![Page 3: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/3.jpg)
Optimization and gradient descent
Katya Scheinberg (Cornell University) 3 / 37
![Page 4: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/4.jpg)
Black Box Optimization Problems
minx∈Rn
f (x)
x f(x)BLACK BOX
f nonlinear function; derivatives of f not available
Noisy functions, stochastic or deterministic
minx∈Rn
f (x) = φ(x) + ε(x) minx∈Rn
f (x) = φ(x)(1 + ε(x))
Katya Scheinberg (Cornell University) 4 / 37
![Page 5: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/5.jpg)
Motivation
Machine Learning
Source(s): https://blog.statsbot.co/, https://campus.datacamp.com/
Deep Learning
Source(s): https://medium.com/
Reinforcement Learning
Source(s): http://people.csail.mit.edu/
Katya Scheinberg (Cornell University) 5 / 37
![Page 6: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/6.jpg)
Optimizingpropertiesobtainedfromexpensivesimulationsorexperiments
Criticaltemperaturesfrommoleculardynamics
simulations
Dignon et al. ACS Cent. Sci., Article ASAP
ReactionrateestimationfromkineticMonteCarlo
simulations
Activationbarriersfromquantummechanicalnudgedelasticbandcalculations
ZACROS (http://zacros.org/tutorials) Andersen et al. Front. Chem. 2019
Yieldestimationfromexperimentalorganicsynthesis
reactorsystems
• Manyexamplesexistinthedomainofmolecularandmaterialssciencewherecalculatingapropertyrequiresexpensivecomputationsorexperiments
• Inmanyofthesecases,derivativesarenotavailable
Holmesetal.React.Chem.Eng.,2016,1,36
Katya Scheinberg (Cornell University) 6 / 37
![Page 7: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/7.jpg)
Derivative-free methods: direct and random search
Iterative algorithms that converge to a local optima.
In each iteration:1 Evaluate a set of sample points around the current iterate;2 Choose the sample point with the best function value;3 Make this point the next iterate;
Katya Scheinberg (Cornell University) 7 / 37
![Page 8: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/8.jpg)
Derivative free methods: model-based
Iterative algorithms that converge to a local optimum.
In each iteration:
1 Evaluate a set of sample points around the current iterate;
2 Interpolate the sample points with a linear or quadratic model;
3 Use this model to find the next iterate;
⇒
Katya Scheinberg (Cornell University) 8 / 37
![Page 9: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/9.jpg)
Model-Based Trust Region Method (pioneered by M.J.D. Powell)
(a) starting point (b) initial sampling
Katya Scheinberg (Cornell University) 9 / 37
![Page 10: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/10.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 10 / 37
![Page 11: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/11.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 11 / 37
![Page 12: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/12.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 12 / 37
![Page 13: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/13.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 13 / 37
![Page 14: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/14.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 14 / 37
![Page 15: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/15.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 15 / 37
![Page 16: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/16.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 16 / 37
![Page 17: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/17.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 17 / 37
![Page 18: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/18.jpg)
Model-Based Trust Region Method
Katya Scheinberg (Cornell University) 18 / 37
![Page 19: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/19.jpg)
Model-Based Trust Region Method
Shrinking and expanding trust region radius, exploiting curvature, efficient in terms of samples
Katya Scheinberg (Cornell University) 19 / 37
![Page 20: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/20.jpg)
Direct Search
11307 function evaluations
Katya Scheinberg (Cornell University) 20 / 37
![Page 21: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/21.jpg)
Random Search
3705 function evaluations
Katya Scheinberg (Cornell University) 21 / 37
![Page 22: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/22.jpg)
Trust Region Method
69 function evaluations
Katya Scheinberg (Cornell University) 22 / 37
![Page 23: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/23.jpg)
Active learning, generative models and derivative free optimizationoptimization
What does model-based derivative-free optimization do?
Using some ”labeled” data (x , f (x)), build a models m(x). What do we want from thatmodel m(x)? Quality? Simplicity?
Optimize m(x) or ”related function”, to obtain new potentially interesting data point. Whatdo we optimize?
Modify model (how?), repeat.
What do we need for convergence?
Katya Scheinberg (Cornell University) 23 / 37
![Page 24: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/24.jpg)
Assumptions on models for convergence
For trust region, first-order convergence
‖∇f (xk )−∇mk (xk )‖ ≤ O(∆k ),
For trust region, second-order convergence
‖∇2f (xk )−∇2mk (xk )‖ ≤ O(∆k )
‖∇f (xk )−∇mk (xk )‖ ≤ O(∆2k )
For line search, first-order converegnce
‖∇f (xk )−∇mk (xk )‖ ≤ O(αk‖∇mk‖)
Intuition
In other words, model should have comparable Taylor expansion as the true function w.r.t. thestep size.
Katya Scheinberg (Cornell University) 24 / 37
![Page 25: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/25.jpg)
Assumptions on models for convergence
For trust region, first-order convergence
‖∇f (xk )−∇mk (xk )‖ ≤ O(∆k ), w.p. 1− δ
For trust region, second-order convergence
‖∇2f (xk )−∇2mk (xk )‖ ≤ O(∆k ) w.p. 1− δ‖∇f (xk )−∇mk (xk )‖ ≤ O(∆2
k ) w.p. 1− δ
For line search, first-order converegnce
‖∇f (xk )−∇mk (xk )‖ ≤ O(αk‖∇mk‖) w.p. 1− δ
Intuition
In other words, model should have comparable Taylor expansion as the true function w.r.t. thestep size.
Katya Scheinberg (Cornell University) 24 / 37
![Page 26: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/26.jpg)
Building models via linear interpolation
m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.
Katya Scheinberg (Cornell University) 25 / 37
![Page 27: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/27.jpg)
Building models via linear interpolation
m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyn}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyn) − f (x)
∈ Rn, MY =
yT1
.
.
.
yTn
∈ Rn×n
Katya Scheinberg (Cornell University) 25 / 37
![Page 28: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/28.jpg)
Building models via linear interpolation
m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyn}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyn) − f (x)
∈ Rn, MY =
yT1
.
.
.
yTn
∈ Rn×n
Model m(y) constructed to satisfy interpolation conditions:
σMYg = FY
Katya Scheinberg (Cornell University) 25 / 37
![Page 29: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/29.jpg)
Building models via linear interpolation
m(y) = f (x) + g(x)T (y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyn}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyn) − f (x)
∈ Rn, MY =
yT1
.
.
.
yTn
∈ Rn×n
Model m(y) constructed to satisfy interpolation conditions:
σMYg = FY
Theorem [Conn, Scheinberg & Vicente, 2008]
Let Y = {x, x + σy1, . . . , x + σyn} be set of interpolation points such that maxi ‖yi‖ ≤ 1 and that MY is nonsingular.Suppose that the function f has L-Lipschitz continuous gradients. Then,
‖∇m(x)−∇f (x)‖ ≤‖M−1Y ‖2
√nσL
2.
Cost: O(n3) (reduces to O(n2) if MY is orthornormal and O(n2) if MY = I )
Katya Scheinberg (Cornell University) 25 / 37
![Page 30: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/30.jpg)
Quadratic Interpolation Models
m(y) = f (x) + g(x)T (y − x) +1
2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.
Katya Scheinberg (Cornell University) 26 / 37
![Page 31: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/31.jpg)
Quadratic Interpolation Models
m(y) = f (x) + g(x)T (y − x) +1
2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyN}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyN ) − f (x)
∈ RN , MY =
yT1 vec(y1y
T1 )
.
.
.
.
.
.
yTn vec(ynyTn )
∈ RN×N
Katya Scheinberg (Cornell University) 26 / 37
![Page 32: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/32.jpg)
Quadratic Interpolation Models
m(y) = f (x) + g(x)T (y − x) +1
2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyN}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyN ) − f (x)
∈ RN , MY =
yT1 vec(y1y
T1 )
.
.
.
.
.
.
yTn vec(ynyTn )
∈ RN×N
Model m(y) constructed to satisfy interpolation conditions:
σMY (g , vec(H)) = FY
Katya Scheinberg (Cornell University) 26 / 37
![Page 33: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/33.jpg)
Quadratic Interpolation Models
m(y) = f (x) + g(x)T (y − x) +1
2(y − x)TH(x)(y − x) : m(y) = f (y), ∀y ∈ Y.
Let Y = {x + σy1, ..., x + σyN}, σ > 0,
FY =
f (x + σy1) − f (x)
.
.
.f (x + σyN ) − f (x)
∈ RN , MY =
yT1 vec(y1y
T1 )
.
.
.
.
.
.
yTn vec(ynyTn )
∈ RN×N
Model m(y) constructed to satisfy interpolation conditions:
σMY (g , vec(H)) = FY
Theorem [Conn, Scheinberg & Vicente, 2008]
Let Y = {x, x + σy1, . . . , x + σyn+n(n+1)/2} be set of interpolation points such that maxi ‖yi‖ ≤ 1 and that MY is
nonsingular. Suppose that the function f has L-Lipschitz continuous Hessians. Then,
‖∇m(x)−∇f (x)‖ ≤ O(‖M−1Y ‖2nσ
2L).
‖∇2m(x)−∇2f (x)‖ ≤ O(‖M−1Y ‖2nσL
).
Cost: O(n6)
Katya Scheinberg (Cornell University) 26 / 37
![Page 34: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/34.jpg)
Interpolation model quality
⇒
⇒
Katya Scheinberg (Cornell University) 27 / 37
![Page 35: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/35.jpg)
Model deterioration
Katya Scheinberg (Cornell University) 28 / 37
![Page 36: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/36.jpg)
Some conclusions so far
Interpolation models allow for old points to be reused and hence are very economical interms of samples.
Linear algebra is expensive and more importantly can be ill-conditioned.
Can improve lin. alg. cost and conditioning by using pre-designed sample sets, but it is moreexpensive in terms of samples (e.g. FD needs n samples per gradient estimate).
What alternatives are there?
Katya Scheinberg (Cornell University) 29 / 37
![Page 37: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/37.jpg)
Gaussian Smoothing
F (x) = Eε∼N (0,I )f (x + σε) =
∫Rn
f (x + σε)π(ε|0, I )dε
π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y
F (x) is a Gaussian smoothed approximation to f (x)
∇F (x) =1
σEε∼N (0,I )f (x + σε)ε
Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)
g(x) =1
Nσ
N∑i=1
f (x + σεi )εi
Katya Scheinberg (Cornell University) 30 / 37
![Page 38: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/38.jpg)
Gaussian Smoothing
F (x) = Eε∼N (0,I )f (x + σε) =
∫Rn
f (x + σε)π(ε|0, I )dε
π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y
F (x) is a Gaussian smoothed approximation to f (x)
∇F (x) =1
σEε∼N (0,I )f (x + σε)ε
Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)
g(x) =1
Nσ
N∑i=1
f (x + σεi )εi
Issue: Variance →∞ as σ → 0
Katya Scheinberg (Cornell University) 30 / 37
![Page 39: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/39.jpg)
Gaussian Smoothing
F (x) = Eε∼N (0,I )f (x + σε) =
∫Rn
f (x + σε)π(ε|0, I )dε
π(y |x ,Σ) is the pdf of N (x ,Σ) evaluated at y
F (x) is a Gaussian smoothed approximation to f (x)
∇F (x) =1
σEε∼N (0,I )(f (x + σε)−f (x))ε
Idea: Approximate ∇f (x) by a sample average approximation of ∇F (x)
g(x) =1
Nσ
N∑i=1
(f (x + σεi )−f (x))εi
Katya Scheinberg (Cornell University) 30 / 37
![Page 40: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/40.jpg)
Gaussian Smoothing
N = 1, theoretical analysis of convergence rates for convex problems
used in reinforcement learning, no theory, N is large
uses interpolation on top of sample average approximation
uniform distribution on a ball for online learning
uniform distribution on a ball for model-free LQR
Katya Scheinberg (Cornell University) 31 / 37
![Page 41: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/41.jpg)
Analysis of Variance for Gaussian Smoothing
‖g(x)−∇f (x)‖ ≤ ‖g(x)−∇F (x)‖︸ ︷︷ ︸sample average error
+ ‖∇F (x)−∇f (x)‖︸ ︷︷ ︸smoothing error
≤ r +√nσL
Theorem [Berahas, Cao, S., 2019]
Suppose that the function f (x) has L-Lipschitz continuous gradients. Let g(x) denote the GSG approximation to ∇f (x). If
N ≥1
δr2
(3n‖∇f (x)‖2 +
n(n2 + 6n + 8)L2σ2
4
).
then, ‖g(x)−∇f (x)‖ ≤ r +√
nσL.
with probability at least 1− δ.
Essentially N ∼ 3n
Katya Scheinberg (Cornell University) 32 / 37
![Page 42: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/42.jpg)
Gradient Approximation Accuracy
numerical experiment setup and results:
f (x) =
n/2∑i=1
M sin(x2i−1) + cos(x2i )
+L−M
2nxT 1n×nx ,
which has ‖∇f (0)‖ =√
n2M. We use n = 20, M = 1, L = 2, σ = 0.01, and N = 4n for the
smoothing methods.
Katya Scheinberg (Cornell University) 33 / 37
![Page 43: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/43.jpg)
Gradient Approximation Accuracy
Katya Scheinberg (Cornell University) 34 / 37
![Page 44: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/44.jpg)
Algorithm Performance
More&Wild problems set (53 smooth problems)
1 2 4 8 16 32 64 128Performance Ratio
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(a) τ = 10−1
1 2 4 8 16 32 64Performance Ratio
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(b) τ = 10−3
1 2 4 8 16 32 64Performance Ratio
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(c) τ = 10−5
0 200 400 600 800Number of Function Evaluations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(d) τ = 10−1
0 100 200 300 400 500 600 700Number of Function Evaluations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(e) τ = 10−3
0 200 400 600 800 1000 1200 1400Number of Function Evaluations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FFD (LBFGS)CFD (LBFGS)GSG (SD,n)BSG (LBFGS,4n)DFOTR
(f) τ = 10−5
Performance and data profiles for best variant of each method. Top row: performance profiles; Bottom row:data profiles.
Katya Scheinberg (Cornell University) 35 / 37
![Page 45: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/45.jpg)
Algorithm Performance
FD = forward finite differenceLIOD = linear interpolation of orthogonal directionsLS = (backtracking) linear searchGSG = Gaussian smooth gradient
0 200 400 600 800
Iterations
-50
0
50
100
150
200
250
300
350
400
Reward
Swimmer
FDLIODLIOD (LS)GSG
0 1000 2000 3000 4000 5000
Iterations
-8000
-6000
-4000
-2000
0
2000
4000
6000
Reward
HalfCheetah
FDLIODLIOD (LS)GSG
0 500 1000 1500 2000
Iterations
-10 5
-10 4
-10 3
-10 2
-10 1
-10 0
Reward
Reacher
FDLIODLIOD (LS)GSG
Reinforcement learning tasks: Swimmer (left), HalfCheetah (center), Reacher (right).
Katya Scheinberg (Cornell University) 36 / 37
![Page 46: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/46.jpg)
Conclusions
Model based derivative free methods are efficient and theoretically sound
Select the type of models according to application but make sure theory applies
Use randomization only when necessary, as it can slow down convergence
Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to Derivative-freeOptimization MPS-SIAM Optimization series. SIAM, Philadelphia, USA, 2008.
Albert Berahas, Liyuan Cao, Krzyzstof Choromanski, Katya Scheinberg. A theoretical andempirical comparison of gradient approximations in derivative-free optimization, arXivpreprint arXiv:1904.11585,1905.01332, 2019.
Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methodsarXiv preprint arXiv:1904.11585, 2019.
Thank you!
Katya Scheinberg (Cornell University) 37 / 37
![Page 47: Overview and Recent Advances in Derivative Free Optimizationhelper.ipam.ucla.edu/publications/mlpws1/mlpws1_15766.pdfKatya Scheinberg (Cornell University) 2 / 37. Optimization and](https://reader034.vdocuments.site/reader034/viewer/2022052101/603b4eba8e91d72254057ba8/html5/thumbnails/47.jpg)
Conclusions
Model based derivative free methods are efficient and theoretically sound
Select the type of models according to application but make sure theory applies
Use randomization only when necessary, as it can slow down convergence
Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to Derivative-freeOptimization MPS-SIAM Optimization series. SIAM, Philadelphia, USA, 2008.
Albert Berahas, Liyuan Cao, Krzyzstof Choromanski, Katya Scheinberg. A theoretical andempirical comparison of gradient approximations in derivative-free optimization, arXivpreprint arXiv:1904.11585,1905.01332, 2019.
Jeffrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-free optimization methodsarXiv preprint arXiv:1904.11585, 2019.
Thank you!
Katya Scheinberg (Cornell University) 37 / 37