non-convex optimization - ubc computer science non-convex optimization gradient descent difficult to...
TRANSCRIPT
![Page 1: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/1.jpg)
Non-convex optimization
Issam Laradji
![Page 2: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/2.jpg)
Strongly Convex
f(x)
x
Objective function
![Page 3: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/3.jpg)
Strongly Convex
Assumptionsf(x)
x
Objective function
Gradient Lipschitz continuous
Strongly convex
![Page 4: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/4.jpg)
Strongly Convex
Assumptionsf(x)
x
Objective function
Gradient Lipschitz continuous
Strongly convex
Randomized coordinate descent
![Page 5: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/5.jpg)
Non-strongly Convex optimization
Assumptions
Gradient Lipschitz continuous
Convergence rate
Compared to the strongly convex convergence rate
![Page 6: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/6.jpg)
Non-strongly Convex optimization
![Page 7: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/7.jpg)
Non-Strongly Convex
Assumptions
Objective function
Lipschitz continuous
Restricted secant inequality
Randomized coordinate descent
![Page 8: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/8.jpg)
Invex functions (a generalization of convex function)
Assumptions
Objective function
Lipschitz continuous
Polyak [1963]
This inequality simply requires that the gradient grows faster than a linear function as we move away from the optimal function value.
Invex function (one global minimum)
![Page 9: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/9.jpg)
Invex functions (a generalization of convex function)
Assumptions
Objective function
Lipschitz continuous
Polyak [1963] - for invex functions where this holds
Randomized coordinate descent
Invex function (one global minimum)
![Page 10: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/10.jpg)
Invex functions (a generalization of convex function)
Assumptions
Objective function
Lipschitz continuous
Polyak [1963] - for invex functions where this holds
Randomized coordinate descent
PolyakConvex
Invex
Venn diagram
![Page 11: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/11.jpg)
Non-convex functions
![Page 12: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/12.jpg)
Non-convex functionslocal maxima
Global minimum Local minima
![Page 13: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/13.jpg)
Non-convex functions
Global minimum Local minima
Strategy 1: local optimization of the non-convex function
![Page 14: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/14.jpg)
Non-convex functionslocal maxima
Global minimum Local minima
Assumptions for local non-convex optimization
Lipschitz continuous Locally convex
![Page 15: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/15.jpg)
Non-convex functions
Global minimum
Local minima
Strategy 1: local optimization of the non-convex function
All convex functions rates apply.
Local randomized coordinate descentlocal maxima
![Page 16: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/16.jpg)
Non-convex functions
Global minimum Local minima
Strategy 1: local optimization of the non-convex function
Issue: dealing with saddle points
local maxima
![Page 17: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/17.jpg)
Non-convex functions
Global minimum
Local minima
Strategy 2: Global optimization of the non-convex function
Issue: Exponential number of saddle points
![Page 18: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/18.jpg)
Local non-convex optimization● Gradient Descent
○ Difficult to define a proper step size
● Newton method○ Newton method solves the slowness problem by rescaling the gradients in
each direction with the inverse of the corresponding eigenvalues of the hessian
○ can result in moving in the wrong direction (negative eigenvalues)
● Saddle-Free Newton’s method○ rescales gradients by the absolute value of the inverse Hessian and the
Hessian’s Lanczos vectors.
![Page 19: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/19.jpg)
Local non-convex optimization● Random stochastic gradient descent
○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine
● Cubic regularization [Nesterov 2006]
Gradient Lipschitz continuous
Hessian Lipschitz continuous
![Page 20: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/20.jpg)
Local non-convex optimization● Random stochastic gradient descent
○ Sample noise r uniformly from unit sphere○ Escapes saddle points but step size is difficult to determine
● Momentum○ can help escape saddle points (rolling ball)
![Page 21: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/21.jpg)
● Matrix completion problem [De Sa et al. 2015]
Global non-convex optimization
● Applications (usually for datasets with missing data)○ matrix completion ○ Image reconstruction○ recommendation systems.
![Page 22: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/22.jpg)
● Matrix completion problem [De Sa et al. 2015]
● Reformulate it as (unconstrained non-convex problem)
● Gradient descent
Global non-convex optimization
![Page 23: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/23.jpg)
● Reformulate it as (unconstrained non-convex problem)
● Gradient descent
● Using the Riemannian manifold, we can derive the following
● This widely-used algorithm converges globally, using only random initialization
Global non-convex optimization
![Page 24: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/24.jpg)
Convex relaxation of non-convex functions optimization
● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network
original neural networks non-convex problem
![Page 25: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/25.jpg)
Convex relaxation of non-convex functions optimization
● Convex Neural Networks [Bengio et al. 2006]○ Single-hidden layer network
○ Use alternating minimization
○ Potential issues with the activation function
![Page 26: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/26.jpg)
Bayesian optimization (global non-convex optimization)
● Typically used for finding optimal parameters ○ Determining the step size of # hidden layers for neural networks○ The parameter values are bounded ?
● Other methods include sampling the parameter values random uniformly○ Grid-search
![Page 27: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/27.jpg)
Bayesian optimization (global non-convex optimization)
● Fit Gaussian process on the observed data (purple shade)○ Probability distribution on the function values
![Page 28: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/28.jpg)
Bayesian optimization (global non-convex optimization)
● Fit Gaussian process on the observed data (purple shade)○ Probability distribution on the function values
● Acquisition function (green shade)○ a function of
■ the objective value (exploitation) in the Gaussian density function; and■ the uncertainty in the prediction value (exploration).
![Page 29: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/29.jpg)
Bayesian optimization
● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)
![Page 30: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/30.jpg)
Bayesian optimization
● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)
Grid-search Bayesian optimization
![Page 31: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/31.jpg)
Bayesian optimization
● Slower than grid-search with low level of smoothness (illustrate)● Faster than grid-search with high level of smoothness (illustrate)
Grid-search Bayesian optimization
A measure of smoothness
![Page 32: Non-convex optimization - UBC Computer Science non-convex optimization Gradient Descent Difficult to define a proper step size Newton method Newton method solves the slowness problem](https://reader031.vdocuments.site/reader031/viewer/2022022010/5b05dc2a7f8b9a5c308c1eef/html5/thumbnails/32.jpg)
Summary
Non-convex optimization
● Strategy 1: Local non-convex optimization○ Convexity convergence rates apply
○ Escape saddle points using, for example, cubic regularization and saddle-free newton
update
● Strategy 2: Relaxing the non-convex problem to a convex problem○ Convex neural networks
● Strategy 3: Global non-convex optimization○ Bayesian optimization○ Matrix completion