practical bayesian optimization of machine learning...

54
Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek , Ryan Adams , Hugo LaRochelle NIPS 2012

Upload: vuhuong

Post on 05-Sep-2018

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Practical Bayesian Optimizationof Machine Learning Algorithms

Jasper Snoek, Ryan Adams, Hugo LaRochelle – NIPS 2012

Page 2: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

“ ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're inadequate for doing speech and vision. But when you're in a domain where you have no prior knowledge and the only thing that you can expect is that similar inputs should have similar outputs, then Gaussian Processes are ideal”.

Page 3: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

“ ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're inadequate for doing speech and vision. But when you're in a domain where you have no prior knowledge and the only thing that you can expect is that similar inputs should have similar outputs, then Gaussian Processes are ideal”.

“... Gaussian processes are a way of using Machine Learning to simulate the graduate student”

- Geoff Hinton

Page 4: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Motivation

N ….

Page 5: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

1

23

...... ...

...

Deep Neural Networks Require Skill to Set Hyperparameters

Page 6: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Common Strategies

Grid Search Random Search

Page 7: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Common Strategies

Grid Search Random Search

- Sometimes better because some parameters have no effect

Page 8: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Can we use Machine Learning instead ?

- To predict regions of the hyperparameterSpace that might give better results.

- to predict how well a new combination of hyperparameters will do and also model the uncertainty of that prediction

Page 9: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization

- Frame Hyperparameter Search as an Optimization Problem

Page 10: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization

- Frame Hyperparameter Search as an Optimization Problem

- Model the estimation of the function from high level parameters (hyperparameters) to the error metric as a regression problem

Page 11: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization

- Frame Hyperparameter Search as an Optimization Problem

- Model the estimation of the function from high level parameters (hyperparameters) to the error metric as a regression problem

- Use G.P Prior : “Similar inputs have similar outputs” to build a statistical model of the function. Prior is weak but general and effective.

Page 12: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization

- Frame Hyperparameter Search as an Optimization Problem

- Model the estimation of the function from high level parameters (hyperparameters) to the error metric as a regression problem

- Use G.P Prior : “Similar inputs have similar outputs” to build a statistical model of the function. Prior is weak but general and effective.

- Use statistics to tell us:• Location of expected minimum of the function• Expected Improvement of trying other parameters

Page 13: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization (Mockus '78)

- Method for the global optimization of multi-modal, computationally expensive black box functions

- Assumes that the unknown function was sampled from a Gaussian Process (prior) and uses the observations (likelihood) to maintain a posterior

- Observations are the measure of generalization performance under different settings of the hyperparameters we wish to optimize.

- The next set of hyperparameters are selected using the maintained posterior – using a strategy determined by the acquisition function

Page 14: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Gaussian Processes

Specifies a distribution over functions such that any finite subset of N points follows a Multivariate Gaussian Distribution.

Page 15: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Gaussian Processes

Specifies a distribution over functions such that any finite subset of N points follows a Multivariate Gaussian Distribution.

The properties of the resulting distribution on functions is specified by a mean and a positive definite covariance function

Page 16: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

The predictive mean and covariance given the observationsIs given by:

Page 17: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

• GP's are a prior for smooth functions

• Similar inputs (high covariance) should have similar outputs

Page 18: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Exploration: Seek Places with High VarianceExploitation: Seek Places in the locality of places you're already doing well at.

Page 19: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Exploration: Seek Places with High VarianceExploitation: Seek Places in the locality of places you're already doing well at.

The acquisition function balances these to determine point of next evaluation

Page 20: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Acquisition Functions

The Acquisition function tells us which experiment to run next and what it's goodness will be

1. GP Upper Confidence BoundIdea: Minimize regret over course of optimization. Balance exploration and exploitation

2. Expected ImprovementIdea: How much can I expect to improve over the best I've seen so far by running an experiment with these parameters?

Page 21: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 22: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 23: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 24: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 25: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 26: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 27: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 28: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Intuition

Page 29: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

An Eggsperiment

Parameters:

Boiling Time (1-12m)Cooling Time (1-12m)Salt (0-10 pinches)Pepper (0-10 pinches)

Optimal 'Soft Boiled Egg'

Page 30: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 5 Iterations....

Page 31: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 5 Iterations....

Page 32: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 10 Iterations....

Page 33: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 10 Iterations....

Page 34: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 12 Iterations....

Page 35: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 14 Iterations....

Page 36: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 16 Iterations....

Page 37: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 20 Iterations....

Page 38: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 25 Iterations....

Page 39: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

After 25 Iterations....

Page 40: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Practical Bayesian Optimization

• Integrate out all parameters in Bayesian Optimization• Choose appropriate covariance• Choice of acquisition function is important

Page 41: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Accounting for additional cost – Expected Improvement per Second

Incorporate a preference towards choosing points that are not only good, but likely to be evaluated quickly

Page 42: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Parallelizing Bayesian Optimization

'N' completed evaluations'J' pending evaluations

Page 43: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Parallelizing Bayesian Optimization

'N' completed evaluations'J' pending evaluations

Posterior samples after 3Observations

Expected improvementunder individual samples

Integrated expectedimprovement

Page 44: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Implications

Page 45: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Implications

Impossible to find by hand!!

CIFAR-10, 9 Hyperparameters

Page 46: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Benefits

For each input dimension, an appropriate scale for measuring similarity is learned. - are 200 and 300 as similar as 2.0 and 3.0?

Page 47: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Benefits

For each input dimension, an appropriate scale for measuring similarity is learned. - are 200 and 300 as similar as 2.0 and 3.0?

What is the sensitivity to each dimension? Which dimensions don't matter?

Page 48: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Benefits

For each input dimension, an appropriate scale for measuring similarity is learned. - are 200 and 300 as similar as 2.0 and 3.0?

What is the sensitivity to each dimension? Which dimensions don't matter?

Reproducible Research – level the playing field. Its a lot more honest than human beings

Page 49: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

Benefits

For each input dimension, an appropriate scale for measuring similarity is learned. - are 200 and 300 as similar as 2.0 and 3.0?

What is the sensitivity to each dimension? Which dimensions don't matter?

Reproducible Research – level the playing field. Its a lot more honest than human beings

If you have the resources to run a fairly large number of experiments, bayesian optimization is better than a person at finding good combinations of hyperparameters

Page 50: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms

References:

[Paper] Practical Bayesian Optimization of Machine Learning AlgorithmsJasper Snoek, Hugo Larochelle and Ryan P. AdamsAdvances in Neural Information Processing Systems, 2012

[Talk/Slides] Jasper Snoek: "Bayesian Optimization for Machine Learning and Science" https://www.youtube.com/watch?v=a79klpzaPgY

[Book] Machine Learning: a Probabilistic PerspectiveKevin Murphyhttp://www.cs.ubc.ca/~murphyk/MLbook/index.html

Page 51: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms
Page 52: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms
Page 53: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms
Page 54: Practical Bayesian Optimization of Machine Learning Algorithmsbecs.aalto.fi/en/research/bayes/courses/4613/Vik_Kamath... · Practical Bayesian Optimization of Machine Learning Algorithms