ridge regression and bayesian linear regression kenneth d. harris 6/5/15
TRANSCRIPT
![Page 1: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/1.jpg)
Ridge regression and Bayesian linear
regressionKenneth D. Harris
6/5/15
![Page 2: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/2.jpg)
Multiple linear regression
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Enough
What sort of prediction do you need? Single best guess
What sort of relationship can you assume? Linear
![Page 3: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/3.jpg)
Multiple linear regression
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Not enough
What sort of prediction do you need? Single best guess
What sort of relationship can you assume? Linear
![Page 4: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/4.jpg)
Multiple predictors, one predicted variable
• Choose to minimize sum-squared error:
Optimal weight vector (in MATLAB)
![Page 5: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/5.jpg)
Too many predictors
• If , you can fit the training data perfectly• is equations in unknowns
• If , the solution is underconstrained ( is not invertible)
• But even if , you can problems with too many predictors
![Page 6: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/6.jpg)
𝑁=40 ,𝑝=30 , 𝑦=𝑥1
![Page 7: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/7.jpg)
𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒
![Page 8: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/8.jpg)
𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒
![Page 9: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/9.jpg)
Geometric interpretation
Target
𝐱𝟏
Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .
It would be better to just fit .
SignalN
oise
𝐱𝟐
![Page 10: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/10.jpg)
Geometric interpretation
Target
𝐱𝟏
Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .
It would be better to just fit .
SignalN
oise
𝐱𝟐
![Page 11: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/11.jpg)
Overfitting = large weight vectors
• Solution: weight vector penalty
Optimal weight vector
The inverse can always be taken, even for .
![Page 12: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/12.jpg)
Example
𝜆=0 𝜆=3
![Page 13: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/13.jpg)
Ridge regression introduces a bias𝜆=0 𝜆=50
![Page 14: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/14.jpg)
A quick trick to do ridge regression
• Ordinary linear regression:
Minimizes . Define
Then is the solution to ridge regression. (Why?)
![Page 15: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/15.jpg)
Regression as a probability model
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Enough
What sort of prediction do you need? Probability distribution
What sort of relationship can you assume? Linear
![Page 16: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/16.jpg)
Regression as a probability model
• Assume is random, but and are just numbers.
Then the likelihood is
Maximum likelihood is the same as least-squares fit.
![Page 17: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/17.jpg)
Bayesian linear regression
• Now consider to also be random with prior distribution:
The posterior distribution is
![Page 18: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/18.jpg)
Bayesian linear regression
This is all quadratic in . So is Gaussian distributed.
![Page 19: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/19.jpg)
Bayesian linear regression
Mean of is exactly the same as in ridge regression. But we also get a covariance matrix for .
![Page 20: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649d315503460f94a09a00/html5/thumbnails/20.jpg)
Bayesian predictions• Given a training set , and a new value Assume is random but are fixed.
• To make a prediction of , integrate over all possible :
Mean is the same as in ridge regression, but we also get a variance:.
The variance does not depend on the training set . It is low when many of the training set values are collinear with .