ridge regression and bayesian linear regression kenneth d. harris 6/5/15

Ridge regression and Bayesian linear

regressionKenneth D. Harris

6/5/15

Multiple linear regression

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?


Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Single best guess

What sort of relationship can you assume? Linear

Multiple linear regression



Dimensionality 1



Dimensionality p

How many data points do you have? Not enough

What sort of prediction do you need? Single best guess


Multiple predictors, one predicted variable

• Choose to minimize sum-squared error:

Optimal weight vector (in MATLAB)

Too many predictors

• If , you can fit the training data perfectly• is equations in unknowns

• If , the solution is underconstrained ( is not invertible)

• But even if , you can problems with too many predictors

𝑁=40 ,𝑝=30 , 𝑦=𝑥1

𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒

Geometric interpretation

Target

𝐱𝟏

Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .

It would be better to just fit .

SignalN

oise

𝐱𝟐

Overfitting = large weight vectors

• Solution: weight vector penalty

Optimal weight vector

The inverse can always be taken, even for .

Example

𝜆=0 𝜆=3

Ridge regression introduces a bias𝜆=0 𝜆=50

A quick trick to do ridge regression

• Ordinary linear regression:

Minimizes . Define

Then is the solution to ridge regression. (Why?)

Regression as a probability model



Dimensionality 1



Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Probability distribution


Regression as a probability model

• Assume is random, but and are just numbers.

Then the likelihood is

Maximum likelihood is the same as least-squares fit.

Bayesian linear regression

• Now consider to also be random with prior distribution:

The posterior distribution is


This is all quadratic in . So is Gaussian distributed.


Mean of is exactly the same as in ridge regression. But we also get a covariance matrix for .

Bayesian predictions• Given a training set , and a new value Assume is random but are fixed.

• To make a prediction of , integrate over all possible :

Mean is the same as in ridge regression, but we also get a variance:.

The variance does not depend on the training set . It is low when many of the training set values are collinear with .

ridge regression and bayesian linear regression kenneth d. harris 6/5/15

Documents

linear slide

ridge regression slide

bias slide

probability model slide

predicted variable slide

multiple linear regression

data points

sort of relationship