linear regression - data science africa · regression examples i predict a real value, y i given...

138
Linear Regression Neil D. Lawrence GPRS 6th August 2013

Upload: others

Post on 28-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Linear Regression

Neil D. Lawrence

GPRS6th August 2013

Page 2: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Outline

Introduction

Useful Texts

Regression

Page 3: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Outline

Introduction

Useful Texts

Regression

Page 4: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Rogers and Girolami

Page 5: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Bishop

Page 6: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Outline

Introduction

Useful Texts

Regression

Page 7: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Regression Examples

I Predict a real value, yi given some inputs xi.I Predict quality of meat given spectral measurements

(Tecator data).I Radiocarbon dating, the C14 calibration curve: predict age

given quantity of C14 isotope.I Predict quality of different Go or Backgammon moves

given expert rated training data.

Page 8: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Olympic 100m Data

I Gold medal times forOlympic 100 m runnerssince 1896.

Image from WikimediaCommons

http://bit.ly/191adDC

Page 9: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Olympic 100m Data

9.5

10

10.5

11

11.5

12

1900 1920 1940 1960 1980 2000

y,pa

cem

in/k

m

x, year

Olympic 100 m Data.

Page 10: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Olympic Marathon Data

I Gold medal times forOlympic Marathon since1896.

I Marathons before 1924didn’t have astandardised distance.

I Present results usingpace per km.

I In 1904 Marathon wasbadly organised leadingto very slow times.

Image from WikimediaCommons

http://bit.ly/16kMKHQ

Page 11: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Olympic Marathon Data

3

3.5

4

4.5

5

1900 1920 1940 1960 1980 2000 2020

y,pa

cem

in/k

m

x, year

Olympic Marathon Data.

Page 12: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

What is Machine Learning?

data

+ model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 13: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

What is Machine Learning?

data +

model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 14: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

What is Machine Learning?

data + model

= prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 15: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

What is Machine Learning?

data + model =

prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 16: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

What is Machine Learning?

data + model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 17: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Regression: Linear Releationship

y = mx + c

I y: winning time/pace.

I x: year of Olympics.I m: rate of improvement over time.I c: winning time at year 0.

Page 18: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Regression: Linear Releationship

y = mx + c

I y: winning time/pace.I x: year of Olympics.

I m: rate of improvement over time.I c: winning time at year 0.

Page 19: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Regression: Linear Releationship

y = mx + c

I y: winning time/pace.I x: year of Olympics.I m: rate of improvement over time.

I c: winning time at year 0.

Page 20: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Regression: Linear Releationship

y = mx + c

I y: winning time/pace.I x: year of Olympics.I m: rate of improvement over time.I c: winning time at year 0.

Page 21: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Simultaneous Equations

A system of two simultaneousequations with two unknowns.

y1 =mx1 + cy2 =mx2 + c

3

4

5

1900 1940 1980 2020ti

me

inm

in/k

m,y

year, x

Page 22: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Simultaneous Equations

A system of two simultaneousequations with two unknowns.

y1 − y2 =m(x1 − x2)

3

4

5

1900 1940 1980 2020ti

me

inm

in/k

m,y

year, x

Page 23: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Simultaneous Equations

A system of two simultaneousequations with two unknowns.

y1 − y2

x1 − x2=m

3

4

5

1900 1940 1980 2020ti

me

inm

in/k

m,y

year, x

Page 24: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Simultaneous Equations

A system of two simultaneousequations with two unknowns.

m =y2 − y1

x2 − x1

c = y1 −mx1 3

4

5

1900 1940 1980 2020ti

me

inm

in/k

m,y

year, x

y 1−

y 2

x2 − x1

Page 25: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Simultaneous Equations

How do we deal with threesimultaneous equations withonly two unknowns?

y1 =mx1 + cy2 =mx2 + cy3 =mx3 + c

3

4

5

1900 1940 1980 2020ti

me

inm

in/k

m,y

year, x

y 1−

y 2

x2 − x1

Page 26: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Overdetermined System

I With two unknowns and two observations:

y1 =mx1 + cy2 =mx2 + c

I Additional observation leads to overdetermined system.

y3 = mx3 + c

I This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Page 27: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Overdetermined System

I With two unknowns and two observations:

y1 =mx1 + cy2 =mx2 + c

I Additional observation leads to overdetermined system.

y3 = mx3 + c

I This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Page 28: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Overdetermined System

I With two unknowns and two observations:

y1 =mx1 + cy2 =mx2 + c

I Additional observation leads to overdetermined system.

y3 = mx3 + c

I This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Page 29: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Noise Models

I We aren’t modeling entire system.I Noise model gives mismatch between model and data.I Gaussian model justified by appeal to central limit

theorem.I Other models also possible (Student-t for heavy tails).I Maximum likelihood with Gaussian noise leads to least

squares.

Page 30: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

y = mx + c

Page 31: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 32: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 33: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 34: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 35: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 36: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 37: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 38: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

y = mx + c

point 1: x = 1, y = 3

3 = m + c

point 2: x = 3, y = 1

1 = 3m + c

point 3: x = 2, y = 2.5

2.5 = 2m + c

Page 39: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

y = mx + c + ε

point 1: x = 1, y = 3

3 = m + c + ε1

point 2: x = 3, y = 1

1 = 3m + c + ε2

point 3: x = 2, y = 2.5

2.5 = 2m + c + ε3

Page 40: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

The Gaussian Density

I Perhaps the most common probability density.

p(y|µ, σ2) =1

2πσ2exp

(−

(y − µ)2

2σ2

)4= N

(y|µ, σ2

)I The Gaussian density.

Page 41: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Gaussian Density

0

1

2

3

0 1 2

p(h|µ,σ

2 )

h, height/m

The Gaussian PDF with µ = 1.7 and variance σ2 = 0.0225. Meanshown as red line. It could represent the heights of apopulation of students.

Page 42: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Gaussian Density

N

(y|µ, σ2

)=

1√

2πσ2exp

(−

(y − µ)2

2σ2

)σ2 is the variance of the density and µ isthe mean.

Page 43: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Sum of Gaussians

I Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi, σ

2i

)

And the sum is distributed as

n∑i=1

yi ∼ N

n∑i=1

µi,n∑

i=1

σ2i

(Aside: As sum increases, sum of non-Gaussian, finitevariance variables is also Gaussian [central limit theorem].)

Page 44: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Sum of Gaussians

I Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi, σ

2i

)And the sum is distributed as

n∑i=1

yi ∼ N

n∑i=1

µi,n∑

i=1

σ2i

(Aside: As sum increases, sum of non-Gaussian, finitevariance variables is also Gaussian [central limit theorem].)

Page 45: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Sum of Gaussians

I Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi, σ

2i

)And the sum is distributed as

n∑i=1

yi ∼ N

n∑i=1

µi,n∑

i=1

σ2i

(Aside: As sum increases, sum of non-Gaussian, finitevariance variables is also Gaussian [central limit theorem].)

Page 46: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Sum of Gaussians

I Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi, σ

2i

)And the sum is distributed as

n∑i=1

yi ∼ N

n∑i=1

µi,n∑

i=1

σ2i

(Aside: As sum increases, sum of non-Gaussian, finitevariance variables is also Gaussian [central limit theorem].)

Page 47: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Scaling a Gaussian

I Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)And the scaled density is distributed as

wy ∼ N(wµ,w2σ2

)

Page 48: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Scaling a Gaussian

I Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)

And the scaled density is distributed as

wy ∼ N(wµ,w2σ2

)

Page 49: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Two Important Gaussian Properties

Scaling a Gaussian

I Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)And the scaled density is distributed as

wy ∼ N(wµ,w2σ2

)

Page 50: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

A Probabilistic Process

I Set the mean of Gaussian to be a function.

p(yi|xi

)=

1√

2πσ2exp

− (yi − f (xi)

)2

2σ2

.I This gives us a ‘noisy function’.I This is known as a process.

Page 51: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

y as a Function of x

I In the standard Gaussian, parametized by mean andvariance.

I Make the mean a linear function of an input.I This leads to a regression model.

yi = f (xi) + εi,

εi ∼N(0, σ2

).

Page 52: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Linear Function

1

2

50 60 70 80 90 100

y

x

data pointsbest fit line

A linear regression between x and y.

Page 53: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Data Point Likelihood

I Likelihood of an individual data point

p(yi|xi,m, c

)=

1√

2πσ2exp

− (yi −mxi − c

)2

2σ2

.I Parameters are gradient, m, offset, c of the function and

noise variance σ2.

Page 54: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Data Set Likelihood

I If the noise, εi is sampled independently for each datapoint.

I Each data point is independent (given m and c).I For independent variables:

p(y) =

n∏i=1

p(yi)

Page 55: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Data Set Likelihood

I If the noise, εi is sampled independently for each datapoint.

I Each data point is independent (given m and c).I For independent variables:

p(y|x,m, c) =

n∏i=1

p(yi|xi,m, c)

Page 56: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Data Set Likelihood

I If the noise, εi is sampled independently for each datapoint.

I Each data point is independent (given m and c).I For independent variables:

p(y|x,m, c) =

n∏i=1

1√

2πσ2exp

− (yi −mxi − c

)2

2σ2

.

Page 57: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Data Set Likelihood

I If the noise, εi is sampled independently for each datapoint.

I Each data point is independent (given m and c).I For independent variables:

p(y|x,m, c) =1

(2πσ2)n2

exp

−∑ni=1

(yi −mxi − c

)2

2σ2

.

Page 58: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Log Likelihood Function

I Normally work with the log likelihood:

L(m, c, σ2) = −n2

log 2π −n2

log σ2−

n∑i=1

(yi −mxi − c

)2

2σ2 .

Page 59: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Consistency of Maximum Likelihood

I If data was really generated according to probability wespecified.

I Correct parameters will be recovered in limit as n→∞.I This can be proven through sample based approximations

(law of large numbers) of “KL divergences”.I Mainstay of classical statistics.

Page 60: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Probabilistic Interpretation of the Error Function

I Probabilistic Interpretation for Error Function is NegativeLog Likelihood.

I Minimizing error function is equivalent to maximizing loglikelihood.

I Maximizing log likelihood is equivalent to maximizing thelikelihood because log is monotonic.

I Probabilistic interpretation: Minimizing error function isequivalent to maximum likelihood with respect toparameters.

Page 61: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Error Function

I Negative log likelihood is the error function leading to anerror function

E(m, c, σ2) =n2

log σ2 +1

2σ2

n∑i=1

(yi −mxi − c

)2 .

I Learning proceeds by minimizing this error function forthe data set provided.

Page 62: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Connection: Sum of Squares Error

I Ignoring terms which don’t depend on m and c gives

E(m, c) ∝n∑

i=1

(yi − f (xi))2

where f (xi) = mxi + c.I This is known as the sum of squares error function.I Commonly used and is closely associated with the

Gaussian likelihood.

Page 63: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Mathematical Interpretation

I What is the mathematical interpretation?I There is a cost function.I It expresses mismatch between your prediction and reality.

E(w) =

n∑i=1

(yi −mxi + c − yi

)2

I This is known as the sum of squares error.

Page 64: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.dE(m)

dm= −2

n∑i=1

xi(yi −mxi − c

)

Page 65: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

0 = −2n∑

i=1

xi(yi −mxi − c

)

Page 66: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

0 = −2n∑

i=1

xiyi + 2n∑

i=1

mx2i + 2

n∑i=1

cxi

Page 67: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

m =

∑ni=1

(yi − c

)xi∑n

i=1 x2i

Page 68: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.dE(c)

dc= −2

n∑i=1

(yi −mxi − c

)

Page 69: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

0 = −2n∑

i=1

(yi −mxi − c

)

Page 70: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

0 = −2n∑

i=1

yi + 2n∑

i=1

mxi + 2nc

Page 71: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Learning is Optimization

I Learning is minimization of the cost function.I At the minima the gradient is zero.I Coordinate ascent, find gradient in each coordinate and set

to zero.

c =

∑ni=1

(yi − cx

)n

Page 72: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Fixed Point Updates

Worked example.

c∗ =

∑ni=1

(yi −m∗xi

)n

,

m∗ =

∑ni=1 xi

(yi − c∗

)∑ni=1 x2

i

,

σ2 ∗ =

∑ni=1

(yi −m∗xi − c∗

)2

n

Page 73: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

E(m, c)

Page 74: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 1

Page 75: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 1

Page 76: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 2

Page 77: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 2

Page 78: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 3

Page 79: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

-1

-0.5

0

0.5

1

1.5

2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

c

m

Iteration 3

Page 80: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 4

Page 81: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 4

Page 82: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 5

Page 83: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 5

Page 84: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 6

Page 85: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 6

Page 86: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 7

Page 87: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 7

Page 88: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 8

Page 89: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 8

Page 90: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 9

Page 91: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 9

Page 92: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 93: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 94: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 95: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 96: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 97: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 98: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 99: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 100: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 101: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 102: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 103: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 104: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 105: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 106: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 107: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 108: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 109: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 110: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 111: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 10

Page 112: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 113: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 114: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 115: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 116: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 117: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 118: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 119: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 120: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 121: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 122: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 123: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 124: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 125: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 126: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 127: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 128: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 129: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 130: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 131: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 20

Page 132: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 30

Page 133: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 30

Page 134: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Coordinate Descent

0.9

1

1.1

1.2

1.3

1.4

1.5

0.2 0.25 0.3 0.35 0.4

c

m

Iteration 30

Page 135: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Important Concepts Not Covered

I Optimization methods.I Second order methods, conjugate gradient, quasi-Newton

and Newton.I Effective heuristics such as momentum.

I Local vs global solutions.

Page 136: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Linear Function

3

3.5

4

4.5

5

1900 1920 1940 1960 1980 2000 2020

y,pa

cem

in/k

m

x, year

data pointsbest fit line

Linear regression for Male Olympics Marathon Gold Medaltimes.

Page 137: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

Reading

I Section 1.2.5 of Bishop up to equation 1.65.I Section 1.1-1.2 of Rogers and Girolami for fitting linear

models.

Page 138: Linear Regression - Data Science Africa · Regression Examples I Predict a real value, y i given some inputs x i. I Predict quality of meat given spectral measurements (Tecator data)

References I

C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, 2006.[Google Books] .

P. S. Laplace. Memoire sur la probabilite des causes par les evenemens. InMemoires de mathematique et de physique, presentes a lAcademie Royale desSciences, par divers savans, & lu dans ses assemblees 6, pages 621–656, 1774.Translated in Stigler (1986).

S. Rogers and M. Girolami. A First Course in Machine Learning. CRC Press,2011. [Google Books] .

S. M. Stigler. Laplace’s 1774 memoir on inverse probability. Statistical Science,1:359–378, 1986.