linear regression: model and algorithms...by analogy to 1d case: 1/9/2017 15 cse 446: machine...

1/9/2017

1

CSE 446: Machine Learning

Linear Regression:Model and AlgorithmsCSE 446: Machine LearningEmily FoxUniversity of WashingtonJanuary 9, 2017

©2017 Emily Fox


Linear regression:The model

©2017 Emily Fox

1/9/2017

2

CSE 446: Machine Learning3

How much is my house worth?

©2017 Emily Fox

I want to listmy house

for sale


How much is my house worth?

©2017 Emily Fox

$$ ????

1/9/2017

3


Data

©2017 Emily Fox

(x1 = sq.ft., y1 = $)

(x2 = sq.ft., y2 = $)

(x3 = sq.ft., y3 = $)

(x4 = sq.ft., y4 = $) Input vs. Output:• y is the quantity of interest

• assume y can be predicted from x

input output

…

(x5 = sq.ft., y5 = $)

CSE 446: Machine Learning6 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Model –How we assume the world works

Regression model:

1/9/2017

4



pri

ce

($

)

x

y

“Essentially, all models are wrong, but some are useful.”

George Box, 1987.

Model –How we assume the world works



pri

ce

($

)

x

y

Simple linear regression model

yi = w0+w1 xi + εi

f(x) = w0+w1 x

1/9/2017

5



pri

ce

($

)

x

y

Simple linear regression model

parameters:regression coefficients

yi = w0+w1 xi + εi



pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2 x2

What about a quadratic function?

1/9/2017

6


Even higher order polynomial

©2017 Emily Fox


pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2 x2 + … + wp xp


Polynomial regression

Model:yi = w0 + w1 xi+ w2 xi

2 + … + wp xip + εi

feature 1 = 1 (constant)feature 2 = xfeature 3 = x2

…feature p+1 = xp

©2017 Emily Fox

treat as different features

parameter 1 = w0

parameter 2 = w1

parameter 3 = w2

…parameter p+1 = wp

1/9/2017

7


Generic basis expansion

Model:yi = w0h0(xi) + w1 h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1feature 2 = h1(x)… e.g., xfeature 3 = h2(x)… e.g., x2 or sin(2πx/12)…feature p+1 = hp(x)… e.g., xp

©2017 Emily Fox

jth regression coefficientor weight

jth feature


Generic basis expansion

Model:yi = w0h0(xi) + w1 h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1 (constant)feature 2 = h1(x)… e.g., xfeature 3 = h2(x)… e.g., x2 or sin(2πx/12)…feature D+1 = hD(x)… e.g., xp

©2017 Emily Fox

1/9/2017

8


Predictions just based on house size

©2017 Emily Fox


pri

ce

($

)

x

y

Only 1 bathroom! Only 1 bathroom! Not same as my

3 bathrooms


Add more inputs

©2017 Emily Fox


pri

ce

($

)

x[1]

y

x[2]

f(x) = w0 + w1 sq.ft.+ w2 #bath

1/9/2017

9


Many possible inputs

- Square feet

- # bathrooms

- # bedrooms

- Lot size

- Year built

- …

©2017 Emily Fox


General notation

Output: yInputs: x = (x[1],x[2],…, x[d])

Notational conventions:x[j] = jth input (scalar)hj(x) = jth feature (scalar)xi = input of ith data point (vector)xi[j] = jth input of ith data point (scalar)

©2017 Emily Fox

d-dim vector

scalar

1/9/2017

10


Generic linear regression model

©2017 Emily Fox

Model:yi = w0 h0(xi) + w1 h1(xi) + … + wD hD(xi) + εi

= wj hj(xi) + εi

feature 1 = h0(x) … e.g., 1feature 2 = h1(x) … e.g., x[1] = sq. ft.feature 3 = h2(x) … e.g., x[2] = #bath

or, log(x[7]) x[2] = log(#bed) x #bath…feature D+1 = hD(x) … some other function of x[1],…, x[d]


Fitting the linear regression model

©2017 Emily Fox

1/9/2017

11


Step 1:Rewrite the regression model

©2017 Emily Fox


Rewrite in matrix notation

For observation i

©2017 Emily Fox

yi = wj hj(xi) + εi

= + εiyi = + εi

1/9/2017

12


Rewrite in matrix notation

For all observations together

©2017 Emily Fox

= +


Step 2:Compute the cost

©2017 Emily Fox

1/9/2017

13


“Cost” of using a given line

©2017 Emily Fox


pri

ce

($

)

x

y Residual sum of squares (RSS)

RSS(w0,w1) = (yi-[w0+w1xi])2


RSS for multiple regression

©2017 Emily Fox


pri

ce (

$)

y

RSS(w) = (yi- )2

= (y-Hw)T(y-Hw)

x[1]

x[2]

1/9/2017

14


Step 3:Take the gradient

©2017 Emily Fox


Gradient of RSS

©2017 Emily Fox

RSS(w) = [(y-Hw)T(y-Hw)]

= -2HT(y-Hw)

Δ Δ

Why? By analogy to 1D case:

1/9/2017

15


Step 4, Approach 1:Set the gradient = 0

©2017 Emily Fox


Closed-form solution

©2017 Emily Fox

RSS(w) = -2HT(y-Hw) = 0

Δ

Solve for w:

1/9/2017

16


Closed-form solution

©2017 Emily Fox

ŵ= ( HTH )-1 HTy

Invertible if:

Complexity of inverse:


Step 4, Approach 2:Gradient descent

©2017 Emily Fox

1/9/2017

17


Gradient descent

©2017 Emily Fox

while not convergedw(t+1) w(t) - η RSS(w(t))

Δ

-2HT(y-Hw)


Interpreting elementwise

©2017 Emily Fox

wj(t+1) wj

(t) + 2η hj(xi)(yi-ŷi(w(t)))

Update to jth feature weight:


pri

ce (

$)

y

x[1]

x[2]

1/9/2017

18


Summary of gradient descentfor multiple regression

©2017 Emily Fox

init w(1)=0 (or randomly, or smartly), t=1

while || RSS(w(t))|| > εfor j=0,…,D

partial[j] =-2 hj(xi)(yi-ŷi(w(t)))

wj(t+1) wj

(t) – η partial[j]

t t + 1

Δ


Why min RSS?

©2017 Emily Fox

1/9/2017

19


Assuming Gaussian noise

©2017 Emily Fox


pri

ce

($

)

x

yModel for εi:

Implied distributionon yi:


Maximum likelihood estimate of params

Maximize log-likelihood wrt w

©2017 Emily Fox

1/9/2017

20


Interpreting the fitted function

©2015 Emily Fox & Carlos Guestrin

CSE 446: Machine Learning40 ©2015 Emily Fox & Carlos Guestrin


pri

ce

($

)

x

y

Interpreting the coefficients –Simple linear regression

ŷ= ŵ0 + ŵ1 x

1 sq. ft.

predictedchange in $

1/9/2017

21


fix


Interpreting the coefficients –Two linear features


pri

ce (

$)

x[1]

y

x[2]

ŷ= ŵ0 + ŵ1 x[1] + ŵ2 x[2]


fix


Interpreting the coefficients –Two linear features

ŷ= ŵ0 + ŵ1 x[1] + ŵ2 x[2]

# bathrooms

pri

ce

($

)

x[2]

1 bathroom

predictedchange in $

y

For fixed # sq.ft.!

1/9/2017

22


fix fix fixfix


Interpreting the coefficients –Multiple linear featuresŷ= ŵ0 + ŵ1 x[1] + …+ŵj x[j] + … + ŵd x[d]


pri

ce (

$)

x[1]

y

x[2]

CSE 446: Machine Learning44 ©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients-Polynomial regression

ŷ= ŵ0 + ŵ1x +… + ŵj xj + … + ŵp xp


pri

ce

($

)

x

y

Can’t hold other features

fixed!

1/9/2017

23


Recap of concepts

©2017 Emily Fox


What you can do now…• Describe polynomial regression• Write a regression model using multiple inputs or

features thereof• Cast both polynomial regression and regression

with multiple inputs as regression with multiple features

• Calculate a goodness-of-fit metric (e.g., RSS)• Estimate model parameters of a general multiple

regression model to minimize RSS:- In closed form- Using an iterative gradient descent algorithm

• Interpret the coefficients of a non-featurizedmultiple regression fit

• Exploit the estimated model to form predictions

©2017 Emily Fox

linear regression: model and algorithms...by analogy to 1d case: 1/9/2017 15 cse 446: machine...

Documents