ece 5984: introduction to machine learnings15ece5984/slides/l9_regression_mo… · robust linear...

36
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: (Finish) Regression Model selection, Cross-validation Error decomposition Bias-Variance Tradeoff Readings: Barber 17.1, 17.2

Upload: others

Post on 17-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

ECE 5984: Introduction to Machine Learning

Dhruv Batra Virginia Tech

Topics: –  (Finish) Regression –  Model selection, Cross-validation –  Error decomposition –  Bias-Variance Tradeoff

Readings: Barber 17.1, 17.2

Page 2: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Administrativia •  HW1

–  Solutions available

•  Project Proposal –  Due: Tue 02/24, 11:55 pm –  <=2pages, NIPS format –  Show Igor’s proposal

•  HW2 –  Due: Friday 03/06, 11:55pm –  Implement linear regression, Naïve Bayes, Logistic

Regression

(C) Dhruv Batra 2

Page 3: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Recap of last time

(C) Dhruv Batra 3

Page 4: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Regression

(C) Dhruv Batra 4

Page 5: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 5 Slide Credit: Greg Shakhnarovich

Page 6: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 6 Slide Credit: Greg Shakhnarovich

Page 7: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 7 Slide Credit: Greg Shakhnarovich

Page 8: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 8 Slide Credit: Greg Shakhnarovich

Page 9: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

•  Why sum squared error??? •  Gaussians, Watson, Gaussians…

But, why?

9 (C) Dhruv Batra

Page 10: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 10 Slide Credit: Greg Shakhnarovich

Page 11: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Is OLS Robust? •  Demo

–  http://www.calpoly.edu/~srein/StatDemo/All.html

•  Bad things happen when the data does not come from your model!

•  How do we fix this?

(C) Dhruv Batra 11

Page 12: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Robust Linear Regression •  y ~ Lap(w’x, b)

•  On paper

(C) Dhruv Batra 12

0 0.2 0.4 0.6 0.8 1−6

−5

−4

−3

−2

−1

0

1

2

3

4Linear data with noise and outliers

least squareslaplace

−3 −2 −1 0 1 2 3−0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

L2

L1

huber

Page 13: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Plan for Today •  (Finish) Regression

–  Bayesian Regression –  Different prior vs likelihood combination –  Polynomial Regression

•  Error Decomposition –  Bias-Variance –  Cross-validation

(C) Dhruv Batra 13

Page 14: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Robustify via Prior •  Ridge Regression

•  y ~ N(w’x, σ2) •  w ~ N(0, t2I)

•  P(w | x,y) =

(C) Dhruv Batra 14

Page 15: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Summary Likelihood Prior Name Gaussian Uniform Least Squares Gaussian Gaussian Ridge Regression Gaussian Laplace Lasso Laplace Uniform Robust Regression Student Uniform Robust Regression

(C) Dhruv Batra 15

Page 16: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 16 Slide Credit: Greg Shakhnarovich

Page 17: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 17 Slide Credit: Greg Shakhnarovich

Page 18: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 18 Slide Credit: Greg Shakhnarovich

Page 19: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 19 Slide Credit: Greg Shakhnarovich

Page 20: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Example •  Demo

–  http://www.princeton.edu/~rkatzwer/PolynomialRegression/

(C) Dhruv Batra 20

Page 21: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

What you need to know •  Linear Regression

–  Model –  Least Squares Objective –  Connections to Max Likelihood with Gaussian Conditional –  Robust regression with Laplacian Likelihood –  Ridge Regression with priors –  Polynomial and General Additive Regression

(C) Dhruv Batra 21

Page 22: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

New Topic: Model Selection and Error Decomposition

(C) Dhruv Batra 22

Page 23: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Example for Regression •  Demo

–  http://www.princeton.edu/~rkatzwer/PolynomialRegression/

•  How do we pick the hypothesis class?

(C) Dhruv Batra 23

Page 24: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Model Selection •  How do we pick the right model class?

•  Similar questions –  How do I pick magic hyper-parameters? –  How do I do feature selection?

(C) Dhruv Batra 24

Page 25: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Errors •  Expected Loss/Error

•  Training Loss/Error •  Validation Loss/Error •  Test Loss/Error

•  Reporting Training Error (instead of Test) is CHEATING

•  Optimizing parameters on Test Error is CHEATING

(C) Dhruv Batra 25

Page 26: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 26 Slide Credit: Greg Shakhnarovich

Page 27: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 27 Slide Credit: Greg Shakhnarovich

Page 28: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 28 Slide Credit: Greg Shakhnarovich

Page 29: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 29 Slide Credit: Greg Shakhnarovich

Page 30: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

(C) Dhruv Batra 30 Slide Credit: Greg Shakhnarovich

Page 31: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Typical Behavior

(C) Dhruv Batra 31

•  a

Page 32: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Overfitting •  Overfitting: a learning algorithm overfits the training

data if it outputs a solution w when there exists another solution w’ such that:

32 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 33: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Error Decomposition

(C) Dhruv Batra 33

Reality

model class

Page 34: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Error Decomposition

(C) Dhruv Batra 34

Reality

Page 35: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Error Decomposition

(C) Dhruv Batra 35

Reality model class

Higher-Order Potentials

Page 36: ECE 5984: Introduction to Machine Learnings15ece5984/slides/L9_regression_mo… · Robust Linear Regression • y ~ Lap(w’x, b) • On paper (C) Dhruv Batra 12 0 0.2 0.4 0.6 0.8

Error Decomposition •  Approximation/Modeling Error

–  You approximated reality with model

•  Estimation Error –  You tried to learn model with finite data

•  Optimization Error –  You were lazy and couldn’t/didn’t optimize to completion

•  (Next time) Bayes Error –  Reality just sucks

(C) Dhruv Batra 36