overview of supervised learning. 2015-10-23overview of supervised learning2 outline linear...

41
Overview of Supervised Overview of Supervised Learning Learning

Upload: john-oneal

Post on 29-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

Overview of Supervised Overview of Supervised LearningLearning

Page 2: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 2

Outline

• Linear Regression and Nearest Neighbors method

• Statistical Decision Theory• Local Methods in High Dimensions• Statistical Models, Supervised Learning and

Function Approximation• Structured Regression Models• Classes of Restricted Estimators• Model Selection and Bias

Page 3: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 3

Notation• X: inputs, feature vector, predictors, independent variables.

Generally X will be a vector of p values. Qualitative features are coded in X. – Sample values of X generally in lower case; xi is i-th of N sample

values.

• Y: output, response, dependent variable. – Typically a scalar, can be a vector, of real values. Again yi is a

realized value.

• G: a qualitative response, taking values in a discrete set G; e.g. G={ survived, died }. We often code G via a binary indicator response vector Y.

Page 4: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 4

Problem• 200 points generated in IR2

from a unknown distribution; 100 in each of

two classes G={ GREEN, RED }.

• Can we build a rule to predict the color of the future points?

Page 5: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 5

• Code Y=1 if G=RED, else Y=0.• We model Y as a linear function of X:

• Obtain by least squares, by minimizing the quadratic criterion:

• Given an model matrix X and a response vector y,

Linear regression

p

j

Tjj XXY

10

N

i

Tii xyRSS

1

2)()(

yXXX TT 1)(

pN

Page 6: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 6

Linear regression

Page 7: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 7

Linear regression• Figure 2.1: A Classification

example in two dimensions. The classes are coded as a binary variable (GREEN=0, RED=1) and then fit by linear regression. The line is the decision boundary defined by . The red shaded region denotes that part of input space classified as RED ,while the green region is classified as GREEN.

5.0

TX

Page 8: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 8

Possible scenarios

Page 9: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 9

K-Nearest Neighbors

Page 10: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 10

K-Nearest Neighbors• Figure 2.2: The same

classification example in two dimensions as in Figure 2.1. The classes are coded as a binary variable (GREEN=0, RED=1) and the fit by 15-nearest-neighbor.

• The predicted class is hence chosen by majority vote amongst the 15-nearest neighbors.

Page 11: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 11

K-Nearest Neighbors• Figure 2.3: The same

classification example are coded as a binary variable ( GREEN=0, RED=1), and then predicted by

1-nearest-neighbor classification.

Page 12: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 12

Linear regression vs. k-NN

Page 13: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 13

Linear regression vs. k-NN• Figure 2.4: Misclassification curves

for the simulation example above. a test sample of size 10,000 was used. The red curves are test and the green are training error for k-NN classification. The results for linear regression are the bigger green and red dots at three degrees of freedom. The purple line is the optimal Bayes Error Rate.

Page 14: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 14

Other Methods

Page 15: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 15

Statistical decision theory

Page 16: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 16

回归函数

)|Y()(

)|]([minarg)(

EPE

)|)](([

)()|

)

)]([)(

2|

2|

2

2

xXExf

xXcYExf

XXfYEE

dxprdxpr(dy)(y-f(x)

pr(dx,dy)(y-f(x)

XfYEfEPE

XYc

XYX

极小解为:

逐点极小化得对

Page 17: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 17

Page 18: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 18

Page 19: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 19

Bayes Classifier

Page 20: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 20

Bayes Classifier• Figure 2.5: The optimal

Bayes decision boundary for the simulation example above.

• Since the generating density is known for each class, this boundary can be calculated exactly.

Page 21: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 21

Curse of dimensionality

Page 22: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 22

20 0 0

20 0 0 0

2 20 0 0 0

0 0 0 0

20 0

( ) [ ( ) ]

[ ( ) [ ] [ ] ]

[ ( )] [ ( ) ( )]

2 {[ ( )][ ( ) ( )]}

( ) ( )

T

T

T T T

T

T

EPE x E f x y

E f x E y E y y

E y E y E E y f x

E y E y E y f x

Var y Bias y

Page 23: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 23

Page 24: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 24

Page 25: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

Linear Model

• Linear Model

• Linear Regression

• Test error

23/4/22 Overview of Supervised Learning 25

TY X

1( )T T

β X X X y

0 0 01

10 0

ˆ ( )

( ) the i-th component of ( )

NT T

i ii

Ti

y x x l x

l x X X X x

Page 26: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 26

Curse of dimensionality

Page 27: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision
Page 28: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 28

Page 29: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 29

Statistical Models

Page 30: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 30

Supervised Learning

Page 31: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 31

Two Types of Supervised Learning

Page 32: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 32

Learning Classification Models

Page 33: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 33

Learning Regression Models

Page 34: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 34

Function Approximation

Page 35: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 35

Function Approximation• Figure 2.10: Least

squares fitting of a function of two inputs. The parameters of fθ(x) are chosen so as to minimize the sum-of-squared vertical errors.

Page 36: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 36

Function Approximation

• More generally, Maximum Likelihood Estimation provides a natural basis for estimation.

• E.g. multinomial

N

iig

k

x

XpXkG

i1

,

,

)(Prlog)(

)(Pr

Page 37: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 37

Structured Regression Models

Page 38: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 38

Classes of Restricted Estimators

Page 39: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 39

Model Selection & the Bias-Variance Tradeoff

Page 40: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 40

Model Selection & the Bias-Variance Tradeoff

• Test and training error as a function of model complexity.

Page 41: Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision

23/4/22 Overview of Supervised Learning 41

• Page 27• Ex 2.1; 2.2 ; 2.6