overview of supervised learning. 2015-10-23overview of supervised learning2 outline linear...

Overview of Supervised Overview of Supervised LearningLearning

23/4/22 Overview of Supervised Learning 2

Outline

• Linear Regression and Nearest Neighbors method

• Statistical Decision Theory• Local Methods in High Dimensions• Statistical Models, Supervised Learning and

Function Approximation• Structured Regression Models• Classes of Restricted Estimators• Model Selection and Bias


Notation• X: inputs, feature vector, predictors, independent variables.

Generally X will be a vector of p values. Qualitative features are coded in X. – Sample values of X generally in lower case; xi is i-th of N sample

values.

• Y: output, response, dependent variable. – Typically a scalar, can be a vector, of real values. Again yi is a

realized value.

• G: a qualitative response, taking values in a discrete set G; e.g. G={ survived, died }. We often code G via a binary indicator response vector Y.


Problem• 200 points generated in IR2

from a unknown distribution; 100 in each of

two classes G={ GREEN, RED }.

• Can we build a rule to predict the color of the future points?


• Code Y=1 if G=RED, else Y=0.• We model Y as a linear function of X:

• Obtain by least squares, by minimizing the quadratic criterion:

• Given an model matrix X and a response vector y,

Linear regression

p

j

Tjj XXY

10

N

i

Tii xyRSS

1

2)()(

yXXX TT 1)(

pN


Linear regression


Linear regression• Figure 2.1: A Classification

example in two dimensions. The classes are coded as a binary variable (GREEN=0, RED=1) and then fit by linear regression. The line is the decision boundary defined by . The red shaded region denotes that part of input space classified as RED ,while the green region is classified as GREEN.

5.0

TX


Possible scenarios


K-Nearest Neighbors


K-Nearest Neighbors• Figure 2.2: The same

classification example in two dimensions as in Figure 2.1. The classes are coded as a binary variable (GREEN=0, RED=1) and the fit by 15-nearest-neighbor.

• The predicted class is hence chosen by majority vote amongst the 15-nearest neighbors.


K-Nearest Neighbors• Figure 2.3: The same

classification example are coded as a binary variable ( GREEN=0, RED=1), and then predicted by

1-nearest-neighbor classification.


Linear regression vs. k-NN


Linear regression vs. k-NN• Figure 2.4: Misclassification curves

for the simulation example above. a test sample of size 10,000 was used. The red curves are test and the green are training error for k-NN classification. The results for linear regression are the bigger green and red dots at three degrees of freedom. The purple line is the optimal Bayes Error Rate.


Other Methods


Statistical decision theory


回归函数

)|Y()(

)|]([minarg)(

EPE

)|)](([

)()|

)

)]([)(

2|

2|

2

2

xXExf

xXcYExf

XXfYEE

dxprdxpr(dy)(y-f(x)

pr(dx,dy)(y-f(x)

XfYEfEPE

XYc

XYX

极小解为：

逐点极小化得对


Bayes Classifier


Bayes Classifier• Figure 2.5: The optimal

Bayes decision boundary for the simulation example above.

• Since the generating density is known for each class, this boundary can be calculated exactly.


Curse of dimensionality


20 0 0

20 0 0 0

2 20 0 0 0

0 0 0 0

20 0

( ) [ ( ) ]

[ ( ) [ ] [ ] ]

[ ( )] [ ( ) ( )]

2 {[ ( )][ ( ) ( )]}

( ) ( )

T

T

T T T

T

T

EPE x E f x y

E f x E y E y y

E y E y E E y f x

E y E y E y f x

Var y Bias y

Linear Model

• Linear Model

• Linear Regression

• Test error


TY X

1( )T T

β X X X y

0 0 01

10 0

ˆ ( )

( ) the i-th component of ( )

NT T

i ii

Ti

y x x l x

l x X X X x


Curse of dimensionality


Statistical Models


Supervised Learning


Two Types of Supervised Learning


Learning Classification Models


Learning Regression Models


Function Approximation


Function Approximation• Figure 2.10: Least

squares fitting of a function of two inputs. The parameters of fθ(x) are chosen so as to minimize the sum-of-squared vertical errors.


Function Approximation

• More generally, Maximum Likelihood Estimation provides a natural basis for estimation.

• E.g. multinomial

N

iig

k

x

XpXkG

i1

,

,

)(Prlog)(

)(Pr


Structured Regression Models


Classes of Restricted Estimators


Model Selection & the Bias-Variance Tradeoff


Model Selection & the Bias-Variance Tradeoff

• Test and training error as a function of model complexity.


• Page 27• Ex 2.1; 2.2 ； 2.6

overview of supervised learning. 2015-10-23overview of supervised learning2 outline linear...

Documents