overview of supervised learning. 2015-10-23overview of supervised learning2 outline linear...
TRANSCRIPT
Overview of Supervised Overview of Supervised LearningLearning
23/4/22 Overview of Supervised Learning 2
Outline
• Linear Regression and Nearest Neighbors method
• Statistical Decision Theory• Local Methods in High Dimensions• Statistical Models, Supervised Learning and
Function Approximation• Structured Regression Models• Classes of Restricted Estimators• Model Selection and Bias
23/4/22 Overview of Supervised Learning 3
Notation• X: inputs, feature vector, predictors, independent variables.
Generally X will be a vector of p values. Qualitative features are coded in X. – Sample values of X generally in lower case; xi is i-th of N sample
values.
• Y: output, response, dependent variable. – Typically a scalar, can be a vector, of real values. Again yi is a
realized value.
• G: a qualitative response, taking values in a discrete set G; e.g. G={ survived, died }. We often code G via a binary indicator response vector Y.
23/4/22 Overview of Supervised Learning 4
Problem• 200 points generated in IR2
from a unknown distribution; 100 in each of
two classes G={ GREEN, RED }.
• Can we build a rule to predict the color of the future points?
23/4/22 Overview of Supervised Learning 5
• Code Y=1 if G=RED, else Y=0.• We model Y as a linear function of X:
• Obtain by least squares, by minimizing the quadratic criterion:
• Given an model matrix X and a response vector y,
Linear regression
p
j
Tjj XXY
10
N
i
Tii xyRSS
1
2)()(
yXXX TT 1)(
pN
23/4/22 Overview of Supervised Learning 6
Linear regression
23/4/22 Overview of Supervised Learning 7
Linear regression• Figure 2.1: A Classification
example in two dimensions. The classes are coded as a binary variable (GREEN=0, RED=1) and then fit by linear regression. The line is the decision boundary defined by . The red shaded region denotes that part of input space classified as RED ,while the green region is classified as GREEN.
5.0
TX
23/4/22 Overview of Supervised Learning 8
Possible scenarios
23/4/22 Overview of Supervised Learning 9
K-Nearest Neighbors
23/4/22 Overview of Supervised Learning 10
K-Nearest Neighbors• Figure 2.2: The same
classification example in two dimensions as in Figure 2.1. The classes are coded as a binary variable (GREEN=0, RED=1) and the fit by 15-nearest-neighbor.
• The predicted class is hence chosen by majority vote amongst the 15-nearest neighbors.
23/4/22 Overview of Supervised Learning 11
K-Nearest Neighbors• Figure 2.3: The same
classification example are coded as a binary variable ( GREEN=0, RED=1), and then predicted by
1-nearest-neighbor classification.
23/4/22 Overview of Supervised Learning 12
Linear regression vs. k-NN
23/4/22 Overview of Supervised Learning 13
Linear regression vs. k-NN• Figure 2.4: Misclassification curves
for the simulation example above. a test sample of size 10,000 was used. The red curves are test and the green are training error for k-NN classification. The results for linear regression are the bigger green and red dots at three degrees of freedom. The purple line is the optimal Bayes Error Rate.
23/4/22 Overview of Supervised Learning 14
Other Methods
23/4/22 Overview of Supervised Learning 15
Statistical decision theory
23/4/22 Overview of Supervised Learning 16
回归函数
)|Y()(
)|]([minarg)(
EPE
)|)](([
)()|
)
)]([)(
2|
2|
2
2
xXExf
xXcYExf
XXfYEE
dxprdxpr(dy)(y-f(x)
pr(dx,dy)(y-f(x)
XfYEfEPE
XYc
XYX
极小解为:
逐点极小化得对
23/4/22 Overview of Supervised Learning 17
23/4/22 Overview of Supervised Learning 18
23/4/22 Overview of Supervised Learning 19
Bayes Classifier
23/4/22 Overview of Supervised Learning 20
Bayes Classifier• Figure 2.5: The optimal
Bayes decision boundary for the simulation example above.
• Since the generating density is known for each class, this boundary can be calculated exactly.
23/4/22 Overview of Supervised Learning 21
Curse of dimensionality
23/4/22 Overview of Supervised Learning 22
20 0 0
20 0 0 0
2 20 0 0 0
0 0 0 0
20 0
( ) [ ( ) ]
[ ( ) [ ] [ ] ]
[ ( )] [ ( ) ( )]
2 {[ ( )][ ( ) ( )]}
( ) ( )
T
T
T T T
T
T
EPE x E f x y
E f x E y E y y
E y E y E E y f x
E y E y E y f x
Var y Bias y
23/4/22 Overview of Supervised Learning 23
23/4/22 Overview of Supervised Learning 24
Linear Model
• Linear Model
• Linear Regression
• Test error
23/4/22 Overview of Supervised Learning 25
TY X
1( )T T
β X X X y
0 0 01
10 0
ˆ ( )
( ) the i-th component of ( )
NT T
i ii
Ti
y x x l x
l x X X X x
23/4/22 Overview of Supervised Learning 26
Curse of dimensionality
23/4/22 Overview of Supervised Learning 28
23/4/22 Overview of Supervised Learning 29
Statistical Models
23/4/22 Overview of Supervised Learning 30
Supervised Learning
23/4/22 Overview of Supervised Learning 31
Two Types of Supervised Learning
23/4/22 Overview of Supervised Learning 32
Learning Classification Models
23/4/22 Overview of Supervised Learning 33
Learning Regression Models
23/4/22 Overview of Supervised Learning 34
Function Approximation
23/4/22 Overview of Supervised Learning 35
Function Approximation• Figure 2.10: Least
squares fitting of a function of two inputs. The parameters of fθ(x) are chosen so as to minimize the sum-of-squared vertical errors.
23/4/22 Overview of Supervised Learning 36
Function Approximation
• More generally, Maximum Likelihood Estimation provides a natural basis for estimation.
• E.g. multinomial
N
iig
k
x
XpXkG
i1
,
,
)(Prlog)(
)(Pr
23/4/22 Overview of Supervised Learning 37
Structured Regression Models
23/4/22 Overview of Supervised Learning 38
Classes of Restricted Estimators
23/4/22 Overview of Supervised Learning 39
Model Selection & the Bias-Variance Tradeoff
23/4/22 Overview of Supervised Learning 40
Model Selection & the Bias-Variance Tradeoff
• Test and training error as a function of model complexity.
23/4/22 Overview of Supervised Learning 41
• Page 27• Ex 2.1; 2.2 ; 2.6