evaluaon) - information and computer sciencecis519/fall2014/lectures/03_evaluation.p… ·...
TRANSCRIPT
![Page 1: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/1.jpg)
Evalua&on
![Page 2: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/2.jpg)
Stages of (Batch) Machine Learning Given: labeled training data
• Assumes each with
Train the model: model ß classifier.train(X, Y )
Apply the model to new data: • Given: new unlabeled instance
ypredic&on ß model.predict(x)
X,Y =nD
x
(i), y
(i)Eon
i=1
x
(i) ⇠ D(X ) y
(i) = ftarget
�x
(i)�
x ⇠ D(X )
model
learner
X, Y
x ypredic&on
2
![Page 3: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/3.jpg)
Classifica&on Metrics
accuracy =
# correct predictions
# test instances
error = 1� accuracy =
# incorrect predictions
# test instances
3
![Page 4: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/4.jpg)
recall =TP
TP + FNprecision =
TP
TP + FP
Confusion Matrix • Given a dataset of P posi&ve instances and N nega&ve instances:
• Imagine using classifier to iden&fy posi&ve cases (i.e., for informa&on retrieval)
Yes
No
Yes No
Actual Class
Predicted Class
TP FN
FP TN
accuracy =TP + TN
P +N
Probability that a randomly selected result is relevant
Probability that a randomly selected relevant document is retrieved 4
![Page 5: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/5.jpg)
Training Data and Test Data • Training data: data used to build the model
• Test data: new data, not used in the training process
• Training performance is oQen a poor indicator of generaliza&on performance – Generaliza&on is what we really care about in ML – Easy to overfit the training data – Performance on test data is a good indicator of generaliza&on performance
– i.e., test accuracy is more important than training accuracy
5
![Page 6: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/6.jpg)
Simple Decision Boundary
2 3 4 5 6 7 8 9 10-1
0
1
2
3
4
5
6
Feature 1
Feat
ure
2
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
DecisionRegion 2
DecisionRegion 1
DecisionBoundary
Slide by Padhraic Smyth, UCIrvine 6
![Page 7: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/7.jpg)
More Complex Decision Boundary
2 3 4 5 6 7 8 9 10 -1
0
1
2
3
4
5
6
Feature 1
Feat
ure
2
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
Decision Region 2
Decision Region 1
Decision Boundary
Slide by Padhraic Smyth, UCIrvine 7
![Page 8: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/8.jpg)
Example: The OverfiXng Phenomenon
X
Y
Slide by Padhraic Smyth, UCIrvine 8
![Page 9: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/9.jpg)
A Complex Model
X
Y
Y = high-order polynomial in X
Slide by Padhraic Smyth, UCIrvine 9
![Page 10: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/10.jpg)
The True (simpler) Model
X
Y
Y = a X + b + noise
Slide by Padhraic Smyth, UCIrvine 10
![Page 11: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/11.jpg)
How OverfiXng Affects Predic&on
Predictive Error
Model Complexity
Error on Training Data
Slide by Padhraic Smyth, UCIrvine 11
![Page 12: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/12.jpg)
How OverfiXng Affects Predic&on
Predictive Error
Model Complexity
Error on Training Data
Error on Test Data
Slide by Padhraic Smyth, UCIrvine 12
![Page 13: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/13.jpg)
How OverfiXng Affects Predic&on
Predictive Error
Model Complexity
Error on Training Data
Error on Test Data
Ideal Range for Model Complexity
Overfitting Underfitting
Slide by Padhraic Smyth, UCIrvine 13
![Page 14: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/14.jpg)
Comparing Classifiers Say we have two classifiers, C1 and C2, and want to choose the best one to use for future predic&ons Can we use training accuracy to choose between them? • No!
– e.g., C1 = pruned decision tree, C2 = 1-‐NN training_accuracy(1-‐NN) = 100%, but may not be best
Instead, choose based on test accuracy...
Based on Slide by Padhraic Smyth, UCIrvine 14
![Page 15: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/15.jpg)
Training and Test Data
Full Data Set Training Data
Test Data
Idea: Train each model on the “training data”... ...and then test each model’s accuracy on the test data
15
![Page 16: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/16.jpg)
k-‐Fold Cross-‐Valida&on • Why just choose one par&cular “split” of the data?
– In principle, we should do this mul&ple &mes since performance may be different for each split
• k-‐Fold Cross-‐Valida&on (e.g., k=10) – randomly par&&on full data set of n instances into
k disjoint subsets (each roughly of size n/k) – Choose each fold in turn as the test set; train model on the other folds and evaluate
– Compute sta&s&cs over k test performances, or choose best of the k models
– Can also do “leave-‐one-‐out CV” where k = n
16
![Page 17: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/17.jpg)
Example 3-‐Fold CV
Test Data
Training Data
1st Par&&on Full Data Set
Training Data
2nd Par&&on
Test Data
Training Data
Test Data
Training Data
kth Par&&on
...
Test Performance
Test Performance
Test Performance
Summary sta&s&cs over k test
performances 17
![Page 18: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/18.jpg)
Op&mizing Model Parameters Can also use CV to choose value of model parameter P • Search over space of parameter values
– Evaluate model with P = p on valida&on set • Choose value p’ with highest valida&on performance • Learn model on full training set with P = p’
Valida&on Set
Training Data
1st Par&&on Training Data
Training Data
2nd Par&&on
Valida&on Set
Training Data
Valida&on Set
Training Data
kth Par&&on
...
Test Data
p 2 values(P )
Found that op&mal P = p1
Found that op&mal P = p2
Found that op&mal P = pk
Choose value of p of the model with the best valida&on performance 18
![Page 19: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/19.jpg)
More on Cross-‐Valida&on • Cross-‐valida&on generates an approximate es&mate of how well the classifier will do on “unseen” data – As k à n, the model becomes more accurate (more training data)
– ...but, CV becomes more computa&onally expensive – Choosing k < n is a compromise
• Averaging over different par&&ons is more robust than just a single train/validate par&&on of the data
• It is an even bener idea to do CV repeatedly! 19
![Page 20: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/20.jpg)
Mul&ple Trials of k-‐Fold CV
Test Data
Training Data
1st Par&&on Full Data Set
Training Data
2nd Par&&on
Test Data
Training Data
Test Data
Training Data
kth Par&&on
...
Test Performance
a.) Randomize Data Set Shuffle
Full Data Set
b.) Perform k-‐fold CV
Test Performance
Test Performance
2.) Compute sta&s&cs over t x k test performances
1.) Loop for t trials:
20
![Page 21: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/21.jpg)
Comparing Mul&ple Classifiers
Test Data
Training Data
1st Par&&on Full Data Set
Training Data
2nd Par&&on
Test Data
Training Data
Test Data
Training Data
kth Par&&on
...
a.) Randomize Data Set Shuffle
Full Data Set
b.) Perform k-‐fold CV
2.) Compute sta&s&cs over t x k test performances
1.) Loop for t trials:
Test Perf.
C1 Test Perf.
C2 Test
C1 Test
C2 Test
C1 Test
C2
Test each candidate learner on same training/tes&ng splits
Allows us to do paired summary sta&s&cs (e.g., paired t-‐test)
21
![Page 22: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/22.jpg)
Learning Curve • Shows performance versus the # training examples
– Compute over a single training/tes&ng split – Then, average across mul&ple trials of CV
22
![Page 23: Evaluaon) - Information and Computer Sciencecis519/fall2014/lectures/03_Evaluation.p… · Training)Dataand)TestData • Training)data:)dataused)to)build)the)model) • Testdata:)](https://reader036.vdocuments.site/reader036/viewer/2022081522/5fbf8a6d450a1a59177f6449/html5/thumbnails/23.jpg)
Building Learning Curves
Test Data
Training Data
1st Par&&on Full Data Set
Training Data
2nd Par&&on
Test Data
Training Data
Test Data
Training Data
kth Par&&on
...
a.) Randomize Data Set Shuffle
Full Data Set
b.) Perform k-‐fold CV
2.) Compute sta&s&cs over t x k learning curves
1.) Loop for t trials:
Curve C1
CurveC2
Curve C1
Curve C2
Curve C1
Curve C2
Compute learning curve over each training/tes&ng split
23