the expected performance curve samy bengio, johnny mariéthoz, mikaela keller
DESCRIPTION
The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller. MI – 25. oktober 2007 Kresten Toftgaard Andersen. Introduction to the paper. By Samy Bengio, Johnny Mariéthoz and Mikaela Keller, 2005 For machine learning community and researchers ect, who need to compare models. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/1.jpg)
1
The Expected Performance CurveSamy Bengio, Johnny Mariéthoz, Mikaela Keller
MI – 25. oktober 2007Kresten Toftgaard Andersen
![Page 2: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/2.jpg)
2
Introduction to the paper
By Samy Bengio, Johnny Mariéthoz and Mikaela Keller, 2005 For machine learning community and researchers ect, who need to compare
models.
Content of the paper: Introduces ROC curves very briefly. Points out some risks when using ROC curves for comparing different classifying
models. Argues that ROC curves can be misleading by showing some results. The authors contributes with a so called “Expected Performance Curve”, and
argues why it is better for comparing models. Extends EPC with confidence intervals and statistical difference tests. Concludes the paper summarizing their contribution and by listing strenghts and
weaknesses of ROC and EPC. Acknowledgement and references
![Page 3: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/3.jpg)
3
Content
Motivation Introduce terminology and notation, define problem. Introduce ROC curves Example: how to calculate a ROC Present arguments of why ROC curves should be used with great care Introduce EPC Continue example showing how to calculate an EPC Present arguments of why EPC might be better than ROC Confidence interval My opinion Discussion
![Page 4: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/4.jpg)
4
Motivation
ROC analysis is an important why to compare binary classifier models.
Can be used to select optimal models and discard suboptimal models.
Area of use: Medicine (diagnostic testing, evaluate evidence-based medicine approaches) Epidemiology (factors affecting health, evaluate optimal treatment approaches) Radiology (radar signals, evaluate new radiology techniques ) Psychology (signal detection, assess human detection of weak signals) Machine Learning (evaluation of machine learning techniques) …
![Page 5: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/5.jpg)
5
Definition of 2-class classifiers
Definition of 2-class classification problems:
Apply function and associated threshold on a seperate test data set (true class must be known) and count the outcome.
![Page 6: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/6.jpg)
6
Confusion matrix
Given a 2 class classifier and an instance, there are four possible outcomes:
TP: instance is positive and is classified as positive FN: instance is positive and is classified as negative TN: instance is negative and is classified as negative FN: instance is negative and is classified as positive
![Page 7: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/7.jpg)
7
Perfomance metrics
Selected measure is a pair which is generically called V1 and V2. V1 and V2 can be calculated in many ways depending on the situation. All
are simple combinations of TP, TN, FP and FN. Exact calculation of V1 and V2 is not important in this paper.
![Page 8: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/8.jpg)
8
Perfomance metrics
An unique measure generically called V combines V1 and V2 V can also be calculated in several ways depending on the situation
(Half Total Error Rate)
![Page 9: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/9.jpg)
9
What is a ROC curve?
ROC Abbreviation for ”Receiver Operating Characteristics”. Technique for visualizing, organizing and selecting classifiers based on their
performance. ROC can both be presented as a graph or a curve.
Classifiers Discrete classifiers (decision trees, rule sets ect.) Probabilistic classifiers (Naive Bayes, neural network ect.) Varying a threshold for a probabilistic classifier will trace a curve (ROC)
Following example will show this.
![Page 10: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/10.jpg)
10
Example
![Page 11: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/11.jpg)
11
Example
![Page 12: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/12.jpg)
12
Example
Threshold
![Page 13: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/13.jpg)
13
Example
Threshold
![Page 14: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/14.jpg)
14
Example
Threshold
![Page 15: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/15.jpg)
15
Example
Threshold
![Page 16: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/16.jpg)
16
Example
![Page 17: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/17.jpg)
17
Example
![Page 18: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/18.jpg)
18
ROC curves
• BEP = Breake Even Point
• BEP corresponds to the threshold nearst to a solutions such that V1 = V2
• The selected threshold have a significant impact on the model.
• The threshold represents the a trade-off between giving importance to V1 or V2.
![Page 19: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/19.jpg)
19
Potential risk of using ROC
Each point corresponds to a particular setting of the threshold. But in “real applications” the thresholds need to be decided before seeing the test set.
Normally the threshold is found by searching for the BEP using some equation. Possibility of mismatch because training set is different from the test set. Situations may occur where the optimal threshold found be using the training set,
doesn’t correspond to the optimal threshold on the test set. One parameter, the threshold, is tuned using the training set. Potential risk to
expect that the training error reflects the general error.
“Real applications often suffer from an additional mismatch between training and test conditions”.
Risk of a different trade-off (V1, V2) in test set. ROC curves does not take the risk of a mismatch into account. This probalility should be reflected in the procedure when calculating the performance curve.
![Page 20: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/20.jpg)
20
Potential risk of using ROC
ROC’s of two real models for a Text-Independent Speaker Verifacation task.
Looking at the curves only model B seems to be better than model A.
Looking at the thresholds, A is actually the best model.
![Page 21: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/21.jpg)
21
Expected performance curve
EPC present a range of possible expected performance on the test set. The calculation takes into account the possible mismatch while estimating the
desired threshold. A parameter alpha is used to estimate the possible missmatch of the threshold.
Framework:
Paremetric performance measure: C( V1(θ, D), V2(θ, D); )Depends on:The parameter , V1 and V2 computed on some data D for the threshold θ.
Example:C( V1(θ, D), V2(θ, D); )= C( Precision(θ, D), Recall(θ, D) ; )= - ( Precision(θ, D) + (1 - ) Recall(θ, D))
Procedure:Vary inside a reasonable range and for each estimate θ that minimizes C(-,-;) on a development set and then use the obtained θ to compute V on the test set. At last plot V with respect to .
![Page 22: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/22.jpg)
22
EPC Algorithm
![Page 23: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/23.jpg)
23
Example
![Page 24: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/24.jpg)
24
Example
![Page 25: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/25.jpg)
25
Example
![Page 26: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/26.jpg)
26
Example
![Page 27: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/27.jpg)
27
Example
![Page 28: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/28.jpg)
28
Example of an typical EPC
Alpha > 0,5 = more importance to false acceptance errors
Alpha < 0,5 = more importance to false rejection errors
![Page 29: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/29.jpg)
29
EPC in real applications
Expected Performance Curves for person authentication, where one wants to trade-off false acceptance rates with false rejection rates.
Expected Performance Curves for text categorization, where one wants to trade-off precision and recall and print the F1 measure.
![Page 30: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/30.jpg)
30
Confidence Interval Confidence intervals are used to indicate the reliability of an estimate
![Page 31: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/31.jpg)
31
My opinion
The authors got a point and the idea is good. Good for comparing models… …but hard to read much from EPC, ROC more informative. Cumbersome to compute EPC. Useful… maybe? Apparently only used by the authors?
![Page 32: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller](https://reader036.vdocuments.site/reader036/viewer/2022081515/56813c2b550346895da5a781/html5/thumbnails/32.jpg)
32
End of Line
QuestionsDiscussion