cost curves - engineeringstan/csi5387/costcurves.pdf · •cost curves enable easy visualization of...

22
What ROC Curves Can't Do (and Cost Curves Can) Robert Holte Computing Science Dept. University of Alberta Chris Drummond Research Officer NRC Adjunct Professor SITE Cost Curves Scalar Measures Don’t Even Come Close

Upload: others

Post on 16-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

What ROC Curves Can't Do

(and Cost Curves Can)

Robert Holte

Computing Science Dept.

University of Alberta

Chris Drummond

Research Officer NRC

Adjunct Professor SITE

Cost Curves

Scalar Measures Don’t Even Come Close

Page 2: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

How to Evaluate Performance

?• Scalar measure summarizing performance

– Accuracy

– Expected cost

• Performance Visualization Techniques

– ROC curve

– Area under the ROC curve

– Cost Curve

Page 3: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

The Lure of Scalar Measures

“…it is often preferable to employ a single value

measure which summarizes the performance

of a classifier, e.g. because there are several

classifiers to be compared and there is no

clear dominance of one ROC curve above the

others.

The most widely used single measure is the

Area Under the ROC Curve …”

– paraphrase from a workshop paper

because there are several

classifiers to be compared and there is no clear

dominance of one ROC curve above the others.

Page 4: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

What’s Genuinely Good About

Scalar Measures ?

• We know how to

– average them,

– compute confidence intervals,

– test for significance, etc

• and there is off-the-shelf software to do these

calculations for us.

• But they often hide important details

Page 5: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Tru

e P

ositiv

e R

ate

False Positive Rate

2 Splitting Criteria for C4.5

Criterion-A

Criterion-D

The key question is

NOT

Is A better than D?

BUT

When is A better than D?

Appendicitis Dataset

Page 6: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Tru

e P

ositiv

e R

ate

Is AUC=0.95 better than AUC=0.75 ?

False Positive Rate

When

positives >> negatives 25:1

AUC=0.95

> twice the error rate of

AUC=0.75

FP = 0.75, TP = 1.0

AUC = 0.75FP = 0, TP = 0.95

AUC = 0.95

Page 7: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Cost Curves

Page 8: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Error Curves

Err

or

Rat

e

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0Classifier 1

TP = 0.4

FP = 0.3

Classifier 2

TP = 0.7

FP = 0.5

Classifier 3

TP = 0.6

FP = 0.2

FP FN = 1-TP

Page 9: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Operating Range

Err

or

Rat

e

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0

Operating Range

“always

negative”“always

positive”

Page 10: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Lower Envelope

Err

or

Rat

e

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0

“always

negative”

“always

positive”

The lower envelope is a biased estimate of performance.

Fresh data is needed to get an unbiased estimate.

Page 11: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

X = p(+) =>

Y = error rate

X = p(+) •C(-|+)

Err

or

Rat

e

Probability of Positive P(+)

Taking Costs Into Account

Y = FN•X + FP •(1-X)

PC(+) - Probability Cost

No

rmal

ized

Ex

pec

ted C

ost

p(+)•C(-|+) + p(-)•C(+|-)

FP FN

[0,1]

Y = E[cost] norm to [0,1]

Page 12: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Comparing Cost Curves

PC(+) - Probability Cost

No

rmal

ized

Ex

pec

ted C

ost

Page 13: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

ROC, Selection procedure

False Positive Rate

Tru

e P

osi

tiv

e R

ate Suppose this classifier was

produced by a training set

with a class ratio of 10:1,

and was used whenever the

deployment situation had a

10:1 class ratio.

Page 14: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Cost Curves, Selection

ProcedureN

orm

aliz

ed E

xpec

ted C

ost

PC(+) - Probability Cost

Page 15: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Averaging Cost Curves

PC(+)-Probability Cost

Norm

aliz

ed E

xp

ecte

d C

ost

Page 16: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Averaging ROC Curves

False Positive Rate

Tru

e P

osi

tive

Rat

e

Page 17: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Confidence Intervals

True

Predicted

pos neg

pos 78 22

neg 40 60

Original

TP = 0.78

FP = 0.4

Predicted

negposTrue

6238neg

1783pos

Resample #2

TP = 0.83

FP = 0.38

Resample confusion matrix 10000 times and take 95% envelope

Resample #1

TP = 0.75

FP = 0.45

Predicted

negposTrue

5545neg

2575pos

Page 18: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Confidence Interval Example

Norm

aliz

ed E

xp

ecte

d C

ost

PC(+) - Probability Cost

Page 19: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Paired Resampling to Test

Statistical Significance

Predicted by

Classifier1

Predicted by Classifier2

pos neg

pos 30 10

neg 0 60

For the 100 test examples in the negative class:

FP for classifier1: (30+10)/100 = 0.40

FP for classifier2: (30+0)/100 = 0.30

FP2 – FP1 = -0.10

Resample this matrix 10000 times to get (FP2-FP1) values.

Do the same for the matrix based on positive test examples.

Plot and take 95% envelope as before.

Page 20: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

PC(+) - Probability Cost

Norm

aliz

ed E

xpec

ted C

ost

Low correlation = Low

significance

classifier1

classifier2

FN2-FN1

FP2-FP10

Page 21: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Comparing J48 and AdaBoost

Page 22: Cost Curves - Engineeringstan/csi5387/CostCurves.pdf · •Cost curves enable easy visualization of –average performance (expected cost) –operating range –confidence intervals

Conclusions

• Scalar performance measures, including

AUC, do not indicate when one classifier is

better than another.

• Cost/ROC curve software is available

• Cost curves enable easy visualization of

– average performance (expected cost)

– operating range

– confidence intervals on performance

– difference in performance and its significance.