Download - Model evaluation 201606
![Page 1: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/1.jpg)
Model Evaluation
A. Townsend PetersonUniversity of Kansas
![Page 2: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/2.jpg)
Generalities
• Calibration data and evaluation data must be independent
• Important to establish whether the observed coincidence between model predictions and testing data is closer than random expectations
• Only once a model is tested (successfully) should the model be interpreted and explored
![Page 3: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/3.jpg)
Threshold-dependent or Not?Thresholded• PRO
– Simplicity of test– Clear interpretation– Computation is easy
• CON– Assumptions required in
thresholding– Less well accepted by the
community (who cares?)
Continuous• PRO
– Avoid need for thresholding and assumptions
– Very well accepted by community
• CON– Less clear in interpretation– Problems (known) with ROC
AUC– Computational challenges
![Page 4: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/4.jpg)
Binomial Test
• Given a SINGLE threshold• Proportional area predicted present
determines expected numbers of points correctly predicted
• Binomial test assesses whether observed number of successes is greater than that expected by chance alone
![Page 5: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/5.jpg)
![Page 6: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/6.jpg)
If predicted suitable area covers 15% of the testing area, then 15% of evaluation points are expected to fall in the predicted suitable area by chance.
• p = proportion of area predicted suitable
• s = number of successes• n = number of evaluation
points• =1-BINOMDIST(s,n,p,”TRUE”)
Cumulative binomial distribution calculates the probability of obtaining s successes out of n trials in a situation in which p proportion of the testing area is predicted present. If this probability is below 0.05, we interpret the situation as indicating that the model’s predictions are significantly better than random.
Threshold-dependent Approach
![Page 7: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/7.jpg)
Threshold-independent Approaches
![Page 8: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/8.jpg)
![Page 9: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/9.jpg)
![Page 10: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/10.jpg)
![Page 11: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/11.jpg)
Corr
ect p
redi
ction
of
pres
ence
info
rmati
on
(= a
void
ance
of o
miss
ion
erro
r)
Correct prediction of absence information (= avoidance of commissionerror)
![Page 12: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/12.jpg)
![Page 13: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/13.jpg)
![Page 14: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/14.jpg)
![Page 15: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/15.jpg)
![Page 16: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/16.jpg)
ROC Problems
• Ignores predicted probability values … just a ranking of suitabilities
• Speaks to regions of ROC space (= predictions) that are not particularly relevant
• Weights omission and commission errors equally• No information about spatial distribution of
model errors• Study area extent determines outcomes!
![Page 17: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/17.jpg)
![Page 18: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/18.jpg)
![Page 19: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/19.jpg)
![Page 20: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/20.jpg)
![Page 21: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/21.jpg)
![Page 22: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/22.jpg)
![Page 23: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/23.jpg)
Significance vs Performance
• Predictions that are significantly better than random is important, and is a sine qua non for model interpretation
• BUT, it is also important to assure that the model performs sufficiently well for the intended uses of the output
• Performance measures include omission rate, correct classification rate, etc.
![Page 24: Model evaluation 201606](https://reader035.vdocuments.site/reader035/viewer/2022081517/58a542f61a28ab4f088b5c43/html5/thumbnails/24.jpg)