testing predictive performance of ecological niche models a. townsend peterson, stolen from richard...
TRANSCRIPT
Testing Predictive Performance of Ecological Niche Models
A. Townsend Peterson, STOLEN FROMRichard Pearson
Niche Model Validation• Diverse challenges …
– Not a single loss function or optimality criterion– Different uses demand different criteria– In particular, relative weights applied to omission and
commission errors in evaluating models
• Nakamura: “which way is relevant to adopt is not a mathematical question, but rather a question for the user”– Asymmetric loss functions
Where do I get testing data????
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: resubstitution
100%
Same region
Different region
Different time
Different resolutionEvaluation
Calibration
Projection
All available
data
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: independent validation
100%All
available data
Same region
Different region
Different time
Different resolutionEvaluation
Calibration
Projection
(after Araújo et al. 2005 Gl. Ch. Biol.)
Model calibration and evaluation strategies: data splitting
70%
Test data
Same region
Different region
Different time
Different resolution
Evaluation
Calibration
Projection
Calibration data
30%
Types of Error
The four types of results that are possible when testing a distribution model
(see Pearson NCEP module 2007)
Presence-absence confusion matrix
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Thresholding
Selecting a decision threshold (p/a data)
(Liu et al. 2005 Ecography 29:385-393)
Selecting a decision threshold (p/a data)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
Threshold
Kapp
a
Selecting a decision threshold (p/a data)
Omission(proportion of presences predicted absent)
(c/a+c)
Commission(proportion of absences predicted present)
(b/b+d)
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
threshold
omis
sion
rate
LPTT10
Selecting a decision threshold (p-o data)
Threshold-dependent Tests(= loss functions)
The four types of results that are possible when testing a distribution model
(see Pearson NCEP module 2007)
Presence-absence test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion (%) correctly predicted (or ‘accuracy’, or ‘correct classification rate’):
(a + d)/(a + b + c + d)
Cohen’s Kappa:
)]/)))(())(((([)]/)))(())(((()[(
ndcdbbacanndcdbbacadak
Presence-absence test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):
a/(a + c)
Presence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’):
a/(a + c)
Proportion of observed presences incorrectly predicted (or ‘omission rate’, or ‘false negative fraction’):
c/(a + c)
Presence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Presence-only test statistics:testing for statistical significance
U. sikorae
Leaf-tailed gecko (Uroplatus)
U. sikorae
Success rate: 4 from 7Proportion predicted present: 0.231Binomial p = 0.0546
Success rate: 6 from 7Proportion predicted present: 0.339Binomial p = 0.008
Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):
d/(b + d)
Absence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’):
d/(b + d)
Proportion of observed (or assumed) absences incorrectly predicted (or ‘commission rate’, or ‘false positive fraction’):
b/(b + d)
Absence-only test statistics
Predicted present
Predicted absent
Recorded present Recorded (or assumed) absent
a (true positive)
c (false negative)
b (false positive)
d (true negative)
AUC: a threshold-independent test statistic
Predicted presentPredicted absent
Recorded present Recorded (or assumed) absent
a (true positive)c (false negative)
b (false positive)d (true negative)
sensitivity = a/(a+c)
specificity = d/(b+d)
(1 – omission rate)
(fraction of absences predicted present)
1 - specificity0 1
0
1
sens
itivi
ty Predicted probability of occurrence
Predicted probability of occurrence
10
10Fr
eque
ncy
Freq
uenc
y
set of ‘absences’ set of ‘presences’
set of ‘absences’ set of ‘presences’
Threshold-independent assessment:The Receiver Operating Characteristic (ROC) Curve
A B
C
(check out: http://www.anaesthetist.com/mnm/stats/roc/Findex.htm)