Use of Predictive Models in Aquatic Biological
Assessment:
theory and application to the Colorado REMAP/NAWQA
datasetCharles P. Hawkins
Aquatic, Watershed, and Earth ResourcesUtah State University
Basic Concepts• Predictive models base
assessments on the compositional similarity between observed and expected biota.
• Major issues:–Understanding the units of measure.
–Predicting the expected taxa.
Basic Concepts(The Expected Taxa)
SpeciesReplicate Sample Number Freq
(Pc)1 2 3 4 5 6 7 8 9 10
A * * * * * * * * * * 1.0
B * * * * * * * * 0.8
C * * * * * 0.5
D * * * * * 0.5
E * 0.1
Sp Count
3 3 3 2 4 3 2 2 4 3 2.9Species Richness is the Unit of Currency.E = ∑ Pc = number of species / sample = 2.9.
O/E as a Measure of Impairment
Expected Biota Observed Biota
Species Pc O1 O2 O3 O4
A 1.0 * * * *
B 0.8 * *
C 0.5 *
D 0.5 *
E 0.1
F 0 *
Expected Sp Count 2.9 3 2 2 1
O/E 1.03 0.69 0.69 0.34
Environmental Gradient
Pro
babili
ty o
f C
aptu
reThis is the Challenge:
Estimating the Probabilities of Capture of Many Different Taxa that Exhibit Individualistic
Distributions
The basic approach to modeling pc’s and estimating E was worked out by Moss et
al.*
River InVertebrate Prediction and Classification System
(RIVPACS)
*Moss, D., M. T. Furse, J. F. Wright, and P. D. Armitage. 1987. The prediction of the macro-invertebrate fauna of unpolluted running-water sites in Great Britain using environmental data. Freshwater Biology 17:41-52.
RIVPACS-type Models: 9 Steps1. Establish a network of reference sites.
2. Establish standard sampling protocols.
3. Classify sites based on biological similarity.
4. Calculate taxa frequencies of occurrence (Fi,g) within each class.
5. Derive a model for estimating probabilites of a site belonging to each group (Pg).
6. Estimate pc’s by weighting Fi,g by Pg.
For each assessed site:
7. Sum pc’s to estimate E.
8. Calculate O/E.
9. Determine if observed O/E is different from reference?
Creating RIVPACS Models1. Establish a
network of reference sites that span the range of environmental conditions in the region of interest.
Creating RIVPACS Models2. Use standard protocols to sample biota
and habitat features.
Creating RIVPACS Models3. Classify sites in terms of their
compositional similarity.
Dissimilarity0 0.2 0.4 0.6 0.8 1
Gro
up
A
D
B
C
Cluster analysis shows
4 ‘groups’ of sites
Creating RIVPACS Models4. Estimate frequencies of occurrence
of each taxon in each biotic group.
Group
Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 Sp 6
A 0.33 0.89 0 0.25 1.00 0
B 0.80 0.99 0.21 0.36 0.87 0
C 0.60 0 0.16 0.28 0.98 0.05
D 0.10 0.54 0.09 0.29 1.00 0
Creating RIVPACS Models4. Estimate frequencies of occurrence of
each taxon in each biotic group.
Color Group Species 1 freq.
Red Group A
0.33
Green Group B
0.80
Yellow Group C
0.60
Blue Group D
0.10
Creating RIVPACS Models
New site
When we go to a new site, how do we know which biological group it should belong to?
5. Derive a model from environmental features (not biology) to predict the probabilities that a new site belongs to each of the biological groups.
Discriminant Function Model (for example):
Group is predicted by elevation, watershed area, geology
Creating RIVPACS Models
6. Develop site-specific ferquencies for each taxon to occur at the new site
New site
• Model predicts the probability that the new site is a member of each of the groups
• These probabilities are used to adjust site-specific frequencies of occurrence
Creating RIVPACS Models
7. Sum pc’s to estimate the number of taxa (E) that should be observed at the site based on standard sampling.
Species Pc
1 0.70
2 0.92
3 0.86
4 0.63
5 0.51
6 0.32
7 0.07
8 0.00
E 4.01
Creating RIVPACS Models
8. Calculate O/E by comparing the number of predicted taxa that were collected (O) with E.
Species Pc O
1 0.70 *
2 0.92 *
3 0.86
4 0.63
5 0.51 *
6 0.32
7 0.07
8 0.00
E 4.01 3
O/E = 3 / 4.01 = 0.75
Creating RIVPACS Models9. Determine if the O/E value is
significantly different from the reference condition by comparing against model predictions and error.
1E
O
O/E
Intermission
What’s confusing so far?
Applying RIVPACS to Colorado:
REMAP and NAWQA data 47 Reference Sites
37 REMAP sites10 NAWQA sites112 Taxa used in the model
123 Test Sites/Samples68 REMAP55 NAWQA
D
C
F
G
P
A
B
Spatial Distribution of Reference SitesColors indicate 7 different biologically defined ‘classes’
Circles = REMAP, Stars = NAWQA
Predictor Variablesin the Colorado Model
Variable F-valueDay Past 1 Jan 12.35Latitude 6.44Stream Length (log)
3.61
Substrate D50 (log) 2.83
% Ks Basin Lithology
2.61
0 10 20 30Expected # Taxa
0
10
20
30
Obs
erve
d #
Tax a
r2 = 0.66
0.0 0.5 1.0 1.50.0 0.5 1.0 1.5O/E
The Distribution of Estimated O/E
Values for Reference SitesMean = 1.00
SD = 0.1610th Percentile = 0.8490th Percentile = 1.16
D
C
F
G
P
A
B
Spatial Distribution of ‘Test’ SitesCircles = REMAP Sites, Stars = NAWQA Sites
Assessments could not be made on 16 of 118 samples (14% of total) because values of predictor variables were outside of the experience of the model.
0.0 0.5 1.0 1.50.0 0.5 1.0 1.5O/E
The Distribution of Estimated O/E Values for Test
Samples77% of test samples had O/E values outside the threshold values of 0.84 and 1.16.
Mean O/E ValuesREMAP: 0.67NAWQA: 0.69
D
C
F
G
P
A
B
Blue:>0.84 and <1.16Yell: 0.64–0.84, >1.16Red: <0.64Gray: No Assessment
Colorado Test SiteO/E values
Relationships between O/Eand
Potential Stressors
0 1 2 3 40.0
0.5
1.0
1.5
log Dissolved Copper (ug/L)
O/E
O/E Response to Stressors
Response to dissolved copper shows 2 thresholds:~ 2.5 and 30 ug/L.