multivariate statistics for use in ecological studies
DESCRIPTION
Multivariate statistics for use in ecological studies. Kevin Wilcox ECOL 600 – Community Ecology Spring 2014. Useful web resources. Vegan tutorial: http ://cc.oulu.fi/~ jarioksa/opetus/metodi/vegantutor.pdf The little book of r for multivariate analyses: - PowerPoint PPT PresentationTRANSCRIPT
Multivariate statistics for use in ecological
studies
Kevin WilcoxECOL 600 – Community Ecology
Spring 2014
Useful web resources• Vegan tutorial:
http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf
• The little book of r for multivariate analyses: http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/src/multivariateanalysis.html#means-and-variances-per-group
• Ordination Methods by Michael Palmer:http://ordination.okstate.edu/overview.htm#Nonmetric_Multidimensional_Scaling
• Community analyses lectures by Jari Oksanen:http://cc.oulu.fi/~jarioksa/opetus/metodi/
•
Univariate statistics to measure community dynamics• Richness (R or S, Either local or regional)• Shannon index (H’; Shannon &Weaver 1949)• Incorporates richness as well as the relative abundances into a metric• Emphasizes richness
• Simpsons index (D or λ; Simpson 1949)• Emphasizes evenness
• Pielou’s evenness index (J’)
Univariate indices (cont.)
Species low.light mid.light high.light
A 0.759876 0.383737 0.083528
B 0.620718 0.156324 0.152468
C 0.245997 0.524531 0.185889
D 0.33769 0.571457 0.52394
E 0.207586 0.281312 0.545748
F 0.143989 0.289351 0.561022
Metric low.light mid.light high.light
Richness 6 6 6
H’ 1.63 1.71 1.60
D 0.78 0.81 0.77
J’ 0.91 0.96 0.89
• Species A and B are dominant in low light• All species do OK with moderate light• Species E and F are dominant in high light
No information about individual species responses
You could look at each species individually
• Lacks clarity with many species• When looking solely at individual
responses, you lose information about entire community dynamics
Light
Abun
danc
e
OR…
You could use multivariate statistics• 3 parts:
1. Dissimilarity matrices2. Ordinations3. Statistical tests of differences between
or among communities
low.light mid.light high.light
low.light 0 0.347573 0.491285
mid.light 0.347573 0 0.287918
high.light 0.491285 0.287918 0
MDS1
MD
S1
low.light mid.light high.light
low.light
mid.light 0.347573
high.light 0.491285 0.287918
• 3 parts:1. Dissimilarity metrics and matrices2. Ordination3. Statistical tests of differences between
or among communities
• Software:• R• SAS• SPSS• PRIMER with PERMANOVA+• PC-ORD
You could use multivariate statistics
Dissimilarity metrics are the building blocks used in many multivariate statistics
• Visual representation (ordination)• Statistical tests • Think carefully about which type
of matrix or dissimilarity metric you should use
P < 0.05
Dissimilarity matrices… brace yourself• A dissimilarity matrix is simply a table that compares all local
communities (plots). The higher the number, the more dissimilar the communities are
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10
plot1 0.662205 0.35045 0.778512 0.459916 0.552845 0.570177 0.458688 0.569126 0.536608 0.647962
plot2 0.563042 0.348089 0.478276 0.642296 0.494129 0.505471 0.380392 0.449799 0.658523 0.418476
plot3 0.718789 0.452508 0.558081 0.58513 0.58881 0.503074 0.685387 0.584184 0.337696 0.520809
plot4 0.635745 0.374085 0.501664 0.397816 0.543634 0.570857 0.430936 0.594428 0.492054 0.477197
plot5 0.407958 0.2844 0.619615 0.49797 0.268607 0.540357 0.439914 0.6602 0.465481 0.515861
plot6 0.442134 0.542096 0.546438 0.575878 0.557583 0.314215 0.363373 0.427713 0.61189 0.765725
plot7 0.401636 0.725872 0.316728 0.573688 0.329604 0.463 0.516889 0.499663 0.579506 0.530339
plot8 0.353048 0.653721 0.589779 0.481271 0.549743 0.445126 0.681987 0.598617 0.443214 0.386528
plot9 0.54211 0.46642 0.495843 0.335123 0.695228 0.315519 0.610575 0.516901 0.525599 0.377469
plot10 0.458872 0.573375 0.485013 0.452928 0.604478 0.636183 0.398031 0.508075 0.342392 0.440215
Dissimilarity matrices… brace yourself• A dissimilarity matrix is simply a table that compares all local
communities (plots). The higher the number, the more dissimilar the communities are
plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9
plot2 0.119391
plot3 0.105672 0.129548
plot4 0.062924 0.090572 0.092943
plot5 0.111247 0.137485 0.140632 0.100948
plot6 0.135112 0.112754 0.150243 0.147331 0.168915
plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513
plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334
plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891
plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446
Types of dissimilarity metrics• Euclidean distance
• Operates in species space• Meaning that each species (or
dependent variable) gets its own orthogonal axis in multidimensional space.
• Because the differences are squared, single large differences become very important when determining dissimilarities
• Dissimilarities between pairs of plots with no shared species are not necessarily the same
• This is why ED is usually used for environmental and not abundance data
plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9
plot2 0.119391
plot3 0.105672 0.129548
plot4 0.062924 0.090572 0.092943
plot5 0.111247 0.137485 0.140632 0.100948
plot6 0.135112 0.112754 0.150243 0.147331 0.168915
plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513
plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334
plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891
plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446
Types of dissimilarity metricsManhattan-type distances• Bray-Curtis (abundance data)• Jacaard (presence-absence)
• Use sums or differences instead of squared terms making it less sensitive to single differences• Reach a maximum dissimilarity
of 1 when there are no shared species between communities
plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9
plot2 0.119391
plot3 0.105672 0.129548
plot4 0.062924 0.090572 0.092943
plot5 0.111247 0.137485 0.140632 0.100948
plot6 0.135112 0.112754 0.150243 0.147331 0.168915
plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513
plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334
plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891
plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446
Dissimilarity metrics are used to look at differences between communities• Euclidean distances are good for
looking at many types of environmental data but is not great for species abundances.
Knapp et al. in prep
Ordinations
• Basically, ordinations plot the communities based on all response variables (e.g. species responses) and then squish this into 2 or 3 dimensions.• Example 1: 2 species, 2 axes.
Species A
Spec
ies
B
Plot 1
Plot 2 Plot 3
Sp.A
Sp.B
Ordinations
• Plots the communities based on the response variables and then squishing this into 2 or 3 dimensions.• Example 2: 3 species, 3 axes• Etc up to n response variables• We can’t visualize this well after 3
axes but it happens
Species A
Spec
ies
BPlot 1
Plot 2Plot 3
Species C
Plot 4
Ordinations Example 3: 3 species, 2 axes.
Axis 1
Axis
2
Plot 1
Plot 2 Plot 3
Sp.A
Sp.B
Plot 4
Sp.C
Species A
Spec
ies
B
Plot 1
Plot 2Plot 3
Species C
Plot 4
Ordinations
• Analyzes communities based on all response variables and then • 2 species, 2 axes• 3 species, 3 axes• Etc up to n species n axis• Need to squash n dimensions
into 2. • Ordination rotates the axes to
minimize distance from primary axes and maximize explanation of variance by axes
Constrained vs unconstrained ordinations• Constrained ordination makes
the data fit into measured variables• This is limiting because you can
only examine species differences to things you measure• However this is beneficial if you
are interested in only a couple of environmental variables
• Unconstrained tries to represent variability of the data even if there are no variables to explain the variation• For example, if different
temperatures in two areas caused altered communities but not included in the model, you would still be able to detect differences in community structure• Better for exploratory analyses
Types of unconstrained ordinations
• Principle components analysis (PCA)• Uses Euclidean distances to map plots with the 2 or 3 axes that explain the
majority of variation• Use with environmental data• Be sure to standardize response variables if they are in different scales
• Principle coordinates analysis (PCO; Gower 1966)• Acts like PCA but uses a dissimilarity matrix instead of pulling straight from the
data. • This is more like plotting a close fitting trendline instead of the actual data.• Fits the line by maximizing a linear correlation – this can be problematic• Is sometimes called metric dimensional scaling (MDS)… not to be confused with
NMDS
• Non-metric multidimentional scaling (NMDS) - PRIMER calls this MDS!!! Ugh.• Very complicated.. In the past, the drawback with this technique was the
large amount of computing power necessary… this is no longer an issue.• Preserves rank order of relationships while plotting more similar local
communities closer together in 2D or 3D space – this solves the linear problem• Axes aren’t constrained by distances (e.g. Euclidean) so this method is more
flexible.
Types of unconstrained ordinations
• Non-metric multidimentional scaling (NMDS)• Stress = mismatch between
rank orders of distances in data and in ordination• Excellent – stress < 0.05• Good – stress < 0.1• Acceptable – 0.1 < stress < 0.2• On the edge – 0.2 < stress < 0.3 • Unacceptable – stress > 0.3
• To cope with high stress…
Unconstrained ordinations Increase the number
of dimensions of your ordination.. if possible
Types of constrained ordinations
• Constrained analysis of proximities (CAP)• You can plug in any dissimilarity
matrix into this• Performs linear mapping
• Redundancy analysis (RDA)• Constrained version of PCA
• Constrained correspondence analysis (CCA)• Based on Chi-squared distances• Weighted linear mapping
Incorporating environmental data into ordination• Can overlay vectors of
environmental data on top of community data• Vectors supply information about
the direction and strength of environmental variables• Easy to interpret the effects of
many variables• However, it assumes all
relationships are linear. This might not be the case…
Oksanen 2013
Incorporating environmental data into ordination• Can overlay surfaces of
environmental data on top of community data• Surfaces provide more detailed
information about how communities exist within abiotic variables• More difficult to interpret with
more than a couple variables• Using treatments is a special case
for this
Oksanen 2013
Ordination by itself is not a robust statistical test• Although ordination is great for
visualizing your data, we need to back it up.• One way is to calculate
confidence ellipses around the centroid• Another way is to use
resemblance-based permutation methods• They give P values…
For discussion how to do this in R, see: http://stats.stackexchange.com/questions/34017/confidence-intervals-around-a-centroid-with-modified-gower-similarity
Resemblance-based permutation methods• One benefit to these techniques is that they compare n dimensional data
instead of ordination data squished into 2 or 3D• Many assumptions of regular MANOVAs are violated with ecological
community data (see Clarke 1993) which spurred the creation of new methods for analyzing multivariate data• 3 majorly used methods:
• Permutational MANOVA (or PERMANOVA)• Analysis of similarities (ANOSIM)• Mantel’s test
• One assumption of all three of these tests is equal variance among treatments…• This is a problem but we’ll come back to this
ANOSIM – Clarke 1993
• Ranks dissimilarities among local communities from 1 to the number of comparisons made• Then looks at averages of ranked
dissimilarities within and among groups• Compares these averages to
random permutations of the R values to get p-value
(Originally from Clarke 1993 and reviewed in Anderson and Walsh 2013)
=1 if i and j are in the same group and =0 if they are in different groups
Mean dissimilarity rank of plot pairs between groupsMean
dissimilarity rank of plot pairs within a group
Used to calculate P value
ANOSIM – Clarke 1993
Essentially, during each permutation, plot labels in the dissimilarity matrix are shuffled and an R value is calculated. Over many permutations, a null distribution for R is created which the original R can be compared to - a p-value is obtained by where the original R falls on the distrubution R
Den
sity Compare R
actual to calculate p value
Mantel test
• Doesn’t use ranks• To compare groups, it uses one
dissimilarity matrix and one model matrix to designate contrasts and compare within and among groups
• p value is calculated as the proportion of z(0,1) (within group dissimilarities) that is lower or equal to z(1,0) (between group dissimilarities)
plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9
plot2 0.119391
plot3 0.105672 0.129548
plot4 0.062924 0.090572 0.092943
plot5 0.111247 0.137485 0.140632 0.100948
plot6 0.135112 0.112754 0.150243 0.147331 0.168915
plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513
plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334
plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891
plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446
Group Z Group Y
Plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9
plot2 1
plot3 1 1
plot4 1 1 1
plot5 1 1 1 1
plot6 0 0 0 0 0
plot7 0 0 0 0 0 1
plot8 0 0 0 0 0 1 1
plot9 0 0 0 0 0 1 1 1
plot10 0 0 0 0 0 1 1 1 1
(See Legendre & Legendre 2012 for more detail)
PERMANOVA
• Calculates a pseudo-F statistic• Pseudo-F is identical to a normal F
statistic if there is only one response variable
• This pseudo-F is calculated using the original data and compared with a distribution of pseudo F statistics from many random permutations. This step is the same as ANOSIM.
(See Anderson 2001, 2005 for more detail)
Pseudo F
Den
sity
Pseudo F
Choosing a method
• A major assumption in all three methods is equal variance among groups• This is often violated in real-world
communities• In fact, this change in variance (i.e.
dispersion or convergence among replicates or beta diversity) is often of interest to ecologists
• So… how do we deal with this?
Anderson and Walsh 2013
Choosing a method
Anderson and Walsh 2013
PERMDISP • Permutational analysis of multivariate
dispersions (Anderson 2004)• Compares multivariate dispersion among
groups• Uses any distance or dissimilarity measure you
feed into it• 2 main reasons to use this:
1. To look for violations of assumptions in tests of centroid location (although, as we discussed above, this may not be as big of a deal as once thought)
2. Variance among local communities within a treatment may be of ecological interest (for more info about using community dissimilarity methods to estimate beta diversity, see Legendre & Caceres 2013) Chase 2007
Anderson 2004
SIMPER
• Similarity percentages of component species or functional groups• Bray-Curtis dissimilarity matrix is
implicit in a SIMPER analysis• Can force it to use a Euclidean
distance matrix in PRIMER• I have not seen evidence for or against
this practice….
• Use this to find out which variables are responsible for observed shifts in multivariate space
Knapp et al. in prep
References• Anderson, Marti J., and Daniel CI Walsh. "PERMANOVA, ANOSIM, and the Mantel test in the face of
heterogeneous dispersions: What null hypothesis are you testing?." Ecological Monographs 83.4 (2013): 557-574.• Anderson, M. J. "PERMDISP: a FORTRAN computer program for permutational analysis of multivariate dispersions
(for any two-factor ANOVA design) using permutation tests." Department of Statistics, University of Auckland, New Zealand (2004).
• Anderson, Marti J. "Permutational multivariate analysis of variance." Department of Statistics, University of Auckland, Auckland (2005).
• Chase, Jonathan M. "Drought mediates the importance of stochastic community assembly." Proceedings of the National Academy of Sciences104.44 (2007): 17430-17434.
• Clarke, K R. "Non parametric multivariate analyses of changes in community structure." ‐ Australian journal of ecology 18.1 (1993): 117-143.
• Gower, John C. "Some distance properties of latent root and vector methods used in multivariate analysis." Biometrika 53.3-4 (1966): 325-338.
• Legendre, Pierre, and Miquel Cáceres. "Beta diversity as the variance of community data: dissimilarity coefficients and partitioning." Ecology letters 16.8 (2013): 951-963.
• Legendre, Pierre, and Louis Legendre. Numerical ecology. Vol. 20. Elsevier, 2012.• Oksanen, Jari. "Multivariate analysis of ecological communities in R: vegan tutorial." R package version (2011): 2-0.• Shannon, Claude E., and Warren Weaver. "The mathematical theory of information." (1949).• Simpson, Edward H. "Measurement of diversity." Nature (1949).
Interactions between climate and plant community structure alter ecosystem
sensitivity and thus ecosystem function
Precipitation regimes
Ecosystem function and services
Ecosystem Sensitivity
1
1 Direct impacts of precipitation regimes are based on ecosystem sensitivity
Sensitivity = absolute change in productivity per unit change in precipitation
IPCC 2007
Precipitation regimes
Ecosystem function and services
Ecosystem Sensitivity
1 Direct impacts of precipitation regimes are based on ecosystem sensitivity
Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics
12 2
Soil moisture dynamics
Climate regimes
Ecosystem function and services
Ecosystem Sensitivity
Community composition
1
1 Direct impacts of precipitation regimes are based on ecosystem sensitivity
Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics
2 2
Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities
33
Soil moisture dynamics
Species responses
3
4
Climate regimes
Ecosystem function and services
Ecosystem Sensitivity
Community composition
1
Soil moisture dynamics
Species responses
2
3
1 Direct impacts of precipitation regimes are based on ecosystem sensitivity
Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics
2
Community composition can directly affect ecosystem services through dominance or diversity effects or indirectly by altering ecosystem sensitivity to precipitation regimes
3 Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities
4
Overarching question…• Do interactions between precipitation drivers, plant community
structure, and ecosystem sensitivity alter effects of precipitation regimes on ecosystem function?
Huxman et al. 2004
Sala et al. 1988
Shifts in Ecosystem fxn across space and time
B
C
A. Changes in overall soil moisture cause a change in the intercept of the Productivity – Precipitation relationship
B. Different drought sensitivities of component species within a community control slope and intercept of the relationship by altering ecosystem responses in dry years
C. Growth limitations (e.g. growth rate maximums, co-limitation by other resources such as N) of component species in wet years determine slope and intercept.
A
A
Precipitation
Ecos
yste
m F
uncti
onDry years Wet years
Experimental designs• ANPP data from 2 long-term data sets and linked
precipitation data1. Irrigation transect – relieves water limitation
throughout the growing season• 1991-2011
2. Uplands vs Lowlands – Annually burned, ungrazed watershed.• 1984 – 2011
• Looked at slopes between growing season rainfall and ANPP to assess sensitivity in control and manipulated plots.
I) Reduced soil water capacity
B
C
A
I) Chronic reduction of soil water availability should cause increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought.
II) Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope).
Precipitation
Ecos
yste
m F
uncti
onI) Chronic reduction of soil water availability should cause
increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought.
II) Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope).
II) Increased soil water availability
A C
II) Increased soil water availability
Ecosystem response to altered precip regimes / soil conditions
*
Precipitation
Ecos
yste
m F
uncti
on
Soil depth Irrigation
Sensitivity shifts?
n.s.**
Growing season precipitation (mm)
ANPP
(g/m
2)
n.s.
How is community structure modifying these relationships?
Smith 2009
Uplands vs lowlandsSpecies Contribution to
divergence (%)Panicum virgatum 21.68
Schizachyrium scoparium 16.76
Predictions based on abiotic forcings
Reduction in dry years because of limited soil water storage to buffer plants during periods of drought
Predictions when incorporating biotic
forcingsWater limitation is not an important factor in wet years
Growth rate limitations of extant species limit production in wet years
Decreased drought sensitivity of extant species limit production loss in dry years
Uplands vs lowlands
Chronic irrigation and community shifts• After 10 years, reordering of the
community occurred• Panicum virgatum replaced
Andropogon gerardii as the dominant species in 2001• We decided to look at ecosystem
sensitivity before and after this community shift to test some of our predictions
Collins et al. 2012
Predictions based on abiotic forcings
Predictions when incorporating biotic shifts
Growth rate or other resources limit production in wet years
Chronic irrigation
As species take over that do not have growth rate limitations, productivity responses in wet years should increase
Community change over time and altered sensitivity
Axis 1
Axis
2
19911991
2011*
2011*
Conclusions and implications
1. Sensitivity of ecosystems to climate drivers may be altered under future precipitation regimes
2. Additionally, community shifts driven by these altered precipitation regimes may cause a change in ecosystem sensitivity
3. Short term experiments may not pick up these community driven sensitivity changes