multivariate statistics for use in ecological studies

Multivariate statistics for use in ecological

studies

Kevin WilcoxECOL 600 – Community Ecology

Spring 2014

Useful web resources• Vegan tutorial:

http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf

• The little book of r for multivariate analyses: http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/src/multivariateanalysis.html#means-and-variances-per-group

• Ordination Methods by Michael Palmer:http://ordination.okstate.edu/overview.htm#Nonmetric_Multidimensional_Scaling

• Community analyses lectures by Jari Oksanen:http://cc.oulu.fi/~jarioksa/opetus/metodi/

•




http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/src/multivariateanalysis.html#means-and-variances-per-group




http://ordination.okstate.edu/overview.htm#Nonmetric_Multidimensional_Scaling

http://ordination.okstate.edu/overview.htm#Nonmetric_Multidimensional_Scaling

http://cc.oulu.fi/~jarioksa/opetus/metodi/

http://cc.oulu.fi/~jarioksa/opetus/metodi/

Univariate statistics to measure community dynamics• Richness (R or S, Either local or regional)• Shannon index (H’; Shannon &Weaver 1949)• Incorporates richness as well as the relative abundances into a metric• Emphasizes richness

• Simpsons index (D or λ; Simpson 1949)• Emphasizes evenness

• Pielou’s evenness index (J’)

Univariate indices (cont.)

Species low.light mid.light high.light

A 0.759876 0.383737 0.083528

B 0.620718 0.156324 0.152468

C 0.245997 0.524531 0.185889

D 0.33769 0.571457 0.52394

E 0.207586 0.281312 0.545748

F 0.143989 0.289351 0.561022

Metric low.light mid.light high.light

Richness 6 6 6

H’ 1.63 1.71 1.60

D 0.78 0.81 0.77

J’ 0.91 0.96 0.89

• Species A and B are dominant in low light• All species do OK with moderate light• Species E and F are dominant in high light

No information about individual species responses

You could look at each species individually

• Lacks clarity with many species• When looking solely at individual

responses, you lose information about entire community dynamics

Light

Abun

danc

e

OR…

You could use multivariate statistics• 3 parts:

1. Dissimilarity matrices2. Ordinations3. Statistical tests of differences between

or among communities

low.light mid.light high.light

low.light 0 0.347573 0.491285

mid.light 0.347573 0 0.287918

high.light 0.491285 0.287918 0

MDS1

MD

S1

low.light mid.light high.light

low.light

mid.light 0.347573

high.light 0.491285 0.287918

• 3 parts:1. Dissimilarity metrics and matrices2. Ordination3. Statistical tests of differences between

or among communities

• Software:• R• SAS• SPSS• PRIMER with PERMANOVA+• PC-ORD

You could use multivariate statistics

Dissimilarity metrics are the building blocks used in many multivariate statistics

• Visual representation (ordination)• Statistical tests • Think carefully about which type

of matrix or dissimilarity metric you should use

P < 0.05

Dissimilarity matrices… brace yourself• A dissimilarity matrix is simply a table that compares all local

communities (plots). The higher the number, the more dissimilar the communities are

sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10

plot1 0.662205 0.35045 0.778512 0.459916 0.552845 0.570177 0.458688 0.569126 0.536608 0.647962

plot2 0.563042 0.348089 0.478276 0.642296 0.494129 0.505471 0.380392 0.449799 0.658523 0.418476

plot3 0.718789 0.452508 0.558081 0.58513 0.58881 0.503074 0.685387 0.584184 0.337696 0.520809

plot4 0.635745 0.374085 0.501664 0.397816 0.543634 0.570857 0.430936 0.594428 0.492054 0.477197

plot5 0.407958 0.2844 0.619615 0.49797 0.268607 0.540357 0.439914 0.6602 0.465481 0.515861

plot6 0.442134 0.542096 0.546438 0.575878 0.557583 0.314215 0.363373 0.427713 0.61189 0.765725

plot7 0.401636 0.725872 0.316728 0.573688 0.329604 0.463 0.516889 0.499663 0.579506 0.530339

plot8 0.353048 0.653721 0.589779 0.481271 0.549743 0.445126 0.681987 0.598617 0.443214 0.386528

plot9 0.54211 0.46642 0.495843 0.335123 0.695228 0.315519 0.610575 0.516901 0.525599 0.377469

plot10 0.458872 0.573375 0.485013 0.452928 0.604478 0.636183 0.398031 0.508075 0.342392 0.440215

Dissimilarity matrices… brace yourself• A dissimilarity matrix is simply a table that compares all local

communities (plots). The higher the number, the more dissimilar the communities are

plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9

plot2 0.119391

plot3 0.105672 0.129548

plot4 0.062924 0.090572 0.092943

plot5 0.111247 0.137485 0.140632 0.100948

plot6 0.135112 0.112754 0.150243 0.147331 0.168915

plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513

plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334

plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891

plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446

Types of dissimilarity metrics• Euclidean distance

• Operates in species space• Meaning that each species (or

dependent variable) gets its own orthogonal axis in multidimensional space.

• Because the differences are squared, single large differences become very important when determining dissimilarities

• Dissimilarities between pairs of plots with no shared species are not necessarily the same

• This is why ED is usually used for environmental and not abundance data


plot2 0.119391

plot3 0.105672 0.129548

plot4 0.062924 0.090572 0.092943

plot5 0.111247 0.137485 0.140632 0.100948

plot6 0.135112 0.112754 0.150243 0.147331 0.168915

plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513

plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334

plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891

plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446

Types of dissimilarity metricsManhattan-type distances• Bray-Curtis (abundance data)• Jacaard (presence-absence)

• Use sums or differences instead of squared terms making it less sensitive to single differences• Reach a maximum dissimilarity

of 1 when there are no shared species between communities


plot2 0.119391

plot3 0.105672 0.129548

plot4 0.062924 0.090572 0.092943

plot5 0.111247 0.137485 0.140632 0.100948

plot6 0.135112 0.112754 0.150243 0.147331 0.168915

plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513

plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334

plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891

plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446

Dissimilarity metrics are used to look at differences between communities• Euclidean distances are good for

looking at many types of environmental data but is not great for species abundances.

Knapp et al. in prep

Ordinations

• Basically, ordinations plot the communities based on all response variables (e.g. species responses) and then squish this into 2 or 3 dimensions.• Example 1: 2 species, 2 axes.

Species A

Spec

ies

B

Plot 1

Plot 2 Plot 3

Sp.A

Sp.B

Ordinations

• Plots the communities based on the response variables and then squishing this into 2 or 3 dimensions.• Example 2: 3 species, 3 axes• Etc up to n response variables• We can’t visualize this well after 3

axes but it happens

Species A

Spec

ies

BPlot 1

Plot 2Plot 3

Species C

Plot 4

Ordinations Example 3: 3 species, 2 axes.

Axis 1

Axis

2

Plot 1

Plot 2 Plot 3

Sp.A

Sp.B

Plot 4

Sp.C

Species A

Spec

ies

B

Plot 1

Plot 2Plot 3

Species C

Plot 4

Ordinations

• Analyzes communities based on all response variables and then • 2 species, 2 axes• 3 species, 3 axes• Etc up to n species n axis• Need to squash n dimensions

into 2. • Ordination rotates the axes to

minimize distance from primary axes and maximize explanation of variance by axes

Constrained vs unconstrained ordinations• Constrained ordination makes

the data fit into measured variables• This is limiting because you can

only examine species differences to things you measure• However this is beneficial if you

are interested in only a couple of environmental variables

• Unconstrained tries to represent variability of the data even if there are no variables to explain the variation• For example, if different

temperatures in two areas caused altered communities but not included in the model, you would still be able to detect differences in community structure• Better for exploratory analyses

Types of unconstrained ordinations

• Principle components analysis (PCA)• Uses Euclidean distances to map plots with the 2 or 3 axes that explain the

majority of variation• Use with environmental data• Be sure to standardize response variables if they are in different scales

• Principle coordinates analysis (PCO; Gower 1966)• Acts like PCA but uses a dissimilarity matrix instead of pulling straight from the

data. • This is more like plotting a close fitting trendline instead of the actual data.• Fits the line by maximizing a linear correlation – this can be problematic• Is sometimes called metric dimensional scaling (MDS)… not to be confused with

NMDS

• Non-metric multidimentional scaling (NMDS) - PRIMER calls this MDS!!! Ugh.• Very complicated.. In the past, the drawback with this technique was the

large amount of computing power necessary… this is no longer an issue.• Preserves rank order of relationships while plotting more similar local

communities closer together in 2D or 3D space – this solves the linear problem• Axes aren’t constrained by distances (e.g. Euclidean) so this method is more

flexible.

Types of unconstrained ordinations

• Non-metric multidimentional scaling (NMDS)• Stress = mismatch between

rank orders of distances in data and in ordination• Excellent – stress < 0.05• Good – stress < 0.1• Acceptable – 0.1 < stress < 0.2• On the edge – 0.2 < stress < 0.3 • Unacceptable – stress > 0.3

• To cope with high stress…

Unconstrained ordinations Increase the number

of dimensions of your ordination.. if possible

Types of constrained ordinations

• Constrained analysis of proximities (CAP)• You can plug in any dissimilarity

matrix into this• Performs linear mapping

• Redundancy analysis (RDA)• Constrained version of PCA

• Constrained correspondence analysis (CCA)• Based on Chi-squared distances• Weighted linear mapping

Incorporating environmental data into ordination• Can overlay vectors of

environmental data on top of community data• Vectors supply information about

the direction and strength of environmental variables• Easy to interpret the effects of

many variables• However, it assumes all

relationships are linear. This might not be the case…

Oksanen 2013

Incorporating environmental data into ordination• Can overlay surfaces of

environmental data on top of community data• Surfaces provide more detailed

information about how communities exist within abiotic variables• More difficult to interpret with

more than a couple variables• Using treatments is a special case

for this

Oksanen 2013

Ordination by itself is not a robust statistical test• Although ordination is great for

visualizing your data, we need to back it up.• One way is to calculate

confidence ellipses around the centroid• Another way is to use

resemblance-based permutation methods• They give P values…

For discussion how to do this in R, see: http://stats.stackexchange.com/questions/34017/confidence-intervals-around-a-centroid-with-modified-gower-similarity

Resemblance-based permutation methods• One benefit to these techniques is that they compare n dimensional data

instead of ordination data squished into 2 or 3D• Many assumptions of regular MANOVAs are violated with ecological

community data (see Clarke 1993) which spurred the creation of new methods for analyzing multivariate data• 3 majorly used methods:

• Permutational MANOVA (or PERMANOVA)• Analysis of similarities (ANOSIM)• Mantel’s test

• One assumption of all three of these tests is equal variance among treatments…• This is a problem but we’ll come back to this

ANOSIM – Clarke 1993

• Ranks dissimilarities among local communities from 1 to the number of comparisons made• Then looks at averages of ranked

dissimilarities within and among groups• Compares these averages to

random permutations of the R values to get p-value

(Originally from Clarke 1993 and reviewed in Anderson and Walsh 2013)

=1 if i and j are in the same group and =0 if they are in different groups

Mean dissimilarity rank of plot pairs between groupsMean

dissimilarity rank of plot pairs within a group

Used to calculate P value

ANOSIM – Clarke 1993

Essentially, during each permutation, plot labels in the dissimilarity matrix are shuffled and an R value is calculated. Over many permutations, a null distribution for R is created which the original R can be compared to - a p-value is obtained by where the original R falls on the distrubution R

Den

sity Compare R

actual to calculate p value

Mantel test

• Doesn’t use ranks• To compare groups, it uses one

dissimilarity matrix and one model matrix to designate contrasts and compare within and among groups

• p value is calculated as the proportion of z(0,1) (within group dissimilarities) that is lower or equal to z(1,0) (between group dissimilarities)


plot2 0.119391

plot3 0.105672 0.129548

plot4 0.062924 0.090572 0.092943

plot5 0.111247 0.137485 0.140632 0.100948

plot6 0.135112 0.112754 0.150243 0.147331 0.168915

plot7 0.173912 0.137066 0.157281 0.159704 0.138085 0.131513

plot8 0.144694 0.158255 0.098638 0.12356 0.131775 0.14664 0.125334

plot9 0.145805 0.135068 0.121951 0.106257 0.184396 0.141295 0.167902 0.112891

plot10 0.130464 0.119984 0.113435 0.088728 0.139998 0.13052 0.142464 0.108512 0.122446

Group Z Group Y

Plot1 plot2 plot3 plot4 plot5 plot6 plot7 plot8 plot9

plot2 1

plot3 1 1

plot4 1 1 1

plot5 1 1 1 1

plot6 0 0 0 0 0

plot7 0 0 0 0 0 1

plot8 0 0 0 0 0 1 1

plot9 0 0 0 0 0 1 1 1

plot10 0 0 0 0 0 1 1 1 1

(See Legendre & Legendre 2012 for more detail)

PERMANOVA

• Calculates a pseudo-F statistic• Pseudo-F is identical to a normal F

statistic if there is only one response variable

• This pseudo-F is calculated using the original data and compared with a distribution of pseudo F statistics from many random permutations. This step is the same as ANOSIM.

(See Anderson 2001, 2005 for more detail)

Pseudo F

Den

sity

Pseudo F

Choosing a method

• A major assumption in all three methods is equal variance among groups• This is often violated in real-world

communities• In fact, this change in variance (i.e.

dispersion or convergence among replicates or beta diversity) is often of interest to ecologists

• So… how do we deal with this?

Anderson and Walsh 2013

Choosing a method

Anderson and Walsh 2013

PERMDISP • Permutational analysis of multivariate

dispersions (Anderson 2004)• Compares multivariate dispersion among

groups• Uses any distance or dissimilarity measure you

feed into it• 2 main reasons to use this:

1. To look for violations of assumptions in tests of centroid location (although, as we discussed above, this may not be as big of a deal as once thought)

2. Variance among local communities within a treatment may be of ecological interest (for more info about using community dissimilarity methods to estimate beta diversity, see Legendre & Caceres 2013) Chase 2007

Anderson 2004

SIMPER

• Similarity percentages of component species or functional groups• Bray-Curtis dissimilarity matrix is

implicit in a SIMPER analysis• Can force it to use a Euclidean

distance matrix in PRIMER• I have not seen evidence for or against

this practice….

• Use this to find out which variables are responsible for observed shifts in multivariate space

Knapp et al. in prep

References• Anderson, Marti J., and Daniel CI Walsh. "PERMANOVA, ANOSIM, and the Mantel test in the face of

heterogeneous dispersions: What null hypothesis are you testing?." Ecological Monographs 83.4 (2013): 557-574.• Anderson, M. J. "PERMDISP: a FORTRAN computer program for permutational analysis of multivariate dispersions

(for any two-factor ANOVA design) using permutation tests." Department of Statistics, University of Auckland, New Zealand (2004).

• Anderson, Marti J. "Permutational multivariate analysis of variance." Department of Statistics, University of Auckland, Auckland (2005).

• Chase, Jonathan M. "Drought mediates the importance of stochastic community assembly." Proceedings of the National Academy of Sciences104.44 (2007): 17430-17434.

• Clarke, K R. "Non parametric multivariate analyses of changes in community structure." ‐ Australian journal of ecology 18.1 (1993): 117-143.

• Gower, John C. "Some distance properties of latent root and vector methods used in multivariate analysis." Biometrika 53.3-4 (1966): 325-338.

• Legendre, Pierre, and Miquel Cáceres. "Beta diversity as the variance of community data: dissimilarity coefficients and partitioning." Ecology letters 16.8 (2013): 951-963.

• Legendre, Pierre, and Louis Legendre. Numerical ecology. Vol. 20. Elsevier, 2012.• Oksanen, Jari. "Multivariate analysis of ecological communities in R: vegan tutorial." R package version (2011): 2-0.• Shannon, Claude E., and Warren Weaver. "The mathematical theory of information." (1949).• Simpson, Edward H. "Measurement of diversity." Nature (1949).

Interactions between climate and plant community structure alter ecosystem

sensitivity and thus ecosystem function

Precipitation regimes

Ecosystem function and services

Ecosystem Sensitivity

1

1 Direct impacts of precipitation regimes are based on ecosystem sensitivity

Sensitivity = absolute change in productivity per unit change in precipitation

IPCC 2007

Precipitation regimes




Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics

12 2

Soil moisture dynamics

Climate regimes



Community composition

1



2 2

Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities

33


Species responses

3

4

Climate regimes



Community composition

1


Species responses

2

3



2

Community composition can directly affect ecosystem services through dominance or diversity effects or indirectly by altering ecosystem sensitivity to precipitation regimes

3 Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities

4

Overarching question…• Do interactions between precipitation drivers, plant community

structure, and ecosystem sensitivity alter effects of precipitation regimes on ecosystem function?

Huxman et al. 2004

Sala et al. 1988

Shifts in Ecosystem fxn across space and time

B

C

A. Changes in overall soil moisture cause a change in the intercept of the Productivity – Precipitation relationship

B. Different drought sensitivities of component species within a community control slope and intercept of the relationship by altering ecosystem responses in dry years

C. Growth limitations (e.g. growth rate maximums, co-limitation by other resources such as N) of component species in wet years determine slope and intercept.

A

A

Precipitation

Ecos

yste

m F

uncti

onDry years Wet years

Experimental designs• ANPP data from 2 long-term data sets and linked

precipitation data1. Irrigation transect – relieves water limitation

throughout the growing season• 1991-2011

2. Uplands vs Lowlands – Annually burned, ungrazed watershed.• 1984 – 2011

• Looked at slopes between growing season rainfall and ANPP to assess sensitivity in control and manipulated plots.

I) Reduced soil water capacity

B

C

A

I) Chronic reduction of soil water availability should cause increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought.

II) Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope).

Precipitation

Ecos

yste

m F

uncti

onI) Chronic reduction of soil water availability should cause

increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought.

II) Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope).

II) Increased soil water availability

A C

II) Increased soil water availability

Ecosystem response to altered precip regimes / soil conditions

*

Precipitation

Ecos

yste

m F

uncti

on

Soil depth Irrigation

Sensitivity shifts?

n.s.**

Growing season precipitation (mm)

ANPP

(g/m

2)

n.s.

How is community structure modifying these relationships?

Smith 2009

Uplands vs lowlandsSpecies Contribution to

divergence (%)Panicum virgatum 21.68

Schizachyrium scoparium 16.76

Predictions based on abiotic forcings

Reduction in dry years because of limited soil water storage to buffer plants during periods of drought

Predictions when incorporating biotic

forcingsWater limitation is not an important factor in wet years

Growth rate limitations of extant species limit production in wet years

Decreased drought sensitivity of extant species limit production loss in dry years

Uplands vs lowlands

Chronic irrigation and community shifts• After 10 years, reordering of the

community occurred• Panicum virgatum replaced

Andropogon gerardii as the dominant species in 2001• We decided to look at ecosystem

sensitivity before and after this community shift to test some of our predictions

Collins et al. 2012

Predictions based on abiotic forcings

Predictions when incorporating biotic shifts

Growth rate or other resources limit production in wet years

Chronic irrigation

As species take over that do not have growth rate limitations, productivity responses in wet years should increase

Community change over time and altered sensitivity

Axis 1

Axis

2

19911991

2011*

2011*

Conclusions and implications

1. Sensitivity of ecosystems to climate drivers may be altered under future precipitation regimes

2. Additionally, community shifts driven by these altered precipitation regimes may cause a change in ecosystem sensitivity

3. Short term experiments may not pick up these community driven sensitivity changes

multivariate statistics for use in ecological studies

Documents