new methods in ecology

23
New Methods in Ecology Complex statistical tests, and why we should be cautious!

Upload: chacha

Post on 24-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

New Methods in Ecology. Complex statistical tests, and why we should be cautious!. Complex tests. Logistic Regression Principal Components Analysis Cluster Analysis. Multivariate. Multi variate tests mean you have a single explanatory variable, but multiple response variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: New Methods in Ecology

New Methods in Ecology

Complex statistical tests, and why we should be cautious!

Page 2: New Methods in Ecology

Complex tests

• Logistic Regression

• Principal Components Analysis

• Cluster AnalysisMultivariate

• Multivariate tests mean you have a single explanatory variable, but multiple response variables.

Page 3: New Methods in Ecology

Logistic Regression

Page 4: New Methods in Ecology

Logistic Regression Insects were exposed to a pesticide to

determine the effectiveness of the treatment. The response is dead individuals from a sample

Dose Dead Batch1 2 1003 10 9010 49 9830 96 100100 98 100

Page 5: New Methods in Ecology

Linear regression on the proportions killed vs dose

dose

At dose 0, Proportion killed is less than 0 (negative deaths?) and greater than dose 4, get > 100% mortality!

P(kill) = ax + b

Page 6: New Methods in Ecology

Need to ensure the model is bounded by 0 and 1, build a new equation

No longer have impossible predictions, and the model fits better

dose

P(ki

ll)

P(Kill) = 1- P(survived)

P(survived) =e(ax+b )

1+ e(ax+b )

Page 7: New Methods in Ecology

dose

P(ki

ll)Can now look at what proportion would be killed

at a particular dosage

P(Kill) = 1- P(survived)

P(survived) =e(ax+b )

1+ e(ax+b )

Page 8: New Methods in Ecology

Logistic regression issues…• Implementing and coding the model can be difficult• Can be tough to work through the equation• Is it easier to design around the issue?

Dose Dead Batch1 2 1003 10 10010 49 10030 96 100100 98 100

• Use the same number in each batch, use “number dead” as the response variable?

#Killed = ax +b€

P(survived) =e(ax+b )

1+ e(ax+b )

Page 9: New Methods in Ecology

Multivariate Statistics

• Single explanatory variable, multiple response variables

• Multivariate tests can be useful and insightful• Can be deeply confusing• Very often misused• Difficult to explain the results• Used to mask bad designs, confuse/impress

stupid people.

Page 10: New Methods in Ecology

Parrots in Bonaire

www.parrotwatch.org

Sam Williams

Sam collected a load of data on different aspects of the birds’ biology

Page 11: New Methods in Ecology

Parrots in Bonaire

• What to do with all this?• 1 descriptive variable (nest)• Multiple response variables• Principal component analysis…

Page 12: New Methods in Ecology

Principal Component Analysis• Obtains values for as many principle

components as there are response variables• Each PC accounts for some more of the total

variation• Each nest has a PC value for each PC• Each response variable has a rotation value for

each PC• What do these PC values and rotation values

relate to?• God knows

Page 13: New Methods in Ecology

Principal Component Output

Principle Component

Scree plot, first few Principal components account for much of the variation

Page 14: New Methods in Ecology

Principal Component Output

Biplot of the first 2 principle components

Can be used to look for correlations

Some significance tests (redundancy analysis)

Lots of noise!

Page 15: New Methods in Ecology

Other use of PCA• each nest/individual/replicate has a value of

each Principal component

• Can use these values as a response variable, and subject to other tests

• Called “Dimensionality Reduction”

Page 16: New Methods in Ecology

Salmon Genomics and Survival

• Gene expression data for ~16000 genes, from ~300 fish.

• Each fish is a replicate, each gene is a response variable

Page 17: New Methods in Ecology

• 16000 genes is lot of data, and a lot of variation.

• Do a PCA on the genes, use the PC values as a response variable

• Reduces the dimension of the data, rather than 16000 response variables, now have 1 (PC1, or PC2)

• Can then use this in other tests.

Salmon Genomics and Survival

Page 18: New Methods in Ecology

Salmon Genomics and Survival

Principle component

• Related value of PC1 to survival of the fish, showed a correlation for one stock

Page 19: New Methods in Ecology

days

Prop

ortio

n su

rviv

ing Scotch Chilko Adams

Salmon Genomics and Survival

• Condensed the gene expression data into something useable

• Method insanely complex and computer intensive• Still don’t really know what PC1 is!

Page 20: New Methods in Ecology

Cluster Analysis

• Like PCA, a multivariate method• Unlike PCA, looks for patterns within the data• Produces a hierarchical cluster• Groups similar individuals together• Unsupervised• Have to then decide where groups lie• Try and relate the grouping to something else?

Page 21: New Methods in Ecology

Cluster Analysis

Page 22: New Methods in Ecology

Multivariate Summary• Multivariate statistics are useful for data

mining• Often used when data collection was done

improperly/you’ve been given data sets• Can indicate how to proceed• Can be very messy• Totally opposite to the a priori “carry out an

experiment to test a hypothesis” idea.

Page 23: New Methods in Ecology

• Can be very useful and insightful if used properly

• More complex doesn’t necessarily mean better

• Can be difficult to interpret• Remember the golden rule – know how to

analyse the type of data you will collect, before you collect it!

Complex stats Summary