multivariate analysis harry r. erwin, phd school of computing and technology university of...

Multivariate Analysis

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland

Resources

• Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold.

• Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

Roadmap

• PBL group assignments• Multivariate data graphics tutorials• Testing distributional assumptions• Principle components analysis• Cluster analysis• Summary

PBL group assignments

• Two groups

Multivariate data graphics tutorials

• Available on the module website• Covers both standard and lattice graphics

Testing distributional assumptions

• For these techniques to work, the data have to be distributed in a multivariate normal distribution.

• There are two ways of testing this:– Examine each variable separately (this does not

imply the data follow a multivariate normal distribution)

– Convert the data to a single number (a generalised distance) and plot against an appropriate chi-squared distribution.

Separate Examination

• X has two columns, and the combined data are bivariate normal:par(mfrow=c(1,2)qqnorm(X[,1],ylab= “Ordered observations”)

qqline(X[,1])qqnorm(X[,2],ylab= “Ordered observations”)

qqline(X[,2])

Comparison to a chi-squared distribution

• Same data, using chisplot available at http://biostatistics.iop.kcl.ac.uk/publications/everitt/ par(mfrow=c(1,1)chisplot(X)

Principle components analysis (PCA)

• Describe the variation of a set of multivariate data in terms of a set of uncorrelated variables, each a linear combination of the original variables.

• The goal is to reduce the number of meaningful variables to a small number that summarise the data set.

• Deals with highly correlated explanatory variables.• Representative of projection pursuit methods.

Cluster analysis

• A tool for classifying a phenomenon that sorts the samples into a small number of groups or clusters, usually non-overlapping.

• These clusters may not be unique.– Predictive clustering– Clustering based on causation

• Hence a cluster analysis is neither true nor false, but is simply useful.

Cluster analysis approaches

• Agglomerative hierarchical clustering (fusion from the bottom-up)

• K-means type methods (partition from the top down)• Classification maximum likelihood methods (assume a

model for the shape of the clusters)• Or you can simply use the tree library.

library(tree)model<-tree(ozone~.,data=ozone.pollution)plot(model)text(model)

Summary

• Multivariate statistics is usually done from the point of view that there are no laws of scientific inference—‘anything goes’.

• First, you explore the data to come up with hypotheses—the models.

• Then you confirm the models on a second data set.• If you have a single data set, split it into two parts, one for

exploration and one for confirmation.• Good data analysis is based on the skilful interpretation of

evidence and the subsequent development of hunches.

multivariate analysis harry r. erwin, phd school of computing and technology university of...

Documents

what is refactoring? cse301 university of sunderland harry...

matlab programming comm2m harry r. erwin, phd university of...

how not to present multivariate data harry r. erwin, phd...

regression harry r. erwin, phd school of computing and...

the future of computing: aspect-oriented programming and...

summary of remainder harry r. erwin, phd school of computing...

hacking for fun and profit (or know thy enemy!) university...

penetration testing university of sunderland csem02 harry r...

why be concurrent? cet306 harry r. erwin university of...

the auditory system (lectures 7 and 8) harry r. erwin, phd...

finding concurrency cet306 harry r. erwin university of...

analysis of covariance harry r. erwin, phd school of...

creating graphical user interfaces (guis) in java cse301...

packet protocols university of sunderland csem02 harry r....

development techniques cse301 university of sunderland harry...

learning concepts - sunderland it courses , courses in...

security mechanisms university of sunderland csem02 harry r....

tcp/ip and internet security csem02 university of sunderland...

micram (midbrain computational and robotic auditory model...

statistical analysis harry r. erwin, phd school of computing...