processing & testing phylogenetic trees. rooting

36
Processing & Processing & Testing Testing Phylogenetic Phylogenetic Trees Trees

Upload: austen-cannon

Post on 23-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing & Testing Phylogenetic Trees. Rooting

Processing & Processing & Testing Testing

Phylogenetic Phylogenetic TreesTrees

Page 2: Processing & Testing Phylogenetic Trees. Rooting

RootingRooting

Page 3: Processing & Testing Phylogenetic Trees. Rooting
Page 4: Processing & Testing Phylogenetic Trees. Rooting
Page 5: Processing & Testing Phylogenetic Trees. Rooting

Rooting

1. Outgroup RootingOutgroup Rooting: Based on external Information.

2. Midpoint RootingMidpoint Rooting: Direct a posteriori use of the ultrametricity assumption.

3. Largest-Genetic-Variability-Group Largest-Genetic-Variability-Group RootingRooting: Indirect a posteriori use of the ultrametricity assumption.

Page 6: Processing & Testing Phylogenetic Trees. Rooting

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

Unrooted tree

Are fungi relatives of animals or plants?

Page 7: Processing & Testing Phylogenetic Trees. Rooting

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

Unrooted tree

Add an outgroup, e.g., a bacterium.

Page 8: Processing & Testing Phylogenetic Trees. Rooting

Rooted tree

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

bacterium

root

animal

animal

animal

fungus

Unrooted tree

plant

plantplant

Monophyletic group

Monophyleticgroup

bacterialoutgroup

Page 9: Processing & Testing Phylogenetic Trees. Rooting

Midpoint rooting

Page 10: Processing & Testing Phylogenetic Trees. Rooting

Largest variation = Most ancientLargest variation = Most ancient

Page 11: Processing & Testing Phylogenetic Trees. Rooting

Species Divergence TimesSpecies Divergence TimesIf we know T1 and the rate of evolution, then we can infer T2.

If we know T2 and the rate of evolution, then we can infer T1.

Page 12: Processing & Testing Phylogenetic Trees. Rooting

r =KAC+KBC

4T1

Page 13: Processing & Testing Phylogenetic Trees. Rooting

If T1 is known T2 =KAB2r

=KABT1

KAC+KBC

Page 14: Processing & Testing Phylogenetic Trees. Rooting

If T2 is known T1=KAC

+KBC( )T2

2KAB

Page 15: Processing & Testing Phylogenetic Trees. Rooting

•Dating divergence events requires paleontological calibrations.

•This is a complicated problem.

Page 16: Processing & Testing Phylogenetic Trees. Rooting

Topological comparisonsTopological comparisons• Topological comparisons entail measuring the similarity or dissimilarity among tree topologies. • The need to compare topologies may arise when dealing with trees that have been inferred from analyses of different sets of data or from different types of analysis of the same data set. • When two trees derived from different data sets or different methodologies are identical, they are said to be congruent. • Congruence can sometimes be partial, i.e., limited to some parts of the trees, other parts being incongruent.

Page 17: Processing & Testing Phylogenetic Trees. Rooting

Penny and Hendy's topological distance (dT)

A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning.

dT = 2c

c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.

Page 18: Processing & Testing Phylogenetic Trees. Rooting

Trees inferred from the Trees inferred from the analysis of a particular analysis of a particular data set are called data set are called fundamental treesfundamental trees, i.e., , i.e., they summarize the they summarize the phylogenetic information in phylogenetic information in a data set. a data set.

Consensus treesConsensus trees are trees are trees that summarize the that summarize the phylogenetic information in phylogenetic information in a set of fundamental trees.a set of fundamental trees.

Page 19: Processing & Testing Phylogenetic Trees. Rooting

•In a strict consensus treestrict consensus tree, all conflicting branching patterns are collapsed into multifurcations. •In a X% majority-rule consensus trees majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted. •When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.

Page 20: Processing & Testing Phylogenetic Trees. Rooting
Page 21: Processing & Testing Phylogenetic Trees. Rooting

A tree is an A tree is an evolutionary evolutionary hypothesishypothesis

Page 22: Processing & Testing Phylogenetic Trees. Rooting

Q: How can we ascertain that the methodology we have used yields reliable results?

A: We can test the methodology on a phylogeny that is known for certain to be true, and compare the inferred phylogeny with the true phylogeny.

Page 23: Processing & Testing Phylogenetic Trees. Rooting

Caminalcules are a group of artificial organisms (belonging to the genus Caminalculus) that were invented by Dr. Joseph H. Camin from the University of Kansas.

Interested in how taxonomists group species, he designed these creatures to show an evolutionary pattern of divergence and diversification in morphology. There are 29 recent “species” of Caminalculus and 48 fossil forms.

The Caminalcules first appeared in print in the journal Systematic Zoology (now Systematic Biology) in 1983, four years after Camin's death in 1979. The first four papers on Caminalcules were written by Robert R. Sokal.

Joseph H. Camin (1922–1979)

Page 24: Processing & Testing Phylogenetic Trees. Rooting

Extant

Extinct

Page 25: Processing & Testing Phylogenetic Trees. Rooting

Assessing tree Assessing tree reliabilityreliability

Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts.

Questions:

(1) how reliable is the tree?(2) which parts of the tree are reliable? (3) is this tree significantly better than another one?

Page 26: Processing & Testing Phylogenetic Trees. Rooting

BootstrappiBootstrappingng

•A statistical A statistical technique that technique that uses intensive uses intensive random resampling random resampling of data to of data to estimate a estimate a statistic whose statistic whose underlying underlying distribution is distribution is unknownunknown..

Page 27: Processing & Testing Phylogenetic Trees. Rooting

•Characters are Characters are resampled with resampled with replacement replacement to create many to create many bootstrap bootstrap replicate data sets replicate data sets ((pseudosamplespseudosamples))

•Each bootstrap replicate data set Each bootstrap replicate data set is is analyzedanalyzed

•Frequency of occurrence of a group Frequency of occurrence of a group (bootstrap proportions) is a (bootstrap proportions) is a measure of support for the groupmeasure of support for the group

BootstrappiBootstrappingng

Page 28: Processing & Testing Phylogenetic Trees. Rooting
Page 29: Processing & Testing Phylogenetic Trees. Rooting

Bootstrapping - an Bootstrapping - an exampleexample

Ciliate SSUrDNA - parsimony bootstrap

123456789 Freq-----------------.**...... 100.00...**.... 100.00.....**.. 100.00...****.. 100.00...****** 95.50.......** 23.33...****.* 11.83...*****. 3.83.*******. 2.50.**....*. 1.00.**.....* 1.00

Partition Table

Ochromonas (1)

Symbiodinium (2)

Prorocentrum (3)

Euplotes (8)

Tetrahymena (9)

Loxodes (4)

Tracheloraphis (5)

Spirostomum (6)

Gruberia (7)

100

96

23

100

100

100

Page 30: Processing & Testing Phylogenetic Trees. Rooting

Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C).

(a) Gene tree for -tubulin (b) C = 50% (c) C = 90%

Page 31: Processing & Testing Phylogenetic Trees. Rooting

•All these tests use the null All these tests use the null hypothesis that the hypothesis that the differences between two trees differences between two trees (A and B) are no greater than (A and B) are no greater than expected by chance (from the expected by chance (from the sampling error).sampling error).

Tests for two competing trees

Page 32: Processing & Testing Phylogenetic Trees. Rooting

Likelihood Ratio Likelihood Ratio TestTest

•Likelihood of Hypothesis 1 = Likelihood of Hypothesis 1 = LL11

•Likelihood of Hypothesis 2 = Likelihood of Hypothesis 2 = LL22

= 2(ln = 2(ln LL1 1 – ln– ln LL22))•Compare Compare to to 22 distribution distributionor to a simulated distribution.or to a simulated distribution.

Page 33: Processing & Testing Phylogenetic Trees. Rooting

Reliability of Phylogenetic Reliability of Phylogenetic MethodsMethods

• Phylogenetic methods can also be evaluated in Phylogenetic methods can also be evaluated in terms of their general performance, particularly terms of their general performance, particularly their:their:

consistency - approach the truth with more dataconsistency - approach the truth with more data

efficiency - how quickly can they handle how much dataefficiency - how quickly can they handle how much data

robustness - how sensitive to violations of assumptionsrobustness - how sensitive to violations of assumptions

Page 34: Processing & Testing Phylogenetic Trees. Rooting

Problems with long branches

With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone

Page 35: Processing & Testing Phylogenetic Trees. Rooting
Page 36: Processing & Testing Phylogenetic Trees. Rooting