processing & testing phylogenetic trees. rooting

Post on 23-Dec-2015

230 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Processing & Processing & Testing Testing

Phylogenetic Phylogenetic TreesTrees

RootingRooting

Rooting

1. Outgroup RootingOutgroup Rooting: Based on external Information.

2. Midpoint RootingMidpoint Rooting: Direct a posteriori use of the ultrametricity assumption.

3. Largest-Genetic-Variability-Group Largest-Genetic-Variability-Group RootingRooting: Indirect a posteriori use of the ultrametricity assumption.

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

Unrooted tree

Are fungi relatives of animals or plants?

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

Unrooted tree

Add an outgroup, e.g., a bacterium.

Rooted tree

Rooting with outgroupRooting with outgroup

plant

plant

plant

fungus

animal

animal

animal

bacterium

root

animal

animal

animal

fungus

Unrooted tree

plant

plantplant

Monophyletic group

Monophyleticgroup

bacterialoutgroup

Midpoint rooting

Largest variation = Most ancientLargest variation = Most ancient

Species Divergence TimesSpecies Divergence TimesIf we know T1 and the rate of evolution, then we can infer T2.

If we know T2 and the rate of evolution, then we can infer T1.

r =KAC+KBC

4T1

If T1 is known T2 =KAB2r

=KABT1

KAC+KBC

If T2 is known T1=KAC

+KBC( )T2

2KAB

•Dating divergence events requires paleontological calibrations.

•This is a complicated problem.

Topological comparisonsTopological comparisons• Topological comparisons entail measuring the similarity or dissimilarity among tree topologies. • The need to compare topologies may arise when dealing with trees that have been inferred from analyses of different sets of data or from different types of analysis of the same data set. • When two trees derived from different data sets or different methodologies are identical, they are said to be congruent. • Congruence can sometimes be partial, i.e., limited to some parts of the trees, other parts being incongruent.

Penny and Hendy's topological distance (dT)

A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning.

dT = 2c

c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.

Trees inferred from the Trees inferred from the analysis of a particular analysis of a particular data set are called data set are called fundamental treesfundamental trees, i.e., , i.e., they summarize the they summarize the phylogenetic information in phylogenetic information in a data set. a data set.

Consensus treesConsensus trees are trees are trees that summarize the that summarize the phylogenetic information in phylogenetic information in a set of fundamental trees.a set of fundamental trees.

•In a strict consensus treestrict consensus tree, all conflicting branching patterns are collapsed into multifurcations. •In a X% majority-rule consensus trees majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted. •When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.

A tree is an A tree is an evolutionary evolutionary hypothesishypothesis

Q: How can we ascertain that the methodology we have used yields reliable results?

A: We can test the methodology on a phylogeny that is known for certain to be true, and compare the inferred phylogeny with the true phylogeny.

Caminalcules are a group of artificial organisms (belonging to the genus Caminalculus) that were invented by Dr. Joseph H. Camin from the University of Kansas.

Interested in how taxonomists group species, he designed these creatures to show an evolutionary pattern of divergence and diversification in morphology. There are 29 recent “species” of Caminalculus and 48 fossil forms.

The Caminalcules first appeared in print in the journal Systematic Zoology (now Systematic Biology) in 1983, four years after Camin's death in 1979. The first four papers on Caminalcules were written by Robert R. Sokal.

Joseph H. Camin (1922–1979)

Extant

Extinct

Assessing tree Assessing tree reliabilityreliability

Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts.

Questions:

(1) how reliable is the tree?(2) which parts of the tree are reliable? (3) is this tree significantly better than another one?

BootstrappiBootstrappingng

•A statistical A statistical technique that technique that uses intensive uses intensive random resampling random resampling of data to of data to estimate a estimate a statistic whose statistic whose underlying underlying distribution is distribution is unknownunknown..

•Characters are Characters are resampled with resampled with replacement replacement to create many to create many bootstrap bootstrap replicate data sets replicate data sets ((pseudosamplespseudosamples))

•Each bootstrap replicate data set Each bootstrap replicate data set is is analyzedanalyzed

•Frequency of occurrence of a group Frequency of occurrence of a group (bootstrap proportions) is a (bootstrap proportions) is a measure of support for the groupmeasure of support for the group

BootstrappiBootstrappingng

Bootstrapping - an Bootstrapping - an exampleexample

Ciliate SSUrDNA - parsimony bootstrap

123456789 Freq-----------------.**...... 100.00...**.... 100.00.....**.. 100.00...****.. 100.00...****** 95.50.......** 23.33...****.* 11.83...*****. 3.83.*******. 2.50.**....*. 1.00.**.....* 1.00

Partition Table

Ochromonas (1)

Symbiodinium (2)

Prorocentrum (3)

Euplotes (8)

Tetrahymena (9)

Loxodes (4)

Tracheloraphis (5)

Spirostomum (6)

Gruberia (7)

100

96

23

100

100

100

Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C).

(a) Gene tree for -tubulin (b) C = 50% (c) C = 90%

•All these tests use the null All these tests use the null hypothesis that the hypothesis that the differences between two trees differences between two trees (A and B) are no greater than (A and B) are no greater than expected by chance (from the expected by chance (from the sampling error).sampling error).

Tests for two competing trees

Likelihood Ratio Likelihood Ratio TestTest

•Likelihood of Hypothesis 1 = Likelihood of Hypothesis 1 = LL11

•Likelihood of Hypothesis 2 = Likelihood of Hypothesis 2 = LL22

= 2(ln = 2(ln LL1 1 – ln– ln LL22))•Compare Compare to to 22 distribution distributionor to a simulated distribution.or to a simulated distribution.

Reliability of Phylogenetic Reliability of Phylogenetic MethodsMethods

• Phylogenetic methods can also be evaluated in Phylogenetic methods can also be evaluated in terms of their general performance, particularly terms of their general performance, particularly their:their:

consistency - approach the truth with more dataconsistency - approach the truth with more data

efficiency - how quickly can they handle how much dataefficiency - how quickly can they handle how much data

robustness - how sensitive to violations of assumptionsrobustness - how sensitive to violations of assumptions

Problems with long branches

With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone

top related