gene tree discordance and multi-species coalescent models
DESCRIPTION
Mike DeGiorgio. Randa Tao. Gene tree discordance and multi-species coalescent models. Noah Rosenberg December 21, 2007. James Degnan. David Bryant. Gene trees and species trees. Different genes may produce different inferences about species relationships. T 2. T 3. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/1.jpg)
Gene tree discordance and multi-species coalescent models
Noah RosenbergDecember 21, 2007
James Degnan Randa TaoDavid Bryant
Mike DeGiorgio
![Page 2: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/2.jpg)
Gene trees and species trees
Different genes may produce different inferences about species relationships
![Page 3: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/3.jpg)
Coalescent model for evolution within species, conditional on the species tree
Hudson (1983, Evolution)Tajima (1983, Genetics)
Nei (1987, Molecular Evolutionary Genetics book)Pamilo & Nei (1988, Molecular Biology and
Evolution)Takahata (1989, Genetics)
Wu (1991, Genetics)Hudson (1992, Genetics)
Maddison (1997, Systematic Biology)
T2
T3
![Page 4: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/4.jpg)
1. Coalescences occur within species, with the same rate for each lineage pair.
3. When species splits are encountered, lineages from all groups descended from the split are allowed to coalesce.
Assumptions of the multispecies coalescent model conditional on a species tree
2. The rate of coalescence is proportional to the number of pairs of lineages.
T2
T3
![Page 5: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/5.jpg)
The probability that i lineages have j ancestors at T coalescent time units (T = t / N ) in the past is
a[k] = a(a-1)…(a-k+1)
a(k) = a(a+1)…(a+k-1)
Takahata and Nei (1985, Genetics)Tavare (1984, Theoretical Population Biology)
![Page 6: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/6.jpg)
Concordant gene tree Discordant gene tree
2. 1/3 of the probability that gene tree is determined in the ancestral phase, or (1/3)e-T
1. The probability gene tree is determined in the 2-species phase, or 1-e-T
Probability of concordance equals 1-(2/3)e-T
For 3 taxa, the probability of concordance is a sum of two terms:
T
A B C
Probability of a concordant gene tree topology
Hudson (1983, Evolution)Nei (1987, Molecular Evolutionary Genetics)Tajima (1983, Genetics)
![Page 7: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/7.jpg)
Probability of the matching gene tree ((AB)C)
Probability of a particular discordant gene tree ((BC)A)
![Page 8: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/8.jpg)
It would be desirable to have a general computation of the probability that a particular species tree topology with branch lengths gives
rise to a particular gene tree topology
![Page 9: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/9.jpg)
Gene tree probabilities under the multispecies coalescent model
A coalescent history gives the list of species tree branches on which gene tree coalescences occur.
Consider a species tree S (topology and branch lengths)
Consider a species tree G (topology only)
A B C A B C
JH Degnan & LA SalterEvolution 59: 24-37 (2005)
![Page 10: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/10.jpg)
The list of coalescent histories for an example with five taxa
A B C D E A C B D E
Species tree Gene tree
4321
(A,C) ((AC),B) (D,E) (((AC)B,(DE)) Probability
gij(T) is the probability that i lineages coalesce to j lineages during time T
![Page 11: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/11.jpg)
What are the properties of the number of coalescent histories?
Computing the probabilities of gene trees
Is it possible for the most likely gene tree to disagree with the species tree?
Using the probabilities of gene trees
How do species tree inference algorithms behave when applied to multiple gene trees?
![Page 12: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/12.jpg)
The number of coalescent histories
![Page 13: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/13.jpg)
The number of coalescent histories for the matching gene tree
12
3
4
5678
A B C D E F
AS,m is the number of coalescent histories for the matching gene tree when we subdivide the species tree root into m pieces
![Page 14: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/14.jpg)
The number of coalescent histories for trees with at most 5 taxa
![Page 15: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/15.jpg)
Number of coalescent histories for special shapes with n taxa
Catalan number Cn-1 (Degnan 2005)
1, 2, 5, 14, 42, 132, 429, 1430…
Number of taxa in left subtree is l
-, -, -, 13, 42, 138, 462, 1573…
![Page 16: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/16.jpg)
The number of coalescent histories for up to 11 taxa
![Page 17: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/17.jpg)
Ratio of the largest and smallest number of coalescent histories for n taxa
>
![Page 18: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/18.jpg)
Which types of shapes have the most coalescent histories?
The number of coalescent histories for trees with 8 taxa
Most
Least
![Page 19: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/19.jpg)
Caterpillar-like shapes with n taxa, based on 4- and 5-taxon subtrees
Cn-1
~(5/4)Cn-1 (1.25)Cn-1
~(23/16)Cn-1 (1.4375)Cn-1
![Page 20: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/20.jpg)
Largest values for caterpillar-like shapes based on 7 and 8-taxon subtrees
~(1381/256)Cn-1 (5.39453125)Cn-1
~(189/64)Cn-1 (2.953125)Cn-1
![Page 21: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/21.jpg)
Can a non-matching gene tree have more coalescent histories?
Caterpillar species tree
1430 coalescent histories
1441 coalescent histories
![Page 22: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/22.jpg)
Is it possible for the most likely gene tree to disagree with the species tree?
Using the probabilities of gene trees
How do species tree inference algorithms behave when applied to multiple gene trees?
What are the properties of the number of coalescent histories?
Computing the probabilities of gene trees
![Page 23: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/23.jpg)
For n>3 taxa, can species trees be discordant with the gene trees they are
most likely to produce?
![Page 24: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/24.jpg)
The labeled history for a gene tree is its sequence of coalescence events.
B C DA B C DA
The two labeled histories below produce the same labeled topology ((AB)(CD))
Randomly joining pairs of lineages leads to a uniform distribution over the set of possible labeled histories.
The number of labeled histories possible for four taxa is
![Page 25: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/25.jpg)
A B C D
T2
T3
If the branch lengths of the species tree are sufficiently short, coalescences will occur more anciently than the species tree root.
B C DA
B C DA
B C DA
Combined
probability 1/9
Probability 1/18
![Page 26: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/26.jpg)
((AB)(CD)) 0.132((AC)(BD)) 0.094((AD)(BC)) 0.094(((AB)C)D) 0.125(((AB)D)C) 0.100(((AC)B)D) 0.070(((AC)D)B) 0.062(((AD)B)C) 0.032(((AD)C)B) 0.032(((BC)A)D) 0.070(((BC)D)A) 0.062(((BD)A)C) 0.032(((BD)C)A) 0.032(((CD)A)B) 0.032(((CD)B)A) 0.032
0.140.14
A B C D
Species tree
Gene tree frequency distribution
Matching gene tree
![Page 27: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/27.jpg)
T2 (units of N generations)
T3
Species tree is (((AB)C)D)
Most likely gene tree is not (((AB)C)D)
T2
T3
Species tree is (((AB)C)D) butmost likely gene tree is ((AB)(CD))
A species tree topology produces anomalous gene trees if branch lengths can be chosen so that the most likely gene tree topology differs from the species tree topology.
![Page 28: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/28.jpg)
A B C D
T2
T3
B C DA
B C DA
B C DA
Combined
probability 1/9
Probability 1/18
Does the 4-taxon symmetric species tree topology produce anomalous gene trees?
![Page 29: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/29.jpg)
• 3 species – no anomalous gene trees.
• 4 species – asymmetric but not symmetric species trees have AGTs.
• 5 or more species?
Probability of the concordant gene tree
Probability of a particular discordant gene tree
![Page 30: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/30.jpg)
B C DA B C DA E B D EA FC
For n > 4, suppose a species tree topology is not n-maximally probable.
If its branches are short enough, it produces AGTs that are n-maximally probable.
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
A labeled topology for n taxa is n-maximally probable if its probability under random branching is greater than or equal to that of any other labeled topology with n taxa.
Proof:
![Page 31: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/31.jpg)
Suppose a species tree topology is n-maximally probable.
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
Proof (continued):
For n > 8 an inductive argument reduces the problem to the case of n=5, 6, 7, or 8.
For n=5, 6, 7, or 8 taxa it remains to show that the n-maximally probable species tree topologies produce AGTs.
![Page 32: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/32.jpg)
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
Proof (continued):
For n=5 the n-maximally probable species tree topology produces AGTs.
![Page 33: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/33.jpg)
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
Proof (continued):
For n=5, 6, 7, or 8 the n-maximally probable species tree topologies produce AGTs.
![Page 34: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/34.jpg)
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
Proof (continued):
For n > 8 one of the two most basal subtrees has between 5 and n-1 taxa inclusive.
G H I J
Choose branch lengths to produce an AGT for that subtree, and make them long for the other subtree.
An inductive argument for n > 8 reduces the problem to the case of n=5, 6, 7, or 8.
![Page 35: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/35.jpg)
If the species tree topology is not n-maximally probable, it has maximally probable AGTs.
With 5 or more species, any species tree topology produces at least one anomalous gene tree.
Proof (summary):
For n > 8, induction reduces the problem to the case of n=5, 6, 7, or 8.
By example, n-maximally probable species tree topologies produce AGTs for n=5, 6, 7, or 8.
This completes the proof
![Page 36: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/36.jpg)
Some properties of anomalous gene trees
![Page 37: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/37.jpg)
Species tree
Gene tree
A B C D E
D E C A B
Anomalous gene trees can have the same unlabeled shape as the species tree
![Page 38: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/38.jpg)
There exist mutually anomalous sets of tree topologies (“wicked forests”).
![Page 39: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/39.jpg)
AGTs can occur if some but not all species tree branches are short
T4T3
T2
![Page 40: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/40.jpg)
T2 (units of N generations)
T3
Does the severity of AGTs increase with more taxa?
Maximal value for shared branch length
that still produces AGTs: 0.1568
![Page 41: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/41.jpg)
Does the severity of AGTs increase with more taxa?
![Page 42: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/42.jpg)
Number of AGTs for the 4-taxon asymmetric species tree
![Page 43: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/43.jpg)
Number of AGTs for 5-taxon species trees
![Page 44: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/44.jpg)
Does the number of AGTs increase with more taxa?
![Page 45: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/45.jpg)
What implications do gene tree probabilities have for phylogenetic
inference algorithms?
![Page 46: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/46.jpg)
• Most commonly observed gene tree topology
Statistically inconsistent in estimating the species tree
T3
T2
A B C D
T2 (units of N generations)
T3
A B C D
A B C D
Species tree Estimated species tree
![Page 47: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/47.jpg)
• Estimated gene tree of concatenated sequence
Statistically inconsistent in estimating the species tree
![Page 48: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/48.jpg)
• Maximum likelihood based on the frequency distribution of gene tree topologies
Statistically consistent even when anomalous gene trees exist
((AB)(CD)) 0.132((AC)(BD)) 0.094((AD)(BC)) 0.094(((AB)C)D) 0.125(((AB)D)C) 0.100(((AC)B)D) 0.070(((AC)D)B) 0.062(((AD)B)C) 0.032(((AD)C)B) 0.032(((BC)A)D) 0.070(((BC)D)A) 0.062(((BD)A)C) 0.032(((BD)C)A) 0.032(((CD)A)B) 0.032(((CD)B)A) 0.032
0.140.14
A B C D
Species tree
Gene tree frequency distribution
Matching gene tree
Anomalousgene tree
![Page 49: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/49.jpg)
• Consensus among gene tree topologies
-Majority rule consensus-Greedy consensus-Rooted triple consensus (R*)
![Page 50: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/50.jpg)
• Tree obtained by agglomeration using minimum pairwise coalescence times across a large number of loci (“Glass tree”)
![Page 51: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/51.jpg)
Summary
There exist algorithms for computing gene tree probabilities on species trees
The number of coalescent histories increases quickly - algorithmic improvements in gene tree probability computations are likely possible
HOWEVER, some algorithms can infer the correct species tree even when gene tree discordance is extreme
A species tree can disagree with the gene tree that it is most likely to produce
This severe discordance only gets worse with more taxa
![Page 52: Gene tree discordance and multi-species coalescent models](https://reader036.vdocuments.site/reader036/viewer/2022062409/56814655550346895db36cb6/html5/thumbnails/52.jpg)
Acknowledgments
David BryantMike DeGiorgioJames DegnanRanda Tao
National Science Foundation DEB-0716904