21 december 2007

16
21 December 2007 21 December 2007 Coalescent Coalescent Consequences for Consequences for Consensus Consensus Cladograms Cladograms J. H. Degnan J. H. Degnan 1 , M. Degiorgio , M. Degiorgio 2 , D. Bryant , D. Bryant 3 , and N. A. , and N. A. Rosenberg Rosenberg 1,2 1,2 1 1 Dept. of Human Genetics, U. of Michigan Dept. of Human Genetics, U. of Michigan 2 2 Bioinformatics Program, U. of Michigan Bioinformatics Program, U. of Michigan 3 3 Dept. of Mathematics, U. of Auckland Dept. of Mathematics, U. of Auckland

Upload: kim-park

Post on 14-Mar-2016

20 views

Category:

Documents


0 download

DESCRIPTION

21 December 2007. Coalescent Consequences for Consensus Cladograms. J. H. Degnan 1 , M. Degiorgio 2 , D. Bryant 3 , and N. A. Rosenberg 1,2 1 Dept. of Human Genetics, U. of Michigan 2 Bioinformatics Program, U. of Michigan 3 Dept. of Mathematics, U. of Auckland. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 21 December 2007

21 December 200721 December 2007

Coalescent Coalescent Consequences for Consequences for Consensus Consensus CladogramsCladograms

J. H. DegnanJ. H. Degnan11, M. Degiorgio, M. Degiorgio22, D. Bryant, D. Bryant33, and N. A. Rosenberg, and N. A. Rosenberg1,21,2

1 1 Dept. of Human Genetics, U. of Michigan Dept. of Human Genetics, U. of Michigan 2 2 Bioinformatics Program, U. of MichiganBioinformatics Program, U. of Michigan3 3 Dept. of Mathematics, U. of AucklandDept. of Mathematics, U. of Auckland

Page 2: 21 December 2007

OutlineOutline Species trees vs. gene treesSpecies trees vs. gene trees Consensus tree backgroundConsensus tree background Asymptotic consensus trees Asymptotic consensus trees Finite sample consensus treesFinite sample consensus trees Consistency resultsConsistency results ConclusionsConclusions

Page 3: 21 December 2007

Gene trees vary across the genomeGene trees vary across the genome

Page 4: 21 December 2007

Why? Incomplete lineage sorting, Why? Incomplete lineage sorting, horizontal gene transfer, sampling, etc.horizontal gene transfer, sampling, etc.

Page 5: 21 December 2007

Gene tree discordanceGene tree discordance From one true species tree, we expect there to From one true species tree, we expect there to

be different gene trees at different loci as a be different gene trees at different loci as a result of lineage sorting, independently of result of lineage sorting, independently of problems due to estimation or sampling error.problems due to estimation or sampling error.

Gene tree discordance depends especially on Gene tree discordance depends especially on branch lengths in the species tree, measured branch lengths in the species tree, measured by the number of generations scaled by by the number of generations scaled by effective population size, effective population size, t / Nt / N..

Page 6: 21 December 2007

x=2, y=1.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

x=y=0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

GT:(((A

,B),C

),D)

GT:(((A

,B),D

),C)

GT:(((A

,C),B

),D)

GT:(((A

,C),D

),B)

GT:(((A

,D),B

),C)

GT:(((A

,D),C

),B)

GT:(((B

,C),A

),D)

GT:(((B

,C),D

),A)

GT:(((B

,D),A

),C)

GT:(((B

,D),C

),A)

GT:(((C

,D),A

),B)

GT:(((C

,D),B

),A)

GT:((A,B

),(C,D

))

GT:((A,C

),(B,D

))

GT:((A,D

),(B,C

))

Page 7: 21 December 2007

Consensus (majority-rule)Consensus (majority-rule)

Page 8: 21 December 2007

Types of consensus treesTypes of consensus trees Strict—only clades that are included in observed trees are in the Strict—only clades that are included in observed trees are in the

consensus tree. In the coalescent model, all clades have probability > 0.consensus tree. In the coalescent model, all clades have probability > 0.

Democratic vote—use the gene tree that occurs most frequently.Democratic vote—use the gene tree that occurs most frequently.

Majority rule—consensus tree has all clades that were observed in > 50% Majority rule—consensus tree has all clades that were observed in > 50% of trees.of trees.

Greedy—sort clades by their proportions. Accept the most frequently Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree.clades. Do this until you have a fully resolved tree.

R*—for each set of 3 taxa, find the most commonly occurring triple e.g., R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triples. triples.

Page 9: 21 December 2007

Asymptotic consensus Asymptotic consensus treestrees

Consensus trees are usually Consensus trees are usually statisticsstatistics, functions of , functions of data like x-bar.data like x-bar.

We consider replacing observed (estimated) gene We consider replacing observed (estimated) gene trees with their theoretical probabilities under trees with their theoretical probabilities under coalescence and determining the resulting consensus coalescence and determining the resulting consensus tree. tree.

Motivation: if there are a large number of independent Motivation: if there are a large number of independent loci, observed gene tree and clade proportions should loci, observed gene tree and clade proportions should approximate their theoretical probabilities.approximate their theoretical probabilities.

Page 10: 21 December 2007

Tree/Clade Probability Examples

x = y = 0.1 x = y = 0.05((AB)(CD)) p1 0.128 0.121 ((AC)(BD)) p2 0.099 0.105((AD)(BC)) p3 0.099 0.105(((AB)C)D) p4 0.104 0.079(((AB)D)C) p5 0.091 0.075(((AC)B)D) p6 0.066 0.061(((AC)D)B) p7 0.062 0.060(((AD)B)C) p8 0.037 0.045(((AD)C)B) p9 0.037 0.045 (((BC)A)D) p10 0.066 0.061(((BC)D)A) p11 0.062 0.060(((BD)A)C) p12 0.037 0.045(((BD)C)A) p13 0.037 0.045(((CD)A)B) p14 0.037 0.045(((CD)B)A) p15 0.037 0.045

{AB} p1 + p4 + p5 0.332 (1) 0.275 (1) {AC} p2 + p6 + p7 0.227 (2) 0.226 (2){AC} p2 + p6 + p7 0.227 (2) 0.226 (2) {AD} p3 + p8 + p9 0.173 (6) 0.189 (7){AD} p3 + p8 + p9 0.173 (6) 0.189 (7) {BC} p3 + p10 + p11 0.226 (3) 0.226 (2){BC} p3 + p10 + p11 0.226 (3) 0.226 (2) {BD} p2 + p12 + p13 0.173 (6) 0.195 (6){BD} p2 + p12 + p13 0.173 (6) 0.195 (6) {CD} p1 + p14 + p15 0.202 (5) 0.211 (4) {ABC} p4 + p10 + p14 0.215 (4) 0.201 (5) {ABD} p5 + p8 + p12 0.165 (8) 0.165 (8) {ACD} p7 + p9 + p14 0.136 (9) 0.150 (9) {ACD} p7 + p9 + p14 0.136 (9) 0.150 (9) {BCD} p11 + p13 + p15 0.136 (9) 0.150 (9){BCD} p11 + p13 + p15 0.136 (9) 0.150 (9)

Greedy Tree (((AB)C)D) ((AB)(CD))Greedy Tree (((AB)C)D) ((AB)(CD))

Page 11: 21 December 2007

Tree/Triple Probability Examples x = y = 0.1 x = y = 0.05

((AB)(CD)) p1 0.128 0.121 ((AC)(BD)) p2 0.099 0.105((AD)(BC)) p3 0.099 0.105(((AB)C)D) p4 0.104 0.079(((AB)D)C) p5 0.091 0.075(((AC)B)D) p6 0.066 0.061(((AC)D)B) p7 0.062 0.060(((AD)B)C) p8 0.037 0.045(((AD)C)B) p9 0.037 0.045 (((BC)A)D) p10 0.066 0.061(((BC)D)A) p11 0.062 0.060(((BD)A)C) p12 0.037 0.045(((BD)C)A) p13 0.037 0.045(((CD)A)B) p14 0.037 0.045(((CD)B)A) p15 0.037 0.045

(AB)C* p1 + p4 + p5 + p8 + p12 0.397 0.365 (AC)B p2 + p6 + p7 + p9 + p14 0.301 0.316 (AB)D* p1 + p4 + p5 + p6 + p10 0.455 0.397 (AD)B p3 + p7 + p8 + p9 + p14 0.272 0.391 (AC)D* p2 + p4 + p6 + p7 + p10 0.397 0.366 (AD)C p3 + p5 + p8 + p9 + p12 0.301 0.315 (BC)D* p3 + p4 + p6 + p10 + p11 0.397 0.366 (BD)C p2 + p5 + p8 + p12 + p13 0.301 0.315 R* Tree (((AB)C)D) (((AB)C)D)

Page 12: 21 December 2007

Unresolved zone for majority-rule Unresolved zone for majority-rule and too-greedy zoneand too-greedy zone

Page 13: 21 December 2007

What about finite samples?What about finite samples?

If you sample 10 loci, you could have:If you sample 10 loci, you could have: All 10 match the species treeAll 10 match the species tree 9 match the species tree, 1 disagrees9 match the species tree, 1 disagrees 8 match the species tree, 2 disagree, etc.8 match the species tree, 2 disagree, etc.

You can consider gene trees as You can consider gene trees as categories categories and use and use multinomialmultinomial probabilities for the probability of your sampleprobabilities for the probability of your sample

samples

knk

n

kk TnncIpp

nnnTnnc k )),,((

!!!]),,(Pr[ 11

11

1

Page 14: 21 December 2007
Page 15: 21 December 2007

Are consensus trees inconsistent Are consensus trees inconsistent estimators of species trees?estimators of species trees?

Theorem 1Theorem 1. Majority-rule asymptotic . Majority-rule asymptotic consensus trees (MACTs) do not have any consensus trees (MACTs) do not have any clades not on the species tree.clades not on the species tree.

Theorem 2Theorem 2. Greedy asymptotic consensus . Greedy asymptotic consensus trees (GACTs) can be misleading estimators of trees (GACTs) can be misleading estimators of species trees for the 4-taxon asymmetric tree species trees for the 4-taxon asymmetric tree and for any species tree with and for any species tree with nn > 4 species. > 4 species.

Theorem 3Theorem 3. R* asymptotic consensus trees . R* asymptotic consensus trees (RACTs) always match the species tree.(RACTs) always match the species tree.

Page 16: 21 December 2007

ConclusionsConclusions Coalescent gene tree probabilities are useful for Coalescent gene tree probabilities are useful for

understanding asymptotic behavior of consensus trees understanding asymptotic behavior of consensus trees constructed from independent gene trees.constructed from independent gene trees.

Greedy consensus trees can be misleading, but are Greedy consensus trees can be misleading, but are typically quicker to approach the species tree than typically quicker to approach the species tree than majority-rule or R* when outside of the greedy zone.majority-rule or R* when outside of the greedy zone.

R* consensus trees are consistent and more resolved R* consensus trees are consistent and more resolved than majority-rule consensus trees.than majority-rule consensus trees.