intro. to phylogenetic analysis slides modified by david ardell from caro-beth stewart, paul higgs,...

67
Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Upload: henry-jackson

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Intro. To Phylogenetic Analysis

Slides modified by David Ardell

From Caro-Beth Stewart, Paul Higgs,

Joe Felsenstein and Mikael Thollesson

Page 2: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

What is phylogenetic analysis and why should we perform it?

Phylogenetic analysis has two major components:

1. Phylogeny inference or “tree building” — evolutionary relationships between genes or species

2. Character and rate analysis —mapping information onto trees

Page 3: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Ancestral Node or ROOT of

the TreeInternal Nodes (represent hypothetical ancestors of

the taxa)

Branches or Lineages

Terminal Nodes

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Common Phylogenetic Tree Terminology

CLADE

Page 4: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

A

B

C

D

X and Y are defined to be more closely related to each other than to Z if, and only if, they share a more recent common ancestor than they do with Z

D C A BB A C D

Page 5: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

All of these rearrangements show the same evolutionary relationships between the taxa

B

A

C

D

A

B

D

C

B

C

A

D

B

D

A

C

B

AC

DRooted tree 1a

B

A

C

D

A

B

C

D

Page 6: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 7: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Taxon A

Taxon B

Taxon C

Taxon D

no meaning

Three types of trees

Cladogram

All show the same branching orders between taxa.

groupings

Page 8: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Taxon A

Taxon B

Taxon C

Taxon D

1

1

1

6

3

5

evolutionary distance

Taxon A

Taxon B

Taxon C

Taxon D

no meaning

Three types of trees

Cladogram Phylogram

All show the same branching orders between taxa.

groupings groupings + distance

Page 9: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Taxon A

Taxon B

Taxon C

Taxon D

1

1

1

6

3

5

Evolutionary distance

Taxon A

Taxon B

Taxon C

Taxon D

time

Taxon A

Taxon B

Taxon C

Taxon D

no meaning

Three types of trees

Cladogram Phylogram Ultrametric tree

All show the same branching orders between taxa.

groupings groupings + distance groupings + time

Page 10: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Similarity vs. Evolutionary Relationship:

Since taxa evolve at different rates, your closest relative could be very different

Taxon A

Taxon B

Taxon C (think lamprey)

Taxon D

1

1

1

6

3

5

C is closer to A but more closely relatedto B

This is why the closest BLAST hit is not necessarily the closest relative, and why you need to make trees.

Page 11: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Types of Similarity

Observed similarity between two entities can be due to:

Evolutionary relationship:Shared ancestral characters (‘plesiomorphies’)Shared derived characters (‘’synapomorphy’)

Homoplasy (independent evolution of the same character):Convergent events,Parallel events, Reversals

CC

G

G

C

C

G

G

CG

G C

C

G

GT

Page 12: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

A few examples of what can be inferred from phylogenetic trees built from DNA

or protein sequence data:

• Which species are the closest living relatives of modern humans?

• Did the infamous Florida Dentist infect his patients with HIV?

• What were the origins of specific transposable elements?

Page 13: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Which species are the closest living relatives of modern humans?

Classical view

Humans

Bonobos

Gorillas

Orangutans

Chimpanzees

MYA015-30

Page 14: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Which species are the closest living relatives of modern humans?

Molecular viewClassical view

MYA

Chimpanzees

OrangutansHumans

Bonobos

Gorillas Humans

Bonobos

GorillasOrangutans

Chimpanzees

MYA015-30 014

Page 15: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient BPatient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

From Ou et al. (1992) and Page & Holmes (1998)

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

Page 16: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Uses of character mapping:

• Dating adaptive evolutionary events

• Ancestral reconstruction

• Testing biological hypotheses of correlated function or change

Page 17: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Ex: Where geographically was thecommon ancestor of African apes and humans?

Eurasia = Black Africa = Red

= Dispersal

Modified from: Stewart, C.-B. & Disotell,T.R. (1998) Current Biology 8: R582-588.

Scenario B requires fourfewer dispersal events

OW Monkeys

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Chimpanzees

Humans

Gorillas

Orangutans

Gibbons

Ouranopithecus

Dryopithecus

Lufengpithecus

Living Species

Living + Fossil Species

Oreopithecus

Proconsul

OW Monkeys

OW Monkeys

Kenyapithecus

OW Monkeys

Kenyapithecus

Proconsul

Ouranopithecus

Dryopithecus

Lufengpithecus

Oreopithecus

Scenario A: Africa as species fountain Scenario B: Eurasia as ancestral homeland

Page 18: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Building Trees

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 19: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Building Trees

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 20: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Building Trees

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 21: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Types of data:

Character-data: Taxa Characters

Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG

Distance-based data: pairwise distances (dissimilarities)

A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----

Uncorrected“p” distance

Example 2: Kimura 2-parameter distance

Page 22: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 23: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 24: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Building Trees

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 25: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Parsimony

Given two trees, the one requiring the lowest number of character changes to explain the observations is the better

– Parsimony score for a tree is the minimum number of required changes

– This score is frequently referred to as number of steps or tree length

Page 26: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Parsimony – an example acgtatgga acgggtgca aacggtgga aactgtgca

: c

: c

: a

: a

: c

: c

: a

: a

: c

: a

: a

: c

Total tree length: 7 Total tree length: 8 Total tree length: 8

Page 27: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Building Trees

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 28: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Using modelsObserved differences

Actual changes

A G

C T

Q =

−3α α α α

α −3α α α

α α −3α α

α α α −3α

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

Example: Jukes-Cantor

pij =14

−14e−4αt

pij =14

+34e−4αt

, if i=j

, if i≠j

A C GC

A C G T

ACGT

Page 29: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson
Page 30: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 31: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 32: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 33: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 34: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

-55,0

-54,5

-54,0

-53,5

-53,0

-52,5

-52,0

-51,5

-51,0

-50,5

0 0,02 0,04 0,06 0,08 0,1

30 nucleotides from -globin genes of two primates on a one-edge tree * *

Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGGOrangutan GGACTCCTTGAGAAATAAACTGCACACTGG

There are two differences and 28 similarities

L =1

161−e−4αt( )

⎡ ⎣ ⎢

⎤ ⎦ ⎥

2 116

1+3e−4αt( )

⎡ ⎣ ⎢

⎤ ⎦ ⎥

28

t

lnL

t= 0.02327lnL= -51.133956

Likelihood of a one-branch tree…

Page 35: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

A recipe for phylogenetic inference

Collect your data Select an optimality criterion (“which tree is better?”, tree

score) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way

Page 36: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Finding the best tree

Number of (rooted) trees– 3 taxa -> 3 trees– 4 taxa -> 15 trees– 10 taxa -> 34 459 425 trees– 25 taxa -> 1,19·1030 trees– 52 taxa -> 2,75·1080 trees

Finding the optimal tree is an NP-complete problem

–Search strategiesExact

Exhaustive Branch and bound

Algorithmic Greedy algorithms, a.k.a.

hill-climbing (including Neighbor-joining)

Heuristic Systematic; branch-

swapping (NNI, SPR, TBR)

Stochastic – Markov Chain Monte

Carlo (MCMC)– Genetic algorithms

Page 37: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Completely unresolvedor "star" phylogeny

Partially resolvedphylogeny

Fully resolved,bifurcating phylogeny

A A A

B

B B

C

C

C

E

E

E

D

D D

Polytomy or multifurcation A bifurcation

“Star-Decomposition”

Page 38: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

There are three possible unrooted trees on four taxa (A, B, C, D)

A C

B D

Tree 1

A B

C D

Tree 2

A B

D C

Tree 3

Page 39: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

The number of unrooted trees increases in a greater than exponential manner with number of taxa

(2N - 5)!! = # unrooted trees for N taxa

CA

B D

A B

C

A D

B E

C

A D

B E

C

F

Page 40: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 41: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

What is a “good” method?

Efficiency Power Consistency Robustness Falsifiability

– Time to find a/the solution

– Rate of convergence/how much data are needed

– Convergence to “correct” solution as data are added

– Performance when assumptions are violated

– Rejection of the model when inadequate

Page 42: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 43: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Page 44: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson
Page 45: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

10 100 1000 10000 100000

Lakes invariants Parsimonny, uniform

UPGMA, Kimura NJ, Kimura

ML, Kimura Parsimony, weighted

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

10 100 1000 10000 100000

UPGMA, Kimura

NJ, percentage

Parsimony, uniform

Parsimony,weightedNJ, Kimura

ML, Kimura

Frequency of correct inference

Sequence length

All 0.50

0.30 and0.05 respectively

Performance on simulated data

Page 46: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

+ and – of the methods Pair-wise, NJ, distance approach

+ Fast (efficiency)

+ Models can be used to make distances (can be consistent)

– pairwise distances throw out information (loss of power)

– One will get a tree, but no score to compare with other trees or hypotheses

Parsimony and tree-search+ Philosophically appealing – Occam’s razor

– Can be inconsistent

– Can be computationally slow due to a huge number of possible trees Maximum likelihood and tree-search

+ Model-based, can be consistent, powerful, gain biological info

– Model-based, bad when you have the wrong model

– Computationally veeeeery slow due to heavy calculations in determining the tree score and a huge number of possible trees

Page 47: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

The quick and dirty, pretty good tree

Calculate model-based pairwise distances. Make a Neighbor-Joining Tree Do a bootstrap

Page 48: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

A recipe for phylogenetic inference

Collect your data Select an optimality criterion (“which tree is better”?) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way

Page 49: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Assessing the variation

Jackknife – resampling without replacement Bootstrap – resampling with replacement

Page 50: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Assessing the variation

Jackknife – resampling without replacement Bootstrap – resampling with replacement

1. Resample columns from an alignment with replacement to make a simulated sample of the same size

Page 51: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Assessing the variation

Jackknife – resampling without replacement Bootstrap – resampling with replacement

1. Resample columns from an alignment with replacement to make a simulated sample of the same size

2. Analyze this resampled dataset in the same way as you did the original sample

Page 52: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Assessing the variation

Jackknife – resampling without replacement Bootstrap – resampling with replacement

1. Resample columns from an alignment with replacement to make a simulated sample of the same size

2. Analyze this resampled dataset in the same way as you did the original sample

3. Repeat this 100+ times, making 100 bootstrap trees

Page 53: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Assessing the variation

Jackknife – resampling without replacement Bootstrap – resampling with replacement

1. Resample columns from an alignment with replacement to make a simulated sample of the same size

2. Analyze this resampled dataset in the same way as you did the original sample

3. Repeat this 100+ times, making 100 bootstrap trees

4. Summarize, for example, as a majority-rule consensus tree

5. Clades in 50% of the trees will be shown, need 70% to be called “weakly supported”

Page 54: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Aus C G A C G G T G G T C T A T A C A C G ABeus C G G C G G T G A T C T A T G C A C G GCeus T G G C G G C G T C T C A T A C A A T ADeus T A A C G A T G A C C C G A C T A T T G

Original data set with n characters.

2 3 13 8 3 19 14 6 20 20 7 1 9 11 17 10 6 14 8 16Aus G A A G A G T G A A T C G C A T G T G CBeus G G A G G G T G G G T C A C A T G T G CCeus G G A G G T T G A A C T T T A C G T G CDeus A A G G A T A A G G T T A C A C A A G T

Draw n characters randomly with re-placement. Repeat m times.

m pseudo-replicates, each with n characters.

Aus

Beus

Ceus

Deus

Original analysis, e.g. MP, ML, NJ.

Aus

Beus

Ceus

Deus

75%

Evaluate the results from the m analyses.

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Repeat original analysis on each of the pseudo-replicate data sets.

Bootstrap

NB! The consensus tree is not a phylogenetic hypothesis, but a way to summarize other trees – in this case bootstrapped trees

Page 55: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Rooting

To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A

BC

Root D

A B C D

RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.

Rooted tree

Unrooted tree

Page 56: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

Now, try it again with the root at another position:

A

BC

Root

D

Unrooted tree

Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.

C D

Root

Rooted tree

A

B

Page 57: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

An unrooted, four-taxon tree can be rooted in five different places

The unrooted tree 1:

A C

B D

Rooted tree 1d

C

D

A

B

4

Rooted tree 1c

A

B

C

D

3

Rooted tree 1e

D

C

A

B

5

Rooted tree 1b

A

B

C

D

2

Rooted tree 1a

B

A

C

D

1

Page 58: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Outgroup rooting: Uses taxa or sequences (the “outgroup”) known to fall outside all the others (the “ingroup”). Requires prior knowledge.

There are two major ways to root trees:

A

B

C

D

10

2

3

5

2

Midpoint rooting:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. Assumes clock-like evolution.

outgroup

d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9

Page 59: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

C-B Stewart, NHGRI lecture, 12/5/00

x =

CA

B D

A D

B E

C

A D

B E

C

F (2N - 3)!! = # unrooted trees for N taxa

Each unrooted tree theoretically can be rooted anywhere along any of its branches

Page 60: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

We have arrived at a tree – can we trust it as a good hypothesis of the phylogeny?

What can go wrong? Sampling error

– Assessed by - for example - the bootstrap Too superficial tree search

– Remember – finding the best tree is really hard– Systematic error (inconsistent method)– Tests of the adequacy of models used– Premeditated use of different methods

Reality– A tree may be a poor model of the real history– Information has been lost by subsequent evolutionary changes

“Species” vs. “gene” trees

Page 61: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Canis MusGadus

What is wrong with this tree?

Negligible (within sequence) sampling error

Tree estimated by a consistent method

100

100

Page 62: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Gene duplication

“Species” tree

“Gene” trees

The expected tree…

Page 63: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Canis Mus Gadus Gadus Mus Canis

Two copies (paralogs) present in the genomes

Paralogous

Orthologous Orthologous

Page 64: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Canis Gadus Mus

What we have studied…

Page 65: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Canis Gadus Mus

What we have studied…

Message: specific loss patterns of paralogs can disrupt species trees if we don’t know what is a paralogAnd what is an ortholog

Page 66: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

To conclude– Phylogenetic inference deals with historical events and information

transfer through time Results from phylogenetic analyses are hypotheses for further testing;

the true history will remain unknown Inference is mathematical intricate and computational heavy, and as a

result methods for phylogenetic inference are legio There are several pitfalls to avoid when doing the analyses and when

interpreting them But… Ignoring the shared histories can sometimes give completely

bogus results in comparative studies

Page 67: Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E))

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.