models of sequence evolution gtr hky jukes-cantor felsenstein k2p tree building methods: some...

24
A C T G Models of sequence evolution GTR HKY A C T G A C T G A C T G A C T G Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages

Upload: ira-cummings

Post on 13-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

A

C T

G

Models of sequence evolution

GTR

HKY

A

C T

G

A

C T

G

A

C T

G

A

C T

G

Jukes-Cantor

Felsenstein K2P

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 2: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

More models of sequence evolution …

Currently, there are more than 60 models described

- plus gamma distribution and invariable sites

- accuracy of models rapidly decreases for highly divergent sequences

- problem: more complicated models tend to be less accurate (and slower)

How to pick an appropriate model?

- use a maximum likelihood ratio test

- implemented in Modeltest 3.06 (Posada & Crandall, 1998)

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 3: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

More models of sequence evolution …

Example for Modeltest file

JC = 3369.2803

F81 = 3342.5513

K80 = 3294.6611

HKY = 3124.4182

TrNef = 3114.5491

TrN = 2993.6340

K81 = 2987.6548

K81uf = 2973.5620

TIMef = 2937.6196

TIM = 2932.9878

TVMef = 2930.3450

TVM = 2922.1970

SYM = 2921.3069

GTR = 2921.1187

A Equal base frequencies

Null model = JC -lnL0 = 3369.2803

Alternative model = F81 -lnL1 = 3342.5513

2(lnL1-lnL0) = 53.4580 df = 3

P-value = <0.000001

B

Model selected: TVM+G

-lnL = 2911.3660

C

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 4: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

More models of sequence evolution …

Amino acid sequences

- infinitely more complicated than nucleotide sequences

- some amino acids can replace one another with relatively little effect on the structure and function of the final protein while other replacements can be functionally devastating

- from the standpoint of the genetic code, some amino acid changes can be made by a single DNA mutation while others require two or even three changes in the DNA sequence

- in practice, what has been done is to calculate tables of frequencies of all amino acid replacements within families of related protein sequences in the databanks: i.e. PAM and BLOSSUM

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 5: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Phylogenetic Inference II

Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield!

Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community.

The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result.

Disclaimers

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 6: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

CS 177 Phylogenetics II

Tree building methods: some examples

Assessing phylogenetic data

Popular phylogenetic software packages

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 7: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

helix

sheet

Are there Correct trees??

Phylogenetic Inference II

Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets

Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid?

Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 8: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes;phenograms are based on overall similarity

Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data;cladograms are based on character evolution (e.g. shared derived characters)

Phenetics versus cladistics

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 9: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Tree building methods

Data type: genetic distance / character-state

Computational method: optimality criterion/clustering algorithmen

C lustering algorithmO ptim ality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARS I M ON Y

UPGM A

N EI GHBO R-M I N I MUM EVO LUTI O N

LEAS T S QUARES

Ch

arac

ters

J O I N I N G

COMPUTATI ONAL METHOD

FI TCH & MARG OLI AS H

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 10: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Tree building (distance based)

UPGMA

- The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages)

- Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 11: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

UPGMA

A B C D E F G

A -

B 63 -

C 94 79 -

D 111 96 47 -

E 67 16 83 100 -

F 23 58 89 106 62 -

G 107 92 43 20 96 102 -

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 12: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

UPGMA

A B C D E F G

A -

B 63 -

C 94 79 -

D 111 96 47 -

E 67 16 83 100 -

F 23 58 89 106 62 -

G 107 92 43 20 96 102 -

GD

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 13: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

A B C E F DG

A -

B 63 -

C 94 79 -

E 67 16 83 -

F 23 58 89 62 -

DG 94 84 35 88 94 -

UPGMA

GD C

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 14: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

A B E F CDG

A -

B 63 -

E 67 16 -

F 23 58 62 -

CDG 61 64 61 74 -

UPGMA

GD C A F

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 15: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

UPGMA

AF B E CDG

AF -

B 98 -

E 106 16 -

CDG 112 64 61 -

B EGD C A F

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 16: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

UPGMA

AF BE CDG

AF -

BE 188 -

CDG 112 108 -

B E GD C A F

Root

B EGD C A F

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 17: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Maximum Parsimony (MP)

outgroup a b c

AAA A AAA A

CCCC C

G G GG G GG G GG G G

T TTT TT

T

outgroup a b c

A AAA

ACG G GACG G GACG G GACG G G

T TTT TT

TC

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

- Parsimony involves evaluating all possible trees for each vertical column of sequence character (nucleotide position)

- only informative sites are considered

- each tree is given a score based on the number of evolutionary changes that are needed to explain the observed data

- finally, those trees that produce the smallest number of changes (shortest trees) overall for all sequence positions are identified

I

I I

I I I

A A T

outgroupT

A AT

outgroupT

aA

bA

cT

outgroupT

T A/TA/

T A/TA/

T A/TA/

I

I I

I I I

aA

bA

cT

outgroupT

aA

bA

cT

outgroupT

AAT

outgroupT

T A/TA/

T A/T A/TA/TA/

T A/TA/

T A/T A/TA/TA/

T A/TA/

T A/T A/TA/TA/

Page 18: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Maximum Likelihood (ML)

outgroup a b c

AAA A AAA A

CCCC C

G G GG G GG G GG G G

T TTT TT

T

outgroup a b c

A AAA

ACG G GACG G GACG G GACG G G

T TTT TT

TC

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

- Maximum Likelihood uses probability calculations based on a specific model of sequence evolution to find a tree that best accounts for the variation in a set of sequences

- all possible trees for each nucleotide position are considered

- the less mutations needed to fit a tree to the data, the more likely the tree

- ML resembles MP in that the tree with the least number of changes will be most likely

- however, ML evaluates trees using explicit evolutionary models

- thus, the method can be used to explore relationships among more diverse taxa

I

I I

I I I

A A T

outgroupT

A AT

outgroupT

aA

bA

cT

outgroupT

T A/TA/

T A/TA/

T A/TA/

I

I I

I I I

aA

bA

cT

outgroupT

aA

bA

cT

outgroupT

AAT

outgroupT

T A/TA/

T A/T A/TA/TA/

T A/TA/

T A/T A/TA/TA/

T A/TA/

T A/T A/TA/TA/

Page 19: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Computational methods for finding optimal trees

Possible evolutionary trees

2,027,02510

135,1359

10,3958

9547

1056

155

34

13

12

unrooted(2n-5)!/(2n-3(n-3)!)

Taxa (n)

30 3.58 x 1036

. .

.

. .

.

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 20: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Computational methods for finding optimal trees

Exact algorithms

- “Guarantee” to find the optimal or “best” tree for the method of choice

- Two types used in tree building:

Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method

Branch-and-bound search: Eliminates part of the tree that only contain suboptimal solutions

Heuristic algorithms

- Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so

- Often operate by “hill-climbing” methods

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 21: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Heuristic algorithms

Searchfor global minimum GLOBAL

MAXIMUM

GLOBALMINIMUM

localminimum

localmaximum

Searchfor globalmaximum

Heuristic search algorithms are input order dependent and can get stuck in local

minima or maxima

GLOBALMAXIMUM

GLOBALMINIMUM

Rerunning heuristic searches using different input orders of taxa can

help find global minima or maxima

From NHGRI lecture, C.-B. Stewart

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 22: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Assessing Phylogenetic Data

Most data includes potentially misleading evidence of relationships

One should not only construct phylogenetic hypotheses but should also assess what ‘confidence’ can be placed in these hypotheses

How much support is there for a particular clade?

Question:

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 23: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Assessing Phylogenetic Data

How much support is there for a particular clade?

Ochromonas

Symbiodinium

ProrocentrumLoxodesSpirostomumum

Tetrahymena

EuplotesTracheloraphis

Gruberia

71

26

1659

1621

Ochromonas

Symbiodinium

ProrocentrumLoxodesSpirostomumum

Tetrahymena

EuplotesTracheloraphis

Gruberia

71

59

Bootstrapping/Jack-knifing:

Lots of randomized data sets are produced by sampling the real data with replacement

(or in jackknifing, by removing some random proportion of the data);

Frequencies of occurrence of groups are a measure of support for those groups

- Bootstrap proportions aren’t easily interpretable

- no indication for how good the data are but simply for how well the tree fits the data

Problems:

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages

Page 24: Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic

Review available at: http://evolution.genetics.washington.edu/phylip/software.html

Popular phylogenetic software packages

Tree building methods:some examples

Assessing phylogenetic data

Popular phylogenetic packages