week 5: distance methods, dna and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3...

34
Week 5: Distance methods, DNA and protein models Genome 570 February, 2012 Week 5: Distance methods, DNA and protein models – p.1/34

Upload: others

Post on 31-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Week 5: Distance methods, DNA and protein models

Genome 570

February, 2012

Week 5: Distance methods, DNA and protein models – p.1/34

Page 2: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Kimura 2-parameter model

A G

C T

α

α

β β β β

The simplest, most symmetrical model that has different rates fortransitions than for transversions.

Week 5: Distance methods, DNA and protein models – p.2/34

Page 3: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Parameters of K2P in terms of transition/transversion ratio

If we let R be the expected ratio of transition changes to transversions, itturns out that

α = RR+1

β =(

12

)1

R+1

Week 5: Distance methods, DNA and protein models – p.3/34

Page 4: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Transition [sic] probabilities for K2P

Computing the net probabilities of change (“transition” in the stochastic

process sense rather than as molecular biologists use it), it turns out that

Prob (transition|t) = 14− 1

2exp

(− 2R+1

R+1t)

+ 14exp

(− 2

R+1t)

Prob (transversion|t) = 12− 1

2exp

(− 2

R+1t).

Week 5: Distance methods, DNA and protein models – p.4/34

Page 5: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Transition and transversion when R = 10

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Total differences

Transitions

Transversions

Time (branch length)

Dif

fere

nc

es

R = 10

Week 5: Distance methods, DNA and protein models – p.5/34

Page 6: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Transition and transversion when R = 2

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Transversions

Transitions

Total differences

Dif

fere

nc

es

Time (branch length)

R = 2

Week 5: Distance methods, DNA and protein models – p.6/34

Page 7: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

ML estimates for the K2P model

The sufficient statistics for comparison of DNA sequences under the K2Pmodel are simply the observed fractions P and Q of transitions andtransversions. Then the maximum likelihood estimates of the branchlength t and of R are obtained by finding the values that predict exactly

those fractions. That done by solving the equations given three slides ago.

t = − 14ln

[(1 − 2Q)(1 − 2P − Q)2

]

R = − ln(1−2P−Q)− ln(1−2Q) − 1

2

Week 5: Distance methods, DNA and protein models – p.7/34

Page 8: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Likelihood for two species under the K2P model

L = Prob (data | t,R)

=(

14

)n(1 − P − Q)n−n1−n2 Pn1

(12Q

)n2

where n1 is the number of sites differing by transitions, and n2 is the

number of sites differing by transversions. P and Q are the expected

fractions of transition and transversion differences, as given by the

expressions four screens above.

This is a function of the two parameters t and R . The values thatmaximize it were given above, if we estimate both of them. But if R is

given rather than estimated, there is no closed-form equation solving fort, it has to be inferred numerically by finding the value of t that maximizesL.

Week 5: Distance methods, DNA and protein models – p.8/34

Page 9: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The Tamura/Nei model, F84, and HKY

To : A G C TFrom :

A − αRπG/πR + βπG βπC βπT

G αRπA/πR + βπA − βπC βπT

C βπA βπG − αYπT/πY + βπT

T βπA βπG αYπC/πY + βπC −

For the F84 model, αR = αY

For the HKY model, αR/αY = πR/πY

Week 5: Distance methods, DNA and protein models – p.9/34

Page 10: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Transition/transversion ratio for the Tamura-Nei model

Ts = 2 αR πA πG / πR + 2 αY πC πT / πY

+ β ( πAπG + 2πC πT)

Tv = 2 β πR πY

To get Ts/Tv = R and Ts + Tv = 1,

β =1

2πRπY(1 + R)

αY =πRπYR − πAπG − πCπT

(1 + R) (πYπAπGρ + πRπCπT)

αR = ρ αY

Week 5: Distance methods, DNA and protein models – p.10/34

Page 11: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Using fictional events to mimic the Tamura-Nei model

We imagine two types of events:

Type I:

If the existing base is a purine, draw a replacement from apurine pool with bases in relative proportions πA : πG . Thisevent has rate αR.

If the existing base is a pyrimidine, draw a replacement from apyrimidine pool with bases in relative proportions πC : πT .This event has rate αY.

Type II: No matter what the existing base is, replace it by a base

drawn from a pool at the overall equilibrium frequencies:πA : πC : πG : πT. This event has rate β.

Week 5: Distance methods, DNA and protein models – p.11/34

Page 12: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Transition [sic] probabilities with the Tamura-Nei model

If the branch starts with a purine:

No events exp(−(αR + β)t)

Some type I, no type II exp(−β t) (1 − exp(−αR t))

Some type II 1 − exp(−β t)

If the branch starts with a pyrimidine:

No events exp(−(αY + β) t)

Some type I, no type II exp(−β t) (1 − exp(−αY t))

Some type II 1 − exp(−β t)

Week 5: Distance methods, DNA and protein models – p.12/34

Page 13: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

A transition probability

So if we want to compute the probability of getting a G given that a branchstarts with an A, we add up

The probability of no events, times 0 (as you can’t get a G from an A

with no events)

The probability of “some type I, no type II” times πG/πY (as the last

type I event puts in a G with probability equal to the fraction of G’sout of all purines).

The probability of “some type II” times πG (as if there is any type IIevent, we thereafter have a probability of G equal to its overallexpected frequency, and further type I events don’t change that).

Week 5: Distance methods, DNA and protein models – p.13/34

Page 14: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

A transition probability

So that, for example

Prob (G|A, t) =

exp(−β t) (1 − exp(−αR t)) πG

πR

+ (1 − exp(−β t)) πG

Week 5: Distance methods, DNA and protein models – p.14/34

Page 15: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

A more compact expression

More generally, we can use the Kronecker delta notation δij and the

“Watson-Kronecker” notation εij to write

Prob (j | i, t) =

exp(−(αi + β)t) δij

+exp(−βt) (1 − exp(−αit))(

πjεijP

k εjkπk

)

+(1 − exp(−βt)) πj

where δij is 1 if the two bases i and j are different (0 otherwise),

and εij is 1 if, of the two bases i and j, one is a purine and one is a

pyrimidine (0 otherwise).

Week 5: Distance methods, DNA and protein models – p.15/34

Page 16: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Reversibility and the GTR model

The condition of reversibility in a stochastic process is written in terms ofthe equilibrium frequences of the states ( πi and the transitionprobabilities as

πi Prob (j|i, t) = πj Prob (i|j, t)

The general time-reversible model:

To : A G C TFrom :

A − πG α πC β πT γG πA α − πC δ πT εC πA β πG δ − πT ηT πA γ πG ε πC η −

Week 5: Distance methods, DNA and protein models – p.16/34

Page 17: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The GTR model

A G

C T

βπβπ

γπαπ

απ

δπ

δπεπ

επ

ηπ

G

A

T

γπAA

C

G

C

T

G

T

ηπC

It is the most general model possible that still has the changes satisfy the

condition of reversibility – so that one cannot tell which direction evolutionhas gone by examining the sequences before and after.

Week 5: Distance methods, DNA and protein models – p.17/34

Page 18: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Standardizing the rates

2πA πG α + 2 πA πC β + 2 πA πT γ

+ 2 πG πC δ + 2 πG πT ε + 2 πC πT η = 1

This is done so that the expected rate of change is 1 per unit time.

Week 5: Distance methods, DNA and protein models – p.18/34

Page 19: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

General Time Reversible models – inference

A data example (simulated under a K2P model, true distance 0.2transition/transversion ratio = 2

A G C T total

A 93 13 3 3 112

G 10 105 3 4 122

C 6 4 113 18 141

T 7 4 21 93 125

total 116 126 140 118 500

The K2P model is a special case of the GTR model (as are all the othermodels that have been mentioned here). So if all goes well we shouldinfer parameters that come close to specifying a K2P model.

Week 5: Distance methods, DNA and protein models – p.19/34

Page 20: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Averaging across the diagonal ...

A G C T total

A 93 11.5 4.5 5 114

G 11.5 105 3.5 4 124

C 4.5 3.5 113 19.5 140.5T 5 4 19.5 93 121.5

total 114 124 140.5 121.5 500

Week 5: Distance methods, DNA and protein models – p.20/34

Page 21: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Dividing each column by its sum

(column, because Pij is to be the probability of change from j to i )

P =

0.815789 0.0927419 0.0320285 0.04115230.100877 0.846774 0.024911 0.03292180.0394737 0.0282258 0.80427 0.1604940.0438596 0.0322581 0.13879 0.765432

Week 5: Distance methods, DNA and protein models – p.21/34

Page 22: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Rate matrix from the matrix logarithm

If the rate matrix is A,

P = eA t

so that

At = log(P

)

=

−0.212413 0.110794 0.034160 0.0467260.120512 −0.174005 0.025043 0.0355540.0421002 0.028375 −0.236980 0.2055790.0498001 0.034837 0.177778 −0.287859

.

Week 5: Distance methods, DNA and protein models – p.22/34

Page 23: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Standardizing the rates

If we denote by D the diagonal matrix of observed base frequencies, and

we require that the rate of (potentially-observable) substitution is 1:

−trace(AD) = 1

We get:

t = −trace(AtD) = −trace(log(P)D

)

and that also gives us an estimate of the rate matrix:

A = log(P

) /− trace

(log(P)D

)

Week 5: Distance methods, DNA and protein models – p.23/34

Page 24: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The rate estimates

A =

−0.931124 0.485671 0.149741 0.2048260.528274 −0.762764 0.109776 0.1558520.184549 0.124383 −1.038820 0.9011680.218302 0.152710 0.779302 −1.261850

.

moderately close to the actual K2P rate matrix used in the simulationwhich was:

A =

−1 2/3 1/6 1/62/3 −1 1/6 1/61/6 1/6 −1 2/31/6 1/6 2/3 −1

(but if any of the eigenvalues of log(P) are negative, this doesn’t workand the divergence time is estimated to be infinite).

Week 5: Distance methods, DNA and protein models – p.24/34

Page 25: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The lattice of these models

General 12−parameter model (12)

General time−reversible model (9)

Kimura K2P (2)

Tamura−Nei (6)

HKY (5) F84 (5)

Jukes−Cantor (1)

There are a great many other models, but these are the most commonly

used. They allow for unequal expected frequencies of bases, and forinequalities of transitions and transversions.

Week 5: Distance methods, DNA and protein models – p.25/34

Page 26: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Variation of rates of evolution across sites

The basic way all these models deal with rates of evolution is using the

fact that a rate r times as fast is exactly equivalent to evolving along abranch that is r times as long, so that it is easy to calculate the transitionprobabilities with rate r:

Pij(r, t) = Pij(r t)

The likelihood for a pair of sequences averages over all rates at each site,

using the density function f(r) for the distribution of rates:

L(t) =sites∏

i=1

(∫∞

0

f(r) πniPmini

(r t) dr

)

Week 5: Distance methods, DNA and protein models – p.26/34

Page 27: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The Gamma distribution

f(r) =1

Γ(α) βαxα−1 e−

E[x] = α β

Var[x] = α β2

To get a mean of 1, set β = 1/α so that

f(r) =αα

Γ(α)rα−1 e−αr.

so that the squared coefficient of variation is 1/α.

Week 5: Distance methods, DNA and protein models – p.27/34

Page 28: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Gamma distributions

Here are density functions of Gamma distributions for three differentvalues of α:

α = 1/4

CV = 2

α = 1

CV = 1

α = 4

CV = 1/2

0.5 1.5 2.5rate

1 2 30

Week 5: Distance methods, DNA and protein models – p.28/34

Page 29: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Gamma rate variation in the Jukes-Cantor model

For example, for the Jukes-Cantor distance, to get the fraction of sites

different we do

DS =

∫∞

0

f(r)3

4

(1 − e−

43r ut

)dr

leading to the formula for D as a function of DS

D = −3

[1 −

(1 −

4

3DS

)−1/α

]

Week 5: Distance methods, DNA and protein models – p.29/34

Page 30: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Gamma rate variation in other models

For many other distances such as the Tamura-Nei family, the transitionprobabilites are of the form

Pij(t) = Aij + Bij e−bt + Cij e−ct

and integrating termwise we can make use of the fact that

Er

[e−b r t

]=

(1 +

1

αb t

)−α

and just use that to replace e−bt in the formulas for the transitionprobabilities.

Week 5: Distance methods, DNA and protein models – p.30/34

Page 31: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Dayhoff’s PAM001 matrix

A R N D C Q E G H I L K M F P S T Wala arg asn asp cys gln glu gly his ile leu lys met phe pro ser thr trp

A ala 9867 2 9 10 3 8 17 21 2 6 4 2 6 2 22 35 32 0R arg 1 9913 1 0 1 10 0 0 10 3 1 19 4 1 4 6 1 8N asn 4 1 9822 36 0 4 6 6 21 3 1 13 0 1 2 20 9 1D asp 6 0 42 9859 0 6 53 6 4 1 0 3 0 0 1 5 3 0C cys 1 1 0 0 9973 0 0 0 1 1 0 0 0 0 1 5 1 0Q gln 3 9 4 5 0 9876 27 1 23 1 3 6 4 0 6 2 2 0E glu 10 0 7 56 0 35 9865 4 2 3 1 4 1 0 3 4 2 0G gly 21 1 12 11 1 3 7 9935 1 0 1 2 1 1 3 21 3 0H his 1 8 18 3 1 20 1 0 9912 0 1 1 0 2 3 1 1 1I ile 2 2 3 1 2 1 2 0 0 9872 9 2 12 7 0 1 7 0L leu 3 1 3 0 0 6 1 1 4 22 9947 2 45 13 3 1 3 4K lys 2 37 25 6 0 12 7 2 2 4 1 9926 20 0 3 8 11 0M met 1 1 0 0 0 2 0 0 0 5 8 4 9874 1 0 1 2 0F phe 1 1 1 0 0 0 0 1 2 8 6 0 4 9946 0 2 1 3P pro 13 5 2 1 1 8 3 2 5 1 2 2 1 1 9926 12 4 0S ser 28 11 34 7 11 4 6 16 2 2 1 7 4 3 17 9840 38 5T thr 22 2 13 4 1 3 2 2 1 11 2 8 6 1 5 32 9871 0W trp 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 9976Y tyr 1 0 3 0 3 0 1 0 4 1 1 0 0 21 0 1 1 2V val 13 2 1 1 3 2 2 3 3 57 11 1 17 1 3 2 10 0

Week 5: Distance methods, DNA and protein models – p.31/34

Page 32: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

The codon model

U C A G

U

C

A

G

phe

phe

leu

leu

leu

leu

leu

leu

ile

ile

ile

met

val

val

val

val

ser stop stop

U

C

C

U

U

C

A

G

A

G

A

G

U

C

A

G

UUU

UUC

UUA

UUG

CUU

CUC

CUA

CUG

AUU

AUC

AUA

AUG

GUU

GUC

GUA

GUG

UCA UAA UGA

changing, and to what

Probabilities of change vary depending

on whether amino acid is

Goldman & Yang, MBE 1994; Muse and Weir, MBE 1994

Week 5: Distance methods, DNA and protein models – p.32/34

Page 33: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

A codon-based model of protein evolution

In each cell:

Pij ij(v) a

ijP is the probability of codon change

and is the probability that the change is accepted

where (v)

ij a

....

....

AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC AGG

AAA

AAC

AAG

AAT

ACA

ACC

ACG

ACT

AGA

AGC

AGG

0

1

0

1

0

0

0

0

0

0

0

lys

lys

asn

asn

thr

thr

thr

thr

arg

ser

arg

observation:

Week 5: Distance methods, DNA and protein models – p.33/34

Page 34: Week 5: Distance methods, DNA and protein modelsevolution.gs.washington.edu/gs570/2012/week5.pdf2/3 −1 1/6 1/6 1/6 1/6 −1 2/3 1/6 1/6 2/3 −1 (but if any of the eigenvalues of

Considerations for a protein model

Making a model for protein evolution (a not-very-practical approach)

Use a good model of DNA evolution.

Use the appropriate genetic code.

When an amino acid changes, accept it with probability that declines

as the amino acids become more different.

Fit this to empirical information on protein evolution.

Take into account variation of rate from site to site.

Take into account correlation of rates in adjacent sites.

How about protein structure? Secondary structure? 3D structure?

(the first four steps are the “codon model” of Goldman and Yang, 1994and Muse and Gaut, 1994, both in Molecular Biology and Evolution. Thenext two are the rate variation machinery of Yang, 1995, 1996 and

Felsenstein and Churchill, 1996).

Week 5: Distance methods, DNA and protein models – p.34/34