dayhoff’s markov model of evolution

21
Dayhoff’s Markov Model of Evolution

Upload: taariq

Post on 06-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited. P(B|A) = 2/7. Brand A. Brand B. P(A|B) = 2/7. Brands of Soup Revisited. Transition Diagram. P(B|A) = p = 2/7. Brand A. Brand B. P(A|B) = p = 2/7. Conditional Probability Formulas. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dayhoff’s Markov Model of Evolution

Dayhoff’s Markov Modelof Evolution

Page 2: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = 2/7

P(A|B) = 2/7

Page 3: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = p = 2/7

P(A|B) = p = 2/7

P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1)

P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1)

Transition Diagram

Conditional Probability Formulas

Page 4: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = p = 2/7

P(A|B) = p = 2/7

P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1)

P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1)

Transition Diagram

Conditional Probability Formulas

Matrix Representation

Page 5: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = p = 2/7

P(A|B) = p = 2/7

P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1)

P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1)

Transition Diagram

Conditional Probability Formulas

Matrix Representation

Page 6: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = p = 2/7

P(A|B) = p = 2/7

P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1)

P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1)

Transition Diagram

Conditional Probability Formulas

Matrix Representation

Page 7: Dayhoff’s Markov Model of Evolution

Brands of Soup Revisited

Brand A Brand B

P(B|A) = p = 2/7

P(A|B) = p = 2/7

P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1)

P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1)

Transition Diagram

Conditional Probability Formulas

Matrix Representation

Page 8: Dayhoff’s Markov Model of Evolution

Markov Processes Can Be Represented by Matrices

e.g., a 3-state process:

1/2

1/31/4Can be represented with this matrix:

Page 9: Dayhoff’s Markov Model of Evolution

Each Step Involves an Inner Product

Page 10: Dayhoff’s Markov Model of Evolution

Each Step Involves an Inner Product

Page 11: Dayhoff’s Markov Model of Evolution

Markov Matrix Properties• Sum of probabilities in a row must be 1• No change = diagonal matrix• If well-behaved*, multiplying the matrix by itself

many times converges to a limit– This limit matrix has identical column elements– The rows of the limit matrix are the “equilibrium

probabilities” for the process

*(1) Every state can transition to every other state at least indirectly, and (2) the least common denominator of any cycle in the transition diagram is 1

Page 12: Dayhoff’s Markov Model of Evolution

Ask Mathematica! Recall m =

Page 13: Dayhoff’s Markov Model of Evolution

Margaret Dayhoff• Had a large (for 1978)

database of related proteins

DAYHOFF, M. O., R. M. SCHWARTZ, and B. C. ORCUTT. 1978. A model of evolutionary change in proteins.(pp 345-352 in M. 0. DAYHOFF, ed. Atlas of protein sequence and structure.Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C.)

• Asked “what is the probability that two aligned sequences are related by evolution?”

Page 14: Dayhoff’s Markov Model of Evolution

Dayhoff Model• Amino acids change over time independently

of their position in a protein. (simplifying assumption)

• The probability of a substitution depends only on the amino acids involved and not on the prior history (Markov model).

Page 15: Dayhoff’s Markov Model of Evolution

A Sequence Alignment

>gi|1173266|sp|P44374|RS5_HAEIN 30S ribosomal protein S5 Length = 166

Score = 263 bits (672), Expect = 1e-70 Identities = 154/166 (92%), Positives = 159/166 (95%)

Query: 1 MAHIEKQAGELQEKLIAVNRVSKTVKGGRIFSFTALTVVGDGNGRVGFGYGKAREVPAAI 60 M++IEKQ GELQEKLIAVNRVSKTVKGGRI SFTALTVVGDGNGRVGFGYGKAREVPAAISbjct: 1 MSNIEKQVGELQEKLIAVNRVSKTVKGGRIMSFTALTVVGDGNGRVGFGYGKAREVPAAI 60

Query: 61 QKAMEKARRNMINVALNNGTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV 120 QKAMEKARRNMINVALN GTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGVSbjct: 61 QKAMEKARRNMINVALNEGTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV 120

Query: 121 HNVLAKAYGSTNPINVVRATIDGLENMNSPEMVAAKRGKSVEEILG 166 NVL+KAYGSTNPINVVRATID L NM SPEMVAAKRGK+V+EILGSbjct: 121 RNVLSKAYGSTNPINVVRATIDALANMKSPEMVAAKRGKTVDEILG 166

(Example alignment from a BLAST search)

Page 16: Dayhoff’s Markov Model of Evolution

Observed Substitution FrequenciesA                                      

R 30                                    

N 109 17                                  

D 154 0 532                                

C 33 10 0 0                              

Q 93 120 50 76 0                            

E 266 0 94 831 0 422                          

G 579 10 156 162 10 30 112                        

H 21 103 226 43 10 243 23 10                      

I 66 30 36 13 17 8 35 0 3                    

L 95 17 37 0 0 75 15 17 40 253                  

K 57 477 322 85 0 147 104 60 23 43 39                

M 29 17 0 0 0 20 7 7 0 57 207 90              

F 20 7 7 0 0 0 0 17 20 90 167 0 17            

P 345 67 27 10 10 93 40 49 50 7 43 43 4 7          

S 772 137 432 98 117 47 86 450 26 20 32 168 20 40 269        

T 590 20 169 57 10 37 31 50 14 129 52 200 28 10 73 696      

W 0 27 3 0 0 0 0 0 3 0 13 0 0 10 0 17 0    

Y 20 3 36 0 30 0 10 0 40 13 23 10 0 260 0 22 23 6  

V 365 20 13 17 33 27 37 97 30 661 303 17 77 10 50 43 186 0 17

  A R N D C Q E G H I L K M F P S T W Y

Page 17: Dayhoff’s Markov Model of Evolution

Building a Markov Model• From the observed substitution data, Dayhoff et al. were able to estimate the joint probabilities of two amino acids substituting for eachother. This yields a big, diagonally symmetric matrix of probabilities. The diagonal elements Mab are close to 1.• But the matrix of joint probabilities, P(b∩a) does not represent a Markov process. Recall the elements of a Markov process’ matrix are conditional probabilities, P(b|a) = P(b∩a) / P(a). P(a) is just the probability (frequency) of an amino acid, so each column in Mab is divided by the frequency of the corresponding amino acid. The diagonal elements are still all close to 1.• Dayhoff then adjusts the small non-diagonal elements by a common factor that makes the expected number of amino acid substitutions equal to 1 in 100. The diagonal elements are then adjusted to make each row add up to 1 as required by the law of total probability.•This is the PAM1 Markov matrix (PAM = Point Accepted Mutation; 1 = 1% substitution frequency).

Page 18: Dayhoff’s Markov Model of Evolution

Using the PAM Model• The PAM1 Markov matrix can be multiplied by

itself to yield the PAM2 Markov matrix, and again to yield the PAM3 matrix, etc. PAM1 is a “unit of evolutionary distance”.

• PAM250 is commonly used. Note that 250% of the amino acids have not been substituted – it’s more like 80%.

• The PAM Markov Matrices arrived at by matrix multiplication need to be converted into the scoring matrices that one would use for BLAST or CLUSTALW.

Page 19: Dayhoff’s Markov Model of Evolution

Probability of an AlignmentIn a random model, the probability of the independent alignment of two proteins x and y is the product of the probabilities { qa } for all the amino acids.

In a match model, the proteins have descended from a common ancestor protein and the amino acid sequences are no longer independent. In this model, the probability can be expressed as a matrix of joint probabilities {{ pab }}

Dayhoff and coworkers could estimate these probabilities from the frequencies of amino acid substitutions she observed in her database of evolutionarily related proteins.

(Note that the { qi } are not all the same value of 1/20.)

(Note that the pij = pji because neither protein is “first”.)

Page 20: Dayhoff’s Markov Model of Evolution

A Log-Odds ScoreWe are interested in the ratio of the match model probability of alignment to the random model probability:

In practice, we usually take the log of these quantities for a substitution “scoring” matrix. This changes the multiplications into additions and reduces round-off error.

S(a,b) defines the number you usually see in a substitution matrix. These numbers are usually rounded to integers to ease computation.

Page 21: Dayhoff’s Markov Model of Evolution

Questions?

• I will post a Mathematica notebook.