a test of the markov model of evolution in proteins

3
doi: 10.1101/pdb.ip58 Cold Spring Harb Protoc; David W. Mount A Test of the Markov Model of Evolution in Proteins Service Email Alerting click here. Receive free email alerts when new articles cite this article - Categories Subject Cold Spring Harbor Protocols. Browse articles on similar topics from (60 articles) Proteomics (488 articles) Proteins and Proteomics, general (102 articles) Genome Analysis (322 articles) Genetics, general (74 articles) Computational Biology (131 articles) Bioinformatics/Genomics, general (33 articles) Alignment of Sequences (12 articles) Alignment of Pairs of Sequences http://cshprotocols.cshlp.org/subscriptions go to: Cold Spring Harbor Protocols To subscribe to Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/ Downloaded from Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/ Downloaded from

Upload: d-w

Post on 22-Feb-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Test of the Markov Model of Evolution in Proteins

doi: 10.1101/pdb.ip58Cold Spring Harb Protoc;  David W. Mount A Test of the Markov Model of Evolution in Proteins

ServiceEmail Alerting click here.Receive free email alerts when new articles cite this article -

CategoriesSubject Cold Spring Harbor Protocols.Browse articles on similar topics from

(60 articles)Proteomics (488 articles)Proteins and Proteomics, general

(102 articles)Genome Analysis (322 articles)Genetics, general

(74 articles)Computational Biology (131 articles)Bioinformatics/Genomics, general

(33 articles)Alignment of Sequences (12 articles)Alignment of Pairs of Sequences

http://cshprotocols.cshlp.org/subscriptions go to: Cold Spring Harbor Protocols To subscribe to

Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/Downloaded from

Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/Downloaded from

Page 2: A Test of the Markov Model of Evolution in Proteins

A Test of the Markov Model of Evolution in Proteins

David W. Mount

Adapted from “Alignment of Pairs of Sequences,” Chapter 3, in Bioinformatics: Sequence and GenomeAnalysis, 2nd edition, by David W. Mount. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,NY, USA, 2004.

INTRODUCTION

The percent accepted mutation (PAM) scoring matrix is based on the Dayhoff model of protein evo-lution, which is a Markov process. In the Markov model of amino acid change, the probability ofmutation at each site is independent of the previous history of mutations. Use of this model makesit possible to extrapolate amino acid substitutions observed over a relatively short period of evolu-tionary time to longer periods of evolutionary time. One criticism of the PAM scoring matrix is thatthe frequency of amino acid changes that require two nucleotide changes is higher than would beexpected by chance. This article describes a test of the Markov model of protein evolution, whichshows that the model can be valid if certain changes are made in the way that PAM matrices arecalculated.

RELATED INFORMATION

Information on PAM scoring matrices and blocks amino acid substitution matrices (BLOSUM) is pre-sented in Using PAM Matrices in Sequence Alignments (Mount 2008a), Using BLOSUM inSequence Alignments (Mount 2008b) and Comparison of the PAM and BLOSUM Amino AcidSubstitution Matrices (Mount 2008c). The appropriate choice for gap penalties to be used with var-ious matrices is discussed in Using Gaps and Gap Penalties to Optimize Pairwise SequenceAlignments (Mount 2008d). BLOSUM and other scoring matrices are compared in combination withvarious alignment algorithms and gap penalties in Studies of Varying Alignment Algorithm, AminoAcid Scoring Matrix, and Gap Penalties (Mount 2008e).

TEST OF THE MARKOV MODEL OF PROTEIN EVOLUTION

In the Markov model of evolution in proteins, the probability of change of any amino acid a to aminoacid b is the same, regardless of the previous changes at that site and also regardless of the positionof amino acid a in a protein sequence. Wilbur (1985) addressed a major criticism of the PAM scoringmatrix, namely, that the frequency of amino acid changes that require two nucleotide changes ishigher than would be expected by chance. About 20% of the observed amino acid changes requiremore than a single mutation for the necessary codon changes. This fraction is far greater than wouldbe expected by chance.

To correct for changes that require at least two mutations, Wilbur recalculated the PAM1 matrixusing only amino acid substitution data from 150 amino acid pairs that can be accounted for by sin-gle mutations. To accomplish this calculation, he used a refined mathematical model that provideda more precise measure of the rate of substitution. He then estimated frequencies of the other 230amino acid substitutions reachable only by at least two mutations, and compared these frequenciesto the values calculated by Dayhoff, who had assumed these were single-step changes. If these

© 2008 Cold Spring Harbor Laboratory Press 1 Vol. 3, Issue 6, June 2008

Please cite as: CSH Protocols; 2008; doi:10.1101/pdb.ip58 www.cshprotocols.org

Information Panel

Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/Downloaded from

Page 3: A Test of the Markov Model of Evolution in Proteins

www.cshprotocols.org 2 CSH Protocols

numbers agreed, argued Wilbur, then the PAM model used to produce the Dayhoff matrix is a reli-able one. In fact, the Dayhoff values exceeded the two-step model values by a factor of ~117. Onesource of discrepancy was the assumption that the two-step changes were a linear function of evo-lutionary time over short evolutionary periods of 1 PAM (average time of 1 PAM = 10,000,000 yr),whereas, because two mutations are required to make the change, a quadratic function is expected.With this correction made to the Dayhoff calculations for amino acid substitutions requiring twomutations, agreement with the two-step model improved about 10-fold, leaving another 11.7-foldunaccounted for.

Wilbur analyzed the remainder by the covarion hypothesis (Fitch and Markowitz 1970; Miyamotoand Fitch 1995), in which it is assumed that only a certain fraction of amino acid sites in a protein arevariable and that one site influences another. Thus, a change in one site may influence the variabilityof others. This model seems to be reasonable from many biological perspectives. For example, aminoacids at different regions of a protein sequence will interact to give a three-dimensional structure; thus,a change in one of these without some kind of compensating change in the other region could bedetrimental to protein structure and function. The prediction of this hypothesis is that the frequencyof two-step changes would be overestimated because we did not take into account the failure of manysites to be mutable. Using a reasonable estimate of 0.3 for the fraction of the sites that could change,the effect on the Dayhoff calculations for frequencies of two-step changes would be 3.3-fold. Theremaining discrepancy in the 11.7-fold ratio between Dayhoff values and two-step values may beattributable to variations in mutation rates from site to site, or to the exclusion of certain amino acidsat a particular site.

In conclusion, Wilbur (1985) has shown that the Dayhoff model for protein evolution appears togive predictable and consistent results, but that frequencies of change between amino acids thatrequire two mutational steps must be calculated as a two-step process. Failure to do so generateserrors due to variations in site-to-site mutability. George et al. (1990) have counter-argued that it hasnever been demonstrated that two independent mutations must occur, each becoming established ina population before the next appears. In fact, double mutations can occur following DNA damage byultraviolet light. However, the issue has never been resolved.

REFERENCES

Fitch, W.M. and Markowitz, E. 1970. An improved method for deter-mining codon variability in a gene and its application to the rateof fixation of mutations in evolution. Biochem. Genet. 4: 579–593.

George, D.G., Barker, W.C., and Hunt, L.T. 1990. Mutation datamatrix and its uses. Methods Enzymol. 183: 333–351.

Miyamoto, M.M. and Fitch, W.M. 1995. Testing the covarion hypoth-esis of evolution. Mol. Biol. Evol. 12: 503–513.

Mount, D.W. 2008a. Using PAM matrices in sequence alignments.CSH Protocols (this issue) doi: 10.1101/pdbtop38.

Mount, D.W. 2008b. Using BLOSUM in Sequence Alignments. CSHProtocols (this issue) doi: 10.1101/pdb.top39.

Mount, D.W. 2008c. Comparison of the PAM and BLOSUM aminoacid substitution matrices. CSH Protocols (this issue) doi:10.1101/pdb.ip59.

Mount, D.W. 2008d. Using gaps and gap penalties to optimizepairwise sequence alignments. CSH Protocols (this issue) doi:10.1101/pdb.top40.

Mount, D.W. 2008e. Studies of varying alignment algorithm, aminoacid scoring matrix and gap penalties. CSH Protocols (this issue)doi: 10.1101/pdb.ip60.

Wilbur, W.J. 1985. On the PAM model of protein evolution. Mol. Biol.Evol. 2: 434–447.

Cold Spring Harbor Laboratory Press at UNIVERSITE LAVAL on June 24, 2014 - Published by http://cshprotocols.cshlp.org/Downloaded from