evolution, energy landscapes and the paradoxes of protein...

13
Review Evolution, energy landscapes and the paradoxes of protein folding Peter G. Wolynes Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA article info Article history: Received 15 September 2014 Accepted 11 December 2014 Available online 18 December 2014 Keywords: Folding landscape Natural selection Structure prediction abstract Protein folding has been viewed as a difcult problem of molecular self-organization. The search problem involved in folding however has been simplied through the evolution of folding energy landscapes that are funneled. The funnel hypothesis can be quantied using energy landscape theory based on the minimal frustration principle. Strong quantitative predictions that follow from energy landscape theory have been widely conrmed both through laboratory folding experiments and from detailed simulations. Energy landscape ideas also have allowed successful protein structure prediction algorithms to be developed. The selection constraint of having funneled folding landscapes has left its imprint on the sequences of existing protein structural families. Quantitative analysis of co-evolution patterns allows us to infer the statistical characteristics of the folding landscape. These turn out to be consistent with what has been obtained from laboratory physicochemical folding experiments signaling a beautiful conuence of ge- nomics and chemical physics. © 2014 Elsevier B.V. and Soci et e française de biochimie et biologie Mol eculaire (SFBBM). All rights reserved. Contents 1. Introduction ...................................................................................................................... 218 2. General evidence that folding landscapes are funneled ................................................................................ 221 3. How rugged is the folding landscape e physicochemical approach ...................................................................... 222 4. Reverse engineering the folding energy landscape ..................................................................................... 223 5. The evolutionary landscapes of proteins ............................................................................................. 225 6. Frustration and function ......................................................... ................................................. 227 7. Folding paradoxes revisited ........................................................................................................ 228 Conflict of interest ................................................................................................................. 228 Acknowledgments ............................................................ .................................................... 228 References ......................................................................................................................... 229 1. Introduction Paradoxically, protein folding has turned out to be easy. How? Why? What's paradoxical? This review is aimed at answering these questions, beginning with the last one [1]. Ever since Annsen's groundbreaking experiments, understanding the spontaneous na- ture of protein folding has been widely viewed as being a difcult problem [2]. Delbrück is quoted by Gunther Stent [3] as saying about protein folding the reduction in dimensionality from three dimensional continuous to one dimensional discrete in the genesis of proteins is a new law of physics and one nobody could have pulled out of quantum mechanics without rst seeing it in operation. According to Stent, after saying this, Delbrück also immediately quoted Bohr as telling about a man who, upon seeing a magician saw a woman in half, shouts out It's all a swindle. In a similar spirit, I think it is fair to say that a majority of people even in the last decade have looked with skepticism at claims that protein folding was becoming understood [4]. Yet today, a variety of computer algorithms can indeed translate, for the simpler systems, one dimensional sequence data into three dimensional structure albeit at moderate resolution [5e7]. The easiest to use algorithms rely ultimately on recoding existing E-mail address: [email protected]. Contents lists available at ScienceDirect Biochimie journal homepage: www.elsevier.com/locate/biochi http://dx.doi.org/10.1016/j.biochi.2014.12.007 0300-9084/© 2014 Elsevier B.V. and Soci et e française de biochimie et biologie Mol eculaire (SFBBM). All rights reserved. Biochimie 119 (2015) 218e230

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

lable at ScienceDirect

Biochimie 119 (2015) 218e230

Contents lists avai

Biochimie

journal homepage: www.elsevier .com/locate/b iochi

Review

Evolution, energy landscapes and the paradoxes of protein folding

Peter G. WolynesDepartment of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA

a r t i c l e i n f o

Article history:Received 15 September 2014Accepted 11 December 2014Available online 18 December 2014

Keywords:Folding landscapeNatural selectionStructure prediction

E-mail address: [email protected].

http://dx.doi.org/10.1016/j.biochi.2014.12.0070300-9084/© 2014 Elsevier B.V. and Soci�et�e française

a b s t r a c t

Protein folding has been viewed as a difficult problem of molecular self-organization. The search probleminvolved in foldinghoweverhas been simplified through the evolutionof foldingenergy landscapes that arefunneled. The funnel hypothesis can be quantified using energy landscape theory based on the minimalfrustration principle. Strong quantitative predictions that follow from energy landscape theory have beenwidely confirmed both through laboratory folding experiments and from detailed simulations. Energylandscape ideas also have allowed successful protein structure prediction algorithms to be developed.

The selection constraint of having funneled folding landscapes has left its imprint on the sequences ofexisting protein structural families. Quantitative analysis of co-evolution patterns allows us to infer thestatistical characteristics of the folding landscape. These turn out to be consistent with what has beenobtained from laboratory physicochemical folding experiments signaling a beautiful confluence of ge-nomics and chemical physics.

© 2014 Elsevier B.V. and Soci�et�e française de biochimie et biologie Mol�eculaire (SFBBM). All rightsreserved.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2182. General evidence that folding landscapes are funneled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2213. How rugged is the folding landscape e physicochemical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2224. Reverse engineering the folding energy landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2235. The evolutionary landscapes of proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2256. Frustration and function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2277. Folding paradoxes revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Conflict of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

1. Introduction

Paradoxically, protein folding has turned out to be easy. How?Why?What's paradoxical? This review is aimed at answering thesequestions, beginning with the last one [1]. Ever since Anfinsen'sgroundbreaking experiments, understanding the spontaneous na-ture of protein folding has been widely viewed as being a difficultproblem [2]. Delbrück is quoted by Gunther Stent [3] as sayingabout protein folding “…the reduction in dimensionality fromthree dimensional continuous to one dimensional discrete in the

de biochimie et biologie Mol�ecula

genesis of proteins is a new law of physics and one nobody couldhave pulled out of quantum mechanics without first seeing it inoperation”. According to Stent, after saying this, Delbrück alsoimmediately quoted Bohr as telling about amanwho, upon seeing amagician saw a woman in half, shouts out “It's all a swindle”. In asimilar spirit, I think it is fair to say that a majority of people even inthe last decade have looked with skepticism at claims that proteinfolding was becoming understood [4].

Yet today, a variety of computer algorithms can indeed translate,for the simpler systems, one dimensional sequence data into threedimensional structure albeit at moderate resolution [5e7]. Theeasiest to use algorithms rely ultimately on recoding existing

ire (SFBBM). All rights reserved.

Page 2: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

Fig. 1. The funnel diagram. A schematic diagram of the energy landscape of a protein,here illustrated with the PDZ domain whose native structure is shown at the bottom ofthe funnel. The energy landscape exists in a very high dimensional space. The diagramcan only give a sense of this through its representation of two dimensions. The radialcoordinate measures the configurational entropy which decreases as the protein takeson a more fully folded structure. The energy of individual configurations is representedby the vertical axis. The values of the energy indicated on this axis are stronglycorrelated with the fraction of native structures that has formed which is oftenmeasured by the fraction of correct native-like contacts called Q. Q also typically in-creases as the structures descend in the funnel. The energy and entropy oppose eachother so that at high temperature the protein is found in an ensemble of states near thetop of the funnel. Structures of denatured configurations thus are shown near the topof the funnel. At low temperature, in contrast, an ensemble clustered around the nativestructures becomes thermally occupied at the bottom of the funnel. The imperfectmatching of entropy and energy leads typically to a free energy barrier that separatesthese two ensembles of states. Surmounting this barrier limits the folding rate. Thesmall mini-funnels on the sides of the funnel represent trap states. These traps typi-cally possess some native structure but also they contain energetically favorablealternative non-native contacts. Because the non-native contacts are not consistentwith each other, rarely are such mini-funnels competitive in an energetic sense withthe native basin. The stability of non-native interactions in any one of these traps is anunusual rare accident while the interactions that are formed in native structure haveevolved in order for the individual natively folded structure to be especially stable.

P.G. Wolynes / Biochimie 119 (2015) 218e230 219

biological information into the form of an energy landscape forcomputer simulations [5] and thus ultimately these algorithms relyon having “seen it in operation” rather than deriving their resultsdirectly from quantum mechanics. Delbrück's demand to pullfolding out of quantum mechanics, has, however, almost beensatisfied by using very powerful computers to simulate fullyatomistic models, the forces in which are only lightly parametrizedby protein structural data [7]. So we see folding proteins really iseasier than many people (including me!) thought. Nevertheless,there is something at least a bit strange, perhaps even paradoxicalabout this outcome, because theory has shown that protein foldingindeed could have been much harder, almost as hard as Levinthalenvisioned in his famous paradox inwhich it was pointed out that itwould take a cosmological time to fold a protein by searchingthrough all its possible structures [8].

Anfinsen's refolding experiment itself suggests that folding canbe thought of as the search for aminimum free energy arrangementin space of a heteropolymeric chain of residues. The solvent aver-aged interactions between amino acids depend on the residuetypes; some residues will interact more strongly with particularother ones than they interact with others. Thus folding is a kind ofmatching problem: we must find which amino acid will finallymake contact with which other one in the native structure in orderto reach the lowest energy three dimensional structure. Thematching problem is in general NP (non-polynomial) complete [9].To say a problem is NP complete is a mathematician's way of sayingthere exist instances of such a problem (i.e. there should be certainsequences) for which no known algorithm could find the solutionin a time scaling as a polynomial in the size of the problem. Theexhaustive search through minima envisioned in Levinthal'sparadox scales exponentially with protein length but even an al-gorithm that takes a time exponentially scaling with a lower powerof length (as happens in some theories based on capillarity ideas[10,11]), also would satisfy strict NP completeness so Levinthal'sestimate is certainly an exaggeration. Of course the key words inthe mathematical statement are “there exist instances”. Not allinstances of matching problems are expected to be equally hard.Some instances of matching problems can in fact be quite easy. Afamiliar biological example where search difficulty varies with thetask is provided by thematching of base pairs in nucleic acid doublehelices as they assemble. The minimum energy matching is easilyfound if the two strands are exactly complementary, as in mostnuclear DNA giving rise to the standard rules of replication. Findinga match is also easy if only a small fraction of the bases are notconjugate to their partners just as happens when we try to fish outfunctionally related sequences using primers or use modifiedprimers for site directed mutagenesis via PCR. Comparing twodivergent sequences for homology is easy, the difficulty scaling onlywith a polynomial in the chain length [12]. Most functional RNAsequences have easymatches also, if no pseudo-knots are formed informing their secondary structures, but in principle the problem ofpairing the bases in an arbitrary nucleic acid strand with itself canbe quite difficult. This difficulty is also reflected in the cell wheresome messenger RNA's turn out actually to be metastable [13] ethey first find an active conformation transiently but later rear-range to an inactive but more stable form after sufficient proteinhas been translated by themRNA so that themessenger is no longerneeded. In contrast, it appears from the Anfinsen experiment, alongwith its thousands of descendants, that monomeric protein foldingis usually thermodynamically controlled, with functional meta-stability being the exception [14] rather than the rule.

Are the easy instances of matching an amino acid sequence toitself in order to fold into a three-dimensional structure exceptionalor are the tough cases the weird examples? This, of course, maydepend on the details of the interactions, but the simplest

arguments suggest that for a sufficiently long chain, among thosechosen at random, the easy examples should be the rarities.Bryngelson and Wolynes suggested that the energy landscape of atypical random amino acid sequence would be rough, riddled withdeep metastable minima of widely differing structures andresemble the rugged landscape of a glass [15,16]. They analyzed thissituationwith the random energymodel [17]. The ideawas that in asufficiently long random sequence conflicts between differentchoices of the favorable interactions would inevitably arise, aphenomenon known as frustration [18]. The low energy states ofsuch a system are all highly compromised, satisfying some in-teractions very stably, perhaps, but with other interactionsremaining unsatisfied and in conflict giving rise to many neardegenerate configurations. The mathematical validity of this ideathat the energy landscape of a completely random heteropolymerwould be rugged, has been buttressed both by more sophisticatedstatistical mechanics using the replica method [19,20] and bynumerous computer simulation studies on highly simplifiedmodels that capture the essential features of self-matching chains[21e23].

So proteins seem to be the rare, easy instances of energy mini-mization. What is it that makes folding protein-like sequenceseasy? What is it that allows proteins to pair residues properlythrough Brownian motion in order to find their lowest free energy

Page 3: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230220

states without the molecule getting trapped in metastable states ofwildly different structure as sometimes happens with RNA? Theanswer is that natural proteins correspond with those special se-quences that have been selected to have more consistently stabi-lizing interactions throughout the natively structured moleculethan does a typical sequence which inevitably makes many com-promises in its low energy structures. Proteins are not so “frus-trated” as a usual heteropolymer sequence is e we say they are“minimally frustrated” [15].

Bryngelson and Wolynes showed that, in contrast to randomsequences, such unusual, easy-to-fold minimally frustrated se-quences have two potential structural thermodynamic phasetransitions. One of the transitions is the folding transition to a well-organized highly stable structure with perhaps some defects.Nevertheless if this correct folding would be prevented fromhappening, a protein still would settle down to an ensemble ofstructurally disparate low energy states when cooled below acharacteristic temperature Tg.Wewould find thesemisfolded statesin the low energy trapped ensemble to have similar energies toeach other, but to be structurally quite distinct from each other.Finding one given energy minimum by a simulated annealingprotocol would yield a structural prediction for that sequence, butthat prediction would be quite unreliable until all the low energyarrangements of the chain were found and examined individually.Which structure out of this wide ranging ensemble prevails for aparticular molecule would depend on the history of the proteinsynthesis and annealing, i.e. the protein folding outcome would bekinetically controlled not thermodynamically controlled. This sit-uation is much like what happens for a liquid which becomes aglass when it is cooled if the liquid fails to crystalize [24]. Thedetailed atomic structure of the resulting glass varies from sampleto sample but the macroscopically averaged properties of the glass,like its energy, are nearly the same from sample to sample.

For easy folding sequences, i.e. minimally frustrated sequences,the folding transition at TF resembles a crystallization transitionwhile the history dependent transition at Tg for poor folders is likethe glass transition for a liquid. Protein folding in this picture re-sembles nucleation of a crystal from a small fluid drop. The kineticsof nucleation is impeded by the multiplicity of trapped states thatbecome prominent near the glass transition temperature of thefluid, Tg. The driving force to the native structure is impeded in itseffectiveness by the energetic ruggedness reflected in Tg [16,20,21].

Nucleation, folding and self assembly differ from the morefamiliar multi-step chemical transformations of organic synthesisor intermediary metabolism in that collective self assembly has amultiplicity of possible mechanisms–no specific sequence of stepsis absolutely needed in order to achieve successful folding. Never-theless a dominant small set of pathways can emerge as the mostimportant ways a particular sequence assembles under somethermodynamic conditions [25,26]. The dominant routes, as well asthe activation barrier determining the folding rate, depend on theway in which the loss of entropy upon ordering the chain iscompensated by the additional stabilization energy that properlyassembled parts of the chain achieve when the energy is comparedwith the weaker stabilization that occurs when the chain tran-siently samples improperly folded states [27]. The landscapes forprotein folding or for crystallization can be described as ruggedfunnels [28]. See Fig. 1.

The rate of attempting to escape from misfolded traps decreasesas the stability of these traps increases and thus the kinetics of searchdepends on the glass transition temperature Tg. Easy-to-fold se-quences can foldwithout getting trappedbecause they have ahigh TFto Tg ratio. This ratio, in turn, depends on the comparison of the en-ergy of the fully folded state EF and the typical stabilization energy ofa trap Eg which monotonically increases as Tg increases [29].

The minimal frustration or “funnel” [21,28] scenario as theexplanation for the ease of protein folding has been confirmed, inconcept, by numerous computer studies that simulate stylizedstripped down models of self-associating polymers. The mostcomprehensive such studies simulate heteropolymers on lattices[30,31] where the exact enumeration of possibilities ensures that acomplete survey of the landscape can be made so that even for badfolders the ground state can be found and certified as being correct.Other models that realistically describe the protein backbone ge-ometry so that they give structures more easily recognized as beingreal proteins concur in supporting the idea that it is the magnitudeof the TF/Tg ratio that determines the ease of folding to a firstapproximation [32e34]. A discussion of these studies can be foundin many early reviews [21,22,35]. Yet the funnel story has seemedtoo simple to many people [36e38]. They ask “Could that really beall there is to it?” Some of the skepticism is perhaps justifiedbecause the success of the funnel landscape picture raises otherquestions: How does the energy landscape theory apply to realproteins (in a quantitative sense)? Exactly how easy is it to fold theproteins of Nature? These questions form the impetus for much ofthe last two decades' work on protein folding theory.

In this paper I plan to first touch on the experimental evidencethat proteins are indeedminimally frustrated polymers, i.e. that theenergy landscape of naturally occurring proteins is actually arugged funnel as hypothesized. The evidence for the funnel idea isactually quite overwhelming in volume. It is much too extensive todo justice to it here. I therefore refer the reader to an earliercomprehensive review on the experimental survey of proteinfolding landscapes [21] using the tools of protein engineering, fastkinetics, theory and simulation. That review documents the use-fulness of the rugged funnel model in understanding the kinetics ofa range of systems. Atomistic simulations done recently [7] alsosupport the funneled energy landscape picture for a large numberof examples, as I will describe below.

Having achieved minimal frustration is the “swindle” (to useBohr's term) behind the ease of protein folding. Minimal frustrationhas apparently come about as the result of evolution and is encodedin existing protein structures and sequences. The mathematicalinstantiation of the minimal frustration principle via the TF/Tg ratiocan therefore be used as a learning tool for inferring the nature offorces within and between protein molecules e “reverse engi-neering” the folding problem [29,39e42]. This strategy has led topredictive algorithms for determining protein tertiary structurefrom sequence alone. Again the topic of protein structure predic-tion algorithms based on energy landscape theory has also recentlybeen reviewed by us [5], so I will just touch on this question here.These practical successes, in my view, also should buttress ourconfidence in the main ideas of folding energy landscape theory.

The bulk of this review will focus on understanding preciselyhow easily do proteins folde that is, what havewe learned throughtrying to quantify the smoothness of the landscape by comparingthe strength of the guiding forces to the ruggedness of the land-scape. To a first approximation this quantification may be sum-marized in a single number, again, the TF/Tg ratio. Several estimatesof TF/Tg have been made in different ways over the years, usinglaboratory data on the thermodynamics and kinetics of folding aswell as experimental observations on the reconfigurational mo-tions and residual structure in molten globules as input [43e45]. Iwill review those arguments.

As I have mentioned, the ease of protein folding must ultimatelybe the result of evolution. The “swindle” of easy protein folding wasset up by natural history. Can we also check the idea of the evolu-tionary origin of minimal frustration in a quantitative sense? Isthere a sign in the sequence data alone that landscapes have beenfunneled through natural selection? Recently it has become

Page 4: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230 221

possible to determine the degree of funneling of entire proteinfamilies as measured by their typical TF/Tg ratio by carrying outinformation theoretic analyses of large families of protein se-quences and comparing the detectable coevolutionary patterns tothe sequence patterns that would be expected based on the physicscontained in energy landscape derived transferable force fields thatare able successfully to predict tertiary folds from individual pro-tein sequences [46]. I will review the resulting confluence of theevolutionary genomic and the physics perspectives on the energylandscape. One finds a picture of the folding energy landscape fromthe genomic data alone that matches pretty well with what hasbeen previously deduced from purely physico-chemical ideas andexperiments. Finally I will return to the paradoxes of easy proteinfolding commenting on my current view of them.

2. General evidence that folding landscapes are funneled

The experimental study of protein structure and folding mech-anism has yielded many general sorts of observations about theways proteins fold. These patterns suggest protein energy land-scapes, in the main, are funneled and that proteins are minimallyfrustrated heteropolymers. In addition, the details of the foldingmechanisms of many specific proteins have been reproduced usinghighly idealized models of protein energy landscapes that have nofrustration at all: so called “structure based” [47,48] or G�o models[49]. These models ignore any possibility of stabilizing misfoldinginteractions that would lead to kinetic traps nevertheless they givesolid and surprisingly accurate predictions. The landscapes of suchstructure based models can be thought of as being “perfect fun-nels”. Like the perfect gas model in elementary chemistry, theperfect funnel model, while working reasonably well, does not al-ways have quantitative perfection. Major qualitative deviationsfrom perfect funnel predictions are, however, quite unusual.Apparent failures of the funnel model have sometimes led us tothink outside the box and to our delight have turned out to confirmthat funneling is still preserved for the greater part of the foldingprocess [50]. Many seeming exceptions to perfect funnel predictioncan be accounted for by usingmodels that have small perturbationsfrom perfectly funneled landscapes. These models take into ac-count weak trapping owing to residual frustration [51] or energeticheterogeneity [52] or other sources of landscape degeneracy suchas the symmetry of designed protein sequences [53] or the sym-metry intrinsic to oligomerization through domain swapping [54].The success of the general predictions about folding that have comefrom funneled, minimally frustrated landscapes along with themultitude of detailed specific successes on many systems shouldgive us a lot of confidence that theminimal frustration principle hasmore than a core of truth. The fact that we can understand andquantify deviations from the model itself is also very encouragingthat we have begun to understand protein folding.

What general observations support the minimal frustrationhypothesis? The first observation is the well-known robustness ofprotein structure to changes in sequence. Single mutations, andeven sometimes dramatic overall changes in sequence like thosefound in “twilight zone” homologues still give remarkably similarfolded native structures [55]. It is commonplace now for proteinengineers to assume thatmaking only a singlemutation in a proteinwill leave its structure intact, once that protein has successfullyfolded, under the right solvent conditions. This robustness is notwhat would be expected for folding on a typical highly frustratedrugged landscape where an ensemble of structurally disparatecompeting low energy structures is the norm. Yes, a single muta-tion sometimes does cause a protein to unfold but this unfoldingreflects an “unfair” competition between one single folded struc-ture against myriads of unfolded alternatives, not the competition

between specific structured individual alternatives. Only recentlyhave sequences been found that adopt fixed structures but that alsochange drastically their structure when individual mutations aremade [56]. These examples typically involve also changing of theoligomeric state of the proteins, however. Such sequences might be“missing links” in protein evolution.

For a minimally frustrated protein, the overall stability typicallydoes not come from just a few residues but rather is spreadthroughout the structure. Owing to this delocalized character of theguiding forces to the native state, proteins with widely differentsequences not only have similar final structures but often follow thesame basic folding mechanism and sometimes have even quantita-tively similar rates when tuned to the same stability [57,58]. Thisgeneralization that “topology” controls the folding mechanism (aswell as its exceptions) has been documented for several proteinfamiliesnotablyby JaneClarke [59]but alsobynumerousothers [60].

The most general powerful consequence of proteins havingminimally frustrated landscapes is that folding kinetics almost al-ways changes smoothly and monotonically as protein stability ischanged. Studying the effect of such changes on the kinetics is thebasis of modern systematic mechanistic folding studies [26,61]. Thestability can be changed by changing temperature (up or down!),changing solvent conditions or by making site mutations. Thesmooth variation of rate with stability is expected for landscapesthat are dominated by forming successively the contacts foundfinally in the native structure [62]. In contrast such a smooth vari-ation is quite unexpected if wildly different structures involvingspecific non-native structures were to play a significant role in thefolding mechanism as would be expected to happen for typicalrandom sequences. Such unexpected variations do sometimesappear. The ROP dimer system seemingly violates the funnelconcept because mutations were found that changed both thefolding and unfolding rates without changing the protein stability[63]. To accommodate this anomaly, energy landscape thinkinghowever led us to suggest the structure of the dimer had, in fact,been changed upon mutation [50]. It was later confirmed by singlemolecule FRET experiments that the ROP dimer indeed had, owingto its high symmetry, two different but similarly stable 4 helixbundle structures sharing a common set of interactions [64,65]. Oneform is dominant for the original protein but the other form dom-inates for the mutant. In the absence of the symmetry of the dimersystem, only a few proteins show any evidence at all of degeneratetrap states in which non-native contacts play a role in the key in-termediates, as we would find for a random sequence. The mostnotorious of these strong exceptions to the perfect funnel model isthe folding of Im7, a molecule that functions by binding a toxin [66].It turns out to be “the exception that proves the rule”. Computa-tional analysis of the Im7 landscape shows that this rather smallprotein possesses a large region that is frustrated in the monomerbut that takes part in the binding function of themoleculewhere theadditional interactions with its target form that are finally favorableand the frustration is relieveduponbinding [51]. This frustrated partof themolecule leads to a re-packing in an intermediatewhen Im7 isrequired to fold on its own without having a partner present [67].

A more detailed test of the ideal funnel model comes frommaking quantitative predictions using perfectly funneled landscapecalculations and comparing them with accurate experimentalmeasurements of kinetic changes on a residue by residue basis[68e72]. The ideal funnel models allow us to predict the fraction ofthe native stability change that becomes translated into the ratechange e the so-called F value analysis. The F value essentiallygives the fraction of the time that a residue participates energeti-cally in the crucial transition states for folding [62]. Very goodagreement between predictions of perfect funnel models andexperiment has been found in several cases for extensive sets of F-

Page 5: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

Fig. 2. The Distribution of energies on a funneled folding landscape. A schematicspectrum showing the density of states of a minimally frustrated protein. Compactalternative or decoy states are distributed with a nearly Gaussian distribution of en-ergies through the random addition of conflicting contributions. At a temperature T,the thermally occupied decoys will be diminished in number but they will still have aGaussian distribution of energies that is shifted downwards. At the glass temperatureTg only a very small number of such trap states would be thermally occupied. Theenergy Eg at Tg can be estimated from the width of the unbiased decoy distribution DE.For a minimally frustrated protein an evolved sequence fits the target structure quitewell so that at a folding temperature TF the Boltzmann weight for the target structurecompetes with the entire collection of states in the unfolded ensemble. For mostrandom heteropolymers no significant gap in the spectrum exists. As the extra stabilityof the target dEF increases, relative to the width of decoy distribution DE, the foldedstructure can be more and more easily picked out from the alternatives. By maximizingdEF/DE over a set of sequences one finds more and more stable “well-designed” se-quences. Conversely if many sequence/structure pairs are known the parameters in theenergy function can be varied so as to maximize the energy of dEF/DE for the set. Theresulting energy function summarizes the structural sequence correlations in thetraining set. In this way structural data allow us to learn transferable energy functions.Energy landscape theory provides us a theoretical “license to do bioinformatics”.

P.G. Wolynes / Biochimie 119 (2015) 218e230222

values measured for many systems. These include the F values forthe folding of azurin, UIA, l repressor and very large lactoserepressor module [73].

Fully atomistic simulations also converge on confirming thevalidity of the funnel concept for the folding of natural proteins atleast when correctly tuned all-atom force fields are used. Manyearly simulation studies did find some evidence for non-nativeintermediates in natural proteins [74e76]. It was suggested thatthese non-native traps simply had been missed by experimenters.Certainly some misfolded intermediates may be real and probablydo exist in the laboratory but a poor force field also typically leadsto transient misfolding (since the protein did not evolve with thatman-made force field!) and this seems to be the problem in manyof the early simulations of folding. In any event the importance ofmisfolded intermediates in the laboratory would be a quantitativequestion. Non-native traps may also be relatively more importantin the smaller protein systems that first became accessible tosimulation than they are in the more typical large proteins. In large,more slowly folding proteins, non-native traps would give biggerbarriers and thus they would interfere much more dramaticallywith folding in vivo. This suggests that selection against misfoldsshould be stronger for longer proteins than shorter ones. As forcefields used in the simulations have improved and the computa-tional capabilities have also advanced so that the bigger systemscan be studied [7], it has now become clear that the finally formednative contacts do, in fact, generally provide the dominant in-teractions in guiding the folding process [77,78]. Other non-nativecontacts may help collapse and also provide an overall frictionalinfluence on the rate as expected by landscape theory [16] but theydo not greatly modify the sequence of events. Eaton et al. haveanalyzed the heroic simulations of the Shaw group on a largenumber of systems. Their analysis shows that only for the artificialdesigned protein aD3 can one find evidence for specific non-nativeinteractions playing any significant role in the dominant foldingpaths. Indeed even for aD3 the non-native interactions that formcorrespond to a simple sliding of helices upon each other. Suchsymmetrically related interactions would not be immediatelyapparent to the casual observer.

So proteins, in themain areminimally frustrated. Quantitatively,how much frustration is there?

3. How rugged is the folding landscape e physicochemicalapproach

The first attempt to quantify the energy landscape of real pro-teins through the TF/Tg ratio was made by Onuchic et al. [43]. Theirgoal was to see whether the then existing toy model simulations ofprotein folding using simplified lattice models could be used tothink about real proteins. The earlier lattice calculations alreadyhad dovetailed with the analytical energy landscape ideas ofBryngelson and Wolynes [15,16] that looked at the folding processof a small protein as a random walk of an order parameter thatmeasures the native-like character of a polymer configuration.Folding is a biased randomwalk because entropy favors the myriadof polymer configurations unconstrained by similarity to the nativestate while energy, on the other hand, favors more native-likeconfigurations. Energy and entropy combine to give a free energythat depends on the order parameter. The gradient of the free en-ergy then describes the bias of the walk towards folding orunfolding and depends on the temperature. The diffusion rate forthe order parameter depends on the energetic ruggedness, slowrates of diffusion corresponding to rugged glassy landscapes, fastrates to smoother landscapes. The resulting description via aKramers-like diffusion equationwas shown to describe the kineticsof lattice models quite well [16,79]. In recent years the biased

diffusion model has also been shown to describe the dynamics ofmore realistic off-lattice models [80] and has been used directly tointerpret experimental data [81e83].

As we see, the first key quantity needed to make the connectionbetween landscape theory and real proteins is the entropy changeof the polymer when it unfolds. Calorimetry is unfortunately onlyof modest help in getting that the part of the entropy changerelevant to the polymer chain alone because the surrounding wateris structured by hydrophobic side chains in the unfolded state. Wecannot easily separate out the entropy change of the water envi-ronment from the overall calorimetric change. This solvent entropychange is a non-trivial phenomenon, ultimately being the origin ofcold denaturation. The folding temperature itself is directlyobservable so we know the energy loss must exactly compensatethe configurational entropy change so TFDS ¼ dE. While manyfolding processes occur directly from a random coil state, thecollapsed but disordered state of the protein is not too far away.Apparently proteins are near a triple point between the native,compact globule and random coil phases. The entropy of thecollapsed molten globule state is key because ruggedness can, inany event, only arise to a significant extent in the collapsedensemble. Searching through collapsed states can be slow butsearching through the expanded states with few stabilizing con-tacts is not. Collapsed state ruggedness is what could makereconfigurational search difficult.

Onuchic et al. [43] estimated the configurational entropy of thecollapsed, molten globule by noting that its helical content is quitehigh. Both collapse and helix formation lower the molten globuleentropy. The helical entropy loss can be quantitatively estimated byconsidering the ease of nucleation in the uncollapsed phase asmeasured by the s and s parameters of the helix coil transition foruncollapsed peptides. Luthey-Schulten et al. showed how to couple

Page 6: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230 223

the quasi-one dimensional phase transition of helix formationwiththe three dimensional collapse transition to relate both the entropyof the globule and its helix content to the hydrogen bond strength[84]. This theory combined with experimental data on the struc-tural content of molten globules gives a configurational entropy ofabout 0.6kB per amino acid in the molten globule. This estimate isbased on the observation that molten globules typically have about65% a helical structure. This is quite a bit less than the 2.3kB permonomer unit estimated for free chains and indeed is even smallerthan the 1.0kB per unit of the toy model lattice polymers.

Saven et al. arrive at similar numbers when the propensity ofparticular sequences to form helices of peptides or the thermody-namics of signals to start or stop helices is used as input to a theoryof the collapsed globule [85].

According to this estimate, entropically a 60 amino acid helicalprotein maps pretty well on to the popular cubic lattice model ofproteins that has 27 beads [43].

While the entropy of the collapsed ensemble follows from thestudy of residual structure in collapsed states, estimating theruggedness of the collapsed ensemble requires knowing about thedynamics within molten globules. In 1995 Onuchic et al. [43] usedthemillisecond time scale value for the reconfiguration time tR thatcould be inferred from existing NMR measurements on moltenglobules [86] to estimate the ruggedness. Surprisingly it is still noteasy to come by better values for molten globule dynamics in thelaboratory. According to the BryngelsoneWolynes random energymodel estimate, tR is related to the ruggedness described by anenergy variance DЕ2 and a microscopic reconfiguration time t0:

tR ¼ t0eDE2=2T2

(1)

Onuchic et al. took a rather short time for t0 z 10�9 s, so thatmillisecond rates in the globule give DE2/2T2 z 15.

The ruggedness DЕ2 also determines the temperature of theglass transition which occurs when the temperature is low enoughso that only deepest energy states out of the eSc/kB possible ones arethermally sampled. This condition on the entropy gives the relation

kBTg ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDE2

�N

q � ffiffiffiffiffiffiffiffiffiffiffiffiffi2Sc=N

p(2)

where N is the chain length.Onuchic et al. thus arrived at a value for the glass transition

temperature which is much lower than the folding temperatureTg z .6TF. This low value is consistent with a moderately stronglyfunneled landscape corresponding to a roughly three letter proteinfolding code.

The random energy model also gives an estimate for the energyof the deepest traps that would be thermally sampled at Tg. Thiscompetitive trap energy is Eg ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiDE2Sc

p.

The minimum frustration principle can also be formulated interms of the condition that the energies dEF are less than Eg. In thisform thenwe see minimal frustration is equivalent to there being agap in the spectrum of folded energy states, as in Fig 2. In reality ofcourse, this gap is filled by partially folded configuration in whichsome of the native contacts have formed that correspond to thestructures on the paths for nucleation of properly folded proteinstructure. Important as they are, these partially folded configura-tions are sufficiently small in number that they would hardly showup on the plot.

The TF/Tg ratio of 1.6 inferred by Onuchic et al. clearly indicatesthe folding landscape is quite smooth. It is probably an underesti-mate of the smoothness. One reason the estimate may be on thelow side is that the polymer chain motions in the globule are nowthought to be intrinsically slower than the t0 estimate they used e

more in the time scale of a microsecond. At the same time, escapefrom traps may not require rearranging the whole chain. In thisregard, Plotkin et al. showed that correlations in the landscapecause only a fraction of the ruggedness to be translated intoconfigurational slowing [87,88]. This error would go in the oppositedirection, but correlations in the landscape also change the corre-sponding Tg estimate, so that correlations in the folding landscapeturn out essentially to preserve the estimated TF/Tg ratio at thevalue found for the Onuchic et al. model.

Chan has argued that the TF/Tg ratio is actually much larger than1.6, perhaps reaching a ratio as large as TF/Tg z 10. This very highvalue would correspond to a very highly funneled energy land-scape. His reasoning is based on trying to match the observed highcooperativity of folding [44]. He shows that a heteropolymer modelwith TF/Tg ¼ 1.6 would give denaturation curves that are muchbroader than typically seen, suggesting the smoother landscape.

Clementi and Plotkin also suggest that TF/Tg is greater than 1.6.They base their suggestion on the quantitative success that perfectfunnel models have in predicting F-values [45]. By adding randomnon-native interactions, to a heterogeneous structure based modelthey suggest that the variance of non-native energies must be quitea bit less than what the 1.6 value for the TF/Tg ratio would indicate;otherwise specific non-native contacts the misfolded state wouldgreatly modify the F values. They suggest that TF/Tg ratios as largeas 2 or 3 would better fit the small deviations from a perfect funnelthat are in fact experimentally observed.

4. Reverse engineering the folding energy landscape

The minimal frustration hypothesis provides a strategy for“reverse engineering” the folding problem–thereby energy land-scape theory provides an organized way of learning the foldinglandscape by “having seen it in action”. The strategywhich is rootedin the minimal frustration hypothesis is to tinker with the forcefield so as to ensure that a transferable form for the energy function,one that can be applied to any sequence, actually leads tominimallyfrustrated, funneled landscapes for a training set of proteins withnatural sequences that we already know fold to specific knownstructures. If all folding landscapes are funneled and the parame-ters in the force field are universal and are thus transferrable, thepredictive force field that results should work well for proteins notin the training set.

Testing the landscape for proper folding would be a hard reverseengineering task if it had to be done simply by trial and error. Thereare many parameters in even the simplest coarse-grained molec-ular force field. Testing even one set of parameters for the force fieldto see if it yields a funneled folding landscape takes quite a bit ofcomputer time, so combinatorial search through parameter spacewould be prohibitively expensive. The good news is the minimumfrustration principle provides a quick zeroth order check onwhether a protein sequence can be folded with a given force field.All we have to do is to check that the energy of the folded state ismuch lower than the typical value of the energy of lowest thermalaccessible misfolded state. How do we find the energy of misfoldedstates? The latter typical trap energy, Eg as we have seen can beestimated directly from the variance of the energy over a set ofcandidate decoy structures, the quantity we call the ruggednessDE2. This variance can easily be calculated from a fixed set ofrepresentative decoys e this makes it unnecessary to find specif-ically the absolutely lowest energy misfolded state for a given forcefield. Finding the actual lowest energy misfolded would be an NPhard task. Sampling to findDE2, however, is much easier and robust.

Taking this strategy one step further mathematically leads to anexplicit optimization scheme: Maximize the ratio of dE/DE. Thisquantity is monotonically related to the TF/Tg ratio. If the

Page 7: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

Fig. 3. Predictions of globular protein tertiary structure. A gallery of globular protein structures predicted by the AWSEM energy landscape optimized force field are shownoverlapped with the correct x-ray structures. The agreement is comparable to what can be found via homology modeling but no homology information was used in making thesepredictions. The examples are 3ICB: vitamin D-dependent calcium-binding protein; 2MHR: myohemerythrin; 1JWE: N-terminal domain of E. coli DNAB helicase; 1R69: amino-terminal domain of phage 434 repressor; 256bB: cytochrome B562; 1utg: uteroglobin; 1MBA: aplysia limacina myoglobin; and 4CPV: carp parvalbuminCARP. For details pleasesee Ref 90.

P.G. Wolynes / Biochimie 119 (2015) 218e230224

parameters in the force field enter into the energy function linearlythen optimizing this ratio actually becomes a problem in linearalgebra, as first noted by Goldstein et al. [29]. Even better funneledenergy landscapes emerge when one employs still more elaborate

Fig. 4. Predictions of membrane protein structures. A gallery of membrane proteins structurray structures. The examples are 1IWG: subdomain of multidrug efflux transporter; 1J4N:transporter; 1PY6SD: subdomain of bacteriorhodopsin; 1OCC: subdomain of cytochrome Creceptor; and 2BL2: subdomain of V-type Naþ-ATPase. For details please see Ref 91.

nonlinear self-consistent optimization schemes in which the de-coys are all iteratively re-computed by sampling misfolded struc-tures for the already partially optimized force fields [89,90]. Thisstrategy (illustrated in Fig 2) has over the years led to a series of

e predicted by the AWSEM-Membrane force field shown overlapped with the correct x-subdomain of aquaporin water channel AQP1; 1PV6: subdomain of lactose permeaseoxidase aa3; 2RH1: 2-Adrenergic GPCR; 2BG9: subdomain of nicotinic acetylcholine

Page 8: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230 225

ever improving force fields that now can successfully fold manyglobular and membrane proteins using their sequence informationalone.

An historical survey of the progress made using this energylandscape theory based strategy as well as the details of its un-derlying mathematics have been recently presented by us [5].Nowadays, the quality of predictions made without homology in-formation from such force fields is almost as high as what can beachieved by tweaking the structures of known homologs. We cansee the comparison of the structures predicted by the latest forcefield (called AWSEM, the Associative Memory Water MediatedStructure and Energy Model) to the X-ray structures for severalglobular proteins in Fig. 3. The same strategy has even proved asuccessful route to developing force fields for studying membraneproteins where the funnel minimal frustration idea has been lesstested. Indeed the very basis of the funnel hypothesis may bequestioned for membrane proteins because of the kinetic controlthat may occur in vivo through the action of the translocon. Someexamples of membrane protein structure prediction [91] using sucha transferable force field based on energy landscape reverse engi-neering are shown in Fig 4. It appears that once the translocon hasplaced the protein in the membrane spontaneous folding �a laAnfinsen ensues.

It is probably not an accident that these force fields which havebeen reverse engineered using the minimal frustrated landscapeidea yield results with a resolution only comparable to what can beobtained using homology models. They are not as precise as fullydetermined atomistic x-ray structures. This is reasonable becausethe minimal frustration principle must be a result of evolution.Evolutionworks at the residue level not at the atomistic levels: onlya limited set of side chains has been tried over the course of natural

Fig. 5. The folding super landscape. The super-energy landscape of proteins is picturedhere. Ideally this would be shown as a function of both sequence and conformationsimultaneously. The large funnels are pictured as a function of sequence space with theradial sizes connoting sequence entropy. Energy is again the vertical axis. Naturalproteins are not necessarily the lowest energy designs. These would be found at thebottom of the super funnel. For each target the configuration space landscape isfunneled, but only to an energy EF. This structural energy landscape is, however, shownsuperimposed on the sequence space landscape. Disordered compact structures andsequence scrambled decoys have comparable energy statistics. They are shown nearthe top of the landscape. The funnels to other structures start from these same highenergy states but again would finally reach energies near EF if they have sufficientlyevolved under the minimal frustration selection constraint. The energy of the traps Egcan be estimated by scrambling sequences within the native structure. This diagramshows how the evolutionary and physical configurational landscapes are related toeach other. Notice that sequence space is cosmologically bigger than the structurespace is as reflected by the large sequence entropy at EF. This excess coding spaceallows minimally frustrated landscapes to be found through the random processes ofnatural selection.

history and any new side chains beyond the standard 20 that wouldbe needed to completely test the all-atom models haven't beentried by Nature. This evolutionary origin of the funneled landscapecan be tested in another way, which I describe in the next section.

5. The evolutionary landscapes of proteins

Folding of proteins is important to their function and the properfunction of proteins is needed for an organism's survival, thereforethe folding landscape is a key part of natural selection. Evolutionworks, however, both by selection and by random experimentationthrough mutation and recombination. Functional proteins inwidely divergent organisms therefore have wildly different se-quences. There is much evidence that structure evolves moreslowly than sequence does [92]. Slow structural evolutionwould beexpected if the folding energy landscape is funneled. Can wequantitatively relate the diversity of protein sequences all of whichfold to a largely common structure to the physical energy land-scapes for each of these sequences? One way to address thisquestion is first to assume that folding to the proper structure isactually the only selection constraint on molecular evolution. Thefunnel hypothesis would then summarize the folding constraint bysaying the energy of the folded structure EF must be far below Eg,the typical energy of a structurally distinct trap. As we have seenthe latter depends on DE2, which itself depends foremost on theprotein sequence composition (but not primarily on the order ofamino acids). This suggests we can picture the evolutionary energylandscape of proteins in a way much like the folding landscape of asingle protein [93]. In Fig. 5 we show such a super landscape whichdescribes how energy varies both in sequence and structure. Therewill be numerous funnels in the landscape corresponding to everypossible structure of a protein. We can concentrate on the land-scape in sequence space for evolving to a given target structure andcompare it with the physical energy landscape for folding to thatstructure. For the sequence space landscape, the energy coordinaterepresents the energy that a particular sequence has when it takeson the particular target structure. One would find the most “well-designed” sequence at the bottom of the sequence space funnel.(Unfortunately many early folding theory papers used the“designed” terminology, which can be easily misinterpreted asbeing an endorsement of creationism!)

It is important to remember, however, that because evolution isa random process, there is no need for selection to have given the“best designed” or most stable sequence, it is only necessary forevolution to have found a sequence that folds to a functionalstructure sufficiently often. To find the folded structure quickly andreliably the physical energy landscape must have a low energy EFbelow some threshold for at least two reasons. One of these reasonsis to avoid kinetic trapping, as we see indeed natural sequences dobut also it is necessary for survival of the species to avoid beingunstable against next generation mutations that would inevitablyoccur in a finite population of organisms [91]. The first selectionconstraint based on kinetic trapping would act on the organismpossessing the protein itself. Kinetic trapping (if EF would be closeto Eg, the typical energy of a trap) will create long-lived misfoldedproteins. Such long lived species could lead to aggregation andproblems with protein trafficking. It is often thought that this is anissue in giving rise to the numerous neurodegenerative diseases.The second selection constraint really acts at the population levelnot at the level of the presently living individual organism. Somefraction of the offspring of a viable organism which has a proteinwith energy EF will, after a mutation has occurred now becomeunviable. Again the closer EF is to Eg the more this poor perfor-mance in the offspring will be a problem for the species, as a whole.Whatever is the actual origin of the selection constraint on EF, if this

Page 9: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

Fig. 6. The distributions of energies in sequence and configuration space. The sche-matic spectrum in sequence space is shown superimposed on the configurationalenergy spectrum. Notice that there are many sequences that fold to the same targetstructure because the selection temperature Tsel is greater than the sequence spaceglass transition temperature. This temperature in turn is lower than the structuralspace glass transition at Tg.

Fig. 8. The correlation between the evolutionary and physical folding landscape. Weevaluate the energies of structures using both a physical and a genomic energyfunction. The pairs shown in the figures correspond to partially folded protein struc-tures generated via molecular simulation using the AWSEM energy function for theIr69 family. Again the two landscapes turn out to be funneled and strongly correlatedas would be expected from the minimal frustration principle. The colors of the pointscorrespond to the fraction of native contacts formed in the sampled structure. Fordetails please see Ref. [46].

P.G. Wolynes / Biochimie 119 (2015) 218e230226

energy constraint is the only constraint, we can argue that aftersufficient sequence variation has occurred, one would end up withan ensemble of proteins whose sequences are random apart fromtheir satisfying a threshold constraint on their energy in the nativestructure EF. Since sequence space is cosmologically large, theselected sample would be equally well described by a Boltzmanndistribution for the energy E (a1,…aN) of a particular sequence ofamino acids (a1…aN) in the target structure [94e97]. This Boltz-mann distribution would be characterized by an effective temper-ature of selection Tsel. The lower the selection temperature Tsel thelower is EF and the more stable or “better designed” or better saidminimally frustrated the sequence would be. Thus as Tsel goesdown, TF gets larger relative to the glass transition temperature Tg. Ifone applies a random energy model ansatz to the sequence space,just as is done for the molten globule configuration space, one endsup with an elegant relation between the evolutionary funnelcharacterized by EF (and characteristic evolutionary temperatureTsel) and the physical folding funnel characterized by the same EF(but with different physical temperatures TF and Tg):

2TFTsel

¼ 1T2g

þ 1T2F

(3)

The key point is that the two “funnels” one in physical space, theother in sequence space have the same energies for native

Fig. 7. The correlation between physical and evolutionary folding landscapes. Weevaluate the energies, on using the physics based AWSEM energy function, the otherusing the direct contact approximation genomic based energy function both forscrambled sequences and for natural sequences in the 1r69 repressor family. Thesepairs of energies are then plotted. We see that both the physical and evolutionaryenergy landscapes have sizable gaps showing the minimally frustrated nature of theproteins. For details please see Ref 46.

structures and also would have similar distributions of decoy en-ergies. See Fig 6. An elementary derivation of this relation may befound in the supplementary material of Ref. [46]. This relation be-tween evolutionary and physical scales was first stated in this wayby Pande, Grosberg and Tanaka through more sophisticated replicaarguments [97].

The assumption of a Boltzmann distribution over sequences hasbeen taken as the basis of several new information theoretic ana-lyses of the genomic data for protein families which maximizesequence entropy given known correlations in aligned sequences[98]. Such analyses have recently become quite useful because ofthe extensive genome sequencing data now available. Genesequencing has made known to us, for many structural families,thousands of sequences all of which presumably fold to sensibly thesame structure in order to retain their functions. Using thesequence data, by quantifying co-evolution at distinct sites in theprotein, one can do “reverse statistical mechanics” to find the formof the energy function EF (a1,…aN) that would give such a sample ofsequences. Typically the information theoretic energy function isparametrized via site energies and pair interactions. One algorithmfor doing reverse statistical mechanics is known as direct couplinganalysis (DCA). This algorithm yields an energy function HDCA

Fig. 9. The smoothness of the folding funnel quantified by coevolutionary information.The TF/Tg ratios for several protein families are inferred using genomics and landscapetheory shown. The families are denoted by the PDB ID codes of the representativestructures which are described in Ref. [46]. TF/Tg measures the smoothness of a foldinglandscape. Higher values correspond to more ideal funnels. The red circle is an alter-nate way of making an estimate by comparing changes in evolutionary energies andexperimentally measured stability changes. The estimated TF/Tg ratios for all the nat-ural protein families studied are larger than one so the folding landscape is confirmedto be a funnel. The estimates are clustered around the value of TF/Tg ¼ 2.5 that wasestimated by Clementi and Plotkin through a comparison of measured F values withsimulated ones. For details please see Ref. [46].

Page 10: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230 227

which describes the evolutionary constraints at a given site and thecoevolution of different positions in the sequence. HDCA should bestrongly correlated with the physical energy function. Indeed it hasbeen shown that HDCA correlates well with the energy functionsdeveloped for structure prediction. Two plots showing the corre-lations are found in Figs. 7 and 8. One of these plots shows thecorrelations of the in sequence space contrasting random se-quences threaded on the native structure to the energies for naturalsequences known to fold to the target. The other plot evaluates boththe physical and evolutionary information theoretic energy forpartially folded structures generated by a molecular dynamicssimulation using the structure prediction force field near thefolding temperature. Computing the DCA energy for differingconfigurations involves the inclusion only of contact pair energyterms.

You can see in both plots that both energy functions generate agap between correct sequenceestructure pairs and incorrect ones:the evolutionary funnel landscape and the physical funneledlandscape are highly correlated.

This analysis also allows us to access the ratio TF/Tsel. We canthen use Equation (3) to find the corresponding TF/Tg ratios. Theseare shown for several families in Fig 9.

It is heartening that evolutionary data give rise to estimates forTF/Tg in the same range as those predicted by physical argumentsusing data from laboratory experiments. The values of the TF/Tgratio inferred in this way are larger than those first estimated byOnuchic et al., and are more in line with the values estimated byClementi and Plotkin, discussed earlier.

Fig. 10. Frustration serves a functional purpose. A diagram showing the minimally frustratedindicated in green. The frustrated interactions in the regions shown in red lead to alternatehinges. The lower panel shows the frustration levels at different sequence locations, the rfrustrated contacts. The black line indicates the local overlap of the two interconnectingenergetically frustrated regions. For details please see Ref. [103].

6. Frustration and function

Protein function requires specificity of interaction as well assufficient stability to live in the cell and in subsequent generationsof cells. In the protein world, standing there and looking pretty isnot always enough, however. Proteins act as nonlinear elements incellular networks in order to process environmental information.This nonlinearity necessitates motion and thus often an energylandscape with multiple distinct stable states. Because stability of alimited number of specific structures is so important to preventpromiscuous interactions, most of the individual interactions inproteins have evolved to give collectively strongly funneled land-scapes but some strategic parts of the sequences located at specificsites in the structure have been selected to be frustrated in order toallow both motion and interaction with partners. To quantify thisphenomenon requires tools for localizing the sources of frustration[18,99,100]. Localization of frustration can be detected by exam-ining the energy changes that would occur if only the local envi-ronment of a site were to be changed through sequence mutationor allowed to change by reconfigurational physical motion. Mini-mally frustrated native regions will have energies in the morestable part of the distribution of local energies; regions that areunstable in comparison with the bulk of the distribution of localenergies are frustrated. Checking for local frustration is relativelyeasy once good structures and adequate energy functions areknown [99], as is now the case. It has been shown that regionswhere highly frustrated interactions cluster often map onto sites ofallosteric change [100] or can identify binding sites [99] which then

web of interactions in two structural forms of e RhoA an allosteric protein. This web isnearly energetically degenerate configurations that allow these regions to function ased line indicating the number of frustrated contacts, green the number of minimallystructures. Notice the regions that move (and have low Q) correspond to the most

Page 11: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230228

have their frustration relieved once a binding partner is found andthen docked to the site. In contrast to the clustered, frustrated,functional regions, minimally frustrated interactions typically forma connected web throughout the protein that keeps subunits of theprotein relatively rigid. The combination of the protein havingmodules with rigidity along with specifically frustrated local re-gions that act as moveable elements justifies considering manylarge proteins as being nearly macroscopic “machines”. A key dif-ference of these molecular machines from their fully macroscopiccounterparts is that the minimally frustrated nature of most of themolecule allows the molecular protein to move by breaking or“cracking” locally and then re-assembling into a new configuration[101,102]. Local frustration analysis is easily automated and a Webserver showing both the web of minimally frustrated interactionsand the highly frustrated sites is available [103].

An example of the frustration patterns in an allosteric protein isshown for the RhoA oncogene in Fig 10.

7. Folding paradoxes revisited

Protein folding has turned out to be easy and that seems para-doxical. As we have seen Levinthal's original paradox can be avoi-ded in several ways, the most important turning out to be thatevolution has led to funneled energy landscapes. It is clear thatthere are other ways of simplifying the search through configura-tion by parsing out the configurational entropy piece meal such ascapillarity effects so that only part of the chain folds at any time[10,11] or by utilizing local structural signals that increase theprobability of the chain bending in the right places [85]. Never-theless the minimal frustration hypothesis has proved to be a mostfruitful tool for visualizing the folding mechanism and addressingprotein design and structure prediction. The quantitative form ofthe minimal frustration principle has been confirmed in severalways through detailed kinetic predictions. In additionwe have seenthe minimal frustration principle is confirmed to be the result ofnatural selection and random variation through the comparison ofthe landscape characteristics inferred from sequence co-evolutionand through folding physics. But doesn't this resolution of theLevinthal just raise another paradox? How did the random processof evolution find the minimally frustrated sequences?

Searching sequence space would seem to be a daunting taskin the same sense that the searching structure space seemed tobe in the Levinthal paradox. Fred Hoyle raised just such an issuein his science fiction work “The Black Cloud” [104]. Sequencespace is indeed cosmologically larger than structure space sohow does evolution find minimally frustrated sequences? Theanswer seems to be that while minimally frustrated sequencesare exponentially rare, they are dense enough in sequence spacethat connected paths of mutation can find minimally frustratedsequences if there is sufficient selection pressure. Indeed we cansee that there is a cosmologically large number of sequences thatfold even to a specific given structure. In Fig. 6 we show thedensity of both available structural states measuring the config-urational entropy and sequences at a given energy measuring thesequence entropy. The minimal frustration principle is quanti-tatively summarized by saying there is a gap between the foldedenergy and the glassy structural traps. Precisely becausesequence entropy is so much larger than structural entropy thedeepest most “well designed” sequence e the energy corre-sponding to a glass transition in sequence space will be muchbelow the typically selected folding energy EF. This excess ofstates signals the expected high connectivity of the sequencespace.

As the diversity of amino acid type interactions reduces one willencounter in sequence space a glass transition that will occur now

at a higher energy. Indeed it appears that such a coding crisis occurswhenwe are limited to 2 letter protein folding codes [105]. Findingfoldable sequences for some structures becomes problematic evenwith 3 letter folding codes, in the computer [53]. The idea thatinteraction diversity is necessary to resolve the Levinthal paradoxhas been confirmed in the laboratory where 5 distinct amino acidtypes have been shown to be sufficient for design [106] but fewernumbers do not suffice.

We see that energy landscape analysis suggests that findingminimally frustrated sequences is not too hard for a random pro-cess like evolution. This has also been confirmed via simulationstudies. Again when evolutionary searches are made in the com-puter using simplified models it has proved straightforward reachby trial and error sequences to fold into even quite complexstructures [107e109]. Evolving for specific function at a specificlocus turns out to entail evolving to fold globally as a prerequisite,according to the work of Sasai [108,109].

A rich problem, still unsolved however is how and when did theminimally frustrated sequences come into being in our world?Completely resolving this puzzle of natural history may be hardbecause folding appears as a phenomenon even in the earliest daysof life that we can now probe through evolutionary sequenceanalysis. Studies of the last universal common ancestor suggest thisorganism had a full complement of foldable proteins [110]. Sopresently there is a veil over the early steps of protein biogenesis.There is hope, however, that the process of evolving foldability isstill going on today. Eukaryotes encode protein with split genes.Gilbert has proposed that these exons might be themselvesexchangeable units [111] and G�o made the case for this for hemo-globin [112]. Using energy landscape analysis Panchenko et al.showed that indeed many exons seemed to be minimally frustratedfolding units. We called these units “foldons” [113,114] nearlytwenty years ago. The case that foldons were exons was equivocalin 1995. The much larger amount of sequence data and better en-ergy functions that are available today should lead to a re-examination of the foldoneexon correspondence.

There is also evidence that we can see minimal frustrationevolving in real time today. During the somatic evolution of anti-bodies [115], Romesberg and co-workers have examined the dy-namics of antibodies that have been made against fluorescent dyes.The motions of the dyes bound to the antibodies can then be pro-bed spectroscopically. His investigations suggest that conforma-tional substates of the antibodyedye complex disappear as theantibody evolves and that the energy landscape gets smoother andsmoother as successive rounds of selection occur so that the anti-body becomes a better binder. This is very much consistent withthe mechanism envisioned by Sasai for evolving folding throughselection pressure on specific binding at a given locus.

Thus while there is no new paradox to the trick of havingevolved funneled landscapes it seems there is much more that weshould be able to learn about evolution and folding in the nearfuture. The union of energy landscape theory andmodern genomicsshould be very fruitful.

Conflict of interest

There is no conflict of interest.

Acknowledgments

The author gratefully acknowledges support from NIH GrantsR01 GM044557 and P01 GM071862 from the National Institute ofGeneral Medical Sciences, and the Center for Theoretical BiologicalPhysics sponsored by the NSF (PHY-0822283). Generous supporthas also been provided by the D.R. Bullard-Welch Chair at Rice

Page 12: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230 229

University (Grant #C-0016). I am also happy to have the opportu-nity to thank all of my collaborators over the years who havecontributed to the success of energy landscape theory described inthis review. I give extra thanks to Nick Schafer, Bobby Kim andDiego Ferreiro who developed the figures in this manuscript.

References

[1] P.G. Wolynes, Three paradoxes of protein folding, in: H. Bohr, S. Brunak (Eds.),Proceedings on Symposium on Protein Folds: a Distance-based Approach,SymposiumDistance-basedApproaches to Protein StructureDetermination II;Copenhagen, November 1994, CRC Press, Boca Raton, FL, 1995, pp. 3e17.

[2] C.B. Anfinsen, Studies on the Principles that Govern the Folding of ProteinChains, Nobel Lecture, National Institutes Of Health, Bethesda, MD,December 11, 1972, http://www.nobelprize.org/nobel_prizes/chemistry/laureates/1972/anfinsen-lecture.pdf.

[3] G. Stent, That was the molecular biology that was, Science 160 (1968)390e395.

[4] R.F. Service, Problem-solved* (*sort of), Science 321 (2008) 784e786.[5] N.P. Schafer, B.L. Kim, W. Zheng, P.G. Wolynes, Learning to fold proteins

using energy landscape theory, Isr. J. Chem. 53 (2013) 1e28.[6] C.A. Rohl, C.E.M. Strauss, K.M.S. Misura, D. Baker, Protein structure prediction

using Rosetta, Methods Enzymol. 383 (2004) 66e93.[7] S. Piano, K. Lindorff-Larsen, D.E. Shaw, Protein folding kinetics and thermo-

dynamics from atomistic simulation, Proc. Natl. Acad. Sci. U. S. A. 109 (2012)17845e17850.

[8] C. Levinthal, Are there pathways for protein folding, J. Chim. Phys. 65 (1968)44e45.

[9] R. Unger, J. Moult, Finding the lowest free energy conformation of a protein isan NP-hard problem: proof and implications, Bull. Math. Biol. 55 (1993)1183e1198.

[10] A.V. Finkelstein, A.Y. Badretdinov, Rate of protein folding near the point ofthermodynamic equilibrium between the coil and the most stable chain field,Fold. Des. 2 (1997) 115e121.

[11] P.G. Wolynes, Folding funnels and energy landscapes of larger proteinswithin the capillarity approximation, Proc. Natl. Acad. Sci. U. S. A. 94 (1997)6170e6174.

[12] T.F. Smith, M.S. Waterman, Identification of common molecular sub-sequences, J. Mol. Biol. 147 (1981) 195e197.

[13] D. Thirumalai, N. Lee, S. Woodson, D.K. Klimov, Early events in RNA folding,Ann. Rev. Phys. Chem. 52 (2001) 751e762.

[14] D. Baker, J.L. Sohl, D.A. Agard, A protein folding reaction under kinetic con-trol, Nature 356 (1992) 263e265.

[15] J.D. Bryngelson, P.G. Wolynes, Spin glasses and the statistical mechanics ofprotein folding, Proc. Natl. Acad. Sci. U. S. A. 84 (1987) 7524e7528.

[16] J.D. Bryngelson, P.G. Wolynes, Intermediates and barrier crossing in arandom energy model (with applications to protein folding), J. Phys. Chem.93 (1989) 6902e6915.

[17] B. Derrida, Random-energy model: an exactly solvable model of disorderedsystems, Phys. Rev. B 2 (1981) 2613.

[18] D.U. Ferreiro, E.A. Komives, P.G. Wolynes, Frustration in biomolecules, Q. Rev.Biophys. 47 (2014) 285e363.

[19] E.I. Shakhnovich, A.M. Gutin, Formation of unique structure in polypeptidechains: theoretical investigation with the aid of a replica approach, Biophys.Chem. 34 (1989) 187e199.

[20] M. Sasai, P.G. Wolynes, Molecular theory of associative memory Hamiltonianmodels of protein folding, Phys. Rev. Lett. 65 (1990) 2740e2743.

[21] J. Bryngelson, J. Onuchic, N. Socci, P.G. Wolynes, Funnels, pathways, and theenergy landscape of protein folding: a synthesis, Proteins Struct. Funct.Genet. 21 (1995) 167e195.

[22] E.I. Shakhnovich, Theoretical studies of protein folding thermodynamics andkinetics, Curr. Opin. Struct. Biol. 7 (1997) 29e40.

[23] K.A. Dill, H.S. Chan, From levinthal to pathways to funnels, Nat. Struct. Biol. 4(1997) 10e19.

[24] V. Lubchenko, P.G. Wolynes, Theory of structural glasses and supercooledliquids, Ann. Rev. Phys. Chem. 58 (2007) 235e266.

[25] J.N. Onuchic, P.G. Wolynes, Theory of protein folding, Curr. Opin. Struct. Biol.14 (2004) 70e75.

[26] M. Oliveberg, P.G. Wolynes, The experimental survey of protein folding en-ergy landscapes, Q. Rev. Biophys. 38 (2005) 245e288.

[27] B.A. Shoemaker, J. Wang, P.G. Wolynes, Structural correlations in proteinfolding funnels, Proc. Natl. Acad. Sci. U. S. A. 94 (1997) 777e782.

[28] P.E. Leopold, M. Montal, J.N. Onuchic, Protein folding funnels: a kineticapproach to the sequenceestructure relationship, Proc. Natl. Acad. Sci. U. S.A. 89 (18) (1992) 8721e8725.

[29] R. Goldstein, Z. Luthey-Schulten, P.G. Wolynes, Optimal protein-folding codesfrom spin-glass theory, Proc. Natl. Acad. Sci. U. S. A. 89 (1992) 4918e4922.

[30] A.R. Dinner, V. Abkevich, E. Shakhnovich, M. Karplus, Factors that affect thefolding ability of proteins, Proteins Struct. Funct. Genet. 35 (1999) 34e40.

[31] R. M�elin, H. Li, N.S. Wingreen, C. Tang, Designability, thermodynamic sta-bility, and dynamics in protein folding: a lattice model study, J. Chem. Phys.110 (1999) 1252.

[32] M.S. Friedrichs, P.G. Wolynes, Toward protein tertiary structure recognitionby means of associative memory Hamiltonians, Science 246 (1989) 371e373.

[33] P.G. Wolynes, Spin glass ideas and the protein folding problems, in: D. Stein(Ed.), Spin Glasses and Biology, World Scientific Press, Singapore, 1992, pp.225e259.

[34] N.D. Socci, J.N. Onuchic, P.G. Wolynes, Protein folding mechanisms and themultidimensional folding funnel, Proteins Struct. Funct. Genet. 32 (1998)136e158.

[35] J.N. Onuchic, Z. Luthey-Schulten, P.G. Wolynes, Theory of protein folding: theenergy landscape perspective, Ann. Rev. Phys. Chem. 48 (1997) 545e600.

[36] M. Karplus, Behind the folding funnel diagram, Nat. Chem. Biol. 7 (2011)401e404.

[37] G.R. Bowman, V.S. Pande, Protein folded states are kinetic hubs, Proc. Natl.Acad. Sci. U. S. A. 107 (2010) 10890e10895.

[38] C.R. Schwantes, V.S. Pande, Improvements in Markov state model con-struction reveal many non-native interactions in the folding of NTL9,J. Chem. Theory Comput. 9 (2013) 2000e2009.

[39] R.A. Goldstein, Z.A. Luthey-Schulten, P.G. Wolynes, Protein tertiary structurerecognition using optimized Hamiltonians with local interactions, Proc. Natl.Acad. Sci. U. S. A. 89 (1992) 9029e9033.

[40] C. Hardin, M. Eastwood, Z. Luthey-Schulten, P.G. Wolynes, Associativememory Hamiltonians for structure prediction without homology: alpha-helical proteins, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 14235e14240.

[41] G.A. Papoian, J. Ulander, M.P. Eastwood, Z. Luthey-Schulten, P.G. Wolynes,Water in protein structure prediction, Proc. Natl. Acad. Sci. U. S. A. 101 (2014)3352e3357.

[42] G.A. Papoian, J. Ulander, P.G. Wolynes, The role of water mediated in-teractions in protein-protein recognition landscapes, J. Am. Chem. Soc. 125(2003) 9170e9178.

[43] J.N. Onuchic, P.G. Wolynes, Z. Luthey-Schulten, N.D. Socci, Toward an outlineof the topography of a realistic protein-folding funnel, Proc. Natl. Acad. Sci. U.S. A. 92 (1995) 3626e3630.

[44] H. Kaya, H.S. Chan, Polymer principles of protein calorimetric two-statecooperativity, Proteins 40 (2000) 637e661.

[45] C. Clementi, S.S. Plotkin, The effects of nonnative interactions on proteinfolding rates: theory and simulation, Prot. Sci. 13 (2004) 1750e1766.

[46] F. Morcos, N.P. Schafer, R.R. Cheng, J.N. Onuchic, P.G. Wolynes, Coevolu-tionary information, protein folding landscapes and the thermodynamics ofnatural selection, Proc. Natl. Acad. Sci. U. S. A. 111 (2014) 12408e12413.

[47] H. Kenzaki, N. Koga, N. Hori, R. Kanada, W.F. Li, K. Okazaki, X.Q. Yao,S. Takada, CafeMol: a coarse-grained biomolecular multitop for simulatingproteins at work, J. Chem. Theory Comput. 7 (2011) 1979e1989.

[48] J.K. Noel, P.C. Whitford, K.Y. Sanbonmatsu, J.N. Onuchic, SMOG@ ctbp:simplified deployment of structure-based models in GROMACS, NucleicAcids Res. 38 (Web server issue) (Jul 2010) W657eW661, http://dx.doi.org/10.1093/nar/gkg498. Epub 2010 Jun 4.

[49] N. G�o, Theoretical studies of protein folding, Ann. Rev. Biophys. Bioeng. 12(1983) 183e210.

[50] Y. Levy, S.S. Cho, T. Shen, J.N. Onuchic, P.G. Wolynes, Symmetry and frus-tration in protein energy landscapes: a near-degeneracy resolves the ROPdimer folding mystery, Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 2373e2378.

[51] L. Sutto, J. L€atzer, J. Hegler, D. Ferreiro, P.G. Wolynes, Consequences oflocalized frustration for the folding mechanism of the IM7 protein, Proc. Natl.Acad. Sci. U. S. A. 104 (2007) 19825e19830.

[52] S.S. Cho, Y. Levy, P.G. Wolynes, Quantitative criteria for native energeticheterogeneity influences in the prediction of protein folding kinetics, Proc.Natl. Acad. Sci. U. S. A. 106 (2009) 434e439.

[53] H.H. Truong, B.L. Kim, N.P. Schafer, P.G. Wolynes, Funneling and frustrationin the energy landscapes of some designed and simplified proteins, J. Chem.Phys. 139 (2013) 121908.

[54] S.C. Yang, S.S. Cho, Y. Levy, M.S. Cheung, H. Levine, P.G. Wolynes,J.N. Onuchic, Domain swapping is a consequence of minimal frustration,Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 13786e13791.

[55] J.U. Bowie, R. Luthy, D. Eisenberg, A method to identify protein sequencesthat fold into a known three-dimensional structure, Science 253 (1991)164e170.

[56] P.A. Alexander, Y. He, Y. Chen, J. Orban, P.N. Bryan, A minimal sequencedecode for switching protein structure and function, Proc. Natl. Acad. Sci. U.S. A. 106 (2009) 21149e21154.

[57] J.R. Telford, P. Wittung-Stafshede, H.G. Gray, J.R. Winkler, Protein foldingtriggered by electron transfer, Acc. Chem. Res. 31 (1998) 755e763.

[58] K.W. Plaxco, K.T. Simons, D. Baker, Contact order transition state placementand the refolding rates of single domain proteins, J. Mol. Biol. 277 (1998)985e994.

[59] A.A. Nickson, B.G. Wensley, J. Clarke, Take home lessons from the studies ofrelated proteins, Curr. Opin. Struct. Biol. 23 (2013) 66e74.

[60] C. Clementi, H. Nymeyer, J.N. Onuchic, Topological and energetic factors:what determines the structural details of the transition state ensemble and“en-route” intermediates for protein folding? An investigation for smallglobular proteins, J. Mol. Biol. 298 (2000) 937e953.

[61] A.R. Fersht, Structure and Mechanism in Protein Science: a Guide to EnzymeCatalysis and Protein Folding, W.H. Freeman, New York, 1999.

[62] J.N. Onuchic, N.D. Socci, Z. Luthey-Schulten, P.G. Wolynes, Protein foldingfunnels: the nature of the transition state ensemble, Fold. Des. 1 (1996)441e450.

Page 13: Evolution, energy landscapes and the paradoxes of protein …binf.gmu.edu/vaisman/binf731/biochimie2015_wolynes.pdfPeter G. Wolynes Department of Chemistry and Center for Theoretical

P.G. Wolynes / Biochimie 119 (2015) 218e230230

[63] M. Munson, R.L. Regan, R. O'Brien, J.M. Sturtevant, Redesigning the hydro-phobic core of a four-helix-bundle protein, Prot. Sci. 3 (1994) 2015e2022.

[64] Y. Gambin, A. Schug, E.A. Lemke, J.J. Lavinder, T.J. Magliery, J.N. Onuchic,A.A.Deniz,Direct single-moleculeobservationofaprotein living in twoopposednative structures, Proc. Natl. Acad. Sci. U. S. A. 106 (2009) 10153e10158.

[65] A. Schug, P.C. Whitford, Y. Levy, J.N. Onuchic, Mutations as trapdoors to twocompeting native conformations of the Rop-dimer, Proc. Natl. Acad. Sci. U. S.A. 104 (2007) 17674e17679.

[66] A.P. Capaldi, C. Leanthous, S.E. Radford, Im7 folding mechanism: misfoldingon the path to the native state, Nat. Struct. Biol. 9 (2002) 209e216.

[67] A.M. Figueiredo, S.B.-M. Whitaker, S.E. Knowling, S.E. Radford,G.R. Moore, Conformational dynamics is more important than helicalpropensity for the folding of the all a-helical protein Im7, Prot. Sci. 22(2013) 1722e1738.

[68] C. Zong, C.J. Wilson, T. Shen, P.G. Wolynes, P. Wittung-Stafshede, n-Valueanalysis of apo-azurin folding: comparison between experiment and theory,Biochemistry 45 (2006) 6458e6466.

[69] C. Zong, C.J. Wilson, T. Shen, P. Wittung-Staffshede, S. Mayo, P.G. Wolynes,Establishing the entatic state in folding metallated pseudomonas Aeruginosaazurin, Proc. Natl. Acad. Sci. U. S. A. 104 (2007) 3159e3164.

[70] T. Shen, C.P. Hoffman, M. Oliveberg, P.G. Wolynes, Scanning malleabletransition state ensembles: coupling theory and experiment for foldingprotein UnA, Biochemistry 44 (2005) 6433e6439.

[71] J. Portman, S. Takada, P.G. Wolynes, Microscopic theory of protein foldingrates. I. Fine structure of the free energy profile and folding routes from avariational approach, J. Chem. Phys. 114 (2001) 5069e5081.

[72] S. Takada, G�o-ing for the prediction of folding mechanism, Proc. Natl. Acad.Sci. U. S. A. 96 (1999) 11698e11700.

[73] P. Das, C.J. Wilson, G. Fosatti, P. Wittung-Stafshede, K.S. Mathews,C. Clementi, Characterization of the folding landscape of monomeric lactoserepressor: quantitative comparison of theory and experiment, Proc. Natl.Acad. Sci. U. S. A. 102 (2005) 14569e14574.

[74] V. Krivov, M. Karplus, Hidden complexity of free energy surfaces for peptide(protein) folding, Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 14766e14770.

[75] D.L. Ensign, P.M. Klassen, V.S. Pande, Heterogeneity even at the speed limit offolding: large scale molecular dynamics study of fast folding variant of theVillin head piece, J. Mol. Biol. 374 (2007) 806e816.

[76] V.A. Voelz, G.R. Bowman, K. Beauchamp, V.S. Pande, Molecular simulation ofab initio protein folding for a millisecond folder NTL9(1-39), J. Am. Chem.Soc. 132 (2010) 1526.

[77] E.R. Henry, R.B. Best, W.A. Eaton, Comparing a simple theoretical model forprotein folding with all-atom molecular dynamics simulations, Proc. Natl.Acad. Sci. U. S. A. 110 (2013) 17880e17885.

[78] R.B. Best, G. Hummer, W.A. Eaton, Native contacts determine protein foldingmechanisms in atomistic simulations, Proc. Natl. Acad. Sci. U. S. A. 110 (2013)17874e17879.

[79] N.D. Socci, J.N. Onuchic, P.G. Wolynes, Diffusive dynamics of the reactioncoordinate for protein folding funnels, J. Chem. Phys. 104 (1996) 5860e5868.

[80] R.G. Best, G. Hummer, Coordinate depleted diffusion in protein folding, Proc.Natl. Acad. Sci. U. S. A. 107 (2010) 1088e1093.

[81] M. Gruebele, The fast protein folding problem, Ann. Rev. Phys. Chem. 50(1999) 485e516.

[82] B.G. Wensley, S. Batey, F.A.C. Bone, Z.M. Chan, N.R. Tumelty, Experimentalevidence for a frustrated energy landscape in a three-helix bundle proteinfamily, Nature 463 (2010) 685e688.

[83] H.S. Chung, W.A. Eaton, Single molecule fluorescence probes dynamics ofbarrier crossing, Nature 502 (2013) 685e688.

[84] Z. Luthey-Schulten, B.E. Ramirez, P.G. Wolynes, Helix-coil, liquid crystal andspin glass transitions of a collapsed heteropolymer, J. Phys. Chem. 99 (1995)2177e2185.

[85] J. Saven, P.G. Wolynes, Local conformational signals and the statisticalthermodynamics of collapsed helical proteins, J. Mol. Biol. 57 (1996)199e216.

[86] J. Baum, C.M. Dobson, P.A. Evans, C. Hanley, Characterization of a partlyfolded protein by NMR methodsestudies of the molten globule state ofguinea-pig alpha lactalbumin, Biochemistry 28 (1989) 7e13.

[87] J. Wang, S. Plotkin, P.G. Wolynes, Configurational diffusion on a locallyconnected correlated energy landscape; application to finite, random het-eropolymers, J. Phys. I 7 (1997) 395e421.

[88] S.S. Plotkin, J. Wang, P.G. Wolynes, Statistical mechanics of a correlated en-ergy landscape model for protein folding funnels, J. Chem. Phys. 106 (1997)2932e2948.

[89] K.K. Koretke, Z. Luthey-Schulten, P.G. Wolynes, Self-consistently optimizedenergy functions for protein structure prediction by molecular dynamics,Proc. Natl. Acad. Sci. U. S. A. 95 (1998) 2932e2937.

[90] A. Davtyan, N.P. Schafer, W. Zheng, C. Clementi, P.G. Wolynes, G.A. Papoian,AWSEM-MD: protein structure prediction using coarse-grained physicalpotentials and bioinformatically based local structure biasing, J. Phys. Chem.116 (2012) 8494e8850.

[91] B.L. Kim, N.P. Schafer, P.G. Wolynes, Predictive energy landscapes for foldinga-helical transmembrane proteins, Proc. Natl. Acad. Sci. U. S. A. 111 (2014)11031e11036.

[92] C. Chothia, A.M. Lesk, The relations between divergence of sequence andstructure in proteins, EMBO J. 5 (1986) 823e826.

[93] E. Bornberg-Bauer, H.S. Chan, Modeling evolutionary landscapes: mutationalstability, topology, and superfunnels in sequence space, Proc. Natl. Acad. Sci.U. S. A. 96 (1999) 10689e10694.

[94] N.V. Dokholyan, E.I. Shakhnovich, Understanding hierarchical protein evo-lution from first principles, J. Mol. Biol. 312 (2001) 289e307.

[95] S. Ramanathan, E. Shakhnovich, Statistical mechanics of protein withevolutionarily selected sequences, Phys. Rev. E 1994 (50) (1994)1303e1312.

[96] J.G. Saven, P.G. Wolynes, Statistical mechanics of the combinatorial synthesisand analysis of folding macromolecules, J. Phys. Chem. B 101 (1997)8375e8389.

[97] V.S. Pande, A.Y. Grosberg, T. Tanaka, Statistical mechanics of simple models,Biophys. J. 73 (1997) 3192e3210.

[98] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D.S. Marks, C. Sander, R. Zecchina,J.N. Onuchic, T. Hwa, M. Weigt, Direct coupling analysis of residue coevo-lution captures native contacts across many protein families, Proc. Natl.Acad. Sci. U. S. A. 108 (2011) E1293eE1301.

[99] D. Ferreiro, J. Hegler, E. Komives, P.G. Wolynes, Localizing frustration innative proteins and protein assemblies, Proc. Natl. Acad. Sci. U. S. A. 104(2007) 19819e19824.

[100] D.U. Ferreiro, J.A. Hegler, E.A. Komives, P.G. Wolynes, On the role of frus-tration in the energy landscapes of allosteric proteins, Proc. Natl. Acad. Sci. U.S. A. 108 (2011) 3499e3505.

[101] O.M. Miyashita, J.N. Onuchic, P.G. Wolynes, Nonlinear elasticity, proteinquakes and the energy landscapes of functional transitions in proteins, Proc.Natl. Acad. Sci. U. S. A. 100 (2003) 1679e1684.

[102] O. Miyashita, P.G. Wolynes, J.N. Onuchic, Simple energy landscape model forthe kinetics of functional transitions in proteins, J. Phys. Chem. B 109 (2005)1959e1969.

[103] M. Jenik, R. Gonzalo Parra, L.G. Radusky, A. Turjanski, P.G. Wolynes,D.U. Ferreiro, Protein frustratometer: a tool to localize energetic frustrationin protein molecules, Nucleic Acids Res. 40 (2012) 348e351.

[104] F. Hoyle, The Black Cloud, William Heinemann Ltd, London, 1957.[105] P.G. Wolynes, As simple as can be? Nat. Struct. Biol. 4 (1997) 871e874.[106] D.S. Riddle, J.V. Santiago, S.T. Bray-Hall, N. Doshi, V.P. Grantcharova, Q. Yi,

D. Baker, Functional rapidly folding proteins from simplified amino acidsequences, Nat. Struct. Biol. 4 (1997) 805e809.

[107] A.M. Gutin, V.I. Abkevich, E.I. Shakhnovich, Evolutionethe selection of fastfolding model protein, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 1282e1286.

[108] S. Saito, M. Sasai, T. Yomo, Evolution of folding ability of proteinsthrough functional selection, Proc. Natl. Acad. Sci. U. S. A. 94 (1997)11324e11328.

[109] C. Nagao, T.P. Terada, T. Yomo, M. Sasai, Correlation between evolutionarystructural development and protein folding, Proc. Natl. Acad. Sci. U. S. A. 102(2005) 18950e18955.

[110] C.G. Kurland, B. Canb€ack, O.G. Berg, The origin of modern proteomes, Bio-chimie 89 (2007) 1454e1463.

[111] W. Gilbert, Why genes in pieces? Nature 271 (1978) 501.[112] M. G�o, Correlation of DNA exonic regions with protein structural units in

haemoglobin, Nature 291 (1981) 90e92.[113] A.R. Panchenko, Z. Luthey-Schulten, P.G. Wolynes, Foldons, protein structural

modules and exons, Proc. Natl. Acad. Sci. U. S. A. 93 (1996) 2008e2013.[114] A.R. Panchenko, Z. Luthey-Schulten, R. Cole, P.G. Wolynes, The foldon uni-

verse: a survey of structural similarity and self-recognition of independentlyfolding units, J. Mol. Biol. 272 (1997) 95e105.

[115] M.C. Thielges, J. Zimmermann, W. Yu, M. Oda, F.E. Romesberg,Exploring the energy landscapes of antibody antigen complexes: pro-tein dynamics, flexibility and molecular recognition, Biochemistry 47(2008) 7237e7247.