a critical comparison of coarse-grained structure-based...

11
This journal is © the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13629 Cite this: Phys. Chem. Chem. Phys., 2017, 19, 13629 A critical comparison of coarse-grained structure-based approaches and atomic models of protein foldingJie Hu, a Tao Chen, b Moye Wang, a Hue Sun Chan * cd and Zhuqing Zhang * ac Structure-based coarse-grained G o-like models have been used extensively in deciphering protein folding mechanisms because of their simplicity and tractability. Meanwhile, explicit-solvent molecular dynamics (MD) simulations with physics-based all-atom force fields have been applied successfully to simulate folding/unfolding transitions for several small, fast-folding proteins. To explore the degree to which coarse-grained G o-like models and their extensions to incorporate nonnative interactions are capable of producing folding processes similar to those in all-atom MD simulations, here we systematically compare the computed unfolded states, transition states, and transition paths obtained using coarse-grained models and all-atom explicit-solvent MD simulations. The conformations in the unfolded state in common G o models are more extended, and are thus more in line with experiment, than those from all-atom MD simulations. Nevertheless, the structural features of transition states obtained by the two types of models are largely similar. In contrast, the folding transition paths are significantly more sensitive to modeling details. In particular, when common G o-like models are augmented with nonnative interactions, the predicted dimensions of the unfolded conformations become similar to those computed using all-atom MD. With this connection, the large deviations of all-atom MD from simple diffusion theory are likely caused in part by the presence of significant nonnative effects in folding processes modelled by current atomic force fields. The ramifications of our findings to the application of coarse-grained modeling to more complex biomolecular systems are discussed. Introduction Substantial progress has been made toward understanding protein folding by theory and computation. 1–5 Coarse-grained models, including structure-based models, have yielded signifi- cant insights into the basic principles of folding through modeling and rationalization of experimental thermodynamic and kinetic data. 6 The structure-based potentials in these models were based on the assumption that interactions favoring native contacts play a dominant role in determining the folding mechanism. 7 This assumption was motivated by the concept of ‘‘minimal frustration’’, which posits that proteins, at least the small, single-domain proteins that have been investigated exten- sively in folding studies, were evolved to fold in relatively smooth, ‘‘funneled’’ energy landscapes such that they can largely avoid kinetic traps by reducing or eliminating nonnative interactions. 3,8 In this conceptual context, coarse-grained Go¯-type constructs, where only residue–residue contacts present in the native struc- ture are deemed to be energetically favorable, have become widely used structure-based models in protein folding. 9 Within this general framework, significant developments include the incor- poration of cooperativity-enhancing desolvation barriers 10–12 and ‘‘hybrid’’ approaches that augment pure Go ¯-like models with sequence-specific hydrophobic and/or electrostatic non-native inter- actions, 13–16 with some models adopting amino-acid-dependent Miyazawa–Jernigan-type potentials instead of uniform native or nonnative interaction energy. 17,18 Recently, a hybrid methodology has been applied to explore more complex systems such as misfolding in multidomain proteins 19 and extended to afford an atomic account of evolution-like conformational switches. 20 Despite the insights they provided and their computational tractability, structure-based models are, obviously, fundamen- tally limited because they sidestep the fundamental question as a College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China. E-mail: [email protected] b College of Chemistry and Materials Science, Northwest University, Xi’an, 710127, China c Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada. E-mail: [email protected] d Department of Molecular Genetics, University of Toronto, Toronto, Ontario, M5S 1A8, Canada Electronic supplementary information (ESI) available. See DOI: 10.1039/c7cp01532a Received 9th March 2017, Accepted 8th May 2017 DOI: 10.1039/c7cp01532a rsc.li/pccp PCCP PAPER Published on 08 May 2017. Downloaded by University of Toronto on 01/06/2017 03:59:37. View Article Online View Journal | View Issue

Upload: others

Post on 25-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13629

Cite this:Phys.Chem.Chem.Phys.,

2017, 19, 13629

A critical comparison of coarse-grainedstructure-based approaches and atomicmodels of protein folding†

Jie Hu, a Tao Chen, b Moye Wang, a Hue Sun Chan *cd andZhuqing Zhang *ac

Structure-based coarse-grained G�o-like models have been used extensively in deciphering protein

folding mechanisms because of their simplicity and tractability. Meanwhile, explicit-solvent molecular

dynamics (MD) simulations with physics-based all-atom force fields have been applied successfully

to simulate folding/unfolding transitions for several small, fast-folding proteins. To explore the degree

to which coarse-grained G�o-like models and their extensions to incorporate nonnative interactions

are capable of producing folding processes similar to those in all-atom MD simulations, here we

systematically compare the computed unfolded states, transition states, and transition paths obtained

using coarse-grained models and all-atom explicit-solvent MD simulations. The conformations in the

unfolded state in common G�o models are more extended, and are thus more in line with experiment,

than those from all-atom MD simulations. Nevertheless, the structural features of transition states

obtained by the two types of models are largely similar. In contrast, the folding transition paths are

significantly more sensitive to modeling details. In particular, when common G�o-like models are

augmented with nonnative interactions, the predicted dimensions of the unfolded conformations

become similar to those computed using all-atom MD. With this connection, the large deviations of

all-atom MD from simple diffusion theory are likely caused in part by the presence of significant nonnative

effects in folding processes modelled by current atomic force fields. The ramifications of our findings to the

application of coarse-grained modeling to more complex biomolecular systems are discussed.

Introduction

Substantial progress has been made toward understandingprotein folding by theory and computation.1–5 Coarse-grainedmodels, including structure-based models, have yielded signifi-cant insights into the basic principles of folding throughmodeling and rationalization of experimental thermodynamicand kinetic data.6 The structure-based potentials in thesemodels were based on the assumption that interactions favoringnative contacts play a dominant role in determining the foldingmechanism.7 This assumption was motivated by the concept of‘‘minimal frustration’’, which posits that proteins, at least the

small, single-domain proteins that have been investigated exten-sively in folding studies, were evolved to fold in relatively smooth,‘‘funneled’’ energy landscapes such that they can largely avoidkinetic traps by reducing or eliminating nonnative interactions.3,8

In this conceptual context, coarse-grained Go-type constructs,where only residue–residue contacts present in the native struc-ture are deemed to be energetically favorable, have become widelyused structure-based models in protein folding.9 Within thisgeneral framework, significant developments include the incor-poration of cooperativity-enhancing desolvation barriers10–12 and‘‘hybrid’’ approaches that augment pure Go-like models withsequence-specific hydrophobic and/or electrostatic non-native inter-actions,13–16 with some models adopting amino-acid-dependentMiyazawa–Jernigan-type potentials instead of uniform native ornonnative interaction energy.17,18 Recently, a hybrid methodologyhas been applied to explore more complex systems such asmisfolding in multidomain proteins19 and extended to afford anatomic account of evolution-like conformational switches.20

Despite the insights they provided and their computationaltractability, structure-based models are, obviously, fundamen-tally limited because they sidestep the fundamental question as

a College of Life Sciences, University of Chinese Academy of Sciences, Beijing,

100049, China. E-mail: [email protected] College of Chemistry and Materials Science, Northwest University, Xi’an, 710127,

Chinac Department of Biochemistry, University of Toronto, Toronto, Ontario,

M5S 1A8, Canada. E-mail: [email protected] Department of Molecular Genetics, University of Toronto, Toronto, Ontario,

M5S 1A8, Canada

† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7cp01532a

Received 9th March 2017,Accepted 8th May 2017

DOI: 10.1039/c7cp01532a

rsc.li/pccp

PCCP

PAPER

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article OnlineView Journal | View Issue

Page 2: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

13630 | Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 This journal is© the Owner Societies 2017

to the physical origins of the native-centric interactions, a questionthat has to be addressed ultimately by physics-based approacheswith transferrable, rather than structure-specific, interactionpotentials.21 Recent years have witnessed much progress in thisregard. In particular, Shaw and coworkers have conducted longexplicit-solvent simulations of atomistic folding trajectoriesthat successfully folded more than ten small proteins.22–24 Thisadvance indicates that existing force fields can capture theessential physics of protein folding, such as some – albeit notall – two-state-like aspects of globular protein folding, at leastwith sufficient adequacy for small proteins. The simulated longtrajectories are a valuable resource not only for comparisonwith experiments25–27 but also for assessment of theoreticalconcepts.28,29 For instance, based on statistical analyses ofthese trajectories, it has been concluded that only nativecontacts determine the protein folding mechanism,29 thoughthe scope and implication of this assertion may need to bemore carefully delimited.16 In another application for the caseof the villin subdomain, it has been shown that a non-explicit-chain native-centric Ising-like theoretical model is capable ofpredicting a distribution of folding transition paths similar tothat obtained using all-atom explicit-solvent simulations.28

These findings suggest that the more tractable structure-basedapproaches are useful complementary tools for physical modelingof protein folding.

In this light, it is instructive to explore more systematicallythe extent to which predictions from structure-based explicit-chain models are similar to those from all-atom explicit-solventmolecular dynamics (MD) simulations (referred to simply asMD or all-atom MD simulations below for brevity when themeaning is clear from the textual context). Here, for four proteins,we conduct such a comparison between the results obtained fromstructure-based coarse-grained Ca-chain models as well as fromextensions of such models with augmented nonnative inter-actions on one hand, and trajectories from MD simulations offolding obtained by Shaw and coworkers on the other. Our studyindicates that the contact patterns in the folding transition statesfor the coarse-grained structure-based models are similar to thosefor the explicit-solvent MD simulations. However, we observedmore appreciable discrepancies between the purely native-centricmodels and MD simulations for the properties related to unfoldedstates and transition paths. Interestingly, augmenting the native-centric models with sequence-dependent nonnative interactionstends to decrease such discrepancies with respect to unfolded-state averages of the radius of gyration and the relative contactorder, suggesting that the discrepancies originate in part from thepresence of significant nonnative effects in MD simulations.

Models and methods

The present study focuses mainly on four fast-folding proteins,namely the WW domain (PDB ID 2F21), NTL9 (PDB ID 2HBA),the l-repressor (PDB ID 1LMB) and protein G (PDB ID 1IM0).The WW domain is all-b, the l-repressor is all-a, whereas NTL9and protein G are a/b proteins. All four were determined

experimentally to be essentially two-state folders, a propertythat has also been observed in all-atom MD simulations21 andcommon Go-model simulations (see below).

Several explicit-chain Ca protein chain models were usedin this study. Here, the ‘‘common Go model’’ refers to theconstruct with desolvation barriers in the native-centric energyterms as prescribed in ref. 30 and 31. We also consideredextensions of this model that are augmented by sequence-dependent nonnative hydrophobic effects, with nonnativeinteraction strengths k = 1.0 and 1.2 as we have specifiedpreviously.13,32 Here we refer to these augmented models,respectively, as ‘‘Go + HP (k = 1.0)’’ and ‘‘Go + HP (k = 1.2)’’.In addition to these three coarse-grained models, we alsoconsidered two other Ca chain models with more complexinteraction schemes: a ‘‘Go + MJ’’ model that uses heterogeneousrather than homogeneous contact energies for native interactions,and a ‘‘Go + mb’’ model that embodies physically plausible many-body local–nonlocal coupling effects.6,33 Formulations for thesemodels are provided in the ESI.† As before, native contacts weredetermined from PDB structures by using a 4.5 Å spatial distancecutoff between non-hydrogen atoms in different residues, and aredefined only between residues separated by more than threeamino acid positions along the protein chain sequence. Langevindynamics simulations were conducted near the transitionmidpoint, Tm, of each protein. During simulations of thesemodels, if the Ca–Ca distance between two residues that areidentified to be in contact in the PDB structure, as defined above,is no larger than 1.2 times that of the PDB contact distance,the residue pair is recognized as forming a native contact. Thedefinition of nonnative contact is identical to that reported inref. 13. For each coarse-grained model protein, no less than 100folding/unfolding transition paths were collected for statisticalanalysis.

We compared our coarse-grained simulations against thecorresponding all-atom MD simulation trajectories of Shawand coworkers’ study.24 To compare the models on an equalfooting, we used the extracted Ca trajectories from the all-atomMD simulations. For the present investigation, two all-atomexplicit-solvent MD trajectories with a total simulated time of1.13 ms and 13 folding transition paths (TPs) were available forthe WW domain; four trajectories (total simulated time 2.95 ms)with 24 TPs for NTL9; four trajectories (total 0.65 ms) with 21 TPsfor the l-repressor; and three trajectories (total 1.16 ms) with18 TPs for Protein G.24 To enrich our analysis of the trend ofvariation of conformational dimensions (radius of gyration) andcertain aspects of the diffusion picture of folding (see below), wehave also considered all-atom trajectories of the proteins villinand a3D from Shaw and coworkers24 as well as common Go andGo + HP model simulation data for three other two-state folders(PDB IDs 2CI2, 1SHF, and 1IMQ).

The fractional number of native contact Q (0 r Q r 1) wasused as the main progress variable in our folding analysis. Theunfolded-state (low-Q) minimum QU,min, peak value Qmax, andthe folded-state (large-Q) minimum QF,min were identified alongsimulated free energy profiles. Conformations with Q o QU,min +0.05 are considered to be in the unfolded state, those with

Paper PCCP

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 3: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13631

Q 4 QF,min � 0.05 to be in the folded state, and those satisfyingQmax � 0.025 o Q o Qmax + 0.025 to be in the transition state(TS). A transition path (TP) is defined as a trajectory that arrivesat the folded state directly after leaving the unfolded state, i.e.,a trajectory that never revisits the unfolded state after leavingit the first time before reaching the folded state. As in our recentstudy,17 f-values34 were computed using TS native contactprobabilities.

Explicit-chain simulation results were compared againstpredictions from one-dimensional diffusion theory of foldingto ascertain the adequacy of using such a simple picture tocharacterize key aspects of folding. A Kawasaki Monte Carlo algo-rithm was used to simulate diffusion on Q-based free energy profilesdeduced from all-atom MD simulations and coarse-grained modelchain simulations. To focus on the issues at hand, we restrict ouranalysis to simple processes with a constant, i.e., Q-independent,diffusion coefficient. Details of this approach are available fromref. 31. In this regard, the equivalence between Kawasaki andSmoluchowski dynamics (the former is a discretized version ofthe latter), which has been demonstrated in the SupportingInformation of this reference, is of particular relevance.

ResultsFree energy profiles of the folding/unfolding process fromdifferent explicit-chain models are similar for some but notall proteins

As in recent studies,29,35 simulated free energy profiles wereobtained as a function of fractional number of native contactsQ (Fig. 1). For each of the studied proteins, the commonGo-model profiles (red solid curves, note that for the WWdomain, the red solid curve overlaps almost completely withthe blue solid curve) and the all-atom MD profiles (black curves)exhibit similar Q-positions for the unfolded-state minimum andalso, separately, for the transition-state (TS) peak, with thecommon-Go TS positioning at only a slightly larger Q than theall-atom MD TS for NTL9 and Protein G. However, the TS barrieris higher for the common Go model than for all-atom MD forNTL9, the l-repressor, and Protein G, especially for the latter two.While this difference might be attributed to the high foldingcooperativity of the present common Go model with desolvationbarriers,30 the situation is reversed for the WW domain, whereinthe TS barrier is significantly lower for the common Go model.Replacing the homogeneous native-centric contact energies by aheterogeneous statistical potential (Go + MJ model; red dashedcurves) lowers the TS barrier for three of the studied proteins.Again, the WW domain is the exception: the TS barrier heights inthe common Go and Go + MJ models are essentially identical.

Augmenting the purely native-centric common Go modelwith favorable nonnative interactions13,15,36 (Fig. 1, solid blueand green curves) has almost no impact and minimal effect onthe free energy profiles, respectively, of the WW domain andNTL9; but the augmentation leads to significant loweringof the TS barrier for the l-repressor and Protein G. For thel-repressor, somewhat curiously, a stronger nonnative interaction

strength (k = 1.2) lowers the TS barrier but a slightly weakernonnative interaction strength (k = 1.0) does not. Except for theWW domain, the unfolded-state minimum and TS of the otherthree proteins studied are both shifted to higher Q values with theintroduction of favorable nonnative interactions. Implications ofthis feature will be discussed below. Incorporation of many-bodylocal–nonlocal coupling is expected to lead to higher foldingcooperativity and hence higher folding barriers.33 However, thistrend applies only to the WW domain and the l-repressor, andmost prominently to the all-a l-repressor; but the trend does notapply to the other two proteins (Go + mb model; blue dashedcurves).

The more extended unfolded-state conformations in the Gomodel agree better with experiment

We next compare the dimensions of the unfolded states predictedusing different studied models. Fig. 2 shows the average radius ofgyration, Rg, of the unfolded conformations as a function of thenumber of residues, N, in a protein. Lacking favorable nonnativecontacts, the unfolded conformations of the common Go modelon average have larger Rgs than all-atom MD. Seen againstexperimental data, the common Go model is clearly more realisticthan all-atom MD in this respect. Fitting the data in Fig. 2 to thescaling relation Rg B Nn yielded n = 0.513 for the common Gomodel, which is higher than the ideal random-coil37 n = 0.5though lower than the experimental38 n = 0.598 that agrees withtheoretical predictions for polymers with full excluded volume ingood solvents. In contrast, the explicit-solvent MD-simulatedRg shows little variation with N. This shortcoming of mostof the current force fields is now quite well recognized.39

Our simulations using coarse-grained models with favorable

Fig. 1 Folding cooperativity. The free energy profiles of four studiedproteins (PDB structures shown as ribbon diagrams) near their respectivetransition midpoints were computed using common-G�o (red solid curves)and G�o models augmented by nonnative interactions (green and bluecurves for k = 1.0 and 1.2 respectively) as well as from all-atom MDsimulation trajectories (ref. 24) (black curves). P(Q) is normalized confor-mational population as a function of Q. Corresponding free energy profilesfrom the G�o + MJ (red dashed curves) and G�o + mb (blue dashed curves)models are also included for comparison.

PCCP Paper

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 4: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

13632 | Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 This journal is© the Owner Societies 2017

nonnative interactions indicate that Rg decreases with increasingnonnative interaction strength (cf. the k = 1.0 green invertedtriangles and the k = 1.2 blue open squares in Fig. 2). The trendsuggests that the all-atom MD-simulated Rg (filled black squaresin Fig. 2) should be similar to that of a Go + HP model with k 41.2. This comparison indicates that the smaller-than-experimentunfolded-state Rgs observed in all-atom MD simulations likelyoriginate from nonnative interactions that are overly favored bythe force field used in the all-atom MD simulations.

The contact patterns of the unfolded states in the two modelsare shown in Fig. 3A. As expected from polymer conformationalstatistics, local contacts are more favored than nonlocal ones.Consistent with the observation in Fig. 2, contact probabilitymaps for all-atom MD (bottom-right of each panel) show higherprobabilities of nonlocal contacts (red and pink squares far fromthe main diagonal in Fig. 3A) than the corresponding maps forthe common Go model (top-left of each panel in Fig. 3A). Thisdifference is also indicated by the off-diagonal red squares in thetop-left parts of the four panels in Fig. 3B. Again consistent withFig. 2, when favorable nonnative interactions were added to thecommon Go model, the differences in unfolded-state contactprobabilities with MD predictions are significantly reduced forthree of the four proteins, with the WW domain as the exception(cf. the top-left and bottom-right triangles in Fig. 3B).

Transition-state contact patterns are generally similar acrossexplicit-chain models but predicted /-values are generallymodel-sensitive

Accurate characterization of the transition state (TS) is importantfor understanding the key kinetic events in the folding reaction.

Following the definition provided in Models and methods of TS asa collection (an ensemble) of conformations populating in anarrow region of Q around the peak of a protein’s simulated freeenergy profile, Fig. 3C provides and compares the native contactprobability maps of TS for the common Go model (top-left of eachpanel) and all-atom MD simulations (bottom-right of each panel).For all four proteins studied, the TS ensembles in the two modelsare largely similar. Nonetheless, certain local contacts are morepopulated in the common Go model than in all-atom MD. Theseregions include the NTL9 helix, the second helix in thel-processor, the b-turn between strands 1 and 2 in NTL9 andthat between strands 3 and 4 in Protein G. At the same time,nonlocal contacts tend to be slightly more populated in all-atomMD than in the common Go model, as can be seen around thetop-left and bottom-right corners of the panel representing con-tacts between the N- and C-terminal regions of the WW domain,NTL9 and Protein G.

f-Value analysis has been an influential methodology toinfer structural information about the folding and unfoldingtransition state from relatively straightforward folding kineticsexperiments.40–43 The comparison of f-values in the models

Fig. 2 Dimensions of the simulated unfolded state. The dependence ofmean radius of gyration (Rg) of unfolded conformations on the number ofresidues of proteins we studied is shown for the common G�o model (redopen circles: proteins in Fig. 1, red open triangles: PDB IDs 2CI2, 1SHF, and1IMQ), augmented G�o-like models (green inverse triangles and bluesquares for k = 1.0 and 1.2 respectively), all-atom explicit-solvent MDsimulations (black squares, including villin and a3D in addition to the fourproteins in Fig. 1), the G�o + MJ (red filled circles) and the G�o + mb (bluefilled triangles) models. The fitted red dash curve Rg = R0 � Nn is for thecommon G�o-model data points with best-fitted exponent n = 0.513.The black solid curve and the shaded band represent Rg = R0 � Nn withR0 = 1.330 and n = 0.598 � 0.028 obtained from fitting experimental dataon an extensive set of proteins (ref. 38).

Fig. 3 Contact patterns in the unfolded and transition states. The pre-dicted native contact probability maps of the unfolded (A) and transition(C) states computed by the common G�o model (top-left of each panel)and by all-atom MD simulations (bottom-right of each panel) are shownfor the four studied proteins. The differences between unfolded-statecontact probabilities predicted by different models are documented in (B),wherein the top-left of each panel denotes the results by all-atom MDminus those by the common G�o model, whereas the bottom-rightdenotes the results by all-atom MD minus those by the G�o + HP(k = 1.2) model. Nonnative contacts are not documented in this figure.

Paper PCCP

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 5: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13633

indicates that the common Go model predictions are similar tothat estimated from all-atom MD simulation trajectories for theWW domain, the l-repressor, and a majority of residues inProtein G. But the two sets of model f-values differ substan-tially for most of the residues in NTL9 and a long stretch ofC-terminal residues in Protein G. The latter discrepancy issignificant in view of other studies suggesting that terminalcontacts might have a strong influence on the transition-statefolding barrier height.17,44,45 Introduction of favorable nonna-tive interactions into the Go model as prescribed by the twoGo + HP schemes led only to relatively small changes inpredicted f-values for the WW domain, NTL9, and Protein G,whereas the changes are larger for the l-repressor. Among thefour proteins studied in Fig. 4, we were only able to retrieveexperimental f-values for the WW domain from the literature.46

The comparison of the magenta squares with the correspondingmodel-predicted f-values indicates clearly that both the coarse-grained models and current atomic force fields are far from adequatein providing a robust detailed account of experimental f-values.

Detailed conformational properties along transition andpre-transition paths are model dependent and strongly affectedby unfolded-state conformational compactness

The transition path (TP), though constituting only a very smallfraction of time within equilibrium folding trajectories, is themost critical part for deciphering the folding mechanism.Recent experiments on two small proteins indicate that thedurations of the TP for the two proteins are in the order ofmicroseconds, and they vary only very mildly with protein sizeand folding rate.47 Here we compare—within the same modeland across models—the averages of several relatively detailedconformational properties along TPs against the corresponding

averages along pre-transition paths (pre-TPs). We define apre-TP as the duration of a protein’s trajectory that lingers inthe unfolded state prior to reaching the TP. Essentially a pre-TPis the ‘‘folding path’’ (FP) considered in our previous work31

minus the TP. Because the TP generally represents only a verysmall fraction of the FP, we expect that pre-TP averaged con-formational properties to be essentially identical to the corres-ponding FP averages.

In addition to Rg, we monitored three other conformationalproperties (Fig. 5): the relative contact order (RCO),48 local–nonlocal coupling (LNC) and nonnative hydrophobic contactnumber. LNC, which we have formulated recently,33 quantifiesthe degree to which two local chain segments, centered respec-tively around each of two residues in a contact pair, adopt localconformations consistent with those in the native structure.The number of nonnative hydrophobic interactions is definedas in our previous work13 that introduced for the first time thetype of analysis in Fig. 5. For all-atom MD, the data in thisfigure show that the average RCO of TPs is mostly higher thanthat of pre-TPs for three of the four proteins, the WW domainbeing the exception. This general trend is consistent with ourprevious observation from the common Go model31 in whichwe compared TP- and FP-averaged RCO for eight other proteins.Results shown in Fig. 5 here for the coarse-grained Go andaugmented models largely follow the same trend although thedifferences in average RCO between TPs and pre-TPs are less

Fig. 4 Computed f-values are model-sensitive. The f-values predictedby common G�o model (red squares), augmented G�o models (k = 1.0 and1.2 are denoted by green circles and blue asterisks respectively) and fromall-atom MD trajectories (black triangles) are plotted. Lines connectingplotted symbols are merely guides for the eye. Available experimentalf-values (ref. 46) for the WW domain (magenta squares) are included forcomparison.

Fig. 5 Comparing conformational properties along pre-transition pathsversus those along transition paths. The variations of average RCO (relativecontact order), average Rg (radius of gyration), average LNC (value of thelocal–nonlocal coupling parameter) and average nonnative N (number ofnonnative contacts) with fractional number of native contacts Q alongtransition paths (TP) and pre-transition paths (pre-TP) are shown for themodels. Average values along TPs are shown by the red (common G�omodel), magenta (G�o + HP (k = 1.2) model), and black (all-atom MD)curves, whereas corresponding values along pre-TPs are shown by theblue (common G�o model), green (G�o + HP (k = 1.2) model), and grey(all-atom MD) curves. In each panel, the vertical dot lines mark the Q positionsof the transition-state peaks along the free energy profiles for all-atom MD (atsmaller Q) and for the G�o + HP (k = 1.2) model (at larger Q).

PCCP Paper

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 6: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

13634 | Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 This journal is© the Owner Societies 2017

significant than those documented for all-atom MD. Anotherparameter that exhibits a clear difference between TPs andpre-TPs is the nonnative contact number (a measure that doesnot apply to the purely native-centric common Go model).By definition TPs can access the native state directly and thusare atypical within equilibrium trajectories, thus fewer nonnativecontacts along TPs than along pre-TPs are expected. Fig. 5 showsthat this is almost always the case; the only exception here beingthe Go + HP (k = 1.2) model for Protein G.

It is clear from Fig. 5 that for both TP and pre-TP, RCOis always significantly higher whereas Rg is always lower forall-atom MD than for the common Go and augmented models.This difference arises from the higher conformational compact-ness of the unfolded conformations in all-atom MD than in thecoarse-grained models considered here, as noted above in themodel comparison for chain-length dependence of averageunfolded-state Rg (Fig. 2). This effect is attributable to thenonnative interactions in all-atom MD because the averageRCO and Rg values shown in Fig. 5 for the Go + HP (k = 1.2)model are closer to those of all-atom MD than the strictlynative-centric common Go model. Interestingly, LNC shownin Fig. 5 is generally higher for the common Go model than forall-atom MD. The augmented Go + HP (k = 1.2) model’s LNC isalso higher than but is somewhat closer to all-atom MD’s. Thisobservation implies that the degree of local–nonlocal coupling,which is a proposed mechanism for cooperative folding as wellas folding upon binding,33 is weaker in the explicit-solventforce field used to obtain the present MD data than in thecoarse-grained model we considered, and the difference likely

originates from the significant nonnative effects in all-atomMD. Nonetheless, at least in three of the proteins considered(except the WW domain), all-atom MD is also seen to be morecapable of avoiding nonnative contacts than the Go + HP(k = 1.2) model once the protein leaves the unfolded-state mini-mum and approaches the native state (bottom panels of Fig. 5).

The distribution of folding pathways—the kinetic partitioninginto different folding events—is model sensitive

The last observation regarding different nonnative contactnumbers in the augmented Go model versus all-atom MD(Fig. 5) suggests that the trajectories in the present coarse-grained (common Go and Go + HP) models and all-atom MDsimulations might sample significantly different conforma-tions during folding despite the similarity in their nativetransition-state contact pattern (Fig. 3). We now proceed tocharacterize and compare the ‘‘macroscopic’’ folding paths inthe common Go, Go + HP (k = 1.2), and all-atom MD models(Table 1). Protein folding is a stochastic process that allows formultiple folding pathways from a conformation in the unfoldedstate to reach the folded structure. Here we focused on thedistribution of the chronological order of the formation ofselected secondary elements and the critical native contactsbetween them along folding transition paths. The regions ofthe native contact maps in our focus are marked in the leftmostcolumn of Table 1. Note that some helices were not selectedbecause they were found to be already formed in the unfoldedstate before the commencement of most of the simulatedtransition paths.

Table 1 The statistical distribution of computed folding pathways of the four studied proteins in different models

Proteins Explicit-water MD simulations Common Go model Go + HP (k = 1.2) model

b1 - b2 (54%) b1 - b2 (58%) b1 - b2 (74%)b2 - b1 (46%) b2 - b1 (42%) b2 - b1 (26%)

b1 - a1 - b2 (42%) b1 - a1 - b2 (56%) b1 - a1 - b2 (29%)a1 - b1 - b2 (0%) a1 - b1 - b2 (33%) a1 - b1 - b2 (36%)b1 last formed (25%) b1 last formed (7%) b1 last formed (30%)a1 last formed (33%) a1 last formed (4%) a1 last formed (5%)

r1 last formed (47%) r1 last formed (65%) r1 last formed (4%)r3 last formed (47%) r3 last formed (30%) r3 last formed (87%)r2 first formed (86%) r2 first formed (70%) r2 first formed (33%)

b1 - b2 - b3 (50%) b1 - b2 - b3 (5%) b1 - b2 - b3 (4%)b1 - b3 - b2 (11%) b1 - b3 - b2 (61%) b1 - b3 - b2 (61%)b3 - b1 - b2 (17%) b3 - b1 - b2 (24%) b3 - b1 - b2 (29%)

Paper PCCP

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 7: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13635

For the all-b WW domain, the considered regions include b1(strand1 and strand2) and b2 (strand2 and strand3). In thiscase, the folding path distribution in all-atom MD is similar tothat in the common Go model, both displaying a major foldingpath of b1 - b2 (54% and 58% respectively), i.e., b1 formingbefore b2. This folding path is even more dominant in Go + HP(k = 1.2), now accounting for 74%. For NTL9, we consideredb-sheet regions including b1 (strand1 and strand2) and b2(strand1 and strand3), as well as the a-helix region labelled asa1. In all-atom MD, the kinetic order b1 - a1 - b2 accountsfor 42% of folding paths, additionally, b1 is the last region to beformed in 25% of the folding paths, and a1 formed last foranother 33% of the folding paths. However, the kinetic parti-tioning is significantly different in the common Go model. Nowb2 is almost always the last region to be formed, such pathwaysencompass 89% of TPs. Among these TPs, the specific kineticorderings of b1 - a1 - b2 and a1 - b1 - b2 constitute 56%and 33%, respectively. Augmenting this model to the Go + HP(k = 1.2) model reduces the kinetic fraction of b1 - a1 - b2 to29% yet increasing the fraction of TPs in which b1 formed lastto 30%. As almost all of the helices in the l-repressor formedbefore commencement of TPs in the common Go model and inall-atom MD (the only exception being a5, which is formedwithin 75% of TPs), we focused on contacts between helices.In Table 1, the label r1 represents the contacts between helix 1and helix 5; r2 represents the contacts between helix 2 andhelix 4, and r3 represents the contacts between helix 4 andhelix 5. In all-atom MD, the percentages of TPs in which r1 or r3formed last and r2 formed first are, respectively, 47%, 47% and86%. The corresponding percentages in the common Go andGo + HP (k = 1.2) models are appreciably different, at 65%, 30%and 70%, and 4%, 87% and 33%, respectively. Finally, weconsidered the formation order of three b-sheets in Protein G:b1 (strand1 and strand2), b2 (strand1 and strand4), and b3(strand 2 and strand3). In all-atom MD, half (50%) of the TPsfollowed the kinetic order of b1 - b2 - b3. Apart from this,11% follows b1 - b3 - b2, and 17% follows b3 - b1 - b2.In common Go model simulations, the corresponding percen-tages of TPs are significantly different, viz., 5%, 61% and 24%,respectively. These percentages, however, were only minimallyaffected by the incorporation of favorable nonnative inter-actions into the Go + HP (k = 1.2) model, for which thecorresponding percentages are 4%, 61% and 29%.

Summarized in Table 1, these comparisons show that asimilarity in the kinetic partition of folding pathways acrossthe three models is observed only for the WW domain butnot for the other three proteins studied here. For the latterproteins, the differences among models are very substantial.The underlying physical mechanisms of these differencesremain to be elucidated. As far as favorable nonnative inter-actions are concerned, their presence can lead to certain shiftsin kinetic partitioning (NTL9 and l-repressor) but the effectcan be minimal (Protein G), depending on the protein.In any event, these shifts in kinetic partitioning do not appear,however, to be in the direction of becoming more similar toall-atom MD.

Comparisons with a simple one-dimensional diffusion pictureof folding further highlight the impact of nonnative effects inall-atom MD trajectories

Motivated by recent advances in single-molecule folding experi-ments, there has been persistent interest in comparing not onlyexperimental but also simulated data with diffusion theoryto assess the strength and limitations of a simple one-dimensional diffusion picture of folding.31,47,49–52 Assumingthat the Kramers diffusion formula is applicable to the proteinfolding reaction, the mean first-passage folding time may beexpressed as

tMFP ¼ 2p bD�ffiffiffiffiffiffiffiffiffiffikwkb

p� ��1e bDGz� �

(1)

where b = 1/(kBT), kB is the Boltzmann constant, T is theabsolute temperature, D* is the diffusion coefficient at thetop of the folding free energy barrier, kw and kb arethe curvatures of the free energy profile in the well and at thebarrier top, respectively, and DG‡ is the free energy barrierheight of folding. Within this framework, Szabo showed thatthe mean transition path time47 tTP is given approximately by

tTP E (bD*kb)�1 ln(2egbDG‡) (2)

where g is Euler’s constant. Hence

tTPtMFP

� 1

2p

ffiffiffiffiffiffiffiffiffiffiffiffiffikw=kb

pln 2egbDGz� ��

e bDGz� ��

(3)

If we now define

F(DG‡) = ln(ln(2egbDG‡)/e(bDG‡)) (4)

it follows that

ln(tTP/tMFP) E F(DG‡) + constant (5)

where the constant here is given by ln[(kw/kb)/4p2]/2.To evaluate the applicability of the above formulation and,

by extension, a conceptual picture of folding as a one-dimensionalprocess along the progress variable Q, we examined and com-pared the dependence of ln(tTP/tMFP) on F(DG‡) in both all-atomMD and the common Go model (Fig. 6, data points plotted asblack squares). Apparently, the diffusion picture applies quitewell to the common Go model (Fig. 6B) but not all-atom MD(Fig. 6A), as underscored by Pearson correlation coefficients ofr = 0.992 (the slope of linear regression = 0.86) and r = �0.141(the slope of linear regression E 0), respectively. In view of theabove analysis of the conformational properties (Fig. 5), onepossible origin of the differences observed in Fig. 6 between thecommon Go model and all-atom MD may be the presence ofsignificant nonnative effects in the latter. To explore thispossibility, we examined the dependence of ln(tTP/tMFP) onF(DG‡) in the Go + HP (k = 1.2) model (blue data points inFig. 6B) and found that correlation does deteriorate fromthat of the Go model, with r = 0.83 and the regression slopeequals 0.64 for Go + HP (k = 1.2). This observation suggeststhat nonnative effects are likely part of the reason for thepoor ln(tTP/tMFP) � F(DG‡) correlation for all-atom MD inFig. 6A.

PCCP Paper

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 8: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

13636 | Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 This journal is© the Owner Societies 2017

The kinetic similarity between explicit-chain simulation andone-dimensional diffusion, or lack thereof, was furtherassessed by comparing the ln(tTP/tMFP) factor obtained fromdirect explicit-chain simulations and from one-dimensionaldiffusion processes on the free energy profiles computed fromexplicit-chain simulations. The methodology of our comparisonrelies on using Kawasaki dynamics to model diffusion, asdetailed by Zhang and Chan in ref. 31. The results are providedby the data points plotted in Fig. 6 as red circles. These resultsalso offer an evaluation of the self-consistency of the diffusionformulation by ascertaining the dependence of ln(tTP/tMFP)values computed from Kawasaki dynamics on F(DG‡). Again,the diffusion picture is apparently adequate for the common Gomodel but not for all-atom MD. For the common Go model, theln(tTP/tMFP) ranges of the red data points in Fig. 6B are lessdissimilar for explicit-chain simulations (E8) and Kawasakidynamics (E5) in comparison with the corresponding ranges ofthe red data points in Fig. 6A (E1.2 versus E0.4). As for thevariation of Kawasaki ln(tTP/tMFP) with F(DG‡), although thecorrelation coefficient is high for both the common Go modeland all-atom MD (r = 0.997 and 0.920 respectively), only thecommon Go model produced a regression slope of 0.98 inFig. 6B that is close to unity. In contrast, the regression slopefor the red data points in Fig. 6A is 0.31.

The latter observation was puzzling. Even if the diffusionpicture does not agree with explicit-chain dynamics, once a freeenergy profile is given, the formulation of one-dimensionaldiffusion is expected to apply mathematically irrespective ofwhether explicit-chain dynamics can be adequately describedby one-dimensional diffusion. Thus, a likely origin of thediscrepancy, underscored by a regression slope of 0.31 for thered data points in Fig. 6A, is the approximate nature of eqn (2)and the high sensitivity of the transition path time to detailed

features of the free energy profile, as we have pointed outpreviously.31 To address this issue, we noted that the freeenergy barriers from all-atom MD for all four proteins consid-ered are quite low. Since this situation likely exacerbates theinaccuracy of eqn (2), we conducted control Kawasaki dynamicsdiffusion simulations on modified free energy profiles that areuniformly scaled such that the barrier heights are doubled.Under these conditions, the correlation coefficient betweenln(tTP/tMFP) and F(DG‡) indeed improves to 0.74, and the fittedslope of the linear regression becomes essentially unity (Fig. S1,ESI†). This analysis lends credence to our proposition thateqn (2) is reasonably accurate only when the barrier height issufficiently high—higher, for the proteins considered here,than those predicted by all-atom MD.

Discussion

As is stated above, the goal of our analysis is to assessthe similarities and differences between coarse-grained andall-atom explicit-solvent models of protein folding and, to alesser extent, strengths and weaknesses of these approaches asmodels of real protein behaviors. A practical limitation of thepresent comparison, however, is the relatively small number ofexplicit-solvent folding transitions that are available—fewerthan thirty transition paths for each protein studied. This islimiting despite the fact that the all-atom MD simulationtrajectories used in this work, which were based on one forcefield for different topological small proteins, already havemuch more folding/unfolding transitions than most of theother published all-atom explicit-solvent simulation results.The situation entails non-negligible statistical uncertainties,especially for fine structural details. Nonetheless, even with thelimited all-atom MD data, most trends observed in this workshould be representative, and this investigation should contributea framework for more thorough analysis when additional atomicsimulation data become available.

A major difference between native-centric Go-like modelsand transferrable (non-structure-based) models based on currentatomic force fields is in their predicted protein unfolded-statedimensions. An accurate description of the unfolded state isimportant for understanding the mechanism of proteinfolding.38,53–57 For instance, one simulation work indicated thatthe dimensions of unfolded proteins are usually more extended inurea than that in water.54 While some experiment indicated littlestructural organization in unfolded proteins,55 others suggestedsignificant populations of secondary structures in unfoldedproteins even at high denaturant concentrations.56 Our analysisfound a local native structure in both the common Go modeland all-atom MD. Perhaps not surprisingly, local native struc-tures are more populated in the native-centric Go model. Experi-mentally, the average Rg of unfolded proteins under highlydenaturing conditions scales as N0.598 with chain length N(ref. 38). This scaling behavior is identical with polymers in a goodsolvent.37 For the proteins we considered, the common Go modelpredicts a weaker scaling Rg B N0.513. Nonetheless, virtually all the

Fig. 6 Dependence of transition path time and mean folding first passagetime on folding free energy barrier height. The variation of the tTP/tMFP

ratio with the function F of free energy barrier height DG‡ [eqn (4)](horizontal scale) were computed by explicit-chain simulations [blacksquares, ln(tTP/tMFP) given by the left vertical scale] and by one-dimensional diffusion dynamics using the free energy profile generatedby explicit-chain dynamics [red circles, ln(tTP/tMFP) given by the rightvertical scale] in all-atom model simulations (A) and in common G�omodels (B). Results from explicit-chain simulations using the G�o + HP(k = 1.2) model (blue triangles) are included in (B) for additional comparison(right vertical and top horizontal scales). Pearson correlation coefficientsfor the black and red data points in (A) are r = �0.141 and 0.92,respectively, and those in (B) for the black, red, and blue data points arer = 0.993, 0.997, and 0.829, respectively.

Paper PCCP

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 9: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13637

Rg values simulated by the present Go model are within thevariation seen in experiments (shaded band in Fig. 2), indicatingthat, as it stands, the common Go model offers a better mimicryfor unfolded proteins than all-atom MD in this regard, since thelatter approach predicted unfolded-state Rg values that hardly varywith N (see also ref. 39). Comparison with the results from Go + HPmodels suggests that this inadequacy of current all-atom models ismost likely caused by its overestimation of favorable nonnativeinteractions, resulting in the predicted unfolded conformationsbeing overly compact. Common physics-based all-atom force fieldstend to overestimate the conformational compactness of intrinsi-cally disordered proteins58–61 as well (reviewed in ref. 16 and 39).Several recent studies have attempted to modify the existingprotein/water force fields to allow for more extended unfolded/disordered protein conformations62–65 with some degrees ofsuccess. For instance, MacKerell Jr and co-workers’ refinementof the CHARMM36 force field to the new force field CHARMM36mappears to have improved the structural accuracy in representingboth globular and intrinsic disordered proteins.65 Efforts in thisdirection should be further pursued.

Among many conformations sampled during folding, thetransition state and the ensemble of transition paths are arguablymost critical to the self-assembly process. Computational modelingis a very important tool to probe the properties of these criticalconformational ensembles because it remains experimentallydifficult to characterize these tiny fractions and highly transientfeatures of the folding process in structural and temporaldetails. As far as transition states are concerned, we foundbroad agreement between the common Go model and all-atomMD at the level of the probabilities of individual native contactsin their predicted equilibrium transition states (Fig. 3C), and,to a lesser extent, in their predicted f-values (Fig. 4). In thisconnection, it is also instructive to note that, for both thecommon Go model and all-atom MD, the distribution of nativecontact probabilities along entire transition paths (Fig. S2,ESI†) is similar to that in the corresponding transition stateinferred from equilibrium free energy profiles (Fig. 3C). How-ever, the detailed folding pathways in the two approaches canbe significantly different for the four proteins we studied (Fig. 5and Table 1), although a recent study suggested that it ispossible to construct a structure-based theory that producesfolding pathways that agree with all-atom MD to a certainextent.66 Apparently, different modeling approaches tend tointroduce significantly different kinetic biases. Moreover, it isas yet unclear whether either approach can currently provideaccurate, structurally detailed predictions of folding kinetics(Fig. 4A). In this context, a promising recent investigationshowing that theoretical f-values obtained from TP ensemblesare in good agreement with experiments43 certainly deservesfurther analysis.

We considered several variations of coarse-grained modelingto assess the impact of nonnative interactions (Go + HP and Go +mb), heterogeneous native interactions63 (Go + MJ and Go + mb),and many-body local–nonlocal coupling33 (Go + mb) on computedprotein behaviors. Some variations led to behaviors more similarto those of all-atom MD, and thus provided a clue to the pertinent

properties. Most notable is the reduction of average unfolded-state Rg by nonnative interactions (Fig. 2 and 4). We observedlocal–nonlocal coupling embodied in the Go + mb model causinga significantly decrease in the number of native contacts in theunfolded state of the l-repressor (Fig. 1); but for the other threeproteins this cooperativity-enhancing effect is apparently counter-acted by the substantial favorable nonnative effect also presentin the same model, resulting in little change in the averageQ values of their unfolded states. Heterogeneous native (Go +MJ and Go + mb) and nonnative (Go + mb) interactions of theMiyazawa–Jernigan67 type lead to f-values more similar tothose of all-atom MD for some residues—e.g., residues 8–13of NTL9 and residues 25–35 of the l-repressor—but not forothers such as the N-terminal residues of the l-repressor(Fig. S3, ESI†). All in all, our results point to a high degree ofmodel sensitivity for the predicted protein-folding propertiesthat involve more structural details.

We have also examined the applicability of one-dimensionaldiffusion processes as conceptual pictures and quantitativesummaries of protein folding data. Such pictures have beeninvoked in the interpretation of data from recent sm-FRET25

and high-resolution force spectroscopy26 experiments. Severalstudies have underscored the principle that derived one-dimensional diffusion behaviors for folding are generally sensi-tive to the choice of reaction coordinate.68–71 It has turned out,nonetheless, that the simple measure of the fractional number ofnative contacts Q is quite adequate as a reaction coordinate forthis purpose. The apparent success was not limited to providingan intuitive, simple account of folding in explicit-chain structure-based models. The essential adequacy of Q as a diffusion coordi-nate has also been seen in the interpretation of certain foldingdynamics simulated using physics-based all-atom models.23,29

In this work, within the framework of a Q-independent diffusioncoefficient, we found that one-dimensional diffusion affords areasonable picture for the salient features of Go-model foldingkinetics but is less adequate for the all-atom MD (Fig. 6). Probableorigins of the latter mismatch for the proteins studied here includenon-negligible nonnative effects and the rather low free energybarriers to folding posited by all-atom MD. It might be possibleto construct another reaction coordinate (different from Q)that would result in a better mathematical match betweenexplicit-chain dynamics and one-dimensional diffusion in thatcoordinate. But such a construct would likely not share the simple,intuitive appeal of Q as a measure of progress in the foldingprocess, thus defeating a major conceptual purpose for thetheoretical exercise in the first place. Certain aspects ofall-atom MD are well described by diffusion processes withQ-dependent diffusion coefficients.35 However, as we haveargued,31 unless a folding process encounters deep kinetictraps in such a way that a diffusion coefficient extremelysensitive to variation in Q is demanded,18 it is instructive toexamine the predictive and interpretative power of processesunified by a single protein-independent as well as Q-independentdiffusion coefficient so as to compare different proteins on thesame footing. This point is exemplified by the consideration inthe paragraph below. After all, much of the intuitive appeal of the

PCCP Paper

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 10: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

13638 | Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 This journal is© the Owner Societies 2017

diffusion picture of folding resides in its presumed ability toabstract complex dynamics into a simple process, rather than as acomplicated data-fitting exercise involving multiple parameters inthe form of the diffusion coefficient’s Q-dependence and protein-dependence.

With this in mind, it is worthy of emphasizing that theconstant diffusion coefficient defined in ref. 31 and applied inthe above analysis (Fig. 6) amounts to postulating the sameintrinsic rate of forming or breaking one native contact irrespec-tive of the existing number of native contacts or the identity of theprotein. Importantly, what this formulation is not postulating isthat the same intrinsic rate of diffusing across a constant Qincrement, dQ, is the same for different proteins. Q is customarilynormalized to take values from zero to unity only. It follows thatthe postulated elementary diffusion step in our formulation,31

which is equal to the reciprocal of the total number of nativecontacts Qn, may correspond to different dQ for different proteinswith different Qn. For the data at hand, we again confirm, as inref. 31, that a constant, Q-independent diffusion coefficient (i.e.,dQ = 1/Qn) provides a reasonable account for both the average firstpassage folding time tMFP (Fig. S4, ESI†) and the transition pathtime tTP (Fig. S5, ESI†) obtained by all-atom MD (black data pointsin these ESI† figures). In contrast, the correlation between explicit-chain simulated tMFP and tTP with the corresponding quantitiescomputed from one-dimensional diffusion becomes much worseif the intrinsic rate of forming a fixed fraction of the total numberof native contacts (a single dQ for all proteins) instead of forminga native contact (dQ = 1/Qn) is assumed to be constant acrossdifferent proteins (red data points in Fig. S4 and S5, ESI†).

Conclusions

To recapitulate, recent advances in all-atom explicit-solvent MDsimulations have been tremendous and very promising. At thesame time, coarse-grained modeling remains attractive as acomplementary approach because of its flexibility and compu-tational tractability, especially for more complex biomolecularsystems. In this context, it is useful to delineate the similaritiesand differences among modeling approaches. We have madesuch an attempt here, especially with regard to the impact ofnonnative interactions and the applicability of a simple diffusionpicture of folding. It is noteworthy that the prevalence of non-native effects in current atomic force fields suggests that they arelikely insufficient to reproduce the high degrees of thermo-dynamic and kinetic cooperativity of protein folding that hasbeen observed for many small, single-domain proteins.6 Incorpor-ating these generic features of folding as stringent tests of bothcoarse-grained and atomic force fields will prove helpful for theirimprovement.

Acknowledgements

We thank David E. Shaw, Rebecca Bish-Cornelissen, and JodiHezky of D. E. Shaw Research for providing access to their MDtrajectories. This work was supported by the National Science

Foundation of China (31200548) and President Foundation A ofUniversity of Chinese Academy of Sciences and CanadianInstitutes of Health Research grant MOP-84281. We are gratefulfor the computing resources generously provided by SciNet ofCompute Canada.

References

1 M. Karplus and J. Kuriyan, Proc. Natl. Acad. Sci. U. S. A.,2005, 102, 6679–6685.

2 E. I. Shakhnovich, Chem. Rev., 2006, 106, 1559–1588.3 P. G. Wolynes, Q. Rev. Biophys., 2005, 38, 405–410.4 D. Thirumalai, E. P. O’Brien, G. Morrison and C. Hyeon,

Annu. Rev. Biophys., 2010, 39, 159–183.5 K. A. Dill and J. L. MacCallum, Science, 2012, 338, 1042–1046.6 H. S. Chan, Z. Zhang, S. Wallin and Z. Liu, Annu. Rev. Phys.

Chem., 2011, 62, 301–326.7 R. D. Hills Jr. and C. L. Brooks III, Int. J. Mol. Sci., 2009, 10,

889–905.8 K. A. Dill and H. S. Chan, Nat. Struct. Biol., 1997, 4, 10–19.9 C. Clementi, H. Nymeyer and J. N. Onuchic, J. Mol. Biol.,

2000, 298, 937–953.10 M. S. Cheung, A. E. Garcıa and J. N. Onuchic, Proc. Natl.

Acad. Sci. U. S. A., 2002, 99, 685–690.11 Z. Liu and H. S. Chan, Phys. Biol., 2005, 2, S75–S85.12 D. Rodriguez-Larrea, B. Ibarra-Molero and J. M. Sanchez-

Ruiz, Biophys. J., 2006, 91, L48–L50.13 Z. Zhang and H. S. Chan, Proc. Natl. Acad. Sci. U. S. A., 2010,

107, 2920–2925.14 A. Zarrine-Afsar, Z. Zhang, K. L. Schweiker, G. I.

Makhatadze, A. R. Davidson and H. S. Chan, Proteins,2012, 80, 858–870.

15 A. Azia and Y. Levy, J. Mol. Biol., 2009, 393, 527–542.16 T. Chen, J. Song and H. S. Chan, Curr. Opin. Struct. Biol.,

2015, 30, 32–42.17 Z. Zhang, Y. Ouyang and T. Chen, Phys. Chem. Chem. Phys.,

2016, 18, 31304–31311.18 T. Chen and H. S. Chan, PLoS Comput. Biol., 2015, 11,

e1004260.19 A. Borgia, K. R. Kemplen, M. B. Borgia, A. Soranno,

S. Shammas, B. Wunderlich, D. Nettels, R. B. Best,J. Clarke and B. Schuler, Nat. Commun., 2015, 6, 8861.

20 T. Sikosek, H. Krobath and H. S. Chan, PLoS Comput. Biol.,2016, 12, e1004960.

21 C. D. Snow, H. Nguyen, V. S. Pande and M. Gruebele, Nature,2002, 420, 102–106.

22 S. Piana, K. Lindorff-Larsen and D. E. Shaw, Proc. Natl. Acad.Sci. U. S. A., 2013, 110, 5915–5920.

23 K. Lindorff-Larsen, S. Piana, R. O. Dror and D. E. Shaw,Science, 2011, 334, 517–520.

24 D. E. Shaw, P. Maragakis, K. Lindorff-Larsen, S. Piana, R. O. Dror,M. P. Eastwood, J. A. Bank, J. M. Jumper, J. K. Salmon, Y. Shanand W. Wriggers, Science, 2010, 330, 341–346.

25 H. S. Chung, S. Piana-Agostinetti, D. E. Shaw and W. A. Eaton,Science, 2015, 349, 1504–1510.

Paper PCCP

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online

Page 11: A critical comparison of coarse-grained structure-based ...arrhenius.med.utoronto.ca/~chan/...PCCP_2017.pdf · concepts.28,29 For instance, based on statistical analyses of these

This journal is© the Owner Societies 2017 Phys. Chem. Chem. Phys., 2017, 19, 13629--13639 | 13639

26 K. Neupane, D. A. Foster, D. R. Dee, H. Yu, F. Wang andM. T. Woodside, Science, 2016, 352, 239–242.

27 L. Sborgi, A. Verma, S. Piana, K. Lindorff-Larsen, M. Cerminara,C. M. Santiveri, D. E. Shaw, E. de Alba and V. Munoz, J. Am.Chem. Soc., 2015, 137, 6506–6516.

28 E. R. Henry, R. B. Best and W. A. Eaton, Proc. Natl. Acad. Sci.U. S. A., 2013, 110, 17880–17885.

29 B. R. Best, G. Hummer and W. A. Eaton, Proc. Natl. Acad. Sci.U. S. A., 2013, 110, 17874–17879.

30 A. Ferguson, Z. Liu and H. S. Chan, J. Mol. Biol., 2009, 389,619–636.

31 Z. Zhang and H. S. Chan, Proc. Natl. Acad. Sci. U. S. A., 2012,109, 20919–20924.

32 Z. Zhang and H. S. Chan, Biophys. J., 2009, 96, L25–L27.33 T. Chen and H. S. Chan, Phys. Chem. Chem. Phys., 2014, 16,

6460–6479.34 A. R. Fersht, A. Matouschek and L. Serrano, J. Mol. Biol.,

1992, 224, 771–782.35 W. Zheng and B. R. Best, J. Phys. Chem. B, 2015, 119,

15247–15255.36 A. Zarrine-Afsar, S. Wallin, A. M. Neculai, P. Neudecker,

P. L. Howell, A. R. Davidson and H. S. Chan, Proc. Natl. Acad.Sci. U. S. A., 2008, 105, 9999–10004.

37 P. J. Flory, Principles of Polymer Chemistry, Cornell Univ.Press, Ithaca, NY, 1953.

38 J. E. Kohn, I. S. Millett, J. Jacob, B. Zagrovic, T. M. Dillon,N. Cingel, R. S. Dothager, S. Seifert, P. Thiyagarajan,T. R. Sosnick, M. Z. Hasan, V. S. Pande, I. Ruczinski,S. Doniach and K. W. Plaxco, Proc. Natl. Acad. Sci. U. S. A.,2004, 101, 12491–12496.

39 S. Piana, J. L. Klepeis and D. E. Shaw, Curr. Opin. Struct.Biol., 2014, 24, 98–105.

40 C. A. Royer, Arch. Biochem. Biophys., 2008, 469, 34–45.41 A. N. Naganatha and V. Munoz, Proc. Natl. Acad. Sci. U. S. A.,

2010, 107, 8611–8616.42 A. Zarrine-Afsar and A. R. Davidson, Methods, 2004, 34, 41–50.43 R. B. Best and G. Hummer, Proc. Natl. Acad. Sci. U. S. A.,

2016, 113, 3263–3268.44 M. Lindberg, J. Tangrot and M. Oliveberg, Nat. Struct. Biol.,

2002, 9, 818–822.45 H. Krobath, A. Rey and P. F. Faısca, Phys. Chem. Chem. Phys.,

2015, 17, 3512–3524.46 M. Jager, H. Nguyen, J. C. Crane, J. W. Kelly and M. Gruebele,

J. Mol. Biol., 2001, 311, 373–393.47 H. S. Chung, J. M. Louis and W. A. Eaton, Proc. Natl. Acad.

Sci. U. S. A., 2009, 106, 11837–11844.48 D. N. Ivankov, S. O. Garbuzynskiy, E. Alm, K. W. Plaxco,

D. Baker and V. F. Alexei, Protein Sci., 2003, 12, 2057–2062.

49 R. J. Oliveira, P. C. Whitford, J. Chahine, V. B. Leite andJ. Wang, Methods, 2010, 52, 91–98.

50 R. B. Best and G. Hummer, Phys. Chem. Chem. Phys., 2011,13, 16902–16911.

51 R. B. Best and G. Hummer, Proc. Natl. Acad. Sci. U. S. A.,2010, 107, 1088–1093.

52 K. Truex, H. S. Chung, J. M. Louis and W. A. Eaton, Phys.Rev. Lett., 2015, 115, 018101.

53 J. H. Cho, W. Meng, S. Sato, E. Y. Kim, H. Schindelin andD. P. Raleigh, Proc. Natl. Acad. Sci. U. S. A., 2014, 111,12079–12084.

54 M. Candotti, S. Esteban-Martın, X. Salvatella and M. Orozco,Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 5933–5938.

55 W. Meng, N. Lyle, B. Luan, D. P. Raleigh and R. V. Pappu,Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 2123–2128.

56 N. S. Bhavesh, J. Juneja, J. B. Udgaonkar and R. V. Hosur,Protein Sci., 2004, 13, 3085–3091.

57 M. Aznauryan, L. Delgado, A. Soranno, D. Nettels, J. R. Huang,A. M. Labhardt, S. Grzesiek and B. Schuler, Proc. Natl. Acad.Sci. U. S. A., 2016, 113, E5389–E5398.

58 A. K. Dunker, I. Silman, V. N. Uversky and J. L. Sussman,Curr. Opin. Struct. Biol., 2008, 18, 756–764.

59 J. Habchi, P. Tompa, S. Longhi and V. N. Uversky, Chem.Rev., 2014, 114, 6561–6588.

60 M. Li, T. Sun, F. Jin, D. Yu and Z. Liu, Mol. BioSyst., 2016, 12,2932–2940.

61 V. Csizmok, A. V. Follis, R. W. Kriwacki and J. D. Forman-Kay, Chem. Rev., 2016, 116, 6424–6462.

62 S. Piana, A. G. Donchev, P. Robustelli and D. E. Shaw,J. Phys. Chem. B, 2015, 119, 5113–5123.

63 R. B. Best, Curr. Opin. Struct. Biol., 2017, 42, 147–154.64 Z. A. Levine and J. E. Shea, Curr. Opin. Struct. Biol., 2017, 43,

95–103.65 J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. de

Groot, H. Grubmuller and A. D. MacKerell Jr, Nat. Methods,2017, 14, 71–73.

66 W. M. Jacobs and E. I. Shakhnovich, Biophys. J., 2016, 111,925–936.

67 S. Miyazawa and R. L. Jernigan, J. Mol. Biol., 1996, 256,623–644.

68 R. B. Best and G. Hummer, Proc. Natl. Acad. Sci. U. S. A.,2005, 102, 6732–6737.

69 W. Xu, Z. Lai, R. J. Oliveira, V. B. Leite and J. Wang, J. Phys.Chem. B, 2012, 116, 5152–5159.

70 N. D. Socci, J. N. Onuchic and P. G. Wolynes, J. Chem. Phys.,1996, 104, 5860–5868.

71 S. S. Cho, Y. Levy and P. G. Wolynes, Proc. Natl. Acad. Sci.U. S. A., 2006, 103, 586–591.

PCCP Paper

Publ

ishe

d on

08

May

201

7. D

ownl

oade

d by

Uni

vers

ity o

f T

oron

to o

n 01

/06/

2017

03:

59:3

7.

View Article Online