protein folding pathways and state transitions described by classical equations of motion of an...

Protein folding pathways and statetransitions described by classicalequations of motion of an elasticnetwork model

Gareth Williams1* and Andrew J. Toon2

1Wolfson Centre for Age-Related Diseases, Kings College London, London Bridge, London SE1 1UL, United Kingdom2School of Science and Technology, SIM University, Singapore

Received 10 June 2010; Revised 15 September 2010; Accepted 4 October 2010

DOI: 10.1002/pro.527Published online 15 October 2010 proteinscience.org

Abstract: Protein topology defined by the matrix of residue contacts has proved to be a fruitful

basis for the study of protein dynamics. The widely implemented coarse-grained elastic networkmodel of backbone fluctuations has been used to describe crystallographic temperature factors,

allosteric couplings, and some aspects of the folding pathway. In the present study, we develop a

model of protein dynamics based on the classical equations of motion of a damped network model(DNM) that describes the folding path from a completely unfolded state to the native conformation

through a single-well potential derived purely from the native conformation. The kinetic energy

gained through the collapse of the protein chain is dissipated through a friction term in theequations of motion that models the water bath. This approach is completely general and

sufficiently fast that it can be applied to large proteins. Folding pathways for various proteins of

different classes are described and shown to correlate with experimental observations andmolecular dynamics and Monte Carlo simulations. Allosteric transitions between alternative protein

structures are also modeled within the DNM through an asymmetric double-well potential.

Keywords: damped harmonic oscillator; allosteric regulation; normal mode analysis; vibrational

dynamics

IntroductionIt has long been the goal of biologists and physicist

to relate the protein sequence to the protein fold.

The mapping of sequence space onto fold space is

problematic because even though most homologous

proteins fold in a similar fashion, there are proteins

of similar shape that can have little sequence simi-

larity.1,2 Moreover, many proteins function through

adopting multiple conformations and are allosteri-

cally regulated by binding events.3–6 Also, protein

misfolding, which underlies many disease states, can

be triggered by relatively minor sequence changes.7

However, the structural transitions that a protein

goes through during folding, referred to as the fold-

ing path, appear to be largely defined by the final

structure8 even when two proteins with little

sequence similarity fold to a similar structure.9

Because the folding path is largely encoded by the

native state topology, it can be studied in a

sequence-independent manner.

Additional Supporting Information may be found in the onlineversion of this article.

Abbreviations: AKE, adenylate kinase; DNM, damped networkmodel; DWNM, double-well network model; ENM, elastic net-work model; MC, Monte Carlo; MD, molecular dynamics;MDGP, molecular distance geometry problem; MFEP, minimalfree-energy paths; NMPbind, nucleoside monophosphate bind-ing; NMR, nuclear magnetic resonance; PDB, protein databank; PNM, plastic network model.

Grant sponsor: Wolfson CARD.

*Correspondence to: Gareth Williams, Wolfson Centre forAge-Related Diseases, Kings College London, London Bridge,London SE1 1UL, United Kingdom. E-mail: [email protected]

Published by Wiley-Blackwell. VC 2010 The Protein Society PROTEIN SCIENCE 2010 VOL 19:2451—2461 2451

Native structure-based folding dynamics has been

modeled by molecular dynamics (MD) simulations with

interactions defined by Go potentials.10–12 In these

models, the potential is parameterized by the native

conformation which is defined at various levels of

coarse-grained reduction.13 Go models in their simplest

incarnation, model the Ca chain with non-neighboring

atoms moving under a Lennard-Jones-like potential

parameterized by the native conformation and a power

law potential for the more rigid covalent bond interac-

tions. These models have been extended to include sta-

tistical potentials modeling pseudo-torsion angles14,15

and have also included Cb atoms.16,17 Taking the limit

of protein fold dynamics being purely encoded by the

native residue contacts and the potential, a simple

quadratic function with an influence radius cut-off

results in the elastic network model (ENM).18–20 The

ENM can be viewed as a simple dynamical extension of

the protein fold topology encoded by the matrix of resi-

due contacts. Such coarse-grained reductions of the

protein chain greatly speed up dynamical simulations

in contrast to full atom MD simulations21 that are cur-

rently hampered by the relatively short time frames

that fall well below the folding times for large proteins,

and it is difficult to see how residue-independent fold-

ing characteristics can naturally emerge from such

treatments.

In the ENM, the protein is represented by a Cabackbone structure that undergoes coupled harmonic

oscillation with other Ca atoms within a defined

sphere of influence. The potential is approximated

by a quadratic fluctuation term, which can be dia-

gonalized to give eigenmodes that describe the line-

arly independent oscillations of the protein. Fluctua-

tion correlations show excellent agreement with

crystallographic B-factors that measure Ca posi-

tional uncertainty19,20,22 and have been successful in

describing allosteric couplings.23,24 Also, a perturba-

tive treatment of ENM has been applied to interac-

tion site prediction.25–28 In many cases, global struc-

tural transitions have conformational change vectors

correlating with the low-energy eigenvectors of the

ENM corresponding to one of the alternative confor-

mations.29–31 In addition to predicting allosteric cou-

plings, large-scale allosteric transition pathways

have been modeled by an iterative ENM normal

mode approach32–36 and within a plastic network

model (PNM), where the pathway is defined as the

minimum energy path interpolating between two

distinct ENM minima.37,38 Protein unfolding has

also been treated within the ENM through a pertur-

bative correlation analysis where, with each itera-

tion, the residue pair undergoing the largest fluctua-

tions has its contact broken.39 Unfolding pathways

are thought to mirror folding pathways,40 and the

analysis of Su et al.39 show the contact matrix evolu-

tion to be in broad agreement with experimental

and theoretical observations.

When modeling the folding pathway, it is no lon-

ger possible to apply the normal mode analysis of

ENM as the motions are no longer small oscillations

about the native state. As mentioned above, unfold-

ing can be investigated within an iterative ENM

model, but it is difficult to see how to apply this iter-

ative approach to folding. Alternatively, the full clas-

sical equations of motion of the ENM can be solved

numerically to simulate global folding. In particular,

we can start from a completely unstructured protein

represented by a random noncontacting Ca backbone

and use numerical iteration to follow the evolution

of the protein fold. However, this leads to an

unphysical rapidly escalating kinetic energy compo-

nent as new residues are continuously brought

within the influence radius. To resolve this problem,

we introduce a damped network model (DNM) with

a friction or damping term modeling energy dissipa-

tion to the water environment in which the protein

is folding. The resulting folding path, defined as a

series of structural transitions eventually converging

to the native configuration, has many features in

common with experimental observations and full

atom simulations. In contrast to the unfolding path-

way derived through sequential release of unstable

couplings,39 the present approach is not sequential

and many folding events occur throughout the pro-

tein at the same time, which is a more realistic

scenario.

Local minima along the folding pathway are a

common problem with folding simulation and there

are instances with the DNM where the protein stalls

at a non-native structure. Within this deterministic

model, the stalling is a consequence of the initial

random conformation, but the space of initial config-

urations leading to folding is sufficiently large for

this not to be a critical problem. The reason for this

is that we are not following a simple steepest

descent minimization strategy but following a

vibrating polymer that is sampling configurations

away from the local minima. This is similar to the

role played by the temperature in Monte Carlo simu-

lations,41 but we are not restricted to generate local

moves and the protein is folding all at once.

In many instances, proteins adopt multiple con-

formations and can dynamically switch between con-

formations. Conformational switches are intimately

related to protein function, where, for example, a

ligand binding event can result in a remote or allo-

steric conformational change in the protein that

leads to a down stream signaling event as the pro-

tein is switched from an inactive to an active confor-

mation.42 Such multiple conformations have been

studied extensively and have led some researchers

to further separate sequence from structure and

question the discretization of structural domains.43

The transition dynamics from one structure to

another can also be modeled by a network model but

2452 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics

with two discrete minima. In particular, the struc-

tural transition path of adenylate kinase (AKE) has

been studied within the context of a double-well

PNM where the transition pathway is defined as a

minimal energy interpolation between the two min-

ima representing the open and closed configura-

tions.37,38 We argue that this transition can also be

modeled with the classical equations of motion of a

double-well DNM. In particular, starting with the

protein in one potential minimum, we introduce an

asymmetry in the double-well potential destabilising

the protein and eventually leading it to fall into the

lower energy conformation. The transition dynamics

are parameterized by the environmental viscosity,

the influence radius, and the potential asymmetry,

which together trigger and dictate the nature of the

transition events.

We first present the DNM folding pathways for

two extensively studied proteins, barnase, and chy-

motrypsin. The DNM method is not restricted to

small proteins, and we illustrate the folding path-

way for a larger protein, a serine lipase. The asym-

metric double-well version of DNM is then intro-

duced and shown to model the allosteric transition

in AKE. There follows a Discussion section. The

mathematical framework behind the DNM approach

is given in the Methods section at the end.

ResultsThere is a wealth of data on the intermediate states

that proteins adopt on the way to their native fold

or alternatively from the native fold to denatured

state.44 The most direct measurements come from

NMR experiments where one can follow the appear-

ance/disappearance of interactions with changing

temperature or environmental pH. This data can be

presented in the form of an evolving contact matrix

and we can make a direct comparison with our path-

ways. Computationally, intensive full atom MD sim-

ulations have shown good agreement with NMR

studies, at least for small proteins, and these can

also be compared with our approach. In what follows

we will look first at two small proteins that have

been extensively studied in terms of their folding

pathways and then we will describe the folding

pathways of a larger protein.

Barnase

Barnase is an ab-protein that has been the subject

of many folding simulation and NMR experi-

ments.44–47 The crystal consists of 108 residues and

the secondary structure consists of: a1(7–17),loop1(18–26), a2(27–33), loop2(34–41), a3(42–45),b1(49–56), loop3(57–68), b2(69–76), loop4(77–85),

b3(86–91), b4(95–99), b5(105–108). Starting with a

random noninteracting fold and solving the DNM

equations of motion, we obtain our predicted folding

path. A representative folding path is illustrated in

Figure 1 giving the Ca backbone structure at vari-

ous values of the native fold contact fraction, Q.

Native contacts were defined as between residues

that are more than three residues apart along the

chain and within 8 A of each other. To get a full pic-

ture of the folding pathway, we performed 100 fold-

ing runs and calculated the average occupancy of

the contacts at various Q values. The average con-

tact matrices for various Q values are shown in Fig-

ure 2, and the evolution of the total native contact

fraction is shown in Figure 3. Fold evolution as a

function of Q is a description adopted in the MD

simulation study.47

The folding nucleation sites have been localized

to the b34 and b45 sheets through NMR and MD

simulations. It has also been shown that the a2, a3,and loop2 contacts are stable under multiple high-

temperature MD unfolding simulations.47 It is inter-

esting to compare these results with our methodol-

ogy especially as the unfolding of this protein has

been successfully modeled by an iterative normal

Figure 1. The folding pathway for barnase. Ca stick

structures are shown for various stages of the fold. Fold

stages are measured by the fraction of native contacts, the

Q fraction, and contacts are defined as non-neighboring

residues within 8 A of each other. The three N-terminal

a-helices emerge at Q ¼ 0.1. The first nonlocal secondary

structure contacts constituting the fold nucleation sites are

the C-terminal b34 and b45 sheets emerging at Q ¼ 0.3. At

Q ¼ 0.5, we see the emergence of the a2:a3 contacts as

the C-terminal residues begin to acquire a globular

configuration. When the fold is 70% complete, there are

two distinct globular structures comprising the N-terminal

b-sheet quartet and the C-terminal a-helix bundle. Finally,

these two clusters join up through the zipping up of the

b12, then the loop4:a3, and subsequently the b23:a1interactions.

Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2453

mode analysis.39 The small-scale mobility of Barnase

has been the subject of a coarse-grained network

model analysis using rigid substructure decomposi-

tion that shows good agreement with NMR experi-

ments.48 Under the classical DNM oscillator equa-

tions of motion, the fold can be illustrated by the

contact map evolving with Q, Figure 2. The initial

secondary structure contacts correspond to the three

N-terminal helices, Q ¼ 0.1. The fold nucleation con-

tacts corresponding to the non-neighboring second-

ary structure interactions of the b34 and b45 sheets

together with the second and third helices and the

loop2 interactions emerge at Q ¼ 0.3. This is in

agreement with NMR and MD simulations.47 The

early folding of the first helix is mirrored in its sta-

bility in unfolding MD simulations and this is in

contrast to the ENM iterative unfolding result.39

The folding rate profile is given in Figure 3 and this

shows a greater folding rate at the beginning and

end of the fold. Of the 100 random initial conforma-

tions, 48 resulted in full folding to the native fold

and the results presented are the averages over com-

plete folds, the average iteration count per fold was

7466. Reaching the native fold was defined as native

contact occupancy of greater than 95%. A run was

judged to stall in a local minimum away from the

native fold if the root-mean-squared distance matrix

difference (RMSD) to native fold oscillates above the

cut-off (1 A) for a sufficient number of iterations.

One could equally well define fold termination as

the fold to native RMSD falling below 1 A with

essentially the same results. When the influence ra-

dius was lowered, the folding success rate is dimin-

ished, with 8% for a 13 A influence radius (19,783

steps) and 26% for a 14 A influence radius (13,854

steps). A representative folding animation (100 steps

per frame) is given as a Supporting Information

Movie S1.

ChymotrypsinAnother well-studied folding pathway is that of the

chymotrypsin inhibitor 2, protein data bank (PDB)

accession 2CI2.39,46,49–51 This is a relatively small

molecule with 64 residues and a secondary structure

progression: a1(12–24), b1(28–34), loop1(35–34),

b2(45–52), b3(61–64). The DNM folding path contact

maps are shown in Figure 4. Of 100 random initial

noninteracting folds, 88 complete folding paths

resulted. NMR and MD simulations have established

the a-helix and a central hydrophobic cluster (32–

37) as the first contacts established in the folding

Figure 2. The average contact matrices for the barnase

fold. Multiple independent folding runs starting from

noncontacting random folds were performed and 48 of 100

reached the native conformation. The average contact

matrices are shown at various Q values and the native

contact matrix is shown at the bottom right. The N-terminal

a-helices are the first secondary structures to appear. The

fold nucleates about the b34 and b45 sheets, the first

nonlocal secondary structure interactions. These plots show

striking similarity to the MC simulations.47

Figure 3. The folding rate of barnase under DNM. Shown is

a linear plot of the native contact fraction against the

folding time. The data comes from the 100 folding runs

illustrated in Figure 2 and the error bars represent the

standard deviation. As with all the folding pathways we

have simulated the folding rate is greater at the beginning

and end of the fold.


path.49 The DNM simulations are consistent with

this scenario with the a1 and Leu32-Thr37 contacts

emerging at Q ¼ 0.1 plot in Figure 4. One recurring

feature of the simulated unfolding path of 2CI2 is

the stability of the b12 sheet relative to the b23sheet.39,49–51 Within our simulation, this contact

cluster appears at Q ¼ 0.3 but only emerges strongly

at Q ¼ 0.5 and after the b23 contacts are estab-

lished. This may reflect an incomplete mirroring of

the folding versus unfolding pathways. However,

running the DNM folding simulation with the

smaller influence radius of 13 A, the b12 contacts

emerge at Q ¼ 0.1 and before the b23 cluster, see

Figure 5. In this case, only 12% of the folding paths

are complete and for folding to be successful, an

early contact cluster that does not nucleate at resi-

dues close on the chain, the b12 cluster, needs to be

established. Thus the influence radius can have an

effect on the fold progression and we will see this

later on for models of allosteric switching. A repre-

sentative folding animation (100 steps per frame,

Rc ¼ 15 A) is given as a Supporting Information

Movie S2.

Hydrolase

To illustrate the generality of this approach, we

folded the relatively large globular serine lipase

belonging to the hydrolase fold class, PDB accession

1TIB. This protein is an ab-protein consisting of 269

residues. The folding pathway is illustrated in Fig-

ure 6. Multiple folding runs were carried out, and of

100 random starts, there were 26 complete folds,

taking on average 28,435 iterations of the DNM

equations of motion. The folding path is illustrated

in Figure 6 and the corresponding movie is given as

Supporting Information. The striking feature here is

the early emergence of the a-helices. The fold nucle-

ation sites are predicted to be the b34 and b56 sheet

contacts. The fold proceeds by the formation of two

Figure 4. The folding path of chymotrypsin. Multiple

independent folding runs starting from noncontacting

random folds were performed and 88 of 100 reached the

native conformation. The average contact matrices are

shown at various Q values and the native contact matrix is

shown at the bottom right. The a-helix and the central

hydrophobic cluster (HC) are the first secondary structure

elements to emerge followed by the b23 sheet. The well-

established stability of the b12 contacts during unfolding is

only partially mirrored in the folding simulation as this

contact cluster emerges after the antiparallel b23 contacts.

Figure 5. Early emergence of b12 in the folding of

chymotrypsin. When the influence radius is reduced to 13 A,

the b12 contacts emerge earlier than for Rc ¼ 15 A and they

can be seen at Q ¼ 0.1 in B and not in A for the larger

influence radius. The folding rates for the two influence radii

are shown in C. The two folding paths differ in the relative

amounts of the fold achieved in the early and late periods of

the fold. The error bars are the standard deviations from 88

folding runs for Rc ¼ 15 A and 12 folding runs for Rc ¼ 13 A.


distinct globular domains, with the C-terminal do-

main nucleating around the b10-11 sheet. A repre-

sentative folding animation (100 steps per frame) is

given as a Supporting Information Movie S3.

Conformation Switch

AKE is an enzyme catalyzing the interconversion of

adenine nucleotides. On binding either AMP or ATP,

it undergoes a large conformational transition where

the nucleoside monophosphate binding (NMPbind)

domain and the LID domain fold into the large

CORE domain,52–55 see Figure 7(A). This transition

has been studied experimentally by florescence reso-

nance energy transfer and there is structural data

available on homologous proteins that have struc-

tures lying between the opened and closed conforma-

tions. There have been many coarse-grained model

studies of the AKE structural transition pathway.

Some groups have developed iterative normal mode

interpolations, based on Ca and full-atom rigid sub-

structure models.32–36 Seelinger et al.56 generate an

ensemble of possible deformations based on distance

constraints and show that the AKE transition over-

laps with a subset of deformations. Transition path-

ways have also been defined as saddle point solu-

tions of double-well potentials connecting the two

ENMs corresponding to the alternative conforma-

tions.57–59 The treatment closest to our approach is

the double-well PNM where the transition pathway

is defined as a minimal energy interpolation

Figure 6. The folding pathways for a larger protein. The

269 residue lipases serve as a good illustration of the DNM

methodology. For 100 folding runs, 26 resulted in native

conformations and the average contact matrices at various

Q fractions are given together with the native contact

matrix center right. From this, it can be hypothesized that

the b34 and b56 sheets constitute the folding nucleus as

they are the first nonlocal (nonhelical) secondary structure

to emerge. It is also apparent that the fold proceeds via the

formation of N-terminal and C-terminal (nucleating at b10-11) globules, which eventually merge through the zipping

up of the b47 and then the folding of the C-terminus onto

the first helix. The fold nucleating b-sheets are highlighted

in gray in the ribbon diagram below.

Figure 7. The allosteric switch in AKE. The allosteric switch

in adenylate kinase can be modeled with an asymmetric

double-well potential. The opened and closed conformations

corresponding to PDB accessions 4AKE(A) and 1AKE(A) is

shown in A, with the NMPbind and LID domains highlighted

in gray. The potential function is plotted at the bottom for

various asymmetry parameters. The protein starts in the top

minimum and rolls into the bottom minimum. The RMSD to

the opened and closed conformations are plotted against

transition time in B. The transition time increases with

decreasing double-well barrier asymmetry, a, and eventually

there is no transition. The transition path switches between

the NMPbind closing first, C, at Rc � 10.5 A and the LID

closing first, D, for Rc � 10.5 A.


between the two minima representing the open and

closed configurations.37,38 Both treatments with the

simple double-well potential find the LID region

closing before the NMPbind region. Maragakis and

Karplus37 show that AKE approaches intermediate

structures along the transition path. A more com-

plex potential used in the double-well network model

(DWNM) of Chu and Voth38 results in two minimal

free-energy paths depending on the minimization

procedure and one of these has the NMPbind region

closing before the LID region. This latter pathway

has also been shown in recent simulation studies to

characterize the unbound as opposed to bound AKE

closure.60,61

Structural transitions can be modeled within the

DNM with an asymmetric double well. The opened

and closed conformations of AKE correspond to PDB

accessions 4AKE (chain A) and 1AKE (chain A),

respectively. The double-well potential is the same as

described above37,38 with an additional asymmetry

factor such that a protein initially in the high-energy

minimum will over time slip out of this state and

migrate to the lower minimum, see Methods Eqs. (13)

and (14). As expected, the transition is faster for

larger influence radii and requires a minimal asym-

metry parameter. The transition path in terms of the

RMSD between the intermediate fold and the termi-

nal conformations is given in Figure 7(B). The transi-

tion path time increases with decreasing asymmetry

parameter a, below a ¼ 0.66, the transition no longer

occurs. To investigate the LID and NMPbind closure,

we divide the residues of AKE into three regions

NMPbind (48–55), CORE (170–214), and LID (121–

160). Following Chu and Voth,38 we follow the closure

with the parameter do � dð Þ= do � dcð Þ, where do/c is

the distance between the LID/NMPbind centroid and

the CORE centroid in the opened/closed configura-

tion, see Figure 7(C,D). Interestingly, for an influ-

ence radius of 10.5 A and above, the NMPbind do-

main closes before the LID domain and this

pathway is remarkably similar to that observed in

the DWNM treatment.38 However, for influence radii

below 10.5 A, the LID domain closes before the

NMPbind domain in agreement with the results of

the PNM treatment.37,38 This order swap does not

occur with varying the asymmetry parameter, Fig-

ure 7(C). As this pathway depends critically on the

influence radius, we looked to the crystal data to see

if the B-factors can be used to fix the influence ra-

dius through Eqs. (7) and (15). We find that for the

closed structure, the correlation of fluctuations with

B is maximal at Rc ¼ 10 A whereas for the opened

conformation, this correlation is relatively

unchanged over the range 9–20 A. This does not

allow us to unambiguously fix Rc and so we have to

fall back on the conclusion in Chu and Voth38 and

posit the coexistence of the two pathways. The fold-

ing animation for Rc ¼ 15 A (20 steps per frame) is

given as a Supporting Information Movie S4 and the

animation for Rc ¼ 10 A (100 steps per frame) is

given as a Supporting Information Movie S5.

DiscussionWe have presented a DNM description of protein dy-

namics and applied it to protein folding and large-

scale structural transitions. The DNM can be viewed

as a reduction of Go model dynamics or a simple

extension of the ENM. The model presented here dif-

fers from Go models in that the bonded and non-

bonded interactions are both modeled with a quad-

ratic potential that extends over a finite radius of

influence with the oscillations being damped by a fric-

tion term. Within ENM, the structure oscillates about

the native conformation and the results are in good

agreement with dynamical data encoded in the crys-

tallographic B-factors. The ENM has also served as a

powerful tool for the pinning down allosteric cou-

plings between protein residues and predicting possi-

ble ligand binding sites. Large-scale protein motions

associated with allosteric switches have been modeled

with an iterative ENM and a plastic extension of the

ENM, the PNM. Directly relevant to the present

study, protein unfolding has been modeled with an

iterative ENM strategy. Our coarse-grained DNM

model can be viewed as a further extension of these

approaches in that the protein evolves according to

the classical equations of motion. This is not a

straight forward energy minimization strategy that

would be beset by local energy minima but a model

where the protein has a kinetic component that is

continuously being dissipated through a friction term.

Consequently, the local minima stalling frequency is

relatively low as the protein is always sampling

nearby conformations away from the local minima.

The model has relatively few parameters and is easy

to implement. It appears that there is good agreement

with folding pathways that have been reported in the

literature, with striking similarity to the reported

folding paths of barnase and chymotrypsin. The

method can equally well be applied to proteins of arbi-

trary length, and we present the folding path of the

269 residue lipase as an example. Our methodology

for modeling structural transitions is an extension of

the PNM potential to include an asymmetry parame-

ter in the double-well potential so that under the clas-

sical equations of motion, the protein slips out of the

higher energy minimum and ends up in the lower

energy minimum. Again we illustrate and validate

the methodology with a well-studied system, the allo-

steric switch in AKE. In the structural transition of

AKE, there are two alternative pathway topologies

depending on which of the LID or NMPbind domains

closes first. We have shown that there is a switch

between the two pathways as the influence radius

passes through 10.5 A and also defined the minimal

asymmetry parameter for the transition to occur.


All the results presented in this study are

sequence independent and this has proved to be a suc-

cessful simplification of the protein dynamics within

coarse-grained network models. However, the contri-

bution of each residue along the chain can be exam-

ined by varying the spring constant coupling and this

has been done for the allosteric switch in GroEL.62 In

our case, it will be interesting to see whether residue-

specific spring constant variation leads to different

folding pathways and/or success rates, and whether

the DNM methodology can then be extended to look

at mutational instability and protein misfolding. This

is the subject of a work in preparation (AJT and GW).

We conclude on a technical note. The ENM poten-

tial is closely related to a global minimization function

which is commonly exploited in solving the molecular

distance geometry problem (MDGP),63 which deter-

mines the native structure of a protein based only on

distance data between all atoms or a subset of atoms.

The approach described in this article can be exploited

to solve the general MDGP and related problems in a

global nonperturbative fashion that competes well with

other MDGPmethods (AJT and GW, in preparation).

Methods

The ENM potential between two residues i and j is

given by

Vij ¼ 1

2kijD

2ij; (1)

where

Dij ¼ dij � d0ij; (2)

and

dij ¼ ~ri �~rj�� (3)

with~ri is the position vector of the ith Ca along the

protein chain. The zero superscript refers to the sta-

ble native configuration about which the protein oscil-

lates and kij is the set of harmonic spring constants.

In the simplest incarnation of ENM, the spring

constants are taken to be residue independent.19,20 In

practice, the nearest neighbor separation must fluctu-

ate over a narrow range restricted by the covalent bond

architecture linking neighboring Ca atoms, so that we

define

ki;i61 ¼ 100k;

kij; i�jj j>1 ¼ k: (4)

In the absence of this relative rigidity in the

neighboring atom fluctuations, the intermediate fold

conformations have unphysically large nearest

neighbor separations. Interestingly, the folding effi-

ciency is also greatly enhanced when nearest neigh-

bor atoms are forced to stay within physical bounds.

Only residues within a given sphere of influ-

ence, Rc, can be considered to physically interact

with each other so that the ENM spring constant

vanishes for well-separated residues:

kij dij > Rc� � ¼ 0: (5)

The dynamics is partly driven by a repulsive

term when dij � Rc and d0ij > Rc. For large Rc, it

could be argued that there is an unphysically large

repulsive term. However, when the repulsive contri-

bution is rendered constant by scaling k � Rcd0ij

� �2

for

dij � Rc and d0ij > Rc, we obtain essentially the same

folding and transition pathways.

Provided fluctuations are small about the equilib-

rium conformation, we can approximate the potential

to quadratic order. Here, V � V0 þ 12

Pij;lm

drli Hlmij dr

mj ,

where H is the Hessian matrix and drli are the fluc-

tuations about the native conformation.

This now enables us to exactly solve for the

thermodynamic fluctuations

Clmij ¼ drli dr

mj

D E¼ kBTH

�1lmij ; (6)

where kB is Boltzman’s constant and T is tempera-

ture, and thus relate the parameters of the theory to

the crystallographic temperature factors, defined as

Bi ¼ 8p23 dr2i� �

.64 Specifically, the spring constant is

then given by

k ¼8p2kBT

PlH�1ll

ii k ¼ 1ð Þ* +

3 Bih i ; (7)

where brackets refer to averaging over the protein

chain.

Now we will consider the coarse-grained protein

model as a deterministic classical system governed

by the equations of motion, m€~r ¼ � @V@~r . Explicitly:

€~ri ¼ �Xj 6¼i

2

mkijdijDij; (8)

where the ‘‘hat’’ refers to the unit vector. To dissipate

the kinetic energy component, we introduce a friction

or damping term, l, into the equations of motion:

€~ri ¼ �Xj 6¼i

2

mkijdijDij � l _~ri: (9)

These are the DNM equations of motion and

apart from the native target structure, the solutions


will depend on the spring constant per unit mass,

the damping term, and the influence radius. Their

values will be discussed in the next section.

In practice, the differential equations Eq. (9) are

solved by the symplectic Euler method65

~vnþ1 ¼~vn � d ~@V rnð Þ þ l~vn�

~rnþ1 ¼~rn þ d~vnþ1; (10)

where n is the iteration number and d is the time

increment.

When the protein can adopt two native confor-

mations ‘‘a’’ and ‘‘b,’’ the potential will be eitherPij

12 kijD

a2

ij orPij

12kijD

b2

ij depending on which is mini-

mal, Da;bij ¼ dij � da;b

ij

�� and da;bij correspond to the Ca

distances for the two conformations. That is

V ¼Xij

1

4kij Dij

2a þDij

2b � Dij

2a �Dij

2b

�� : (11)

To ensure smooth interpolation between the two

minima, we introduce a small parameter e:37,38

V ¼Xij

1

4kij Dij

2a þDij

2b �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDij

2a �Dij

2b

� �2þe2q� �

: (12)

For a transition to occur within the context of clas-

sical trajectories, we introduce a destabilising term that

favors one minimum over the other. The simplest way

to do this is via an asymmetry parameter a. Specifically,

V ¼Xij

1

4kij Dij

2a þDij

2b þ a�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDij

2a �Dij

2b þ a

� �2þe2q� �

;

(13)

with the double-well classical equations of motion

given by

€~ri ¼ �Xj 6¼i

1

mkijdij

� Da

ij þDbij �

�Da

ij �Dbij

��Da2

ij �Db2ij þ a

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�Da2

ij �Db2ij þ a

�2 þ e2q

!� l _~ri: ð14Þ

Of course, an isolated residue sitting in the

high-energy minimum will not end up in the lower

minimum as there is an energy barrier to overcome.

However, for an allosteric transition, there is in gen-

eral a set of residues within the protein for which

the transition path is short enough for the barrier to

vanish and it is these residues that drive the transi-

tion. It will be shown that transitions only occur for

a large enough asymmetry.

We now deal with the parameterization of the

model. Folding within the DNM is achieved by initiat-

ing the protein chain in a random completely nonin-

teracting configuration and then following the trajec-

tories of the Ca nodes under the equations of motion.

At each stage, only those nodes within the influence

radius are visible to the individual residues, Eq. (5),

and as the protein collapses, more residues come

within the influence radius and there is a build-up of

kinetic energy that is dissipated via a friction term

that models the water environment, Eq. (9).

Based on protein interaction statistical potential

models, see for example,66,67 we take the influence ra-

dius to be 15 A. For larger radii, the protein will fold

faster but such large range interactions are unphysi-

cal. Within ENM, the radius of influence is chosen to

maximize the correlation of the normal mode fluctua-

tion magnitudes with the crystallographic tempera-

ture B-factors.19,20 However, the correlation varies

with structure and we find maximal correlations for

chymotrypsin (2CI2) 15 A, barnase (1A2P) 7 A, hydro-

lase (1TIB) 12–20 A, and for AKE, the correlation is

maximal at 10 A for the closed conformation (1AKE)

and 9–20 A for the opened conformation (4AKE). With-

out crystallographic data on the intermediate fold con-

formations, it is difficult to derive Rc from ENM analy-

sis and we are forced to rely on potential model

parameters. We find that if the influence radius falls

below 15 A, then the folding probability is substan-

tially reduced and folding time increased. However, as

will be shown below, when considering DNM models of

allosteric transition, the transition path can undergo a

dramatic change with a lower Rc. We take the spring

constant per unit mass to be 1/2, which determines the

time scale of the problem. Statistical thermodynamics

allow us to fix the parameters of the theory based on

crystallographic B-factors, Eq. (7). So, with our choice

of spring constant per unit mass, we have time units of

t ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

3m Bh i16p2kBT H�1 k¼1ð Þh i

q, where m is the mass of the reso-

nating node, in our case, that of the amino acid. In

the numerical iterations, Eq. (10), we set the time in-

crement to be 0.01. The damping term is set to unity.

For a small damping term, the iteration becomes

unstable and for a large damping term, the fold is

slow. For the double-well potential, we set the

smoothing factor, Eq. (12), to be 0.1 and perform sim-

ulations with various choices of asymmetry parame-

ter, a. All programs were written in C and run on a

PC. Movies were generated in gif format by importing

folding coordinate frames into Maple (Waterloo

Maple, Maplesoft). Structure images were generated

by DS ViewerPro software (Accelrys).

References

1. Orengo CA, Michie, AD, Jones S, Jones, DT, SwindellsMB, Thornton JM (1997) CATH—a hierarchic classifi-cation of protein domain structures. Structure 5:1093–1108.


2. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, MurzinAG, Chothia, C (2000) SCOP: a structural classificationof proteins database. Nucleic Acids Res 28:257–259.

3. Monod J, Wyman J, Changeux JP (1965) On the natureof allosteric transitions: a plausible model. J Mol Biol12:88–118.

4. Koshland DE, Jr, Nemethy G, Filmer D (1966) Compar-ison of experimental binding data and theoretical mod-els in proteins containing subunits. Biochemistry 5:365–385.

5. del Sol A, Tsai CJ, Ma B, Nussinov R (2009) The originof allosteric functional modulation: multiple pre-exist-ing pathways. Structure 17:1042–1050.

6. Tsai CJ, Del Sol A, Nussinov R (2009) Protein allostery,signal transmission and dynamics: a classificationscheme of allosteric mechanisms. Mol Biosyst 5:207–216.

7. Sanchez-Ruiz JM (2010) Protein kinetic stability. Bio-phys Chem 148:1–15.

8. Alm E, Baker D (1999) Matching theory and experi-ment in protein folding. Curr Opin Struct Biol 9:189–196.

9. Perl D, Welker C, Schindler T, Schroder K, MarahielMA, Jaenicke R, Schmic FX (1998) Conservation ofrapid two-state folding in mesophilic, thermophilic andhyperthermophilic cold shock proteins. Nat Struct Biol5:229–235.

10. Taketomi H, Ueda Y, Go N (1975) Studies on proteinfolding, unfolding and fluctuations by computer simula-tion. I. The effect of specific amino acid sequence repre-sented by specific inter-unit interactions. Int J PeptProtein Res 7:445–459.

11. Abe H, Go N (1981) Noninteracting local-structuremodel of folding and unfolding transition in globularproteins. II. Application to two-dimensional lattice pro-teins. Biopolymers 20:1013–1031.

12. Go N, Abe H (1981) Noninteracting local-structuremodel of folding and unfolding transition in globularproteins. I. Formulation. Biopolymers 20:991–1011.

13. Hills RD, Jr, Lu L, Voth GA (YEAR) Multiscale coarse-graining of the protein energy landscape. PLoS ComputBiol 6:e1000827.

14. Karanicolas J, Brooks CL, III (2002) The origins ofasymmetry in the folding transition states of protein Land protein G. Protein Sci 11:2351–2361.

15. Best RB, Hummer G (YEAR) Coordinate-dependent dif-fusion in protein folding. Proc Natl Acad Sci USA 107:1088–1093.

16. Sulkowska JI, Cieplak M (2008) Selection of optimalvariants of Go-like models of proteins through studiesof stretching. Biophys J 95:3174–3191.

17. Bereau T, Deserno M (2009) Generic coarse-grainedmodel for protein folding and aggregation. J ChemPhys 130:235106.

18. Tirion MM (1996) Large amplitude elastic motions inproteins from a single-parameter, atomic analysis.Phys Rev Lett 77:1905–1908.

19. Bahar I, Atilgan AR, Erman B (1997) Direct evaluationof thermal fluctuations in proteins using a single-pa-rameter harmonic potential. Fold Des 2:173–181.

20. Haliloglu T, Bahar I, Erman B (1997) Gaussian dynam-ics of folded proteins. Phys Rev Lett 79:3090–3093.

21. Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE(2009) Long-timescale molecular dynamics simulationsof protein structure and function. Curr Opin StructBiol 19:120–127.

22. Bahar I, Jernigan RL (1997) Inter-residue potentials inglobular proteins and the dominance of highly specifichydrophilic interactions at close separation. J Mol Biol266:195–214.

23. Balabin IA, Yang W, Beratan DN (2009) Coarse-grainedmodeling of allosteric regulation in protein receptors.Proc Natl Acad Sci USA 106:14253–14258.

24. Williams G (2010) Elastic network model of allostericregulation in protein kinase PDK1. BMC Struct Biol10:11.

25. Ming D, Wall ME (2005) Allostery in a coarse-grainedmodel of protein dynamics. Phys Rev Lett 95:198103.

26. Ming D, Cohn JD, Wall ME (2008) Fast dynamics per-turbation analysis for prediction of protein functionalsites. BMC Struct Biol 8:5.

27. Ming D, Wall ME (2005) Quantifying allosteric effectsin proteins. Proteins 59:697–707.

28. Ming D, Wall ME (2006) Interactions in native bindingsites cause a large change in protein dynamics. J MolBiol 358:213–223.

29. Yang L, Song G, Carriquiry A, Jernigan RL (2008)Close correspondence between the motions from princi-pal component analysis of multiple HIV-1 proteasestructures and elastic network modes. Structure 16:321–330.

30. Yang L, Song G, Jernigan RL (2007) How well can weunderstand large-scale protein motions using normalmodes of elastic network models? Biophys J 93:920–929.

31. Xu C, Tobi D, Bahar I (2003) Allosteric changes in pro-tein structure computed by a simple mechanical model:hemoglobin T<-->R2 transition. J Mol Biol 333:153–168.

32. Schuyler AD, Jernigan RL, Qasba PK, RamakrishnanB, Chirikjian GS (2009) Iterative cluster-NMA: a toolfor generating conformational transitions in proteins.Proteins 74:760–776.

33. Kirillova S, Cortes J, Stefaniu A, Simeon T (2008) AnNMA-guided path planning approach for computinglarge-amplitude conformational changes in proteins.Proteins 70:131–143.

34. Korkut A, Hendrickson WA (2009) Computation of con-formational transitions in proteins by virtual atom mo-lecular mechanics as validated in application toadenylate kinase. Proc Natl Acad Sci USA 106:15673–15678.

35. Miyashita O, Onuchic JN, Wolynes PG (2003) Nonlin-ear elasticity, proteinquakes, and the energy land-scapes of functional transitions in proteins. Proc NatlAcad Sci USA 100:12570–12575.

36. Feng Y, Yang L, Kloczkowski A, Jernigan RL (2009)The energy profiles of atomic conformational transitionintermediates of adenylate kinase. Proteins 77:551–558.

37. Maragakis P, Karplus M (2005) Large amplitude con-formational change in proteins explored with a plasticnetwork model: adenylate kinase. J Mol Biol 352:807–822.

38. Chu JW, Voth GA (2007) Coarse-grained free energyfunctions for studying protein conformational changes:a double-well network model. Biophys J 93:3860–3871.

39. Su JG, Li CH, Hao R, Chen WA, Wang CX (2008) Pro-tein unfolding behavior studied by elastic networkmodel. Biophys J 94:4586–4596.

40. Daura X, Jaun B, Seebach D, van Gunsteren WF,Mark AE (1998) Reversible peptide folding in solutionby molecular dynamics simulation. J Mol Biol 280:925–932.

41. Elofsson A, Le Grand SM, Eisenberg D (1995) Localmoves: an efficient algorithm for simulation of proteinfolding. Proteins 23:73–82.

42. Huse M, Kuriyan J (2002) The conformational plastic-ity of protein kinases. Cell 109:275–282.


43. Burra PV, Zhang Y, Godzik A, Stec B (2009) Global dis-tribution of conformational states derived from redun-dant models in the PDB points to non-uniqueness ofthe protein structure. Proc Natl Acad Sci USA 106:10505–10510.

44. Fersht AR (1993) The sixth Datta Lecture. Protein fold-ing and stability: the pathway of folding of barnase.FEBS Lett 325:5–16.

45. Matthews JM, Fersht AR (1995) Exploring the energysurface of protein folding by structure-reactivity rela-tionships and engineered proteins: observation of Ham-mond behavior for the gross structure of the transitionstate and anti-Hammond behavior for structural ele-ments for unfolding/folding of barnase. Biochemistry34:6805–6814.

46. Li A, Daggett V (1998) Molecular dynamics simulationof the unfolding of barnase: characterization of themajor intermediate. J Mol Biol 275:677–694.

47. Shinoda K, Takahashi K (2007) Retention of local con-formational compactness in unfolding of barnase; con-tribution of end-to-end interactions within quasi-modules. Biophysics 3:1–12.

48. Gohlke H, Thorpe MF (2006) A natural coarse grainingfor simulating large biomolecular motion. Biophys J 91:2115–2120.

49. Kazmirski SL, Wong KB, Freund SM, Tan YJ, FershtAR, Daggett V (2001) Protein folding from a highly dis-ordered denatured state: the folding pathway of chymo-trypsin inhibitor 2 at atomic resolution. Proc Natl AcadSci USA 98:4349–4354.

50. Ozkan SB, Dalgyn GS, Haliloglu T (2004) Unfoldingevents of Chymotrypsin Inhibitor 2 (CI2) revealed byMonte Carlo (MC) simulations and their consistencyfrom structure-based analysis of conformations. Poly-mer 45:581–595.

51. Li L, Shakhnovich EI (2001) Constructing, verifying,and dissecting the folding transition state of chymo-trypsin inhibitor 2 with all-atom simulations. Proc NatlAcad Sci USA 98:13014–13018.

52. Dzeja PP, Zeleznikar RJ, Goldberg ND (1998) Adenyl-ate kinase: kinetic behavior in intact cells indicates itis integral to multiple cellular processes. Mol Cell Bio-chem 184:169–182.

53. Muller CW, Schlauderer GJ, Reinstein J, Schulz GE(1996) Adenylate kinase motions during catalysis: anenergetic counterweight balancing substrate binding.Structure 4:147–156.

54. Muller CW, Schulz GE (1992) Structure of the complexbetween adenylate kinase from Escherichia coli and

the inhibitor Ap5A refined at 1.9 A resolution. A modelfor a catalytic transition state. J Mol Biol 224:159–177.

55. Whitford PC, Onuchic JN, Wolynes PG (2008) Energylandscape along an enzymatic reaction trajectory:hinges or cracks? HFSP J 2:61–64.

56. Seelinger D, Haas J, de Groot BL (2007) Geometry-based sampling of conformational transitions in pro-teins. Structure 15:1482–1492.

57. Zheng W, Brooks BR, Hummer G (2007) Protein confor-mational transitions explored by mixed elastic networkmodels. Proteins 69:43–57.

58. Tekpinar M, Zheng W (YEAR) Predicting order of con-formational changes during protein conformationaltransitions using an interpolated elastic networkmodel. Proteins 78:2469–2481.

59. Franklin J, Koehl P, Doniach S, Delarue M (2007) Min-ActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locallyharmonic energy landscape. Nucleic Acids Res 35:W477–W482.

60. Beckstein O, Denning EJ, Perilla JR, Woolf TB (2009)Zipping and unzipping of adenylate kinase: atomisticinsights into the ensemble of open<-->closed transi-tions. J Mol Biol 394:160–176.

61. Daily MD, Phillips GN, Jr, Cui Q (YEAR) Many localmotions cooperate to produce the adenylate kinase con-formational transition. J Mol Biol 400:618–631.

62. Zheng W, Brooks BR, Thirumalai D (2007) Allosterictransitions in the chaperonin GroEL are captured by adominant normal mode that is most robust to sequencevariations. Biophys J 93:2289–2299.

63. Grosso A, Locatelli M, Schoen F (2009) Solving molecu-lar distance geometry problems by global optimizationalgorithms. Comput Optim Appl 43:23–37.

64. Atilgan AR, Durell SR, Jernigan RL, Demirel MC,Keskin O, Bahar I (2001) Anisotropy of fluctuation dy-namics of proteins with an elastic network model. Bio-phys J 80:505–515.

65. Press WH, Teukolsky SA, Vetterling W, Flannery BP(2007) Numerical recipes: The art of scientific comput-ing, Third Edition. New York, NY: Cambridge Univer-sity Press.

66. Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gasreference state improves structure-derived potentials ofmean force for structure selection and stability predic-tion. Protein Sci 11:2714–2726.

67. Devane R, Shinoda W, Moore, PB, Klein ML (2009) ATransferable coarse grain non-bonded interactionmodel for amino acids. J Chem Theory Comput 5:2115–2124.


protein folding pathways and state transitions described by classical equations of motion of an...

Documents