protein folding pathways and state transitions described by classical equations of motion of an...
TRANSCRIPT
Protein folding pathways and statetransitions described by classicalequations of motion of an elasticnetwork model
Gareth Williams1* and Andrew J. Toon2
1Wolfson Centre for Age-Related Diseases, Kings College London, London Bridge, London SE1 1UL, United Kingdom2School of Science and Technology, SIM University, Singapore
Received 10 June 2010; Revised 15 September 2010; Accepted 4 October 2010
DOI: 10.1002/pro.527Published online 15 October 2010 proteinscience.org
Abstract: Protein topology defined by the matrix of residue contacts has proved to be a fruitful
basis for the study of protein dynamics. The widely implemented coarse-grained elastic networkmodel of backbone fluctuations has been used to describe crystallographic temperature factors,
allosteric couplings, and some aspects of the folding pathway. In the present study, we develop a
model of protein dynamics based on the classical equations of motion of a damped network model(DNM) that describes the folding path from a completely unfolded state to the native conformation
through a single-well potential derived purely from the native conformation. The kinetic energy
gained through the collapse of the protein chain is dissipated through a friction term in theequations of motion that models the water bath. This approach is completely general and
sufficiently fast that it can be applied to large proteins. Folding pathways for various proteins of
different classes are described and shown to correlate with experimental observations andmolecular dynamics and Monte Carlo simulations. Allosteric transitions between alternative protein
structures are also modeled within the DNM through an asymmetric double-well potential.
Keywords: damped harmonic oscillator; allosteric regulation; normal mode analysis; vibrational
dynamics
IntroductionIt has long been the goal of biologists and physicist
to relate the protein sequence to the protein fold.
The mapping of sequence space onto fold space is
problematic because even though most homologous
proteins fold in a similar fashion, there are proteins
of similar shape that can have little sequence simi-
larity.1,2 Moreover, many proteins function through
adopting multiple conformations and are allosteri-
cally regulated by binding events.3–6 Also, protein
misfolding, which underlies many disease states, can
be triggered by relatively minor sequence changes.7
However, the structural transitions that a protein
goes through during folding, referred to as the fold-
ing path, appear to be largely defined by the final
structure8 even when two proteins with little
sequence similarity fold to a similar structure.9
Because the folding path is largely encoded by the
native state topology, it can be studied in a
sequence-independent manner.
Additional Supporting Information may be found in the onlineversion of this article.
Abbreviations: AKE, adenylate kinase; DNM, damped networkmodel; DWNM, double-well network model; ENM, elastic net-work model; MC, Monte Carlo; MD, molecular dynamics;MDGP, molecular distance geometry problem; MFEP, minimalfree-energy paths; NMPbind, nucleoside monophosphate bind-ing; NMR, nuclear magnetic resonance; PDB, protein databank; PNM, plastic network model.
Grant sponsor: Wolfson CARD.
*Correspondence to: Gareth Williams, Wolfson Centre forAge-Related Diseases, Kings College London, London Bridge,London SE1 1UL, United Kingdom. E-mail: [email protected]
Published by Wiley-Blackwell. VC 2010 The Protein Society PROTEIN SCIENCE 2010 VOL 19:2451—2461 2451
Native structure-based folding dynamics has been
modeled by molecular dynamics (MD) simulations with
interactions defined by Go potentials.10–12 In these
models, the potential is parameterized by the native
conformation which is defined at various levels of
coarse-grained reduction.13 Go models in their simplest
incarnation, model the Ca chain with non-neighboring
atoms moving under a Lennard-Jones-like potential
parameterized by the native conformation and a power
law potential for the more rigid covalent bond interac-
tions. These models have been extended to include sta-
tistical potentials modeling pseudo-torsion angles14,15
and have also included Cb atoms.16,17 Taking the limit
of protein fold dynamics being purely encoded by the
native residue contacts and the potential, a simple
quadratic function with an influence radius cut-off
results in the elastic network model (ENM).18–20 The
ENM can be viewed as a simple dynamical extension of
the protein fold topology encoded by the matrix of resi-
due contacts. Such coarse-grained reductions of the
protein chain greatly speed up dynamical simulations
in contrast to full atom MD simulations21 that are cur-
rently hampered by the relatively short time frames
that fall well below the folding times for large proteins,
and it is difficult to see how residue-independent fold-
ing characteristics can naturally emerge from such
treatments.
In the ENM, the protein is represented by a Cabackbone structure that undergoes coupled harmonic
oscillation with other Ca atoms within a defined
sphere of influence. The potential is approximated
by a quadratic fluctuation term, which can be dia-
gonalized to give eigenmodes that describe the line-
arly independent oscillations of the protein. Fluctua-
tion correlations show excellent agreement with
crystallographic B-factors that measure Ca posi-
tional uncertainty19,20,22 and have been successful in
describing allosteric couplings.23,24 Also, a perturba-
tive treatment of ENM has been applied to interac-
tion site prediction.25–28 In many cases, global struc-
tural transitions have conformational change vectors
correlating with the low-energy eigenvectors of the
ENM corresponding to one of the alternative confor-
mations.29–31 In addition to predicting allosteric cou-
plings, large-scale allosteric transition pathways
have been modeled by an iterative ENM normal
mode approach32–36 and within a plastic network
model (PNM), where the pathway is defined as the
minimum energy path interpolating between two
distinct ENM minima.37,38 Protein unfolding has
also been treated within the ENM through a pertur-
bative correlation analysis where, with each itera-
tion, the residue pair undergoing the largest fluctua-
tions has its contact broken.39 Unfolding pathways
are thought to mirror folding pathways,40 and the
analysis of Su et al.39 show the contact matrix evolu-
tion to be in broad agreement with experimental
and theoretical observations.
When modeling the folding pathway, it is no lon-
ger possible to apply the normal mode analysis of
ENM as the motions are no longer small oscillations
about the native state. As mentioned above, unfold-
ing can be investigated within an iterative ENM
model, but it is difficult to see how to apply this iter-
ative approach to folding. Alternatively, the full clas-
sical equations of motion of the ENM can be solved
numerically to simulate global folding. In particular,
we can start from a completely unstructured protein
represented by a random noncontacting Ca backbone
and use numerical iteration to follow the evolution
of the protein fold. However, this leads to an
unphysical rapidly escalating kinetic energy compo-
nent as new residues are continuously brought
within the influence radius. To resolve this problem,
we introduce a damped network model (DNM) with
a friction or damping term modeling energy dissipa-
tion to the water environment in which the protein
is folding. The resulting folding path, defined as a
series of structural transitions eventually converging
to the native configuration, has many features in
common with experimental observations and full
atom simulations. In contrast to the unfolding path-
way derived through sequential release of unstable
couplings,39 the present approach is not sequential
and many folding events occur throughout the pro-
tein at the same time, which is a more realistic
scenario.
Local minima along the folding pathway are a
common problem with folding simulation and there
are instances with the DNM where the protein stalls
at a non-native structure. Within this deterministic
model, the stalling is a consequence of the initial
random conformation, but the space of initial config-
urations leading to folding is sufficiently large for
this not to be a critical problem. The reason for this
is that we are not following a simple steepest
descent minimization strategy but following a
vibrating polymer that is sampling configurations
away from the local minima. This is similar to the
role played by the temperature in Monte Carlo simu-
lations,41 but we are not restricted to generate local
moves and the protein is folding all at once.
In many instances, proteins adopt multiple con-
formations and can dynamically switch between con-
formations. Conformational switches are intimately
related to protein function, where, for example, a
ligand binding event can result in a remote or allo-
steric conformational change in the protein that
leads to a down stream signaling event as the pro-
tein is switched from an inactive to an active confor-
mation.42 Such multiple conformations have been
studied extensively and have led some researchers
to further separate sequence from structure and
question the discretization of structural domains.43
The transition dynamics from one structure to
another can also be modeled by a network model but
2452 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics
with two discrete minima. In particular, the struc-
tural transition path of adenylate kinase (AKE) has
been studied within the context of a double-well
PNM where the transition pathway is defined as a
minimal energy interpolation between the two min-
ima representing the open and closed configura-
tions.37,38 We argue that this transition can also be
modeled with the classical equations of motion of a
double-well DNM. In particular, starting with the
protein in one potential minimum, we introduce an
asymmetry in the double-well potential destabilising
the protein and eventually leading it to fall into the
lower energy conformation. The transition dynamics
are parameterized by the environmental viscosity,
the influence radius, and the potential asymmetry,
which together trigger and dictate the nature of the
transition events.
We first present the DNM folding pathways for
two extensively studied proteins, barnase, and chy-
motrypsin. The DNM method is not restricted to
small proteins, and we illustrate the folding path-
way for a larger protein, a serine lipase. The asym-
metric double-well version of DNM is then intro-
duced and shown to model the allosteric transition
in AKE. There follows a Discussion section. The
mathematical framework behind the DNM approach
is given in the Methods section at the end.
ResultsThere is a wealth of data on the intermediate states
that proteins adopt on the way to their native fold
or alternatively from the native fold to denatured
state.44 The most direct measurements come from
NMR experiments where one can follow the appear-
ance/disappearance of interactions with changing
temperature or environmental pH. This data can be
presented in the form of an evolving contact matrix
and we can make a direct comparison with our path-
ways. Computationally, intensive full atom MD sim-
ulations have shown good agreement with NMR
studies, at least for small proteins, and these can
also be compared with our approach. In what follows
we will look first at two small proteins that have
been extensively studied in terms of their folding
pathways and then we will describe the folding
pathways of a larger protein.
Barnase
Barnase is an ab-protein that has been the subject
of many folding simulation and NMR experi-
ments.44–47 The crystal consists of 108 residues and
the secondary structure consists of: a1(7–17),loop1(18–26), a2(27–33), loop2(34–41), a3(42–45),b1(49–56), loop3(57–68), b2(69–76), loop4(77–85),
b3(86–91), b4(95–99), b5(105–108). Starting with a
random noninteracting fold and solving the DNM
equations of motion, we obtain our predicted folding
path. A representative folding path is illustrated in
Figure 1 giving the Ca backbone structure at vari-
ous values of the native fold contact fraction, Q.
Native contacts were defined as between residues
that are more than three residues apart along the
chain and within 8 A of each other. To get a full pic-
ture of the folding pathway, we performed 100 fold-
ing runs and calculated the average occupancy of
the contacts at various Q values. The average con-
tact matrices for various Q values are shown in Fig-
ure 2, and the evolution of the total native contact
fraction is shown in Figure 3. Fold evolution as a
function of Q is a description adopted in the MD
simulation study.47
The folding nucleation sites have been localized
to the b34 and b45 sheets through NMR and MD
simulations. It has also been shown that the a2, a3,and loop2 contacts are stable under multiple high-
temperature MD unfolding simulations.47 It is inter-
esting to compare these results with our methodol-
ogy especially as the unfolding of this protein has
been successfully modeled by an iterative normal
Figure 1. The folding pathway for barnase. Ca stick
structures are shown for various stages of the fold. Fold
stages are measured by the fraction of native contacts, the
Q fraction, and contacts are defined as non-neighboring
residues within 8 A of each other. The three N-terminal
a-helices emerge at Q ¼ 0.1. The first nonlocal secondary
structure contacts constituting the fold nucleation sites are
the C-terminal b34 and b45 sheets emerging at Q ¼ 0.3. At
Q ¼ 0.5, we see the emergence of the a2:a3 contacts as
the C-terminal residues begin to acquire a globular
configuration. When the fold is 70% complete, there are
two distinct globular structures comprising the N-terminal
b-sheet quartet and the C-terminal a-helix bundle. Finally,
these two clusters join up through the zipping up of the
b12, then the loop4:a3, and subsequently the b23:a1interactions.
Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2453
mode analysis.39 The small-scale mobility of Barnase
has been the subject of a coarse-grained network
model analysis using rigid substructure decomposi-
tion that shows good agreement with NMR experi-
ments.48 Under the classical DNM oscillator equa-
tions of motion, the fold can be illustrated by the
contact map evolving with Q, Figure 2. The initial
secondary structure contacts correspond to the three
N-terminal helices, Q ¼ 0.1. The fold nucleation con-
tacts corresponding to the non-neighboring second-
ary structure interactions of the b34 and b45 sheets
together with the second and third helices and the
loop2 interactions emerge at Q ¼ 0.3. This is in
agreement with NMR and MD simulations.47 The
early folding of the first helix is mirrored in its sta-
bility in unfolding MD simulations and this is in
contrast to the ENM iterative unfolding result.39
The folding rate profile is given in Figure 3 and this
shows a greater folding rate at the beginning and
end of the fold. Of the 100 random initial conforma-
tions, 48 resulted in full folding to the native fold
and the results presented are the averages over com-
plete folds, the average iteration count per fold was
7466. Reaching the native fold was defined as native
contact occupancy of greater than 95%. A run was
judged to stall in a local minimum away from the
native fold if the root-mean-squared distance matrix
difference (RMSD) to native fold oscillates above the
cut-off (1 A) for a sufficient number of iterations.
One could equally well define fold termination as
the fold to native RMSD falling below 1 A with
essentially the same results. When the influence ra-
dius was lowered, the folding success rate is dimin-
ished, with 8% for a 13 A influence radius (19,783
steps) and 26% for a 14 A influence radius (13,854
steps). A representative folding animation (100 steps
per frame) is given as a Supporting Information
Movie S1.
ChymotrypsinAnother well-studied folding pathway is that of the
chymotrypsin inhibitor 2, protein data bank (PDB)
accession 2CI2.39,46,49–51 This is a relatively small
molecule with 64 residues and a secondary structure
progression: a1(12–24), b1(28–34), loop1(35–34),
b2(45–52), b3(61–64). The DNM folding path contact
maps are shown in Figure 4. Of 100 random initial
noninteracting folds, 88 complete folding paths
resulted. NMR and MD simulations have established
the a-helix and a central hydrophobic cluster (32–
37) as the first contacts established in the folding
Figure 2. The average contact matrices for the barnase
fold. Multiple independent folding runs starting from
noncontacting random folds were performed and 48 of 100
reached the native conformation. The average contact
matrices are shown at various Q values and the native
contact matrix is shown at the bottom right. The N-terminal
a-helices are the first secondary structures to appear. The
fold nucleates about the b34 and b45 sheets, the first
nonlocal secondary structure interactions. These plots show
striking similarity to the MC simulations.47
Figure 3. The folding rate of barnase under DNM. Shown is
a linear plot of the native contact fraction against the
folding time. The data comes from the 100 folding runs
illustrated in Figure 2 and the error bars represent the
standard deviation. As with all the folding pathways we
have simulated the folding rate is greater at the beginning
and end of the fold.
2454 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics
path.49 The DNM simulations are consistent with
this scenario with the a1 and Leu32-Thr37 contacts
emerging at Q ¼ 0.1 plot in Figure 4. One recurring
feature of the simulated unfolding path of 2CI2 is
the stability of the b12 sheet relative to the b23sheet.39,49–51 Within our simulation, this contact
cluster appears at Q ¼ 0.3 but only emerges strongly
at Q ¼ 0.5 and after the b23 contacts are estab-
lished. This may reflect an incomplete mirroring of
the folding versus unfolding pathways. However,
running the DNM folding simulation with the
smaller influence radius of 13 A, the b12 contacts
emerge at Q ¼ 0.1 and before the b23 cluster, see
Figure 5. In this case, only 12% of the folding paths
are complete and for folding to be successful, an
early contact cluster that does not nucleate at resi-
dues close on the chain, the b12 cluster, needs to be
established. Thus the influence radius can have an
effect on the fold progression and we will see this
later on for models of allosteric switching. A repre-
sentative folding animation (100 steps per frame,
Rc ¼ 15 A) is given as a Supporting Information
Movie S2.
Hydrolase
To illustrate the generality of this approach, we
folded the relatively large globular serine lipase
belonging to the hydrolase fold class, PDB accession
1TIB. This protein is an ab-protein consisting of 269
residues. The folding pathway is illustrated in Fig-
ure 6. Multiple folding runs were carried out, and of
100 random starts, there were 26 complete folds,
taking on average 28,435 iterations of the DNM
equations of motion. The folding path is illustrated
in Figure 6 and the corresponding movie is given as
Supporting Information. The striking feature here is
the early emergence of the a-helices. The fold nucle-
ation sites are predicted to be the b34 and b56 sheet
contacts. The fold proceeds by the formation of two
Figure 4. The folding path of chymotrypsin. Multiple
independent folding runs starting from noncontacting
random folds were performed and 88 of 100 reached the
native conformation. The average contact matrices are
shown at various Q values and the native contact matrix is
shown at the bottom right. The a-helix and the central
hydrophobic cluster (HC) are the first secondary structure
elements to emerge followed by the b23 sheet. The well-
established stability of the b12 contacts during unfolding is
only partially mirrored in the folding simulation as this
contact cluster emerges after the antiparallel b23 contacts.
Figure 5. Early emergence of b12 in the folding of
chymotrypsin. When the influence radius is reduced to 13 A,
the b12 contacts emerge earlier than for Rc ¼ 15 A and they
can be seen at Q ¼ 0.1 in B and not in A for the larger
influence radius. The folding rates for the two influence radii
are shown in C. The two folding paths differ in the relative
amounts of the fold achieved in the early and late periods of
the fold. The error bars are the standard deviations from 88
folding runs for Rc ¼ 15 A and 12 folding runs for Rc ¼ 13 A.
Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2455
distinct globular domains, with the C-terminal do-
main nucleating around the b10-11 sheet. A repre-
sentative folding animation (100 steps per frame) is
given as a Supporting Information Movie S3.
Conformation Switch
AKE is an enzyme catalyzing the interconversion of
adenine nucleotides. On binding either AMP or ATP,
it undergoes a large conformational transition where
the nucleoside monophosphate binding (NMPbind)
domain and the LID domain fold into the large
CORE domain,52–55 see Figure 7(A). This transition
has been studied experimentally by florescence reso-
nance energy transfer and there is structural data
available on homologous proteins that have struc-
tures lying between the opened and closed conforma-
tions. There have been many coarse-grained model
studies of the AKE structural transition pathway.
Some groups have developed iterative normal mode
interpolations, based on Ca and full-atom rigid sub-
structure models.32–36 Seelinger et al.56 generate an
ensemble of possible deformations based on distance
constraints and show that the AKE transition over-
laps with a subset of deformations. Transition path-
ways have also been defined as saddle point solu-
tions of double-well potentials connecting the two
ENMs corresponding to the alternative conforma-
tions.57–59 The treatment closest to our approach is
the double-well PNM where the transition pathway
is defined as a minimal energy interpolation
Figure 6. The folding pathways for a larger protein. The
269 residue lipases serve as a good illustration of the DNM
methodology. For 100 folding runs, 26 resulted in native
conformations and the average contact matrices at various
Q fractions are given together with the native contact
matrix center right. From this, it can be hypothesized that
the b34 and b56 sheets constitute the folding nucleus as
they are the first nonlocal (nonhelical) secondary structure
to emerge. It is also apparent that the fold proceeds via the
formation of N-terminal and C-terminal (nucleating at b10-11) globules, which eventually merge through the zipping
up of the b47 and then the folding of the C-terminus onto
the first helix. The fold nucleating b-sheets are highlighted
in gray in the ribbon diagram below.
Figure 7. The allosteric switch in AKE. The allosteric switch
in adenylate kinase can be modeled with an asymmetric
double-well potential. The opened and closed conformations
corresponding to PDB accessions 4AKE(A) and 1AKE(A) is
shown in A, with the NMPbind and LID domains highlighted
in gray. The potential function is plotted at the bottom for
various asymmetry parameters. The protein starts in the top
minimum and rolls into the bottom minimum. The RMSD to
the opened and closed conformations are plotted against
transition time in B. The transition time increases with
decreasing double-well barrier asymmetry, a, and eventually
there is no transition. The transition path switches between
the NMPbind closing first, C, at Rc � 10.5 A and the LID
closing first, D, for Rc � 10.5 A.
2456 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics
between the two minima representing the open and
closed configurations.37,38 Both treatments with the
simple double-well potential find the LID region
closing before the NMPbind region. Maragakis and
Karplus37 show that AKE approaches intermediate
structures along the transition path. A more com-
plex potential used in the double-well network model
(DWNM) of Chu and Voth38 results in two minimal
free-energy paths depending on the minimization
procedure and one of these has the NMPbind region
closing before the LID region. This latter pathway
has also been shown in recent simulation studies to
characterize the unbound as opposed to bound AKE
closure.60,61
Structural transitions can be modeled within the
DNM with an asymmetric double well. The opened
and closed conformations of AKE correspond to PDB
accessions 4AKE (chain A) and 1AKE (chain A),
respectively. The double-well potential is the same as
described above37,38 with an additional asymmetry
factor such that a protein initially in the high-energy
minimum will over time slip out of this state and
migrate to the lower minimum, see Methods Eqs. (13)
and (14). As expected, the transition is faster for
larger influence radii and requires a minimal asym-
metry parameter. The transition path in terms of the
RMSD between the intermediate fold and the termi-
nal conformations is given in Figure 7(B). The transi-
tion path time increases with decreasing asymmetry
parameter a, below a ¼ 0.66, the transition no longer
occurs. To investigate the LID and NMPbind closure,
we divide the residues of AKE into three regions
NMPbind (48–55), CORE (170–214), and LID (121–
160). Following Chu and Voth,38 we follow the closure
with the parameter do � dð Þ= do � dcð Þ, where do/c is
the distance between the LID/NMPbind centroid and
the CORE centroid in the opened/closed configura-
tion, see Figure 7(C,D). Interestingly, for an influ-
ence radius of 10.5 A and above, the NMPbind do-
main closes before the LID domain and this
pathway is remarkably similar to that observed in
the DWNM treatment.38 However, for influence radii
below 10.5 A, the LID domain closes before the
NMPbind domain in agreement with the results of
the PNM treatment.37,38 This order swap does not
occur with varying the asymmetry parameter, Fig-
ure 7(C). As this pathway depends critically on the
influence radius, we looked to the crystal data to see
if the B-factors can be used to fix the influence ra-
dius through Eqs. (7) and (15). We find that for the
closed structure, the correlation of fluctuations with
B is maximal at Rc ¼ 10 A whereas for the opened
conformation, this correlation is relatively
unchanged over the range 9–20 A. This does not
allow us to unambiguously fix Rc and so we have to
fall back on the conclusion in Chu and Voth38 and
posit the coexistence of the two pathways. The fold-
ing animation for Rc ¼ 15 A (20 steps per frame) is
given as a Supporting Information Movie S4 and the
animation for Rc ¼ 10 A (100 steps per frame) is
given as a Supporting Information Movie S5.
DiscussionWe have presented a DNM description of protein dy-
namics and applied it to protein folding and large-
scale structural transitions. The DNM can be viewed
as a reduction of Go model dynamics or a simple
extension of the ENM. The model presented here dif-
fers from Go models in that the bonded and non-
bonded interactions are both modeled with a quad-
ratic potential that extends over a finite radius of
influence with the oscillations being damped by a fric-
tion term. Within ENM, the structure oscillates about
the native conformation and the results are in good
agreement with dynamical data encoded in the crys-
tallographic B-factors. The ENM has also served as a
powerful tool for the pinning down allosteric cou-
plings between protein residues and predicting possi-
ble ligand binding sites. Large-scale protein motions
associated with allosteric switches have been modeled
with an iterative ENM and a plastic extension of the
ENM, the PNM. Directly relevant to the present
study, protein unfolding has been modeled with an
iterative ENM strategy. Our coarse-grained DNM
model can be viewed as a further extension of these
approaches in that the protein evolves according to
the classical equations of motion. This is not a
straight forward energy minimization strategy that
would be beset by local energy minima but a model
where the protein has a kinetic component that is
continuously being dissipated through a friction term.
Consequently, the local minima stalling frequency is
relatively low as the protein is always sampling
nearby conformations away from the local minima.
The model has relatively few parameters and is easy
to implement. It appears that there is good agreement
with folding pathways that have been reported in the
literature, with striking similarity to the reported
folding paths of barnase and chymotrypsin. The
method can equally well be applied to proteins of arbi-
trary length, and we present the folding path of the
269 residue lipase as an example. Our methodology
for modeling structural transitions is an extension of
the PNM potential to include an asymmetry parame-
ter in the double-well potential so that under the clas-
sical equations of motion, the protein slips out of the
higher energy minimum and ends up in the lower
energy minimum. Again we illustrate and validate
the methodology with a well-studied system, the allo-
steric switch in AKE. In the structural transition of
AKE, there are two alternative pathway topologies
depending on which of the LID or NMPbind domains
closes first. We have shown that there is a switch
between the two pathways as the influence radius
passes through 10.5 A and also defined the minimal
asymmetry parameter for the transition to occur.
Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2457
All the results presented in this study are
sequence independent and this has proved to be a suc-
cessful simplification of the protein dynamics within
coarse-grained network models. However, the contri-
bution of each residue along the chain can be exam-
ined by varying the spring constant coupling and this
has been done for the allosteric switch in GroEL.62 In
our case, it will be interesting to see whether residue-
specific spring constant variation leads to different
folding pathways and/or success rates, and whether
the DNM methodology can then be extended to look
at mutational instability and protein misfolding. This
is the subject of a work in preparation (AJT and GW).
We conclude on a technical note. The ENM poten-
tial is closely related to a global minimization function
which is commonly exploited in solving the molecular
distance geometry problem (MDGP),63 which deter-
mines the native structure of a protein based only on
distance data between all atoms or a subset of atoms.
The approach described in this article can be exploited
to solve the general MDGP and related problems in a
global nonperturbative fashion that competes well with
other MDGPmethods (AJT and GW, in preparation).
Methods
The ENM potential between two residues i and j is
given by
Vij ¼ 1
2kijD
2ij; (1)
where
Dij ¼ dij � d0ij; (2)
and
dij ¼ ~ri �~rj�� �� (3)
with~ri is the position vector of the ith Ca along the
protein chain. The zero superscript refers to the sta-
ble native configuration about which the protein oscil-
lates and kij is the set of harmonic spring constants.
In the simplest incarnation of ENM, the spring
constants are taken to be residue independent.19,20 In
practice, the nearest neighbor separation must fluctu-
ate over a narrow range restricted by the covalent bond
architecture linking neighboring Ca atoms, so that we
define
ki;i61 ¼ 100k;
kij; i�jj j>1 ¼ k: (4)
In the absence of this relative rigidity in the
neighboring atom fluctuations, the intermediate fold
conformations have unphysically large nearest
neighbor separations. Interestingly, the folding effi-
ciency is also greatly enhanced when nearest neigh-
bor atoms are forced to stay within physical bounds.
Only residues within a given sphere of influ-
ence, Rc, can be considered to physically interact
with each other so that the ENM spring constant
vanishes for well-separated residues:
kij dij > Rc� � ¼ 0: (5)
The dynamics is partly driven by a repulsive
term when dij � Rc and d0ij > Rc. For large Rc, it
could be argued that there is an unphysically large
repulsive term. However, when the repulsive contri-
bution is rendered constant by scaling k � Rcd0ij
� �2
for
dij � Rc and d0ij > Rc, we obtain essentially the same
folding and transition pathways.
Provided fluctuations are small about the equilib-
rium conformation, we can approximate the potential
to quadratic order. Here, V � V0 þ 12
Pij;lm
drli Hlmij dr
mj ,
where H is the Hessian matrix and drli are the fluc-
tuations about the native conformation.
This now enables us to exactly solve for the
thermodynamic fluctuations
Clmij ¼ drli dr
mj
D E¼ kBTH
�1lmij ; (6)
where kB is Boltzman’s constant and T is tempera-
ture, and thus relate the parameters of the theory to
the crystallographic temperature factors, defined as
Bi ¼ 8p23 dr2i� �
.64 Specifically, the spring constant is
then given by
k ¼8p2kBT
PlH�1ll
ii k ¼ 1ð Þ* +
3 Bih i ; (7)
where brackets refer to averaging over the protein
chain.
Now we will consider the coarse-grained protein
model as a deterministic classical system governed
by the equations of motion, m€~r ¼ � @V@~r . Explicitly:
€~ri ¼ �Xj 6¼i
2
mkijdijDij; (8)
where the ‘‘hat’’ refers to the unit vector. To dissipate
the kinetic energy component, we introduce a friction
or damping term, l, into the equations of motion:
€~ri ¼ �Xj 6¼i
2
mkijdijDij � l _~ri: (9)
These are the DNM equations of motion and
apart from the native target structure, the solutions
2458 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics
will depend on the spring constant per unit mass,
the damping term, and the influence radius. Their
values will be discussed in the next section.
In practice, the differential equations Eq. (9) are
solved by the symplectic Euler method65
~vnþ1 ¼~vn � d ~@V rnð Þ þ l~vn�
~rnþ1 ¼~rn þ d~vnþ1; (10)
where n is the iteration number and d is the time
increment.
When the protein can adopt two native confor-
mations ‘‘a’’ and ‘‘b,’’ the potential will be eitherPij
12 kijD
a2
ij orPij
12kijD
b2
ij depending on which is mini-
mal, Da;bij ¼ dij � da;b
ij
��� ��� and da;bij correspond to the Ca
distances for the two conformations. That is
V ¼Xij
1
4kij Dij
2a þDij
2b � Dij
2a �Dij
2b
�� ��� �: (11)
To ensure smooth interpolation between the two
minima, we introduce a small parameter e:37,38
V ¼Xij
1
4kij Dij
2a þDij
2b �
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDij
2a �Dij
2b
� �2þe2q� �
: (12)
For a transition to occur within the context of clas-
sical trajectories, we introduce a destabilising term that
favors one minimum over the other. The simplest way
to do this is via an asymmetry parameter a. Specifically,
V ¼Xij
1
4kij Dij
2a þDij
2b þ a�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDij
2a �Dij
2b þ a
� �2þe2q� �
;
(13)
with the double-well classical equations of motion
given by
€~ri ¼ �Xj 6¼i
1
mkijdij
� Da
ij þDbij �
�Da
ij �Dbij
��Da2
ij �Db2ij þ a
�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�Da2
ij �Db2ij þ a
�2 þ e2q
!� l _~ri: ð14Þ
Of course, an isolated residue sitting in the
high-energy minimum will not end up in the lower
minimum as there is an energy barrier to overcome.
However, for an allosteric transition, there is in gen-
eral a set of residues within the protein for which
the transition path is short enough for the barrier to
vanish and it is these residues that drive the transi-
tion. It will be shown that transitions only occur for
a large enough asymmetry.
We now deal with the parameterization of the
model. Folding within the DNM is achieved by initiat-
ing the protein chain in a random completely nonin-
teracting configuration and then following the trajec-
tories of the Ca nodes under the equations of motion.
At each stage, only those nodes within the influence
radius are visible to the individual residues, Eq. (5),
and as the protein collapses, more residues come
within the influence radius and there is a build-up of
kinetic energy that is dissipated via a friction term
that models the water environment, Eq. (9).
Based on protein interaction statistical potential
models, see for example,66,67 we take the influence ra-
dius to be 15 A. For larger radii, the protein will fold
faster but such large range interactions are unphysi-
cal. Within ENM, the radius of influence is chosen to
maximize the correlation of the normal mode fluctua-
tion magnitudes with the crystallographic tempera-
ture B-factors.19,20 However, the correlation varies
with structure and we find maximal correlations for
chymotrypsin (2CI2) 15 A, barnase (1A2P) 7 A, hydro-
lase (1TIB) 12–20 A, and for AKE, the correlation is
maximal at 10 A for the closed conformation (1AKE)
and 9–20 A for the opened conformation (4AKE). With-
out crystallographic data on the intermediate fold con-
formations, it is difficult to derive Rc from ENM analy-
sis and we are forced to rely on potential model
parameters. We find that if the influence radius falls
below 15 A, then the folding probability is substan-
tially reduced and folding time increased. However, as
will be shown below, when considering DNM models of
allosteric transition, the transition path can undergo a
dramatic change with a lower Rc. We take the spring
constant per unit mass to be 1/2, which determines the
time scale of the problem. Statistical thermodynamics
allow us to fix the parameters of the theory based on
crystallographic B-factors, Eq. (7). So, with our choice
of spring constant per unit mass, we have time units of
t ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3m Bh i16p2kBT H�1 k¼1ð Þh i
q, where m is the mass of the reso-
nating node, in our case, that of the amino acid. In
the numerical iterations, Eq. (10), we set the time in-
crement to be 0.01. The damping term is set to unity.
For a small damping term, the iteration becomes
unstable and for a large damping term, the fold is
slow. For the double-well potential, we set the
smoothing factor, Eq. (12), to be 0.1 and perform sim-
ulations with various choices of asymmetry parame-
ter, a. All programs were written in C and run on a
PC. Movies were generated in gif format by importing
folding coordinate frames into Maple (Waterloo
Maple, Maplesoft). Structure images were generated
by DS ViewerPro software (Accelrys).
References
1. Orengo CA, Michie, AD, Jones S, Jones, DT, SwindellsMB, Thornton JM (1997) CATH—a hierarchic classifi-cation of protein domain structures. Structure 5:1093–1108.
Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2459
2. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, MurzinAG, Chothia, C (2000) SCOP: a structural classificationof proteins database. Nucleic Acids Res 28:257–259.
3. Monod J, Wyman J, Changeux JP (1965) On the natureof allosteric transitions: a plausible model. J Mol Biol12:88–118.
4. Koshland DE, Jr, Nemethy G, Filmer D (1966) Compar-ison of experimental binding data and theoretical mod-els in proteins containing subunits. Biochemistry 5:365–385.
5. del Sol A, Tsai CJ, Ma B, Nussinov R (2009) The originof allosteric functional modulation: multiple pre-exist-ing pathways. Structure 17:1042–1050.
6. Tsai CJ, Del Sol A, Nussinov R (2009) Protein allostery,signal transmission and dynamics: a classificationscheme of allosteric mechanisms. Mol Biosyst 5:207–216.
7. Sanchez-Ruiz JM (2010) Protein kinetic stability. Bio-phys Chem 148:1–15.
8. Alm E, Baker D (1999) Matching theory and experi-ment in protein folding. Curr Opin Struct Biol 9:189–196.
9. Perl D, Welker C, Schindler T, Schroder K, MarahielMA, Jaenicke R, Schmic FX (1998) Conservation ofrapid two-state folding in mesophilic, thermophilic andhyperthermophilic cold shock proteins. Nat Struct Biol5:229–235.
10. Taketomi H, Ueda Y, Go N (1975) Studies on proteinfolding, unfolding and fluctuations by computer simula-tion. I. The effect of specific amino acid sequence repre-sented by specific inter-unit interactions. Int J PeptProtein Res 7:445–459.
11. Abe H, Go N (1981) Noninteracting local-structuremodel of folding and unfolding transition in globularproteins. II. Application to two-dimensional lattice pro-teins. Biopolymers 20:1013–1031.
12. Go N, Abe H (1981) Noninteracting local-structuremodel of folding and unfolding transition in globularproteins. I. Formulation. Biopolymers 20:991–1011.
13. Hills RD, Jr, Lu L, Voth GA (YEAR) Multiscale coarse-graining of the protein energy landscape. PLoS ComputBiol 6:e1000827.
14. Karanicolas J, Brooks CL, III (2002) The origins ofasymmetry in the folding transition states of protein Land protein G. Protein Sci 11:2351–2361.
15. Best RB, Hummer G (YEAR) Coordinate-dependent dif-fusion in protein folding. Proc Natl Acad Sci USA 107:1088–1093.
16. Sulkowska JI, Cieplak M (2008) Selection of optimalvariants of Go-like models of proteins through studiesof stretching. Biophys J 95:3174–3191.
17. Bereau T, Deserno M (2009) Generic coarse-grainedmodel for protein folding and aggregation. J ChemPhys 130:235106.
18. Tirion MM (1996) Large amplitude elastic motions inproteins from a single-parameter, atomic analysis.Phys Rev Lett 77:1905–1908.
19. Bahar I, Atilgan AR, Erman B (1997) Direct evaluationof thermal fluctuations in proteins using a single-pa-rameter harmonic potential. Fold Des 2:173–181.
20. Haliloglu T, Bahar I, Erman B (1997) Gaussian dynam-ics of folded proteins. Phys Rev Lett 79:3090–3093.
21. Klepeis JL, Lindorff-Larsen K, Dror RO, Shaw DE(2009) Long-timescale molecular dynamics simulationsof protein structure and function. Curr Opin StructBiol 19:120–127.
22. Bahar I, Jernigan RL (1997) Inter-residue potentials inglobular proteins and the dominance of highly specifichydrophilic interactions at close separation. J Mol Biol266:195–214.
23. Balabin IA, Yang W, Beratan DN (2009) Coarse-grainedmodeling of allosteric regulation in protein receptors.Proc Natl Acad Sci USA 106:14253–14258.
24. Williams G (2010) Elastic network model of allostericregulation in protein kinase PDK1. BMC Struct Biol10:11.
25. Ming D, Wall ME (2005) Allostery in a coarse-grainedmodel of protein dynamics. Phys Rev Lett 95:198103.
26. Ming D, Cohn JD, Wall ME (2008) Fast dynamics per-turbation analysis for prediction of protein functionalsites. BMC Struct Biol 8:5.
27. Ming D, Wall ME (2005) Quantifying allosteric effectsin proteins. Proteins 59:697–707.
28. Ming D, Wall ME (2006) Interactions in native bindingsites cause a large change in protein dynamics. J MolBiol 358:213–223.
29. Yang L, Song G, Carriquiry A, Jernigan RL (2008)Close correspondence between the motions from princi-pal component analysis of multiple HIV-1 proteasestructures and elastic network modes. Structure 16:321–330.
30. Yang L, Song G, Jernigan RL (2007) How well can weunderstand large-scale protein motions using normalmodes of elastic network models? Biophys J 93:920–929.
31. Xu C, Tobi D, Bahar I (2003) Allosteric changes in pro-tein structure computed by a simple mechanical model:hemoglobin T<-->R2 transition. J Mol Biol 333:153–168.
32. Schuyler AD, Jernigan RL, Qasba PK, RamakrishnanB, Chirikjian GS (2009) Iterative cluster-NMA: a toolfor generating conformational transitions in proteins.Proteins 74:760–776.
33. Kirillova S, Cortes J, Stefaniu A, Simeon T (2008) AnNMA-guided path planning approach for computinglarge-amplitude conformational changes in proteins.Proteins 70:131–143.
34. Korkut A, Hendrickson WA (2009) Computation of con-formational transitions in proteins by virtual atom mo-lecular mechanics as validated in application toadenylate kinase. Proc Natl Acad Sci USA 106:15673–15678.
35. Miyashita O, Onuchic JN, Wolynes PG (2003) Nonlin-ear elasticity, proteinquakes, and the energy land-scapes of functional transitions in proteins. Proc NatlAcad Sci USA 100:12570–12575.
36. Feng Y, Yang L, Kloczkowski A, Jernigan RL (2009)The energy profiles of atomic conformational transitionintermediates of adenylate kinase. Proteins 77:551–558.
37. Maragakis P, Karplus M (2005) Large amplitude con-formational change in proteins explored with a plasticnetwork model: adenylate kinase. J Mol Biol 352:807–822.
38. Chu JW, Voth GA (2007) Coarse-grained free energyfunctions for studying protein conformational changes:a double-well network model. Biophys J 93:3860–3871.
39. Su JG, Li CH, Hao R, Chen WA, Wang CX (2008) Pro-tein unfolding behavior studied by elastic networkmodel. Biophys J 94:4586–4596.
40. Daura X, Jaun B, Seebach D, van Gunsteren WF,Mark AE (1998) Reversible peptide folding in solutionby molecular dynamics simulation. J Mol Biol 280:925–932.
41. Elofsson A, Le Grand SM, Eisenberg D (1995) Localmoves: an efficient algorithm for simulation of proteinfolding. Proteins 23:73–82.
42. Huse M, Kuriyan J (2002) The conformational plastic-ity of protein kinases. Cell 109:275–282.
2460 PROTEINSCIENCE.ORG Damped Network Model of Protein Dynamics
43. Burra PV, Zhang Y, Godzik A, Stec B (2009) Global dis-tribution of conformational states derived from redun-dant models in the PDB points to non-uniqueness ofthe protein structure. Proc Natl Acad Sci USA 106:10505–10510.
44. Fersht AR (1993) The sixth Datta Lecture. Protein fold-ing and stability: the pathway of folding of barnase.FEBS Lett 325:5–16.
45. Matthews JM, Fersht AR (1995) Exploring the energysurface of protein folding by structure-reactivity rela-tionships and engineered proteins: observation of Ham-mond behavior for the gross structure of the transitionstate and anti-Hammond behavior for structural ele-ments for unfolding/folding of barnase. Biochemistry34:6805–6814.
46. Li A, Daggett V (1998) Molecular dynamics simulationof the unfolding of barnase: characterization of themajor intermediate. J Mol Biol 275:677–694.
47. Shinoda K, Takahashi K (2007) Retention of local con-formational compactness in unfolding of barnase; con-tribution of end-to-end interactions within quasi-modules. Biophysics 3:1–12.
48. Gohlke H, Thorpe MF (2006) A natural coarse grainingfor simulating large biomolecular motion. Biophys J 91:2115–2120.
49. Kazmirski SL, Wong KB, Freund SM, Tan YJ, FershtAR, Daggett V (2001) Protein folding from a highly dis-ordered denatured state: the folding pathway of chymo-trypsin inhibitor 2 at atomic resolution. Proc Natl AcadSci USA 98:4349–4354.
50. Ozkan SB, Dalgyn GS, Haliloglu T (2004) Unfoldingevents of Chymotrypsin Inhibitor 2 (CI2) revealed byMonte Carlo (MC) simulations and their consistencyfrom structure-based analysis of conformations. Poly-mer 45:581–595.
51. Li L, Shakhnovich EI (2001) Constructing, verifying,and dissecting the folding transition state of chymo-trypsin inhibitor 2 with all-atom simulations. Proc NatlAcad Sci USA 98:13014–13018.
52. Dzeja PP, Zeleznikar RJ, Goldberg ND (1998) Adenyl-ate kinase: kinetic behavior in intact cells indicates itis integral to multiple cellular processes. Mol Cell Bio-chem 184:169–182.
53. Muller CW, Schlauderer GJ, Reinstein J, Schulz GE(1996) Adenylate kinase motions during catalysis: anenergetic counterweight balancing substrate binding.Structure 4:147–156.
54. Muller CW, Schulz GE (1992) Structure of the complexbetween adenylate kinase from Escherichia coli and
the inhibitor Ap5A refined at 1.9 A resolution. A modelfor a catalytic transition state. J Mol Biol 224:159–177.
55. Whitford PC, Onuchic JN, Wolynes PG (2008) Energylandscape along an enzymatic reaction trajectory:hinges or cracks? HFSP J 2:61–64.
56. Seelinger D, Haas J, de Groot BL (2007) Geometry-based sampling of conformational transitions in pro-teins. Structure 15:1482–1492.
57. Zheng W, Brooks BR, Hummer G (2007) Protein confor-mational transitions explored by mixed elastic networkmodels. Proteins 69:43–57.
58. Tekpinar M, Zheng W (YEAR) Predicting order of con-formational changes during protein conformationaltransitions using an interpolated elastic networkmodel. Proteins 78:2469–2481.
59. Franklin J, Koehl P, Doniach S, Delarue M (2007) Min-ActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locallyharmonic energy landscape. Nucleic Acids Res 35:W477–W482.
60. Beckstein O, Denning EJ, Perilla JR, Woolf TB (2009)Zipping and unzipping of adenylate kinase: atomisticinsights into the ensemble of open<-->closed transi-tions. J Mol Biol 394:160–176.
61. Daily MD, Phillips GN, Jr, Cui Q (YEAR) Many localmotions cooperate to produce the adenylate kinase con-formational transition. J Mol Biol 400:618–631.
62. Zheng W, Brooks BR, Thirumalai D (2007) Allosterictransitions in the chaperonin GroEL are captured by adominant normal mode that is most robust to sequencevariations. Biophys J 93:2289–2299.
63. Grosso A, Locatelli M, Schoen F (2009) Solving molecu-lar distance geometry problems by global optimizationalgorithms. Comput Optim Appl 43:23–37.
64. Atilgan AR, Durell SR, Jernigan RL, Demirel MC,Keskin O, Bahar I (2001) Anisotropy of fluctuation dy-namics of proteins with an elastic network model. Bio-phys J 80:505–515.
65. Press WH, Teukolsky SA, Vetterling W, Flannery BP(2007) Numerical recipes: The art of scientific comput-ing, Third Edition. New York, NY: Cambridge Univer-sity Press.
66. Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gasreference state improves structure-derived potentials ofmean force for structure selection and stability predic-tion. Protein Sci 11:2714–2726.
67. Devane R, Shinoda W, Moore, PB, Klein ML (2009) ATransferable coarse grain non-bonded interactionmodel for amino acids. J Chem Theory Comput 5:2115–2124.
Williams and Toon PROTEIN SCIENCE VOL 19:2451—2461 2461