by hanif bayat movahed a thesis submitted in conformity with the … · 2012. 11. 2. · hanif...
TRANSCRIPT
Free Energy Landscape of Protein-Like Chains Interacting
Under Discontinuous Potentials
by
Hanif Bayat Movahed
A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Chemistry
University of Toronto
Copyright c© 2011 by Hanif Bayat Movahed
Abstract
Free Energy Landscape of Protein-Like Chains Interacting Under Discontinuous
Potentials
Hanif Bayat Movahed
Doctor of Philosophy
Graduate Department of Chemistry
University of Toronto
2011
The free energy landscape of a protein-like chain is constructed from exhaustive simu-
lation studies using a combination of discontinuous molecular dynamics and parallel tem-
pering methods. The protein model is a repeating sequence of four kinds of monomers,
in which hydrogen bond attraction, electrostatic repulsion, and covalent bond vibrations
are modeled by step, shoulder and square-well potentials, respectively. These protein-
like chains exhibit a helical structure in their folded states. The model allows a natural
definition of a configuration by considering which beads are bonded. In the absence of a
solvent, the relative free energy of dominant structures is determined from the relative
populations, and the probabilities predicted from the calculated free energies are found
to be in excellent agreement with the observed probabilities at different temperatures.
The free energy landscape of the protein-like chain is analyzed and confirmed to have
funnel-like characteristics, confirmed by the fact that the probability of observing the
most common configuration approaches unity at low enough temperatures for chains
with fewer than 30 beads. The effect on the free energy landscape of an explicit square-
well solvent, where the beads that can form intra-chain bonds can also form (weaker)
bonds with solvent molecules while other beads are insoluble, is also examined. Simula-
tions for chains of 15, 20 and 25 beads show that at low temperatures, the most likely
structures are collapsed helical structures. The temperature at which collapsed helical
ii
structures become dominant is higher than in the absence of a solvent. Finally, the dy-
namics of the protein-like chain immersed in an implicit hard sphere solvent is studied
using a simple model in which the implicit solvent interacts on a fast time scale with the
chain beads and provides sufficient friction so that the motion of monomers is governed
by the Smoluchowski equation. Using a Markovian model of the kinetics of transitions
between conformations, the equilibration process from an ensemble of initially extended
configurations to mainly folded configurations is investigated at low effective tempera-
tures for a number of different chain lengths. It was observed that folding profiles appear
to be single exponentials and independent of temperature at low temperatures.
iii
To the memory of my Grandfathers,
Fereydoun Bayat Movahed, my role model in life for his vision, wisdom, charisma and morality
and
Mohammad Reza Roghani Zanjani for his kindness, productivity and hardworking character.
iv
Acknowledgements
Studying and working at the University of Toronto has been a wonderful life experience
for me. At UofT, besides earning experience through conducting research, teaching and
passing courses, I obtained valuable experience by becoming involved in policy devel-
opment in the academic board of the Governing Council and the Graduate Education
Council. Many people have a hand in my success, and they deserve to be named in
this acknowledgement, but naming all of them would require adding another chapter to
this thesis. However, I would like to express my sincere appreciations to the following
individual and organizations.
First, I should thank my supervisor, Prof. Jeremy Schofield, for giving me the chance
to work in his group and for all of his support, patience, guidance and friendship during
my PhD studies.
Thanks also go to Prof. Stuart Whittington and Prof. Gilbert Walker, the other
members of my advisory committee for all their advice and generous support during my
PhD program, and to my M.Sc. supervisor, Prof. Donald Sullivan, for his continuous
support.
I would like to extend very sincere thanks to Dr. Ramses van Zon whose help and
advice were essential for many challenging parts of my PhD project. Both when he was
part of the group and then when he joined SciNet, he always had time for my numerous
questions. He shared his knowledge with me in the most effective way and suggested very
smart and practical solutions. I should also thank my friend and colleague in CPTG, Dr.
Ali Nassimi, for his useful advice and comments regarding my PhD project.
I should express my deepest gratitude to the Department of Chemistry at the Univer-
sity of Toronto, Ontario Ministry of Training, Colleges and Universities, and the Natural
Science and Engineering Research Council of Canada for financial assistance; and also
SciNet HPC Consortium for providing outstanding computational facilities.
I owe much to my parents for their tremendous support during all my life. Their
v
continuous belief in me and their encouragement have helped me to move passionately
towards my goals. I will always remember their joy and happiness when they heard I
defended my thesis successfully. Beside this, great thanks go to my wonderful brothers
Saeed and Saber for being supportive and wise friends to me throughout our lives.
Finally, I should thank Fatemeh Jafargholi, my wife, for all she has done for me;
especially in the last year when her life was totally affected by my continuous struggle
with the project. I should thank her for understanding me and supporting me in all
aspects of my life during these years. Her love is the main asset of my life, and her
support, encouragement, comfort and suggestions are priceless.
Thank you.
vi
Contents
1 Introduction 1
1.1 Protein Folding Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Main Viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Energy Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Theoretical Studies of Protein Folding . . . . . . . . . . . . . . . . . . . 7
1.4.1 Using DMD to Study Protein Folding . . . . . . . . . . . . . . . . 8
1.5 Role of Solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Simulation Techniques 13
2.1 Different Simulation Techniques . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Molecular Dynamics (MD) . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Discontinuous Molecular Dynamics (DMD) . . . . . . . . . . . . . . . . . 19
2.2.1 Event Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Cell Crossing Event . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Collision Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Measurement Events . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Parallel Tempering Method . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
2.3.1 Parallel tempering . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Efficient Parallel Tempering Dynamics . . . . . . . . . . . . . . . 26
2.4 Simulation Structure of the Project . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Protein-like Chain Without a Solvent 31
3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Definition of configurations . . . . . . . . . . . . . . . . . . . . . . 36
3.1.2 Temperature independence of relative configurational entropies . . 38
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Parallel tempering efficiency . . . . . . . . . . . . . . . . . . . . . 40
3.2.2 Observed structures . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.3 Free energy landscape . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.4 Entropy and free energy calculation for the model B 25-bead chain 56
3.2.5 Entropy and free energy calculation for 35 beads protein-like chain 64
3.2.6 Effects of the protein-like chain length . . . . . . . . . . . . . . . 68
4 Protein-like Chain Inside a Solvent 75
4.1 The System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 The solvent model . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.1.2 Definition of Configuration . . . . . . . . . . . . . . . . . . . . . . 78
4.1.3 Simulation Structure . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Parallel tempering efficiency . . . . . . . . . . . . . . . . . . . . . 80
4.2.2 Phase of the solvent . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.3 Observed structures and Free energy landscape . . . . . . . . . . 89
4.2.4 Relative configurational entropy . . . . . . . . . . . . . . . . . . . 101
viii
5 Simple Dynamics Using Smoluchowski Equation 103
5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Smoluchowski dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.1 First passage time approach to rate constants . . . . . . . . . . . 108
5.2.2 Numerical test of microscopic rate expressions . . . . . . . . . . . 111
5.3 Markov model of configurational dynamics . . . . . . . . . . . . . . . . . 112
6 Conclusions, Summary and Future Work 118
6.1 Free Energy Landscape in the Absence of a solvent . . . . . . . . . . . . 118
6.2 Free Energy Landscape for a Chain Solvated by a Square-Well Fluid . . . 122
6.3 Simple Dynamics Using Smoluchowski Equation . . . . . . . . . . . . . . 124
6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Appendices 129
A Heat Capacity and Compressibility 129
A.1 Heat Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
A.2 Compressibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
B Temperature sets in PT 131
B.1 In the absence of solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Bibliography 132
ix
Chapter 1
Introduction
1.1 Protein Folding Problem
Proteins naturally have a tendency to assume conformations that range from a wide range
of extended configurations at high temperatures to unique three-dimensional “folded”
structures at physiological conditions that play functional roles in an organism. Under-
standing how proteins fold has been a very challenging problem in physics, chemistry and
biology[1], and was considered by Science as one of the 100 biggest unsolved problems
that span the sciences[2]. There are two major challenges in understanding the physics
of protein folding. The first one is concerned with the prediction of three dimensional
configuration from a one dimensional sequence of amino acids, and the second one con-
cerns understanding the mechanism by which an unfolded structure ends up in a folded
state[3, 4]. Obviously, these two questions are connected and finding a complete solution
for the second problem would likely provide great insight into the connection between
the sequence and the structure. However, the first problem will be likely solved by other
methods that are mainly based on the understanding of known structures[4]. Character-
izing and understanding how the free energy of a protein depends on its configuration
(the free energy landscape) would significantly help in answering the second question and
1
Chapter 1. Introduction 2
consequently, the first one.
One of the main characteristics of proteins is their complex phase behavior, where
the main phases are: the denatured coil state with a random structure, a relatively
structured compact globule state, and the native state[5]. The compact globule state
plays the role of an intermediate between the completely unfolded denatured state and
the native folded structure[6].
1.2 Main Viewpoints
Historically, there are two opposing views of the folding mechanism. The first one suggests
that for a specific sequence of amino acids the native structure is the most stable state,
which means that the free energy associated to the specific three dimensional structure
of the protein (protein configuration) is the lowest one among all possible configurations.
However, the other viewpoint maintains that the native structure is the structure that is
kinetically most accessible[3].
Nobel Prize Laureate Christian Anfinsen, a pioneer of the first viewpoint, proposed
a postulate that has been called the “second translation of the primary structure”[7].
He proposed that all the required information for folding into a functional and three
dimensional structure exists in the sequence of amino acids[8]. It was shown by Anfinsen
et al. that the unique secondary and tertiary structure of fully reduced ribonuclease
is thermodynamically the most stable configuration[9]. Later, based on the results of
his numerous denaturation-renaturation experiments, Anfinsen proposed one of the most
famous postulates in molecular biology, known as the “thermodynamic hypothesis” (or
Anfinsen’s dogma): The native configuration of a protein is associated with the global
minimum of the Gibbs free energy[10, 11], where the native configuration is unique,
stable and kinetically accessible. For some proteins, independent of the folding pathway,
native activity is thermodynamically reversible; thus, the native states are stable like
Chapter 1. Introduction 3
crystals and do not exhibit the characteristic long-lived metastable states of glasses where
the structure depends on the preparation history.[12]. A recent study on 1018 known
protein folds provided strong evidence in support of Anfinsen’s idea that the native
folded structure of protein domains is dictated by the information encoded in the amino
acid sequence[13].
In contrast to Anfinsen, Levinthal argued that the time required to explore all confor-
mations of an average protein would take thousands of years and therefore, there is not
sufficient time for the protein to find the global free energy minimum. Hence, kinetic path-
ways that are readily accessible must determine the conformation of a protein[14, 15, 16].
In this viewpoint the native structure is kinetically trapped in a low energy, long-lived
configuration. In his words, this idea is explained as ‘if the final folded state turned out
to be the one of lowest configurational energy, it would be a consequence of biological
evolution and not of physical chemistry”[15], and he proposed that the folding happens
following specific pathways[15]. The main argument of Levinthal, known as “Levinthal’s
paradox”, is about the length of time required for a protein to find its global free energy
minimum from a very large number of energetically accessible conformations. For ex-
ample, if we assume that there are only two possible conformations for each amino acid,
a protein with only 100 amino acids has on the order of 2100 ≈ 1030 different possible
conformations[14]. Therefore, the folding process is likened to searching for a needle in
a haystack or hitting the hole in a large flat golf course by rolling a ball randomly [12].
Because of the folding time problem introduced by Levinthal, the second major question
of protein folding introduced in section 1.1 has been categorized by Dill et al. to two
major questions of the folding code and the folding speed[17]. The first one deals with
the thermodynamic aspect of how the interatomic forces acting on the protein amino
acids form a native structure, and the second one concerns the kinetic aspect of how a
protein evolves quickly into a functional structure.
Some researchers have tried to connect these two viewpoints by identifying the fea-
Chapter 1. Introduction 4
tures for rapid folding to the global free energy minimum. Using a lattice Monte Carlo
model, Shakhnovich and co-workers suggested that a sufficiently large energy difference
between the native configuration and others structures is a sufficient and necessary con-
dition for fast folding. If the energy of the native state is much lower than that of all
other metastable configurations, the protein folds rapidly and avoids becoming kineti-
cally trapped in locally stable configurations for long periods of time.[18]. However the
use of lattice models to study qualitative aspects of protein folding has been criticized on
the grounds that the configurational space in lattice models is relatively small compared
to real off-lattice proteins. The restricted number of configurations makes them a poor
model to resolve “Levinthal’s paradox”, which concerns the difficulty of finding a folded
state from a large number of conformations[16].
While some experiments suggest that small proteins follow the thermodynamic hy-
pothesis [16, 19], there are proteins for which the native structure is not the thermody-
namically most stable configuration[16]. For some amino acid sequences it is predicted
that naturally-occurring proteins either fold to a configuration that is not a global mini-
mum of the free energy or are part of a subset of amino acid sequences that can fold to
the global free energy minimum in a reasonable time[14]. For example, the protein mis-
folding happening in many diseases such as Alzheimers and Creutzfeldt-Jakob has been
attributed to folding to an alternate lower energy state that acts as a kinetic trap[8, 16].
Several experimental results indicate the existence of specific pathways in which a
given protein folds[20], supporting Levinthal’s claim[15]. Based on these observations,
there are intermediates even for single-domain proteins that accumulate during the first
few milliseconds of folding, and even for proteins that seem to fold in a two-state reaction,
specific pathways have been characterized by protein intermediates[20]. However, there
is no comprehensive theoretical explanation for this behavior[3].
Chapter 1. Introduction 5
1.3 Energy Landscape
Generally, the term energy landscape refers to the characteristic shape and topology or
form of the free energy as a function of protein conformation to explain some aspects of
folding behaviors[14]. Stable or metastable configurations of the system can be unam-
biguously identified either through structural features or through basins in the potential
energy. Typically, the landscape of complex systems is covered with many local min-
ima that can lead to complicated thermodynamic and dynamical behavior[14, 21]. The
configurational space of a protein (composed of the set of values of all degrees of the
freedom of the molecular system) is a highly multi-dimensional space. Even for small
proteins, the dimensionality of this space is on the order of a thousand[22]. Within the
high dimensional space, the energy landscape (free energy as a function of protein con-
figurational space) features many local minima and entropic barriers. In addition, there
can be flat areas of nearly equal free energy in the landscape, which effectively trap the
system for long periods of time under the normal dynamics of the system and require long
periods of time for the system to reach the native structure. The free energy landscape
of many proteins is extremely rugged around the global minimum due to the relatively
close packing of the amino acids and their atoms in the native configuration[10].
Statistical mechanical modeling has helped significantly in answering Levinthal’s para-
dox by postulating that folding happens in funnel-shaped energy landscapes that allow
multiple efficient folding pathways rather than involving a single microscopic pathway[17].
Onuchic, Dill, Wolynes and co-workers proposed that a “folding funnel” is the special
characteristic of foldable proteins that directs the folding protein into the native state
without the need for a definite pathway[12, 14, 23, 24, 25, 26, 27, 28]. Based on this idea,
the landscape that pertains to the form of either energy or the free energy as a function
of protein conformation is shaped as a funnel. Protein folding is viewed as a process
in which the protein glides down the funnel shaped free-energy landscape along several
different paths towards its native structure[25, 26, 28, 29]. Thus, the low free energy
Chapter 1. Introduction 6
Figure 1.1: A rugged funnel shaped energy landscape[27]
structures of the free energy landscape are at the bottom of a broad valley and therefore,
a protein molecule in a conformation within one of the valleys can dynamically funnel to
the lowest free energy state. A rough picture of this funnel, made by Ken Dill et al., is
presented in Fig. 1.1.
As mentioned in section 1.2, the Levinthal paradox can be presented as hitting the
hole in a large flat golf course by rolling a ball randomly. However, if a golf course
has a funnel-shaped landscape, downhill everywhere towards the hole, a hole-in one can
happen every time in a reasonable time[12]. While Levinthal postulated there should be
one specific pathway for folding[15], typically there are many possible pathways down
the funnel in the single funnel landscape[30]. However, experimental results for the fast-
folding proteins “reconciled” these two ideas by illustrating that there is a statistically
predominant pathway[31].
There are several proteins that do not have a single stable structure[30]. For example,
Levinthal showed that the mutants of E-coli have two stable forms[15], or prions that
Chapter 1. Introduction 7
can occasionally misfold even in the absence of any mutation and consequently develop
neurodegeneration[32, 33]. Having more than one stable structure under specific thermo-
dynamic conditions suggests that the landscape consists of not only one funnel but rather
a multi-funnel free energy landscape[30]. Based on this picture, it has been suggested
that folding happens first by a kinetic step to select a specific funnel and then the protein
glides inside the selected funnel to reach its minimum free energy point[3].
Folding happens as a diffusion-like phenomenon on this landscape of hills and valleys[34].
As a result, the folding time is too long to be studied by almost all realistic computational
approaches[35, 22], and therefore, most of the theoretical studies have been carried using
relatively simple models such as lattice models.
1.4 Theoretical Studies of Protein Folding
The main common approaches to study protein folding are Molecular Dynamics (MD)
and Monte Carlo simulations of lattice models of biomolecular systems. As discussed
in the next chapter, standard molecular dynamics simulations are based on integration
schemes of Newtonian equations of motion that use the sequential application of coordi-
nate updates over brief time intervals that are typically on the order of a femtosecond.
Discontinuous Molecular Dynamics (DMD) is another common approach to study protein
folding. In the DMD approach, potentials are modeled as a series of discontinuous steps
between constant values of the interaction energy. Except for specific points where the
energy is discontinuous, the system is force free and the evolution equations can be solved
exactly. The effect of the discontinuities is to exert impulses that lead to discrete jumps in
the momenta of the system. Under many conditions, the system can be simulated quite
efficiently and propagated to longer times at a given computational load[36, 37, 35, 38].
In typical lattice models of biopolymers, amino acids are represented as single beads
that can only occupy sites on the underlying lattice. Typically this lattice is cubic with
Chapter 1. Introduction 8
several different types of beads. The interaction potential energy is defined between
neighboring beads occupying adjacent lattice sites, depending on the beads type. One
of the most popular lattice models is the HP model in which only two kinds of beads
are introduced to model hydrophobic (H) and polar (P) amino acids. Monte Carlo
simulation on lattice models typically is even simpler than discontinuous potential models
since the restriction of positioning beads on lattice sites implies they have a relatively
small configurational space that can be efficiently explored at low computational cost[35].
However, because of the limited possibility of angles between bonds, lattice models fail
to adequately describe the geometrical properties of proteins[35] and the entropy range
of configurations is greatly reduced. Lattice models can not adequately address the
“Levinthal’s paradox” since unlike a real protein, almost all conformations are kinetically
accessible in lattice models[16]. To overcome these obstacles, models amenable to DMD
simulation in which simulations can reach realistic time scales and which mimic the basic
thermodynamics properties of proteins, have become popular[5, 35].
1.4.1 Using DMD to Study Protein Folding
DMD of biomolecular systems is based on simple models in which detailed, smoothly
varying interaction potentials between constituents of the system are replaced with dis-
continuous potentials of stepped form. Such models can be designed to capture the
qualitative behavior of proteins at low computational cost. The potentials typically
must consist of flat areas with different potential values. For instance, attraction and
repulsion can be defined as step and shoulder potentials respectively.
These stepped forms also make it possible to use the potentials as natural index
functions for use in classifying and comparing different protein configurations. As we
will show, such a classification scheme has the added benefit of temperature-independent
relative configurational entropies in the absence of a solvent. This is quite different
from models for which MD may be used. There, the classification of structures is less
Chapter 1. Introduction 9
natural since structures are identified based on arbitrary critical distances which are not
clearly distinguished in the interactive potentials. Beside this, the relative configurational
entropies are (at least somewhat) temperature dependent.
The collapsed phase of a protein-like chain can be studied using simple DMD dynam-
ics starting from a random unfolded structure at a very high temperature, and decreasing
the temperature in several steps during the simulation, until reaching a specific temper-
ature below the room temperature. While the configuration obtained at the end of this
annealing process is in a collapsed phase, it is not guaranteed that this is the native
structure of the protein-like chain, since it can be a metastable local energy minimum.
If the free energy landscape is rugged, the latter would actually be more likely. In order
to distinguish between compact configurations and the “folded” native structure at the
global minimum of the free energy, it is therefore necessary to study the free energy
landscape.
As will be discussed clearly in Chapter 2, the investigation of the free energy landscape
can be done using a Hybrid Monte Carlo (HMC) method. Usually, HMC is implemented
as a combination of the Monte Carlo and MD methods. Here, we combine a Monte Carlo
procedure with a dynamical updating scheme based on DMD.
1.5 Role of Solvent
There is no consensus on how significant the role of the solvent is for protein folding,
or what interaction or set of interactions play the main role[39]. Some researchers
have proposed that folding is a balance between entropy versus enthalpy-dominated
hydration[40, 41], while there have been experiments that have shown that a protein can
fold into its native configuration with apparently negligible solvent ordering effects[39].
However, in nature and therefore in almost all experimental studies, folding occurs
in the presence of a fluid environment (in vitro or in vivo). Some experimental studies
Chapter 1. Introduction 10
suggest that “a significant portion of the fold-dictating information is encoded by the
atomic interaction network in the solvent-unexposed core of protein domains”[13].
To obtain a useful description of the energy landscape of a folding protein, the free
energy should be averaged over solvent coordinates, where the energy landscape becomes
only a function of the protein atoms coordinates[28, 14].
1.6 Thesis Outline
In Chapter 2, some of the simulation techniques used to explore the free energy landscape
of protein-like systems such as DMD (Sec. 2.2), the Parallel Tempering (PT) method
(Sec. 2.3) will be introduced. It will be shown that object oriented programming as well
as parallel programming can be implemented easily for the PT method.
In Chapter 3, studies of the energy landscape of a protein-like chain in the absence
of any fluid will be presented. To capture the basic behavior of proteins in a reasonable
computational time, in these models, discontinuous potentials are used for the potentials,
where attraction and repulsion are defined as step and shoulder potentials respectively.
It will be shown that using a family of such simple protein models, each consisting of a
periodic sequence of four different kinds of bead, these protein-like chains are found to
exhibit a secondary alpha helix structure in their folded states. It will be shown that in
these cases the relative configurational entropies of the protein-like chains are independent
of temperature, which makes it possible to compute the relative configurational entropies
and the free energies of the configurations very accurately. Relative configurational free
energies at different temperatures can be determined from relative populations at those
temperatures. The free energy results can be interpreted in terms of the free energy
landscape picture. For example, if at a specific temperature the population of the most
common structure is around 99% of the total population, this is a sign of a deep free
energy valley in the landscape belonging to that particular structure at that temperature.
Chapter 1. Introduction 11
Such understanding of the free energy landscape is the main objective of this work.
In Chapter 4, the free energy landscape of a protein-like chain in the presence of a
square-well fluid is computed and contrasted to the unsolvated system. All interactions
in the system are defined in terms of discontinuous potentials. Similar to the previous
chapter, the investigation of the free energy landscape is done using a Hybrid Monte Carlo
(HMC) method, where HMC is implemented as a combination of the Monte Carlo and the
DMD method. The Parallel Tempering (PT) method [42, 43, 44] is used for the Monte
Carlo part to avoid getting trapped in local free energy minima and to increase the speed
of phase space exploration[45]. The parallel tempering method allows configurations to
be generated with weights given by the canonical ensemble over a range of temperatures.
It will be discussed that the relative configurational entropies in the presence of the fluid
particles can be temperature dependent. The phase of the used solvent will be studied
and compared to previous studies. It will be shown that existence of a phase transition
can have a huge impact on the efficiency and usefulness of the PT sampling approach.
In Chapter 5, we review a simple model of the dynamics of a protein-like chain stud-
ied in the previous chapters, Model B, in the presence of an implicit solvent. The model
assumes that the implicit solvent interacts on a fast time scale with the chain beads
compared to the time scale for structural rearrangements of the chain, and the implicit
solvent provides sufficient friction so that the motion of all beads in the protein-like
chain is governed by the Smoluchowski equation. It will be shown through simulation
of a stochastic model of the evolution of the system that the dynamics of transitions
between microstates of the chain is well described by the first-passage time solution of
the Smoluchowski equation. The individual rates between microstates are incorporated
into a Markovian model of the relaxation of the chain. Using this model, the equili-
bration process from an ensemble of initially extended configurations to mainly folded
configurations is investigated at low effective temperatures for a number of different chain
lengths.
Chapter 1. Introduction 12
Finally, in Chapter 6, the conclusion will be given based on the results of the previous
chapters.
Chapter 2
Simulation Techniques
2.1 Different Simulation Techniques
The content of this section is a brief introduction to different simulation techniques
that are used in protein folding studies. These methods are discussed extensively in
refs. [36, 46, 47, 48].
Monte Carlo (MC) and Molecular Dynamics (MD) are the two most common ap-
proaches for numerical studies of many-particle systems. The objective of these numer-
ical methods is to simulate a system applying some approximations and observe some
properties for computing the quantities that are infeasible to calculate analytically.
2.1.1 Monte Carlo Method
Monte Carlo methods refer to a class of computational algorithms that use numerous
random sampling methods to compute a specific property. For example, Monte Carlo
sampling can be used to compute a simple integral:
I =
∫ b
a
f(x)dx = |b− a|〈f(x)〉 ≈ |b− a| 1N
N∑i=1
f(xi), (2.1)
13
Chapter 2. Simulation Techniques 14
where N is the number of random samples and xi values uniformly drawn from the interval
[a, b]. As N →∞ the two sides of Eq. 2.1 become equal.
In many problems of interest in statistical physics, where the systems have many
degrees of freedom, Monte Carlo methods can be applied to compute high dimensional
integrals that correspond to ensemble averages. For the canonical ensemble, where the
number of particles and temperature are fixed, the equilibrium average of an observable
F can be expressed in terms of configuration space integrals as:
〈F 〉T =1
Z
∫F (rN) exp[−U(rN)/kbT ] drN , (2.2)
where U(rN), r, N and T are the potential energy, coordinate, number of particles and
temperature respectively, and Z is
Z =
∫exp[−U(rN)/kbT ]drN . (2.3)
Here, the most efficient Monte Carlo sampling can be applied if each sample is chosen
according to the probability weight of
w(rN) = exp[−U(rN)/kbT ], (2.4)
which is the Boltzmann factor. Eq. 2.2 can be re-expressed as:
〈F 〉T =1
Z
∫F (rN) exp[−U(rN)/kbT ]
w(rN)w(rN) drN , (2.5)
and consequently:
〈F 〉T =1
Z
∫F (rN)w(rN) drN . (2.6)
If a number of points, m, are randomly generated in configuration space according to
weight function Eq. 2.4, Eq. 2.6 can be written as:
〈F 〉T ≈ 1
m
m∑i=1
F (rNi ), (2.7)
where by increasing m, the right side of Eq. 2.7 becomes a better approximation of 〈F 〉T .
Chapter 2. Simulation Techniques 15
Markov chain Monte Carlo (MCMC)
By assuming that the Hamiltonian of the system can be written as H = ΣNi=1p
2i /2mi +
U(rN), by separating the kinetic energy from the Hamiltonian, the probability density
can be factored into a density for the spatial degrees of freedom and a Maxwell-Boltzmann
density for the momenta. The ensemble average of a property F (rN) that depends only
on the spatial degrees of freedom can be described by Eq. 2.2.
Considering Eqs. 2.6 and 2.7, a Monte Carlo procedure can be implemented using a
random walk in such a way that the visiting of a particular point rN happens with the
probability density proportional to the Boltzmann factor (exp[−βU(rN)]). Therefore, the
main task is to generate a sequence of configurations rN1 , rN
2 , ..., rNm in which the probabil-
ity of finding a configuration rNi is exp[−U(rN
i )/kbT ]drNi when m → ∞. This sequence
can be generated using stochastic methods in many different ways. The following three
steps scheme was introduced by Metropolis et al.[49]:
1. Select a random configuration and compute its energy, U(rN)
2. Then, slightly move the location of the system in the configurational space from rN
to r′N , and calculate the new energy, U(r′N).
3. The move (rN → r′N) is accepted with probability of min (1, exp(−β[U(r′N) −U(rN)])).
It can be shown that every point in configuration space can be reached from any other
state in a finite number of MC steps[47]. Later in this chapter, the Parallel Tempering
Method, which is one of the common MCMC methods, will be introduced.
2.1.2 Molecular Dynamics (MD)
Molecular dynamics is a very common approach for studying the properties of classi-
cal many body systems. In this approach, the equations of motions for each particle are
Chapter 2. Simulation Techniques 16
solved numerically using short time steps to construct approximate solutions of the equa-
tions of motion. For most classical systems, the particle positions are updated according
to the Newtonian equations of of motion:
~ri =~Pi
m, ~Pi = ~Fi = −∇ri
∑
j 6=i
U(rij). (2.8)
This means that the position of a specific particle at time t + ∆t can be understood
using the Taylor expansion:
x(t + ∆t) = x(t) + x(t)∆t +1
2x(t)∆t2 + ... = x(t) + vx(t)∆t + O(∆t2), (2.9)
where x is one of the three dimensions of the position vector (~r), and O(∆t2) is the local
truncation error. Therefore:
x(t + ∆t) ≈ x(t) + vx(t)∆t, (2.10)
which is known as the “Euler scheme”. To calculate vx(t) in Eq. 2.10, the same technique
can be applied in which
vx(t + ∆t) ≈ vx(t)− 1
m
∂U(t)
∂x(t)∆t. (2.11)
The simulation proceeds iteratively by calculating the accelerations and displacing
the particles according to Eqs. 2.10 and 2.11. This process of calculating of velocities
and moving the particles according to Eq. 2.10 is continued until meeting the target time.
It should be mentioned that there are other integration schemes such as the leapfrog and
Verlet algorithms that can be applied instead of Eqs. 2.10 and 2.11, which have smaller
global error considering the whole run.
To derive the Verlet method, four terms of Taylor expansion of Eq. 2.9 should be
kept, where x(t + ∆t) will be:
x(t + ∆t) = x(t) + vx(t)∆t +1
2ax(t)∆t2 +
1
6bx(t)∆t3 + O(∆t4), (2.12)
Chapter 2. Simulation Techniques 17
where ax(t) and bx(t) are the acceleration and jerk (the third derivative of x respect to
t) at time t. Applying Taylor expansion in different direction, x(t−∆t) can be derived
as:
x(t−∆t) = x(t)− vx(t)∆t +1
2ax(t)∆t2 − 1
6bx(t)∆t3 + O(∆t4). (2.13)
By adding these two expansions (Eqs.2.12,2.13) the Verlet formula is derived as:
x(t + ∆t) = 2x(t)− x(t−∆t) + ax(t)∆t2 + O(∆t4). (2.14)
Because of the cancelation of first and third order terms of Taylor expansion, the Verlet
integrator becomes more accurate than using simple Taylor expansion.
According to the ergodic theorem, the volume of the phase space covered by a dynam-
ical trajectory is proportional to the time of its evolution. Consequently, the time average
of a dynamical variable along a long trajectory converges to the uniform average of the
dynamical variable over a constant energy surface of phase space as the length of the
trajectory tends to infinity. The average over the constant energy hypersurface of phase
space is known as the microcanonical ensemble average. While the ergodic hypothesis is
applicable for microcanonical ensemble averages, averages in the canonical ensemble can
be shown to be equivalent to microcanonical ensemble averages over an extended phase
space in which auxiliary variables are introduced that act as a thermostat to fix the tem-
perature. Furthermore, it can be shown that the average of most dynamical variables in
the canonical ensemble differ from their microcanonical ensemble counterparts by terms
that differ by at most order N−1, which become insignificant for large systems.
One consequence of the typical form of the Hamiltonian is that the dynamics conserves
specific variables, such as the total energy of the system, H =∑
i p2i /2m+U(Ri). Energy
drift tends to happen in typical MD algorithms due to the use of a finite time step
and numerical round-off error. This means that when MD is applied for large time
scales or when MD is used for sampling phase space, the stability of the run should
be monitored to ensure that the energy of the system is stable and the trajectory is
Chapter 2. Simulation Techniques 18
reasonable. However, for the Verlet algorithm (Eq. 2.14), it can be shown that while the
energy fluctuates between steps, there is no energy drift, and the energy oscillates around
a constant value which is the solution of a shadow Hamiltonian (slightly different from
the real Hamiltonian)[50, 51]. However, similar to other MD methods, for large time
increments, the Verlet algorithm does not generate accurate trajectories and can become
unstable.
While the computational cost increases significantly as the time step is reduced, it
is necessary to choose a very small time step (on order of a femtosecond) to have a
sufficiently accurate algorithm. The most efficient MD algorithm is the one that allows
the largest possible time step for a specific level of accuracy while maintaining stability
and preserving conservation laws.
The interaction potentials between particles in the system can be either detailed
models such as Lennard-Jones or coarse-grained models such as square-well potentials.
It is clear that while using detailed and accurate potentials makes the dynamics more
realistic, it increases the cost of simulation significantly. When the potentials are dis-
continuous, another approach, known as discontinuous molecular dynamics (DMD), can
be applied which has a lower computational cost than a trajectory of equivalent length
carried out using standard MD methods. In DMD the dynamics is essentially exact and
the conservation of energy does not depend on a chosen time step.
2.1.3 Hybrid Monte Carlo
Traditional MC simulation methods may suffer from strong correlation between states
that are generated in two consecutive MC steps, which means that there are only small
differences between the two states[47]. Consequently, applying MC alone can lead to a
slow rate of convergence of estimated ensemble averages, particularly for systems with
an underlying energy landscape that is rough and pitted with many local minima and
saddle points. The simulation methods that are a combination of both Monte Carlo
Chapter 2. Simulation Techniques 19
sampling and molecular dynamics are called Hybrid Monte Carlo (HMC), where a dy-
namical procedure is used to change a set of coordinates for use as a trial state in a MC
procedure[52].
The HMC algorithm consists of selecting a set of momenta conjugate to each spatial
coordinate based on the Maxwell-Boltzmann density, propagating the dynamical system
according to some effective Hamiltonian, and then applying an acceptance test to deter-
mine if the trial state obtained at the end of the propagation is acceptable or not. The
dynamics is time-reversible and preserves the volume of the phase space relevant for the
dynamics. A trajectory can be accepted with a probability proportional to exp(−β∆H),
where ∆H = Hfinal −Hinitial is the change in the total Hamiltonian between the initial
and final states. If the energy of the system is conserved exactly, as in the case of DMD
trajectories discussed in the next section, all trajectories are accepted.
In this project HMC is applied for the sampling of the energy landscape of protein-like
chain in which the Monte Carlo sampling is done using Parallel Tempering (PT) and the
generation of trial configurations is carried out by Discontinuous Molecular Dynamics
(DMD). Both the DMD and the PT method will be discussed extensively in the next
sections.
2.2 Discontinuous Molecular Dynamics (DMD)
The content of this section is a brief introduction to the DMD simulation method based
on the extensive discussions provided in ref. [36].
DMD is a version of MD in which the potentials are discontinuous and the system
evolves event by event rather than by iteration of a fixed time step. The discontinuous
potentials make the dynamics force free and consequently Newton’s equations are exactly
solvable. Since in the DMD method the studied system evolves event by event instead of
by sequential propagation of the system over discrete time intervals, the method is also
Chapter 2. Simulation Techniques 20
called an “event driven” method. There are various advantages to using DMD over MD,
especially for runs that simulate relatively long trajectories. DMD not only offers faster
computational speed, but stability as well, since the total energy in the simulation is
always conserved. A very simple scheme of the DMD simulation consists of the iteration
of predicting the first collision event and evolving the system up to the collision time.
Since one of the main goals of using DMD is to decrease the simulation cost, a
number of techniques are associated with this method to obtain an optimal computational
performance.
2.2.1 Event Tree
In event driven dynamics, collisions between particles and other events must be executed
in their proper chronological order. An important component of such a simulation is the
storage and ordering of events to simplify the search for the next event to be executed.
A number of algorithms developed for database systems, such as binary trees, are useful
to optimize performance in event driven simulations.
A binary tree is used to store and sort the different events such as collisions and
measurements, where the event time is an ordering “key”. A few functions are responsible
to insert new predicted events in the tree and search for the earliest event. The insertion
of a new event scales as log N with a small prefactor, where N is the number of events in
the tree [36]. Each node of the tree contains a number of essential pieces of information
such as the event participants, the event time and the type of event. Some of this
information can be used to understand whether an event in the tree has been invalidated
by the occurrence of an earlier event. It is easy to add or delete events because of the
tree structure. Since some of the previously predicted events become irrelevant after each
collision, for each particle several other future collisions are predicted and stored in the
tree. After executing each event, the new events are scheduled for the participant(s) of
the event. Unlike molecular dynamics, a particle position is only updated when an event
Chapter 2. Simulation Techniques 21
Figure 2.1: Cell partitioning for a chain system
associated with that particle is executed. This means that except for the measurement
events where a full update of the system occurs, each particle has a record or a local clock
that records the time when the last collision or cell crossing of that particle happened.
2.2.2 Cell Crossing Event
Another technique used to increase the efficiency of the simulation is to divide the sim-
ulation box into cells. By using cubic cells to partition the system, the search for future
collision partners for a given particle is restricted to the particles occupying the same and
neighboring cells instead of all the particles in the system. Whenever a particle moves
out of a cell, the new neighboring cells must be checked for possible collision events.
This means that in addition to the collision prediction event, the time of cell crossing
of every particle should be computed and stored in the event tree. The chosen cell size
should be larger than any critical interaction distance to ensure that only by checking
the neighboring cells of a particle, its next events are predicted. A very basic scheme of
cell partitioning for a chain consists of several beads is presented in Fig. 2.1.
2.2.3 Collision Event
Predicting the collisions between particles plays an important role in the DMD simulation.
The prediction of new events is done by checking the distances and the relative velocities
of all pairs of particles in adjacent cells. The next interaction time for each pair of particles
Chapter 2. Simulation Techniques 22
must be determined for storage in the binary tree. The interaction times are determined
by solving for the time at which the distance between the pair reaches a critical value.
At the collision |~r + ~vτ | = λ , where ~r = ~ri − ~rj is the initial relative vector of particles i
and j and ~v = ~vi − ~vj is the initial relative velocities of these two particles and λ is the
critical distance between the pair, where the potential has a discontinuity. Based on this:
τ =−b + α
√b2 − v2(r2 − λ2)
v2(2.15)
where b = ~r.~v and α can be either 1 or -1.
The collisions can be identified by categorizing the collision events into two main
categories. The first category consists of hard sphere collisions in which the potential
energy does not change, while the second category consists of events in which the potential
energy of the system changes and the total kinetic energy of the colliding particles will
change. For the hardcore collision, α = −1, since the smaller positive solution of Eq. 2.15
presents the first time that the separation between two particles become equal to λ. To
check whether two particles will collide, Eq. 2.15 should have a real positive solution,
which requires b < 0 and (b2 − v2(r2 − λ2)) ≥ 0. Applying the laws of conservation of
energy and momentum, the velocities of interacting particles i and j should change after
a collision by:
∆~vi = − b (2mj)
(mi + mj) λ2~r (2.16)
∆~vj =b (2mi)
(mi + mj) λ2~r (2.17)
where mi and mj are the masses of particle i and j respectively. It is clear that when
mi = mj, ∆~vi = −∆~vj = − bλ2 ~r.
For entering and exiting a potential well or shoulder, α in Eq. 2.15 should be −1 and
1 respectively. However, the feasibility of exiting a well (or entering a shoulder potential)
Chapter 2. Simulation Techniques 23
depends on the kinetic energy of the particles. Since there is an effect of gaining or losing
energy, the final velocities after the collision in the case of entering and exiting a well (or
shoulder) are:
∆~vi =−(bµ) + α
√(bµ)2 − 2r2µ ∆u)
r2mi
~r (2.18)
where ∆u is the potential depth of the well (or potential height of the shoulder), µ is the
reduced mass (mi mj
(mi+mj)), and the value of α for entering and exiting the well is -1 and 1
respectively. In the case that mi = mj, Eq. 2.18 simplifies to:
∆~vi = −∆~vj =−b + α
√b2 − 4r2(∆u/m)
2r2~r (2.19)
2.2.4 Measurement Events
Beside the cell crossing and collisions, there are a few measurement events that occur
at specific times or after specific number of events. As mentioned earlier, unlike MD,
during the DMD simulation, each particle has its own clock recording the last time its
coordinates were updated. To record an instantaneous configuration of the system, the
coordinates of all particles in the system must be updated. Such a procedure is signalled
in the event tree by a measurement event.
Observables
To track changes in the protein-like chain, properties such as the end-to-end vector (Re),
radius of gyration, potential and kinetic energies and principle moments of inertia are
measured. The most important thing that must be recorded during the simulation is the
chain configuration, which is represented by its bonds.
The radius of gyration, Rg, the root mean square distance of the beads of the protein-
like chain from the chain center of mass, is defined as
Chapter 2. Simulation Techniques 24
Rg =
√√√√ 1
N
N∑
k=1
(rk − r)2, (2.20)
where N is the number of beads in the protein-like chain and r is the chain center of
mass. Beside this, since the main objective of this study is to investigate the populations
of different structures, the matrix representing the configuration structure is stored in a
measurement event. Although end-to-end distances can be used to distinguish compact
configurations from stretched or partially collapsed conformations, distances between
only a pair of beads are not specific enough indicators to identify configurations un-
ambiguously. The radius of gyration, Rg, is a better indicator for distinguishing the
protein-like phases and it can indicate the folded structure in a more precise way than
the end-to-end distance.
Potential and kinetic energies are also measured frequently during the simulation.
Checking the conservation of energy is a useful tool to confirm that no collision has been
missed or that an event has been executed out of order. This capability of verifying the
accuracy of a trajectory is one of the main advantages of DMD over MD. The potential
energy of the system can be measured either by checking the separation distances of all the
particles or by adding or subtracting the potential depth of a well whenever any particle
enters or exits. Since the first approach is computationally much more demanding, it is
used less frequently just to ensure that no collision is missed. The potential energy is the
main parameter for calculating the probability of exchange in the PT method and it is one
of the main tools in understanding the shape of the energy landscape. In chapter 4, the
heat capacity (Cv) and the compressibility factor (κT ) are calculated using the variation
in energy and in the number of particles respectively to examine the thermodynamic
properties of the system and look for changes in phase of solvated systems. The details
of the relations between Cv and the energy variation and κT and the number of particles
variation will be discussed in Appendix A.
Chapter 2. Simulation Techniques 25
Finally, the principal moments of inertia are measured (Chapter 3) to study qualita-
tively the shape of structures.
2.3 Parallel Tempering Method
2.3.1 Parallel tempering
In Markov Chain Monte Carlo (MCMC), a collection of configurations is gathered during
the sampling, i.e., X=(S1, S2, ..., Sn), where each configuration appears in the chain
of states with a known weight. States in the Markov chain can be strongly correlated,
leading to a slow rate of convergence of some estimates to their true asymptotic averages.
Unlike a conventional MCMC, which uses one Markov chain for sampling, the parallel
tempering (PT) method can be viewed as using multiple coupled chains for sampling
and studying ensemble averages. Applying several Markov chains can help in collecting a
set of configurations in which two consecutive configurations are not strongly correlated.
This method was initially introduced by Swendsen and Wang [42] and formulated by
Geyer [43], and then it was subsequently implemented, developed and applied in physics
as the PT method by Tesi, van Rensburg, Orlandini and Whittington [44]. This Monte
Carlo sampling method can increase the mobility of exploring phase space, particularly
at low temperatures, and it converges to the desired distribution more rapidly, especially
for systems with rugged potential energy landscapes. Therefore, this method has been
applied to study protein folding [10].
Considering βi = 1/kbTi, where kb and Ti are the Boltzmann constant and the system
temperature respectively, the main idea of PT is to select a set of values β0 < β1 <
β2 < · · · < βn within a chosen range of temperatures [β0, βn], such that there is a
significant amount of overlap in the distributions of two adjacent β values [44]. Note
that Tn = 1/(kbβn) is the lowest temperature, while Tmax = T0 = 1/(kbβ0) is the highest
one and all the replicas of the system are running at different temperatures inside this
Chapter 2. Simulation Techniques 26
range. In the canonical ensemble, the probability of state j (Sj) at inverse temperature
βi is
P(Sj ,βi) =exp(−βiUj)
Zi
(2.21)
where P(Sjβi) is the probability of state j (Sj) at βi and Uj is the potential energy of state
j and Zi is the normalization factor. To generate a canonical distribution, replicas at
adjacent temperatures exchange their configurations (or their temperatures) at specific
times or number of events with the probability of
p = min
(1 ,
P(Si+1,βi)P(Si,βi+1)
P(Si,βi)P(Si+1,βi+1)
)= (2.22)
min (1 , exp(−(βi+1 − βi)(Ui − Ui+1))) = min (1 , exp(∆β∆U)), (2.23)
where ∆U = Ui+1−Ui and ∆β = βi+1−βi. The generated structures obey the canonical
ensemble probability.
By using the PT sampling, the high temperature replicas explore large volumes of
the phase space, while the low temperature ones explore local low-energy regions of the
phase space. Note that although running n replicas means an order of n times increase
in computational effort, it makes the MCMC convergence more than n times faster[45].
2.3.2 Efficient Parallel Tempering Dynamics
To achieve an efficient sampling, all the temperatures should be assigned to each replica
in a reasonable amount of simulated time. To achieve this purpose, each replica should
travel easily between any pair of temperatures and spend a comparable amount of time
at different temperatures. Thus, swapping should occur frequently and the probability
distributions of βi and βi+1 should have a reasonable overlap [44], which requires choosing
a proper ∆β = βi+1 − βi for different parts of the temperature spectrum. It is clear
that ∆β should be sufficiently small and therefore, depending on the range of studied
Chapter 2. Simulation Techniques 27
temperatures, the number of replicas can be rather large [44]. However, since increasing
the number of replicas increases the computation cost, an optimum value for ∆β should
be found to avoid using too many replicas.
In addition to choosing a proper ∆β, the temperature range βn−β0 should be chosen
carefully. For example, in this project, the very rough landscape of the protein-like chain
have deep metastable minima. The range of temperatures must be sufficiently large to
enable an escape from these minima [53]. Consequently, the highest temperature must be
high enough to avoid the trapping of replicas in local energy minima [45]. It is expected
that the most common structure at the highest temperature T0 = 1/(kBβ0) be a non-
bonded structure. It is predicted that in most of the cases the most common structure
at the lowest temperature Tn = 1/(kBβn) is the configuration with the lowest energy in
the system.
One of the challenges of using the PT method is to estimate the right number of
replicas, and the temperature difference between adjacent replicas, where ∆β can depend
on the temperature. In most of the cases, ∆β varies in the temperature spectrum, where
at lower temperatures typically ∆β should be smaller.
To study the efficiency of the PT method in a specific run, a period is defined as an
amount of time that is needed for one replica to travel back to its initial temperature after
several parallel tempering exchange events. Typically when the PT system contains a
large number of replicas, it is harder to have good dynamics, and typically some of
the replicas may not move very well between all the temperatures during one period
(i.e. due to possible entropic barriers). This can lead to a prohibitively inefficient PT
dynamics. Sampling is efficient if the period is small and if in one period many (if not
all) temperatures are visited.
As will be discussed in chapter 4, another thing that should be considered is the
difficulty of applying the PT method to the systems that undergo a phase transition in
the range of studied temperatures. While finite size effects limit the scale of fluctuations
Chapter 2. Simulation Techniques 28
near phase boundaries in small systems, it is still possible to monitor for phase transitions
by looking at derivatives of the free energy of the system, such as the heat capacity and
the compressibility.
The time interval between two consecutive PT exchange events can vary for different
systems. This means that an optimum value for the time step should be found, where
this time should be large enough that the system can locally equilibrate yet small enough
to avoid any unnecessary computation. Considering a fixed computational cost (fixed
number of CPU hours), finding this optimum time value for the dynamics step is one of
the most challenging tasks to optimize the PT algorithm. It is clear that for a specific
system, this value depends on the characteristics of the system, such as the number of
particles as well as the range of temperatures under investigation.
2.4 Simulation Structure of the Project
Unlike typical HMC where the dynamics between Monte Carlo steps is conducted using a
reversible, symplectic integration scheme, here the dynamics is conducted using the DMD
method and the Monte Carlo sampling is done using the PT method. For the Monte Carlo
part, the Parallel Tempering (PT) method can be employed to avoid getting trapped in
local free energy minima and to increase the speed at which the phase space of the
system can be explored. As was mentioned in Chapter 1, in this work all the interactive
potentials are discontinuous. The simulated system consists of several protein-like chains
exploring the configurational space individually by the DMD method. This process of
exploration can occur in the absence or presence of a solvent environment, which will be
discussed in Chapter 3 and Chapter 4 respectively.
The initial velocities are drawn from the Maxwell-Boltzmann distribution based on
the assigned temperatures. At specific times, the protein-like chain systems exchange
their temperatures according to the probability that derives from the Parallel Tempering
Chapter 2. Simulation Techniques 29
(PT) method. Each system containing the protein-like chain and its surrounding envi-
ronment is called a replica and the process of exchanging the temperatures is called the
replica exchange. After allowing replicas to propagate using DMD for a fixed amount of
time, some of the replicas exchange their temperatures. Then the velocities are drawn for
all the replicas from the Maxwell-Boltzmann distribution based on the current tempera-
ture for the replica. The parallel tempering method allows configurations to be generated
according to the canonical ensemble. Since the velocities of all replicas are being updated
periodically using the Maxwell-Boltzmann distribution, and DMD similar to MD is sym-
plectic which means that the mapping is volume-preserving and the dynamics are time
reversible, and the PT sampling is done according to the canonical ensemble probability,
all necessary conditions for generating a state with canonical density are satisfied [52].
The potential-based classification of structures can be used to find the population (i.e.
frequency of occurrence in the simulation) of each structure at a specific temperature.
Since each structure is generated by a probability proportional to e−βF , where β = 1kBT
and F is the Helmholtz free energy of that configuration, by comparing the popula-
tions of different structures, it becomes possible to calculate the entropy and free energy
difference of any pair of configurations.
2.4.1 Parallel Programming
Object-oriented program facilitated setting up a code to simulate systems with many
replicas. In chapter 4, a parallel programming, MPI (Message-Passing Interface)[54]
is used to make the program approachable in time to study the energy landscape of a
protein-like chain in the presence of thousands of fluid beads. Applying the PT method
using an objected oriented programming makes the implementation of the parallel pro-
gramming straightforward. To construct a parallel version of the code, each replica as an
object runs in one processor and at the replica exchange event, the energy values of the
replicas are sent to the main node and then each replica receives its updated temperature,
Chapter 2. Simulation Techniques 30
which can be the same as its earlier temperature. The measurement events happen after
several parallel tempering steps. The parameters measured and recorded on each replica
including the matrix representing the structure are sent from the processor to the main
node where it is stored in the RAM of the main node. At the end of simulation, all the
necessary statistics are accumulated and computed on the main node.
Chapter 3
Protein-like Chain Without a
Solvent
3.1 Model
In this chapter, a protein-like chain is studied in the absence of any solvent. Since the
main objective is to study the basic behavior of proteins in a very short computational
time, the focus is on constructing simple models of a protein-like chain which have the
ability to be folded into an alpha helix structure at sufficiently low temperature. The
model is designed to have a free energy landscape similar to that of simple protein systems
that make use of much more detailed potentials.
The protein model used here is a beads on a string model in which each bead represents
an amino acid or residue. In this model the protein-like chain consists of a repeated
sequence of four different kinds of beads. While having four different types of beads is
not enough to represent the twenty different types of amino acids, it preserves at least
some of the differences between amino acids. The interactions between these beads are
designed to mimic the interactions that lead to the formation of common motifs in protein
structure, such as the alpha helix. Previous studies indicated that short chains containing
31
Chapter 3. Protein-like Chain Without a Solvent 32
6, 8 or 12 monomers are too short to fold into compact states at low temperatures, while
somewhat longer chains with 25 monomers can capture folded helical states[40]. Here,
chains of moderate lengths of 25 to 35 beads have been used to facilitate the exploration
of the free energy landscape.
In an alpha helix, one of the most common secondary structures, each turn has 3.6
amino acids, and there is a hydrogen bond between beads i and i + 4. To capture this
feature of helices, the models analyzed here allow for attractive interactions, intended
to mimic hydrogen bonds between non-adjacent residues, between beads separated from
each other by 4n beads, where n ≥ 1, and with additional restrictions on the possible
hydrogen bonds to be specified below. Several models of protein-like chains have been
considered, but only the results for two of them are presented here. The choice was made
based on the models’ similarity to a real protein and the feasibility of being studied using
the parallel tempering (PT) method.
To make contact with real proteins, and because there are too many parameters to
form unique reduced units, physical units are used in the definition of the model, although
these should not be taken too literally: we only aim to set these to the right order of
magnitude to mimic real proteins. In particular, lengths will be expressed in Angstroms,
energies in kJ/mol and masses in atomic mass units.
The two presented models differ in the hydrogen-bond potentials, while other inter-
chain interactions are the same. In total, four different inter-chain potentials are used in
these models. The first kind of potential acts between the nearest and the next nearest
neighbors and restricts the distance between the beads to specific ranges by applying an
infinite square-well potential similar to Bellemans’ bonds model[55]. Fig. 3.1(a) shows
the shape of this kind of potential. To mimic a covalent bond between two consecutive
amino acids in the protein, the distance between two neighboring beads is restricted to the
range 3.84 A to 4.48 A. This potential allows these distances to “vibrate” around values
close to the distance between stereocenters used in Ref. [5]. The next-nearest neighbors’
Chapter 3. Protein-like Chain Without a Solvent 33
(a) (b)
(c) (d)
Figure 3.1: Model potentials: the (a) infinite square-well potential, (b) attractive step
potential, (c) repulsive shoulder potentials, and (d) hard core repulsion.
infinite square-well potentials represent an angle vibration. Restricting their distance to
a range from 5.44 A to 6.40 A generates a vibration angle between 75◦ and 112◦. For
simplicity, dihedral angles are not considered in our models, but as discussed later, some
restrictions on hydrogen bonds are employed to create rigidity in the backbone of the
protein-like chain similar to the dihedral angle interactions in more detailed potentials.
Hydrogen bonds are modeled by an attractive square-well potential, depicted in
Fig. 3.1(b). In all the studied models, including the two presented models, the attractive
Chapter 3. Protein-like Chain Without a Solvent 34
interactions are defined between beads i and i + 4n to resemble the hydrogen bonds in
alpha helix structures. However, the two main models differ in the possibility of these
attractive bonds and the values of i and n.
In the first model, named model A, the attractive interactions act between half the
same type beads such that bonds can be formed between two beads both with the same
index of i = 4k + 1 or the same index of i = 4k + 3, where k is an integer number, and
n can be any integer number such that i + 4n lies on the chain.
In the second model, model B, only the beads with index i = 4k + 2 can make bonds
with each other, and n cannot be 2 or 3. This means that there is no attractive bond
between beads separated along the chain by eight or twelve beads. Bonds between beads
i and i + 8 as well as i and i + 12 are disallowed to make the occurrence of turns more
difficult in the protein-like chain and make it more rigid. This restriction has a similar
function as dihedral angles interactions and side chains in real proteins where they prevent
a protein from bending over easily. The differences between these two models are shown
in Table 3.1. Also in Fig. 3.2 for the 25-bead version, the possible attractive bonds
for the two models are presented in which subsequent beads were labeled A through Y.
By having different numbers of possible hydrogen bonds, their properties are likely very
different.
In an alpha helix there are 3.6 amino acids in each turn, and the distance between
two consecutive amino acids is 1.5 A along the helical axis[56]. This means a translation
of 5.4 A along the helix axis in each turn. For both models A and B, the parameters for
the attractive square-well potential, σ1 and σ2, are chosen to be 4.64 A and 5.76 A with
a mid point of 5.2 A, which is close to the translation of 5.4 A along the helix. Compared
to covalent bonds, these attractive interactions act across longer distances. The unit of
energy, ε, is chosen as the depth of the potential well of the attractive interactions. The
ε is around 20kJ/mol and the mass of each bead is set to 2× 10−25 kg, which is close to
120 amu (atomic mass units).
Chapter 3. Protein-like Chain Without a Solvent 35
To represent electrostatic interactions of the atoms, repulsive interactions act between
beads 1 + 4k and 4k′, where k and k′ are integers and k 6= k′. The repulsive interaction
takes the form of a shoulder potential, shown in Fig. 3.1(c). The range of the shoulder
is set to be from 4.64 A to 7.36 A, while the height is 0.9ε. The effect of changing the
number of step repulsions in a few models was evaluated in terms of minimizing the
free energy. It turned out that changing the number of repulsions does not have a huge
impact on the shape of free energy landscape around the native structure point. Since the
repulsion between the beads increases the potential energy, the most common structures
at low temperatures do not have any repulsive interactions. Therefore, the two discussed
models differ only in their attractive potentials, while their repulsive interactions are the
same.
Finally, all other bead pairs for which no covalent bonds, hydrogen bonds or shoulder
repulsive interactions are defined feel a hard sphere repulsion to account for excluded
volume interactions at short distances, depicted in Fig. 3.1(d). The hard sphere diameter
is set to be 4.64 A, which is slightly different from the value of 4.27 A used by Zhou et
al.[5].
The reduced temperature is defined as T ∗ = (kbT )/ε, where ε is the potential depth of
the square-well attractive interactions, and β∗ is the inverse of the reduced temperature,
β∗ = 1/T ∗. Based on the units that are chosen for mass, length and time, T ∗ = 1.0
corresponds to 2400K. This means that β∗ = 8 (T ∗ = 18) should be around standard
room temperature, 300K.
It is worth pointing out that the attractive and repulsive potentials used here are
Attracting beads i n
A 4k + 1 and 4k + 3 any number
B 4k + 2 n 6= 2 and n 6= 3
Table 3.1: Allowed values of attracting bead pairs (i, i + 4n).
Chapter 3. Protein-like Chain Without a Solvent 36
(a) (b)A
B
C
D
E
F
G
H
I
J
KLMN
O
P
Q
R
S
T
U
V
W
XY A
B
C
D
E
F
G
H
I
J
KLMN
O
P
Q
R
S
T
U
V
W
XY
Figure 3.2: Possible attractive bonds of (a) model A, and (b) model B for a chain of 25
beads.
qualitatively different from those of the popular Go model[57, 58] in which attractive
interactions are only defined between beads that are in contact in the native structure.
3.1.1 Definition of configurations
One of the advantages of using discontinuous potentials is the ease of comparing config-
urations. The bonds are defined using the specific range of bead separations rij in which
the potential energy V = 12
∑ij U(rij) is equal to a specific, non-zero value. Since only
one bond can exist between each bead pair (i, j) in the current models, each configuration
or structure can be represented by a matrix of interactions in which the entry at row i
and column j is 1 if i and j are bonded and 0 otherwise. Because bonded interactions
largely determine the form of the protein, we will identify this matrix with the config-
uration of the protein-like chain. Thus, by comparing the matrices, identical structures
can be easily found.
However, to represent these matrices, a more user-friendly alphabetical notation is
Chapter 3. Protein-like Chain Without a Solvent 37
applied. Each bead is represented by an alphabetical letter and each bonded interaction
is shown by a pair of letters. The two dimensional matrix can thus be represented by a
string of alphabetical pairs. Since most of the studied cases involve 25-bead chains, A to
Y have been used to label different beads. For chains longer than 26 beads, both capital
and small letters are used.
The simulations produce a large series of structures. To extract the most common ones
(with their frequency of occurrence fobs), their interaction matrices need to be compared.
To make the comparison between the matrices faster, two indices were introduced that
are not necessarily unique for each interaction matrix but must be equal for matrices to be
identical. Instead of comparing the matrices directly, first these two indices are compared,
and the matrices are compared only when the indices are equal. This last comparison is
further optimized by storing each interaction matrix as an array of integers, one for each
row, with each bit representing one matrix entry. Chains longer than the number of bits
per integer (32 bits) require multiple integers per row.
The two indices were defined as follows. The protein-like chain consists of periodically
repeated regions of four different kinds of beads, and for the purpose of constructing the
indices, these regions are numbered. For example, the beads 1 to 4 are considered the first
region and the beads 21 to 24 are considered region 6. The interaction index is defined
such that the nth digit (from the right) of the interaction index defines the number of
attractive or repulsive interactions between the region n and the beads of the previous
regions, 1 to n− 1. For example, from the interaction index 321110 for a 25-bead chain,
one can understand that the beads 21-24 have three bonds with the beads 1 to 20, beads
17-20 have two bonds with the beads 1 to 16, beads 13-16 have one bond with beads 1
to 12, beads 9-12 have one bond with beads 1 to 8 and beads 5-8 have one bond with
the beads 1 to 4.
The second index, the attraction index, is defined such that the nth digit (from the
right) of the index represents the number of attractive bonds between beads that are
Chapter 3. Protein-like Chain Without a Solvent 38
separated from each other by n − 1 regions. For example, an attractive index of 12005
means that there are five bonds between neighboring regions (for example, beads 1 to
4 and beads 5 to 8 are in neighboring regions), and there are two bonds between beads
that are separated by three regions (i.e., separation by three regions means there are at
least 12 beads between these beads), and there is only one bond between beads that are
separated from each other by four regions (separated by at least 16 beads).
3.1.2 Temperature independence of relative configurational en-
tropies
The definition of configurations presented above was based on the presence of bonds.
Within the model, having a certain set of bonds (and no others) leads to a specific
potential energy Uc for each configuration c. As shown below, this leads to a temperature
independent relative configurational entropy.
Here, the configurational entropy of any particular configuration c is the entropy of
a sub-ensemble in which the phase points are restricted to those of configuration c. I.e.,
the full phase space of the protein-like chain can be subdivided in regions corresponding
to specific configurations. Denoting the collection of spatial degrees of freedom by R, for
each configuration c, one defines an index function
χc(R) =
1 if (only) the bonds in c are present,
0 otherwise.(3.1)
In the canonical ensemble, the probability fobs(c, T ) of observing a configuration c at
temperature T is
fobs,c = e−β(Fc−F ), (3.2)
where Fc is the free energy of configuration C, and F is the full free energy of the system.
By definition, one has
e−βFc =1
h3N
∫dR dP χc(R)e
−β
�PNi=1
|pi|22m
+V (R)
�
, (3.3)
Chapter 3. Protein-like Chain Without a Solvent 39
where N is the number of beads, m is their mass, and V is the potential energy function.
The configurational entropy is related to Fc via
Fc = Ec − TSc, (3.4)
where Ec is the average energy of configuration c at temperature T . Since its potential
energy V is always equal to Uc when χc = 1, one has
Ec = Uc +3
2NkBT. (3.5)
Combining Eqs. 3.3-3.5, one finds
Sc =3
2NkB ln
(2πme
βh2
)+ kB ln
∫dR χc(R), (3.6)
so the relative entropy of two configurations c1 and c2 at a specific temperature is
∆Sc1c2 = Sc1 − Sc2 = kB ln
∫dR χc1(R)∫dR χc2(R)
, (3.7)
which does not depend on temperature.
From Eqs. 3.5 and 3.6 it can be concluded that the free energy of a configuration is
Fc = Uc − 3
2NkBT ln
(2πm
βh2
)− kBT ln
∫dR χc(R), (3.8)
where the second term, 32NkBT ln
(2πmβh2
), is the same for all the configurations at tem-
perature T.
Because relative configurational entropies do not depend on temperature, relative
entropies can be determined from a single run at a temperature, T, using
∆Sc1c2 =∆Ec1c2 −∆Fc1c2
T
=∆Ec1c2
T+ kB ln
fobs(c1, T )
fobs(c2, T )(3.9)
=∆Uc1c2
T+ kB ln
fobs(c1, T )
fobs(c2, T ). (3.10)
Therefore, no approximation is necessary to calculate the relative configurational
entropies in contrast to molecular dynamics (MD) studies (see e.g. Ref. [59]).
Chapter 3. Protein-like Chain Without a Solvent 40
3.2 Results
3.2.1 Parallel tempering efficiency
In the current context, we will call the simulation efficient if it generates many inde-
pendent configurations in a given simulated time period. For instance, since the PT
simulations can be seen as replicas moving from temperature to temperature while they
change their configurations, if a certain replica gets stuck in a certain range of temper-
atures, the sampling would likely provide poor estimates for ensemble averages. All the
presented results belong to the the simulation runs in which the highest used tempera-
ture, T ∗0 , is 2
3, while the lowest temperature T ∗
n varies mainly depending on the model.
The reason for choosing T ∗0 = 2
3is that at this temperature the most common structure
is the structure with no bond, which ensures that the temperature is sufficiently high so
that the chain does not become trapped in any potential minima.
Initially at the start of the simulation, T ∗0 is assigned to replica 0 and T ∗
n is assigned
to replica n; however, during the replica exchange events, these temperatures can be
assigned to other replicas as well. To represent the existing temperatures, typically
β∗0 to β∗n are used. The replica exchange between adjacent replicas happens every two
picoseconds of simulated time, in which approximately 20 events happen at a single
temperature. In each replica exchange event, half of replicas are chosen at random
and these replicas get a chance to exchange their temperatures (or configurations) with
replicas at adjacent temperatures. Most runs consist of more than half a million replica
exchange events.
As explained in chapter 2, the parameters chosen for the PT method have a strong
effect on the PT efficiency. Therefore, before determining free energies and other prop-
erties, the efficiency of the simulations should be assessed. To evaluate the PT efficiency
in a simulation a PT period is defined as the time for a replica to travel between all the
temperatures and come back again to its initial temperature. This is equivalent to twice
Chapter 3. Protein-like Chain Without a Solvent 41
0
20
40
60
80
100
120
140
160
180
0 10 20 30 40 50 60 70 80 90 100
β* Index
PT Replica Exchange Event (x 0.0001)
Figure 3.3: Example of less efficient dynamics in inverse temperature space for one replica
(the one that started at β∗ = 38.4) for the system of 170 replicas of model A.
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
β* Index
PT Replica Exchange Event (x 0.0001)
Figure 3.4: Example of efficient dynamics in temperature space for one replica (which
starts at β∗ = 10.5) for a system of 90 replicas of model B.
Chapter 3. Protein-like Chain Without a Solvent 42
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Replica Index
PT Replica Exchange Event (x 0.0001)
Figure 3.5: Example of efficient dynamics in replica space at one temperature (β∗ = 7.5)
of the system with 90 replicas for model B.
the traversal time between the maximum and minimum temperatures. The traveling
process that happens in one PT period is called a cycle. The key concept to check the
efficiency of a PT simulation is the number of PT cycles in one run.
When the PT system contains a large number of replicas, it is harder to have good
dynamics and some of the replicas may not move very well through all the temperatures
during one PT cycle due to barriers. For example, in Fig. 3.3, in a system of 170 replicas
for model A, replica 40 does not visit all the existing temperatures in short time and
spends a long period of time moving among one third of the temperatures during one PT
period. In this case, while in a reasonable amount of time (less than 25,000 PT replica
exchange events for this case) replica 40 would travel back to its initial temperature, it
does not visit all the existing temperatures.
As mentioned in section 2.3.2, ∆β∗ can depend on β∗i . However, depending on the
potential landscape, it may be possible to observe efficient dynamics even with a large
number of replicas, without requiring a temperature dependent ∆β∗. For example in
Chapter 3. Protein-like Chain Without a Solvent 43
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Replica Index
PT Replica Exchange Event (x 0.0001)
Figure 3.6: Example of less efficient dynamics in replica space at a relatively low tem-
perature (β∗ = 14.7) of the system with 90 replicas for model B.
Fig. 3.4, in a study of model B with 90 replicas and 5× 105 exchange intervals, one sees
that replica index 60 travels between any two temperatures in a reasonable amount of
time.
The main factor for assessing an efficient PT dynamics is the sequence of replicas
that have a specific temperature as the PT simulation progresses. Ideally, in a long
run, each temperature should be assigned to different replicas with the same probability,
so that the dynamics is smooth in replica space at a fixed temperature. This can be
observed in Fig. 3.5, where all replicas visiting temperature β∗ = 7.5 with almost the
same probability. Even within the same run, the dynamics typically become less efficient
as temperature is lowered. As can be seen in Fig. 3.6, for β∗ = 14.7, which is a relatively
low temperature in the system, the replica index does not vary smoothly and uniformly
as the PT simulation progresses, but spends more time at specific replicas. To have
properly smooth dynamics between the replicas, model A requires that ∆β∗ decreases at
high β∗i . Details of the temperature set used in PT are provided in the appendix B.1.
Chapter 3. Protein-like Chain Without a Solvent 44
(a) (b)
(c) (d)
Figure 3.7: Pictures a to d show the snapshots from four different steps of dynamics that
start from an unfolded state and ends in the collapsed structure.
3.2.2 Observed structures
It is clear that at very low temperatures, the most common structures only have attractive
bonds and no repulsive bonds. Therefore, unless otherwise specified, here the term
“bond” always refers only to an attractive bond (or hydrogen bond) and not repulsion or
covalent bonds. Before studying the free energy landscape using PT, the discontinuous
molecular dynamics (DMD) method was used to study the dynamics of the protein-
like chain. Starting the dynamics from an unfolded state, by decreasing the temperature
Chapter 3. Protein-like Chain Without a Solvent 45
Figure 3.8: Snapshot of the lowest potential energy configuration of model B for the
25-bead chain.
during several steps, the collapsed structures were observed. In this process by decreasing
the temperature, for the 25-bead chain of model A, the radius of gyration and the end-to-
end vector dropped to 58%–75% and 50%–78% of their initial values at high temperatures
respectively. Four snapshots illustrating the change in the typical conformation of the
protein-like chain from high to low temperatures are presented in Fig. 3.7. However,
the collapsed structures that were observed in this process can be some local minima
structures and not necessary the native structures. This was one of the initial motivations
to study the free energy landscape and its variation with temperature.
Using the PT method, the most common structures of the protein-like chain at dif-
ferent temperatures can be determined. For model B, except at high temperatures the
helical structures are the most common structures, which can be seen clearly in the snap-
shot of one of these structures in Fig. 3.8. In contrast, model A lacks sufficient rigidity
for turning, where beads that are separated by 8 or 12 beads can make bonds, leading
Chapter 3. Protein-like Chain Without a Solvent 46
(a)
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100
Rg (Å)
PT Replica Exchange Event (x 0.001)
(b)
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100
Rg (Å)
PT Replica Exchange Event (x 0.001)
Figure 3.9: Variation of the radius of gyration at high temperature (green crosses, T ∗ =
0.333) and low temperature (red pluses, T ∗ = 0.073) for (a) model A and (b) model B.
Chapter 3. Protein-like Chain Without a Solvent 47
46 48 50 52 54 56 58 60 62 64 68 72 76 80 84
80
84
88
92
96
100
Iz
Ix
Iy
Iz
Figure 3.10: Variation of principal moments of inertia of the shapes that are represented
by one interaction matrix (i.e., one configuration), which belongs to the most common
structures of model A at β∗ = 9 (AE AI AM CG CK CO CS CW EI GK GS GW KO
KS OS QU QY SW UY). One unit corresponds to 2.048×10−44 kg m2.
to structures that are unrealistically compact. Consequently, no alpha helical structure
was observed in the common structures of this model.
In Fig. 3.9 the variation of the radius of gyration of the chain during the PT simulation
is shown for configurations of both model A and B at two temperatures. The green dots
belong to the temperature T ∗ = 0.333 and the red dots belong to the temperature
T ∗ = 0.073. One sees that at low temperature, most of the configurations are in the
collapsed phase (compact globule or native structure). However, at high temperature,
most of the configurations are not restricted by any bonds and are completely unfolded
with various number of shapes. For the configurations at low temperatures, the radius
of gyration is around 16 A, indicating a relatively compact conformation consistent with
both native and intermediate phases. In contrast, the value of the radius of gyration at
the higher temperature varies between 16 A and 80 A. According to these graphs both
the collapsed and unfolded structures in model A are denser than model B, which is the
Chapter 3. Protein-like Chain Without a Solvent 48
42 44 46 48 50 52 54 56 58 60 62 60 65
70 75
80 85
90 95
75 80 85 90 95
100 105
Iz
Ix
Iy
Iz
Figure 3.11: Variation of principal moments of inertia of the shapes of the most (red)
and the second most (green) common configurations of model A at β∗ = 9. One unit
corresponds to 2.048×10−44 kg m2.
result of more attractive interactions in model A.
In this thesis configuration and structure refer to the same concept, which is defined
by the matrix of interactions. To confirm that this definition of a configuration is proper,
it should be shown that the configurations that are represented by one matrix are very
similar in their shapes. For this purpose the principal moments of inertia of different
shapes that are represented by one matrix are calculated and plotted. To calculate the
principal moments of inertia, all the elements of the moment of inertia tensor, I, are
calculated according to
I ≡N∑
i=1
m
y2i + z2
i −xiyi −xizi
−yixi x2i + z2
i −yizi
−zixi −ziyi x2i + y2
i
, (3.11)
where “i” is the bead index and “N” is the number of beads in the chain. The principal
Chapter 3. Protein-like Chain Without a Solvent 49
moments of inertia are determined by finding the eigenvalues of this matrix, where Ix ≤Iy ≤ Iz. In Fig. 3.10, the variations of the principal moments of inertia are presented for
the most common structure of Model A at β∗ = 9, which is a relatively low temperature.
It can be observed in this figure that the principal moments of inertia lie in a reasonably
small range, and for example, Ix and Iy ranges do not have any overlap.
This matter can also be verified visually, where all the shapes that are represented
by a matrix are compared with one of the shapes chosen as a reference shape. For doing
this comparison, each shape, except the reference shape, is rotated 1000 times randomly,
and then the sum of the square distances between each chain bead of the rotated shape
and the same chain bead in the reference shape is calculated according to
D =Size∑i=1
~r 2i , (3.12)
where ~ri refers to the distance between the two same index beads in the reference shape
and the rotated shape. For each shape that is represented by the matrix, one shape from
the 1000 rotated shapes that has the minimum difference in the calculated sum of the
square distances, D, is chosen for a movie.
The movie made by these selected shapes demonstrates that the shapes are similar
to each other for the matrix that represents a low energy configuration (having several
attractive bonds). However, for high energy configurations, where there is no attractive
bond or there is only one bond, one matrix can represent many different shapes. This
is expected since having no bond or only one bond means that there are only a few
constraints in the configuration to form its shape.
The spreads in the principal moments of inertia for shapes corresponding to the most
common structure (AE AI AM CG CK CO CS CW EI GK GS GW KO KS OS QU QY
SW UY) and the third most common structure (AE AI AM CG CK CO CS EI GK GS
KO KS KW OS OW QU QY SW UY) of model A at β∗ = 9 are compared in Fig. 3.11.
It can be seen that these two configurations, which only differ in two hydrogen bonds,
Chapter 3. Protein-like Chain Without a Solvent 50
fill a large common area in the principal moments of inertia space.
Although as was presented in Fig. 3.10, the same configurations (with exactly the same
bonds), have similar principal moments of inertia, the principal moments of inertia are
not a good indicator for distinguishing the configurations, and as can be seen in Fig. 3.11,
two different structures may have a similar principal moments of inertia. Therefore, the
structures can only be distinguished by their matrix of interaction explained in Sec. 3.1.1.
Since the structures are saved as interaction matrices, it is relatively easy to count the
number of occurrences of the different structures and to find the most common structures.
The most common configuration at each temperature is the configuration with the lowest
free energy of the system. Considering Eqs. 3.2 and 3.4, at low temperatures, the energy
dominates the entropy effects, and therefore, the structure with the lowest energy has
the lowest or one of the lowest free energies as well. Consequently, it is expected that at
very low temperatures, the lowest energy configuration is the most common structure.
The most common configurations at different temperatures for model A and model B
are presented in Table 3.2 and Table 3.3 respectively. According to Fig. 3.2, for the model
B 25-bead chain, the maximum number of attractive bonds is 8 bonds. As expected, the
most common structure for model B at low temperatures, β∗ ≥ 4.5, has 8 attractive bonds
and therefore has the lowest potential energy for this model. According to Table 3.2, the
lowest potential energy configuration in model A for the 25-bead chain has 21 attractive
bonds. However, according to Fig. 3.2(a), 36 possible attractive bonds are available for
the 25-bead chain in model A. This means that either the configurations with lower
energies that have more than 21 attractive bonds are not geometrically accessible (due
to constraints in the model) or their configurational entropies are too low to be observed
at these temperatures. It will be shown later (Sec. 3.2.6) that the first scenario is the
case. However, if the second scenario were true, the lower energy configurations would
become dominant by reaching lower temperatures.
Chapter 3. Protein-like Chain Without a Solvent 51
β∗ the most common structure fobs(%)
1.5 No bond 14.2±0.6
4.5 AU AY CG CS CW EQ GK GO GS GW IM KO KS KW OS SW UY 1.3±0.2
9.0 AE AI AM CG CK CO CS CW EI GK GS GW KO KS OS QU QY SW UY 5.6±0.4
14.0 AE AI AY CG CK CS CW EI GK GO GS IY KO KS KW MQ MU OS QU SW 9.7±0.6
24.0 AE AI AY CG CK CS CW EI GK GO GS IY KO KS KW MQ MU OS QU SW 10.6±0.6
38.4 AQ AU AY CG CO CS CW EI EM GK GO GS GW IM KO KS OS QU SW UY 8.5±0.6
57.5 AQ AU AY CG CO CS CW EI EM GK GO GS GW IM KO KS OS QU SW UY 7.7±0.6
72.5 AE AI AM CG CK CO CS EI GK GS GW KO KS KW OS OW QU QY SW UY 8.1±0.6
87.5 AE AI AM AQ AU AY CG CK EI EY GK IM IQ IY MQ MU OS QU QY SW
UY
8.2±0.6
β∗ the second most common structure fobs(%)
1.5 SW 2.1±0.2
4.5 AE AI AM EI EM GK GO GW IM KO KS KW OS OW QU QY SW UY 0.7±0.2
9.0 AE AI AM CG CK CO CS CW EI GK GW KO KW OS OW QU QY SW UY 5.0±0.4
14.0 AE AI AY CG CK CO CS CW EI GK GO IY KO KW MQ MU OS OW QU SW 8.7±0.6
24.0 AE AI AY CG CK CO CS CW EI GK GO IY KO KW MQ MU OS OW QU SW 9.6±0.6
38.4 AE AI AY CG CK CO CS CW EI GK GO IY KO KW MQ MU OS OW QU SW 6.6±0.4
57.5 AE AI AM CG CK CO CS EI GK GS GW KO KS KW OS OW QU QY SW UY 4.5±0.4
72.5 AQ AU AY CG CO CS CW EI EM GK GO GS GW IM KO KS OS QU SW UY 7.5±0.4
87.5 AE AU AY CG CS CW EY GK GS GW IM IQ KO KS KW MQ OS OW SW
UY
6.6±0.6
Table 3.2: Most common configurations of the model A 25-bead chain, for the system
with 170 replicas.
Chapter 3. Protein-like Chain Without a Solvent 52
β∗ the most common structure fobs(%)
1.5 No bond 22.4 ± 1.2
3.0 No bond 6.7 ± 1.0
3.5 BF JN 4.0 ± 0.6
3.8 BF JN RV 4.2 ± 0.6
4.2 BF FJ NR RV 6.5 ± 0.8
4.5 BF BR BV FJ FV JN NR RV 7.5 ± 1.0
5.3 BF BR BV FJ FV JN NR RV 46.4 ± 1.6
6.0 BF BR BV FJ FV JN NR RV 76.0 ± 1.2
7.5 BF BR BV FJ FV JN NR RV 94.1 ± 0.8
9.0 BF BR BV FJ FV JN NR RV 98.0 ± 0.4
13.5 BF BR BV FJ FV JN NR RV 99.9 ± 0.0
β∗ the second most common fobs(%)
1.5 BF 3.5 ± 0.6
3.0 BF 5.6 ± 0.8
3.5 BF NR 4.0 ± 0.6
3.8 BF FJ NR RV 3.9 ± 0.6
4.2 BF FJ JN RV 4.9 ± 0.8
4.5 BF FJ JN NR RV 6.4 ± 0.8
5.3 BF BR BV FJ JN NR RV 10.1 ± 0.8
6.0 BF BR BV FJ JN NR RV 6.8 ± 0.8
7.5 BF BR BV FJ JN NR RV 1.9 ± 0.6
9.0 BF BR BV FJ JN NR RV 0.5 ± 0.2
13.5 N/A N/A
Table 3.3: Most common configurations of the model B 25-bead chain, for the system
with 90 replicas.
Chapter 3. Protein-like Chain Without a Solvent 53
3.2.3 Free energy landscape
As mentioned in chapter 1, the term energy landscape refers to the free energy as a
function of protein conformation specified here by the configuration matrix. Therefore,
to study the energy landscape at a specific temperature the most common structures
should be checked and the free energy of different structures should be calculated at the
temperature. Two structures are close in the landscape if they have similar configurations,
which means that they should have a large number of bonds in common. For model A,
these dominant structures are shown in Table 3.2, while those for model B are given in
Table 3.3. The most common structures at any temperature are those with the lowest
Helmholtz free energy at that temperature. Therefore, at low enough temperatures,
when the effect of entropy is small, the most common structure is the one with the
lowest possible potential energy. The term funnel refers to the relatively steep valley in
which the deepest point corresponding to the configuration with the lowest free energy
is easily accessible from almost anywhere inside the valley. This means that the barriers
between local minima located inside the funnel and the deepest point of the valley should
be small. If the barriers are relatively small it can be assumed that the chain folding
happens as a chain gliding down in the funnel shaped free-energy landscape along several
different paths towards its lowest free energy structure.
As can be seen in Table 3.2, by decreasing the temperature for model A, some dom-
inant structures are observed, but by decreasing the temperature further, the ratios of
their populations to the total population starts to decrease and new structures become
dominant. It can be concluded that in this model, the shape of the landscape changes
significantly by varying the temperature, where at high temperatures the landscape is
riddled with many local minima and one very deep but wide minimum (no bonded struc-
ture), and at low temperatures there are a few narrow deep minima. For model A, either
there are deep local minima inside a funnel shaped valley or there are only a few deep
local minima beside each other. At the studied temperatures, there is no structure with a
Chapter 3. Protein-like Chain Without a Solvent 54
Rank most common structure fobs(%)
1 BF BR BV FJ FV JN NR RV 76.0 ± 1.2
2 BF BR BV FJ JN NR RV 6.8 ± 0.8
3 BF BV FJ FV JN NR RV 3.8 ± 0.6
4 BF BR BV FV JN NR RV 1.9 ± 0.4
5 BF BR BV FJ FV JN RV 1.3 ± 0.4
6 BF BR BV FJ FV NR RV 1.0 ± 0.3
7 BF BR BV FJ FV JN NR 1.0 ± 0.3
Table 3.4: Most common configurations of the model B 25-bead chain at β∗ = 6.
very large population, which confirms that there is no very deep point in the free energy
landscape. Since the most common structures at each temperature differ from each other
in a few bonds, these deep minima are located close to each other in the landscape but
not necessary inside a funnel. For example, as can be seen in Table 3.2, the first two most
common structures at β∗ = 57.5 differ in seven bonds. Hence, there are many barriers
to access one of these points from other ones, because seven bonds must be broken and
seven new bonds must be formed. On the other hand, these two structures share thirteen
bonds (65% of their total bonds), which indicates that they are similar and therefore,
their locations in the landscape are relatively close to each other.
By decreasing the temperature there is a process of trapping in and escaping from
these local minima. It was not possible to decrease the temperature further because this
would increase the number of replicas to the point that it would be impossible to have
very good PT dynamics.
Unlike the behavior observed in model A, by decreasing the temperature a single
dominant structure is identified in model B, where the probability of the most common
structure attains a value of nearly one at low temperatures (See Table 3.3). For β∗ ≥ 5.3,
the free energy landscape consists of a very deep funnel in which there are several deep
Chapter 3. Protein-like Chain Without a Solvent 55
minima. The most common structures for β∗ = 6 are presented in Table 3.4. Since
for repulsion the alphabetic index of two beads should be 4k + 1 and 4k′, none of the
seven most common structures have a repulsive bond. This is not surprising, since the
formation of a repulsive bond both limits the number of accessible conformations and is
energetically unfavorable. The most common structure, BF BR BV FJ FV JN NR RV,
is the deepest point in the funnel, and the six other most common structures (2nd-7th)
differ only in one bond from this structure. This means that there is a funnel shaped
valley where there is a very deep minimum inside and there are a few local minima beside
this deepest point of the landscape. According to Table 3.3, by lowering the temperature
the deepest point of the funnel becomes deeper while the other minima become shallower,
since the population of the most common structure reaches a value higher than 99.9%.
This means that by lowering the temperature the funnel become smoother and steeper,
and the lowest free energy configuration becomes more accessible.
The trend of the probability of the most common structure for the two models can
be seen in Fig. 3.12, in which for model A the probability of the most common structure
at low temperatures is fluctuating around a value far from one (0.08) but in model B for
the 25-bead chain the probability of the most common structure nearly reaches to one.
Several of the most common structures of model B for chains longer than 29 beads
(such as the 35-bead chain) at very low temperatures have the same energy and similar
entropy values. Therefore, there is no single deep point in their energy landscape unlike
the case of the 25-bead chain and their landscape consists of several minima beside each
other inside a wide funnel. As will be discussed extensively later in section 3.2.6, this
happens for chains longer than 29 beads since the configuration with the theoretical
maximum number of bonds is geometrically prohibited.
Chapter 3. Protein-like Chain Without a Solvent 56
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50
Mos
t Com
mon
Str
uctu
re P
roba
bilit
y
β*
Model A - 25 beadsModel B - 25 beads
Figure 3.12: Variation of the probabilities of the most common structure versus the β∗
3.2.4 Entropy and free energy calculation for the model B 25-
bead chain
As shown in Sec. 4.2.4, one can obtain the relative configurational entropies and con-
sequently their free energies from the probability ratio of configurations at a specific
temperature using Eq. 3.10. Since there are fewer possible structures in model B than
model A, the uncertainty associated with the population of each structure is smaller for
model B and consequently, the calculated entropy and free energy of each configuration
have a smaller uncertainty.
The main factor that can be used to categorize the range of entropy values is the
number of bonds. As can be seen in Table 3.5 the entropies of configurations with the
same number of bonds (4, 5 and 7, 8 and 10, 11) are different but they are in the
same range, and typically the entropy decreases by increasing the number of bonds and
adding more restrictions to the configuration shape. However, as provided in Table 3.4,
configurations with the same energy of -6ε have different populations; therefore, their
Chapter 3. Protein-like Chain Without a Solvent 57
entropies are different. While the free energy value of a configuration clearly defines how
deep the landscape is for the configuration, the entropy of a configuration defines how
much area the configuration occupies in the landscape.
Chapter 3. Protein-like Chain Without a Solvent 58
configuration Ec/ε Sc/kB
1 AD 0.9 31.3 ±0.8
2 No Bond 0.00 31.8 ±0.6
3 BF -1 28.6 ± 0.6
4 BF JN -2 25.1 ± 0.6
5 BF NR -2 25.2 ± 0.6
6 BF JN RV -3 21.7 ± 0.4
7 BF FJ NR RV -4 17.8 ± 0.6
8 BF FJ JN RV -4 17.6 ± 0.6
9 BF FJ JN NR RV -5 13.2 ± 0.6
10 BF BR BV FJ JN NR RV -7 3.7 ± 0.8
11 BF BV FJ FV JN NR RV -7 2.9 ± 0.6
12 BF BR BV FJ FV JN NR RV -8 0
Table 3.5: Potential energy in unit of ε and relative entropy of the most common struc-
tures of the model B 25-bead chain.
β∗ ∆F1,2 ∆F1,3 ∆F1,4 ∆F1,5 ∆F1,6 ∆F1,7 ∆F1,8 ∆F1,9 ∆F1,10 ∆F1,11 ∆F1,12
1.5 -1.21 -0.013 1.27 1.24 2.53 4.05 4.27 6.25 10.54 11.05 11.93
2.4 -1.09 -0.72 -0.3 -0.31 0.12 0.69 0.83 1.69 3.63 3.95 4.12
3.3 -1.04 -1.04 -1.01 -1.02 -0.97 -0.83 -0.73 -0.38 0.48 0.71 0.57
4.2 -1.01 -1.23 -1.41 -1.42 -1.60 -1.71 -1.63 -1.56 -1.31 -1.13 -1.46
5.1 -0.99 -1.35 -1.67 -1.68 -2.01 -2.27 -2.21 -2.33 -2.47 -2.33 -2.77
6 -0.98 -1.43 -1.86 -1.87 -2.29 -2.66 -2.61 -2.86 -3.29 -3.16 -3.69
9 -0.95 -1.59 -2.21 -2.21 -2.83 -3.41 -3.37 -3.87 -4.83 -4.74 -5.43
12 -0.94 -1.67 -2.38 -2.38 -3.09 -3.78 -3.75 -4.38 -5.59 -5.53 -6.29
Table 3.6: Helmholtz free energy (∆Fij = Fi−Fj) in units of ε, for the configurations in
table 3.5 (25-bead model B).
Chapter 3. Protein-like Chain Without a Solvent 59
The entropy difference between any two common configurations (∆S) can be cal-
culated based on the ratio of their populations (cf. Eq. 3.10). By using the calculated
entropies, the calculation of the relative Helmholtz free energy between any pair of config-
urations at any temperature becomes possible. This allows one to predict the population
of any structure at any temperature and predict the temperature that the populations
of two specific configurations become equal. In Table 3.6, the calculated free energies of
the 12 configurations of Table 3.5 relative to the free energy of the first configuration,
are presented as a function of temperature.
The frequency of occurrence of two structures at a specific temperature can be used to
calculate their entropy difference. However, often there is no reasonable overlap between
the population distributions of the most common structure at a very low temperature
and the most common structure at a very high temperature (e.g.: configurations 2 and
12 of Table 3.5). Hence, one or two intermediate configurations should be employed to
find ∆S for these two configurations. For example if A and D are the most common
configurations of very high and very low temperatures respectively and B and C are the
most common structures of the temperatures between, if ∆SA,B, ∆SB,C and ∆SC,D can
be calculated, the entropy difference for A and D can be determined. By implementing
this technique, the relative entropy of any pair of configurations can be found. The free
energy and entropy of some of the most common structures of model B for the 25-bead
chain are shown in Table 3.5.
According to Fig. 3.2(b), the maximum number of attractive bonds for the Model B
25-bead chain is 8 bonds. Therefore, BF BR BV FJ FV JN NR RV, which is the
most common structure at low temperatures (refer to Table 3.3), is the lowest energy
configuration, and by lowering the temperature it is not possible to observe any other
configuration as the most common structure.
The trend of β∗∆F versus the configuration index of Table 3.5 is shown in Fig. 3.13.
∆F1c is based on the calculated Helmholtz free energy of Table 3.5. Since both the en-
Chapter 3. Protein-like Chain Without a Solvent 60
β∗ configuration ppred fobs ∆(%)
1.5 No Bond 0.206 0.165 25
1.5 BF 0.068 0.059 15
1.5 RV 0.059 0.065 9
1.5 FJ 0.053 0.067 21
1.5 JN 0.052 0.064 19
3.0 BF BR BV FJ FV JN NR RV 0.096 0.075 28
3.0 BF FJ NR RV 0.076 0.064 19
3.0 BF FJ JN NR RV 0.063 0.064 2
3.0 BF FJ JN RV 0.064 0.059 8
4.0 BF BR BV FJ FV JN NR RV 0.785 0.760 3
4.0 BF BR BV FJ JN NR RV 0.076 0.068 12
4.0 BF BV FJ FV JN NR RV 0.036 0.038 5
4.0 BF BR BV FV JN NR RV 0.018 0.019 5
5.0 BF BR BV FJ FV JN NR RV 0.949 0.941 0.8
5.0 BF BR BV FJ JN NR RV 0.020 0.019 5
5.0 BF BV FJ FV JN NR RV 0.010 0.012 17
6.0 BF BR BV FJ FV JN NR RV 0.988 0.980 0.8
6.0 BF BR BV FJ JN NR RV 0.005 0.005 0
9.0 BF BR BV FJ FV JN NR RV 0.999 0.999 0
Table 3.7: Comparison of the predicted probability (ppred) and the simulation results for
the frequency (fobs), and their relative difference (∆), for the most common structures
of the model B 25-bead.
Chapter 3. Protein-like Chain Without a Solvent 61
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
0 2 4 6 8 10 12
β* ∆
F
Configuration Index
β*=1.5β*=3.9
β*=6β*=12
Figure 3.13: Variation of the β∗∆F versus the configuration index of Table 3.5, where
β∗ = 1T ∗ = ε
kbTand ∆F is the Helmholtz free energy difference with configuration 1 in
unit of ε.
tropy and energy of the configurations are decreasing from configuration 1 to 12, the trend
of β∗∆F is very different for high and low β∗ values. At high temperatures (β∗ ≤ 3),
the second configuration of Table 3.5 (configuration 2) with no bonds, which has the
maximum entropy, is the lowest free energy structure. This can be seen for β∗ = 1.5 in
Fig. 3.13, where for β∗ = 1.5 the free energy for the structures with more bonds (higher
configuration index) are larger than the configurations with fewer bonds (lower configu-
ration index). But by decreasing the temperature, when 4.5 ≤ β∗, the last structure of
Table 3.5 (configuration 12), which has the lowest potential energy, becomes the lowest
free energy structure. This behavior can be seen clearly in Fig. 3.13, where for β∗ = 6
and β∗ = 12 the configuration 12 has the lowest free energy. This confirms that the effect
of entropy in the free energy at low temperatures is very small.
Using the calculated free energies, it is predicted that if 4.5 ≤ β∗, the last structure
of Table 3.5, configuration 12, becomes dominant, since for all the temperatures in that
Chapter 3. Protein-like Chain Without a Solvent 62
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
0 2 4 6 8 10 12 14 16
Free
Ene
rgy
Dif
fere
nce
β*
Figure 3.14: Free energy difference of configurations 2 and 12 of Table 3.5 in units of ε
versus β∗.
range (4.5 ≤ β∗) this configuration has the lowest free energy. The results representing the
population of each configuration confirm this prediction. The population of configuration
12 has the rank of 30th, 13th and 5th among all configurations for the β∗ values of 4.05,
4.2 and 4.35 respectively, and for 4.5 ≤ β∗, it is the most common configuration.
The relative free energy of configurations 2 and 12 is plotted versus β∗ in Fig. 3.14.
It can be seen that at β∗ ≈ 4 their free energies are equal, which implies that their
populations are the same. Indeed, the results show that the percentage populations of
configuration 2 and configuration 12 at β∗ = 3.9 are 1.0% and 0.5%, respectively and at
β∗ = 4.05 are 0.6% and 1.2%, respectively, which confirms that their population should
became equal in the range 3.9 ≤ β∗ ≤ 4.05.
Probability calculation
One of the objectives of calculating the free energy is to predict the population of a specific
structure at any temperature using the probability of a configuration in the canonical
Chapter 3. Protein-like Chain Without a Solvent 63
ensemble, Pri = exp(−βFi)/Z, where Z =∑N
i=1 exp (−βFi), i is the configuration
index and N is the total number of configurations. In principle, the free energy values
of all possible configurations are required to compute the value of the configurational
partition function Z. However, here the structures that have a population of less than
0.5% of the total population at all the studied temperatures are eliminated from the
calculations. It is clear that by not considering some of the rarely occurring structures,
that appear with low probability the value of the partition function Z is underestimated,
which implies that the probabilities computed on the basis of simulation results are
overestimated. The reason for eliminating configurations with populations of less than
0.5% at all the studied temperatures is that because of their small populations at any
studied temperature, statistical uncertainty of computed values of the entropy are too
large to be reliable. Beside this, based on their populations, their free energies are very
small at all the studied temperatures so that neglecting them does not have a significant
effect on the computed value of Z and consequently on the calculated probabilities.
For calculating the probabilities of the configurations with 25 beads, 78 configurations
were chosen and while there is a systematic error because of not considering all the
possible configurations, the predicted probabilities are very close to the observed ones
from the simulation runs, as can be seen in Table 3.7. According to this table, the
predicted values agree better with the simulation results at lower temperatures. The
reason for this behavior is related to the fact that some configurations with very low
populations have not been considered in the probability calculations and since these
configurations occur more frequently at high temperatures, neglecting their contribution
leads to a larger error at high temperatures.
Chapter 3. Protein-like Chain Without a Solvent 64
3.2.5 Entropy and free energy calculation for 35 beads protein-
like chain
The entropies and free energies of Model B 35-bead configurations are calculated in a
similar way to the 25-bead case. Adding only 10 beads to the chain changes the number
of possible attractive bonds from 8 in the 25-bead chain to 23 in the 35-bead chain
(cf. 3.2), which results in a much more complex energy landscape. This dramatic change
in landscape can be seen in Table 3.8 and Fig. 3.16, where we see that unlike the 25-bead
chain, the probability of the most common structure at even very low temperatures does
not become close to one.
As can be seen in Table 3.8, by increasing β∗ (decreasing temperature) a few structures
become dominant at different temperatures. Except for the lowest energy configuration
with 23 attractive bonds, other energies are degenerate with multiple configurations
possessing the same number of bonds. It will be shown in the next section 3.2.6 that a
structure with 23 attractive bonds is geometrically prohibited. The configurations with
21 or 22 attractive bonds have not been observed in any of the runs. However, if it
is assumed that configurations with 21 or 22 bonds are possible, even by lowering the
temperature it is not possible to observe one dominant structure because there should
be several configurations with 21 or 22 attractive bonds. Therefore, the trend that has
been observed for the 25-bead chain is not expected for the 35-bead chain even at very
low temperatures.
The landscape of the 35-bead chain is different from the 25-bead chain landscape
because of the large entropic barriers between configurations with different energies. All
the most common structures in Table 3.8 at high β∗ have an energy of −19ε. Beside
the two main configurations with the energy of −19ε, which are presented in Table 3.8,
there are at least 18 other configurations with the same potential energy but with lower
entropies (cf. Table 3.9). As can be seen in this table, three structures with an energy
Chapter 3. Protein-like Chain Without a Solvent 65
β∗ the most common structure fobs(%)
1.5 No Bond 11.7 ± 1.3
3.75 BF NR VZ dh 0.7 ± 0.3
4.5 BF Bd Bh FJ FZ Fd Fh JZ Jd Jh NR RV Zd dh 2.7 ± 0.6
5.25 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 7.2 ± 0.9
9.0 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 18.8 ± 1.5
16.5 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 25.8 ± 1.8
31.5 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 24.7 ± 1.6
53.63 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 23.6 ± 1.6
β∗ the second most common structure fobs(%)
1.5 dh 2.3 ± 0.6
3.75 BF FJ JN RV Zd dh 0.6 ± 0.3
4.5 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 2.0 ± 0.5
5.25 BF BR BV BZ Bd Bh FJ Fd Fh JN Jh NR Nh RV Rh VZ Zd dh 5.3 ± 0.9
9.0 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 15.4 ± 1.5
16.5 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 14.4 ± 1.4
31.5 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 16.7 ± 1.4
53.63 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 15.2 ± 1.3
Table 3.8: Most common configurations of the model B 35-bead chain, for the system
with 110 replicas.
Chapter 3. Protein-like Chain Without a Solvent 66
of −20ε and with relatively low entropies have been observed in the runs. The results of
many runs suggest that there is only one configuration with a potential energy of −20ε,
but the result of one of the runs revealed the other two structures, configurations 22 and
23. Based on their populations, configurations 22 and 23 should have very low entropies.
According to Table 3.8, even at very low temperatures, configuration 21 is not among the
first two most common structures. Configuration 21 with the lowest observed energy is
different in five bonds from the first configuration of Table 3.8, which has been the most
common structure in the lowest studied temperatures. Hence, these two configurations
can not be located inside one steep funnel, and there is a huge entropic barrier between
configuration 21 and other structures with higher energies, such as the first configuration,
that can be overcome by lowering the temperature much further.
Based on the calculated entropies and energies of configurations 1 and 21 in Table 3.9,
it is predicted that at β∗ ≥ 63 configuration 21, with 20 bonds, becomes the most common
structure. However, at the lowest studied temperature, β∗ = 64.5, configurations 21 and
1 have 15% and 20% of the total population respectively, which does not support the
prediction, but it shows that by slightly lowering the temperature, configuration 21 should
become the most common structure. However, since there are other structures with the
same energy (cf. Table 3.9), its probability will not approach one.
The landscape for 35-bead chains at low temperatures is very different from the deep
and steep funnel that was observed for the 25-bead chain landscape. At low temperatures
the most common structures of the 35-bead chain are a few configurations with the
same energy. For example, as can be seen in 3.8, the two most common structures for
16.5 ≤ β∗ ≤ 53 have 19 attractive bonds. While these two structures differ slightly in
their populations, structurally they differ by more than one bond, quite unlike the seven
most common structures of the 25-bead chain at β∗ = 6 (cf. Fig. 3.4), which only differ
from each other by one bond. Since the most common structures of the 35-bead chain
at low temperatures share most of their bonds, their points in the landscape should be
Chapter 3. Protein-like Chain Without a Solvent 67
configuration Ec/ε Sc/kB
1 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 62.9
2 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh -19 62.3±0.4
3 BF BR BV BZ Bd Bh FJ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 62.3±0.5
4 BF BR BV BZ Bd Bh FJ FZ Fd Fh JN Jh NR Nh RV Rh VZ Zd dh -19 61.5±0.6
5 BF BR BV BZ Bd Bh FJ FV Fh JN Jh NR Nd Nh RV Rh VZ Zd dh -19 61.4±1.3
6 BF BR BV BZ FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.3±1.5
7 BF BR BV BZ Bd Bh FJ FZ Fd JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.3±0.5
8 BF BR BV Bh FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.2±0.3
9 BF BR BV BZ Bd Bh FJ FZ Fd JN Jd NR Nd Nh RV Rh VZ Zd dh -19 60.6±1.5
10 BF BV BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh -19 60.4±0.5
11 BF BZ Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV VZ Zd dh -19 60.1±0.9
12 BF BV BZ Bd FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 60.0±0.7
13 BF BR BV FJ FV FZ Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 59.5±2.5
14 BF BZ Bd Bh FJ FV FZ Fd Fh JN JZ Jd Jh NR Nd RV VZ Zd dh -19 59.4±1.2
15 BF BR BV BZ FJ FV FZ Fd JN JZ Jd NR Nd Nh RV Rh VZ Zd dh -19 59.4±1.0
16 BF BV BZ Bd Bh FJ FV FZ Fd Fh JN JZ Jd Jh NR RV VZ Zd dh -19 59.3±1.0
17 BF BR BV BZ Bd FJ FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh -19 59.0±0.7
18 BF BR BV BZ Bd FJ FZ Fd JN Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 58.9±1.0
19 BF Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 58.9±0.8
20 BF BR BV BZ Bh FJ FZ Fd JN JZ Jd Jh NR Nh RV Rh VZ Zd dh -19 58.7±0.7
21 BF BR BV BZ Bh FJ FV FZ JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 0.0±1.0
22 BF BR BV Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 N/A
23 BF BR BV Bh FJ FV FZ Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 N/A
Table 3.9: The lowest potential energy configurations of the model B 35-bead chain.
Since the reference point for calculating the entropies is the configuration 1, its entropy
variation is zero. The first twenty configurations are presented in order of their entropies.
Chapter 3. Protein-like Chain Without a Solvent 68
close to each other. However, since the most common structure, as the deepest point of
the landscape, differs from other common structures by more than one bond, their points
in the landscape are not necessary inside one steep valley. Therefore, the landscape at
low temperatures for the 35-bead chain consists of several minima that are close but not
necessary inside one steep funnel, and there is no very deep point in the landscape similar
to the 25-bead case.
Obstacles
While the range of energies and entropies of the possible configurations are 8ε and 32kB
respectively for 25-bead chains, these ranges are increased to 20ε and 140kB respectively
for 35-bead chains, which confirms the view that the landscape of the 35-bead chain is
much wider than the 25-bead chain landscape. This also shows that for studying the
landscape a much wider range of temperatures and more replicas are required.
Predicting the probabilities of configurations for 35-bead chains is much more difficult
numerically than for 25-bead chains. To predict the probability of each configuration its
free energy as well as the free energies of almost all other configurations need to be
estimated. The need for a vast number of free energy estimates makes the probability
calculation for the 35-bead chain much harder, and consequently, the errors associated
with these calculations are clearly larger. In Table 3.10, it is evident that the predicted
and observed probabilities for β∗ = 9.8 have larger statistical uncertainties than those
observed in smaller systems (cf. Table 3.7).
3.2.6 Effects of the protein-like chain length
As was seen for 25-bead chains, the probability of the most common structure approaches
one at relatively low temperatures. In contrast, the probability of the most common
structure of the 35-bead chain does not become close to one at the low temperatures
studied here. There are two possible reasons for this behavior. First, the studied range
Chapter 3. Protein-like Chain Without a Solvent 69
Configuration structure ppred fobs ∆ (%)
BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 0.21 0.21 0
BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 0.18 0.13 38
BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 0.09 0.08 13
BF BR BV BZ Bd Bh FJ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 0.06 0.07 6
BF Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV VZ Zd dh 0.14 0.06 130
Table 3.10: Comparison of the predicted probability and the simulation results values
for the most common structures of the model B 35-bead chain.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.5 3 4.5 6 7.5 9 10.5 12 13.5 15 16.5
Mos
t Com
mon
Str
uctu
re P
roba
bilit
y
β*
Model B - 15 beadsModel B - 20 beadsModel B - 25 beadsModel B - 29 beads
Figure 3.15: Variation of the probabilities of the most common structure versus the β∗
for chains with 15, 20, 25 and 29 beads
Chapter 3. Protein-like Chain Without a Solvent 70
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50
Mos
t Com
mon
Str
uctu
re P
roba
bilit
y
β*
Model B - 29 beadsModel B - 30 beadsModel B - 35 beads
Figure 3.16: Variation of the probabilities of the most common structure versus the β∗
for the chains with 29, 30 and 35 beads. The result of the 29-bead chain from figure 3.15
is presented here as a reference.
of temperature was not sufficiently large and therefore, the lowest energy configuration
(theoretically according to Fig. 3.2) has not been observed in the simulation runs. The
second possible reason for this behavior can be that the lowest possible energy is not
geometrically accessible considering the criteria of model B (Table 3.1) and by increasing
β∗ (decreasing the temperature), several structures with the same energy are competing
for the highest probability. While the configurational entropies of these configurations
are different, there is no configuration with a much higher entropy than all the other
structures with the same energy, and hence the probability of none of them can approach
one. There are many pieces of evidence to support the second scenario that will be
explained here.
According to Table 3.1 and Fig. 3.2, the maximum possible number of attractive
bonds for the chains with length of 15, 20, 25, 29, 30 and 35 beads are 3, 5, 8, 12, 17
and 23 bonds respectively in model B. As can be seen in Fig. 3.15, for 15, 20, 25 and
Chapter 3. Protein-like Chain Without a Solvent 71
29 beads chains the probability of the most common structure approaches one at low
temperatures. This happens since the configurations with the lowest possible energy,
which have the maximum number of bonds, becomes prominent at low temperatures. As
was seen in Table 3.9 for the 35-bead chain, for the longer chains the difference of the
entropies of the configurations that differ only in one bond becomes really large for low
energy structures.
For the 29-bead chain there is a peak in the most common structure probability
at β∗ = 7.5, which happens because of the very large ∆S between the most common
structure with 11 bonds and the most common structure with 12 bonds. Therefore, for
some β∗ values (4.35 ≤ β∗ ≤ 9), a configuration with 11 bonds becomes the most common
structure because of its higher entropy in comparison to the 12-bond configuration as well
as other 11-bond configurations. The probability of the 11-bond structure increases until
β∗ = 7.5, where from this point the probability of the configuration with 12 bonds
increases because of the lower entropy effect at the temperatures β∗ ≥ 7.5. At β∗ = 9
their probabilities become equal, and for 9 < β∗ the structure with 12 bonds becomes
the most common structure.
The configuration with the theoretical maximum number of bonds seems to be not
geometrically possible for chains longer than 29 beads. For the 30-bead chain, the maxi-
mum possible number of bonds is 17. No structure with 17 bonds was observed in several
runs with different numbers of replicas, different PT temperature sets and different ranges
of temperatures, which implies that satisfying all possible bonds for the 30-bead chain is
geometrically impossible. If this is the case, satisfying all possible bonds for any longer
chains should be impossible as well. Since for 30 and 35 beads chains the range of stud-
ied temperatures is larger, it is much harder to observe a very good PT dynamics in
comparison to 15, 20, 25 and 29 beads chains cases. The results presented in Fig. 3.16
for 30 and 35 beads chains, examine a relatively wide temperature range in which the
PT dynamics is relatively good. As can be seen in Fig. 3.16, when 4.5 ≤ β∗ ≤ 7.5, the
Chapter 3. Protein-like Chain Without a Solvent 72
probability of the most common structure increases for the 30-bead chain (similar to the
behavior observed in 15, 20, 25 and 29 beads chain systems). Then, the probability of
the most common structure remains unchanged around 0.70 as β∗ increases, up to a β∗
value that can vary between β∗ = 13.5 and β∗ = 21, depending on the PT dynamics
setup (e.g.: initial temperature set and the number of replicas in the run). After this
flat area in the graph, the most common structure probability decreases until reaching
a β∗ value, where the population of the structure with the highest entropy among 16
bonds structures becomes equal to the population of the structure with the highest en-
tropy among 15 bonds structures. This is where the probability of the two most common
structures become equal, which can be seen as a local minimum in the plot of the 30-
bead chain of Fig. 3.16. After passing this local minimum, the structure with 16 bonds
becomes the most common structure. However, because there are at least six structures
with 16 bonds, the probability of the most common structure does not become close
to one even at very low temperatures. Since the structure with 17 bonds (theoretically
lowest free energy) seems to be not geometrically possible, several configurations with 16
bonds become very common at very low temperatures, where their populations mainly
depend on their configurational entropies ( configurations with higher energies have much
higher free energies). Therefore, as can be seen in Fig. 3.16, the most common structure
probability converges to a value that is lower than one at very low temperatures.
The attractive bonds can be formed at a range 4.6A ≤ rij ≤5.8A (σ1=4.6 A and
σ2=5.8 A), where rij is the distance between beads i and j. Because of these geometrical
criteria, the 30-bead chain structure with 17 bonds seems to be not accessible. Therefore,
if the structure with 17 bonds is not geometrically accessible, by increasing the range of
attractive bond, it should be possible to observe a structure with 17 bonds. Since the
entropic barriers become smaller by increasing the attractive bond range, the shape of the
graph representing the probability of the most common structure should change as well.
To change the attractive bond range, σ1 was kept constant and only σ2 was increased.
Chapter 3. Protein-like Chain Without a Solvent 73
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 3 6 9 12 15 18 21 24 27 30
Mos
t Com
mon
Str
uctu
re P
roba
bilit
y
β*
5.86.46.76.9
Figure 3.17: Variation of the probabilities of the most common structure versus the β∗
for the 30-bead chain for different attractive bond interaction distances (increasing σ2
from the initial 5.8 A to 6.4 A, 6.7 A and 6.9 A) .
At σ2 = 6.2A, it was possible to observe a configuration with 17 attractive bonds at low
temperatures, while this structure was not observed for the runs with σ2 ≤ 6.1 A. In
graph 3.17, for σ2 = 6.4A the probability of the 17 bonds structure as the most common
structure approaches one around β∗ = 27, and by increasing the value of σ2, this occurs
at lower β∗, since the entropic barriers between the low energies configurations, such
as the configurations with 15 bonds and 16 bonds, become smaller. The first bump
in the graphs of Fig. 3.17 represents a temperature region where the structure with 15
bonds becomes the most probable configuration, and the second bump occurs at higher
β∗ values, where the 16 bonds configuration becomes the most probable structure. Since
the entropy difference between the configurations with different energies becomes smaller
for larger σ, this range of β∗, where the structure with 15 bonds becomes the most
common structure, becomes smaller for larger σ values as can be seen for σ2 =6.7 A and
σ2 =6.9 A in Fig. 3.17.
Therefore, for chains smaller than 30 beads at low temperatures the landscape con-
Chapter 3. Protein-like Chain Without a Solvent 74
sists of one deep funnel that contains several minima. The funnel becomes steeper by
decreasing the temperature, since by decreasing the temperature the effect of entropy
decreases and consequently, the relative free energy of the structure with the maximum
number of bonds and the structure with the lower number of bonds increases. At very
low temperatures the landscape consists of a funnel with a very deep point representing
the configuration with the maximum number of bonds. But for chains longer than 29
beads, at low temperatures the landscape consists of several minima that become further
from each other by increasing the chain length. Even at low temperatures, the landscape
of the longer chains does not consist of one deep funnel, and it consists of several minima
or funnels. The landscape becomes much more complex by increasing the length of the
chain. For example, the range of energy and entropy of the 35-bead chain are 2.5 and
4.4 times larger than the 25-bead chain ones.
The observed landscape behavior can give an insight into the native structure of
proteins. While for small proteins the native structure might be the lowest free energy
structure, there is the distinct possibility that for longer proteins the native structure is
not necessarily the lowest free energy structure. The native structure for longer proteins
might be one of the lowest free energy structures that can be accessed easily during
the folding dynamics. The effects of temperature on the free energy landscape seem to
be larger for longer chains. While for small chains at lower temperatures, the lowest
free energy structure does not change by decreasing the temperature, for longer chains
at the same range of temperatures, the deepest point of the landscape may change to
another structure with slightly different bonds by decreasing the temperatures. Thus,
the native structure may be more sensitive to fluctuations in temperature for some of the
long proteins.
Chapter 4
Protein-like Chain Inside a Solvent
While some experiments have shown that a protein can fold into its native configuration
with apparently negligible solvent ordering effects[39], in nature, and consequently in
most experimental studies, folding occurs in the presence of a fluid environment (in vitro
or in vivo), where it has been suggested that “a significant portion of the fold-dictating
information is encoded by the atomic interaction network in the solvent-unexposed core
of protein domains”[13]. In this chapter, the thermodynamics of a protein-like chain
interacting via discontinuous potentials is examined in the presence of a square-well fluid
capable of forming bonds with selected parts of the chain.
4.1 The System
The system consists of a protein-like chain inside an environment with thousands of
solvent particles. The protein-like chain is the same beads on a string model described in
the previous chapter (Chapter 3) in which each bead represents one amino acid or residue.
All intra-chain interactions of the protein-like chain are the same as the main model of
the previous chapter, model B (cf. Table 3.1 and Fig. 3.2). Similar to the previous
chapter, beside introducing the reduced units for energy and temperature, physical units
in the definition of the model are also introduced to make contact with real proteins.
75
Chapter 4. Protein-like Chain Inside a Solvent 76
In particular, lengths will be expressed in Angstroms, energies in kJ/mol and masses in
atomic mass units.
4.1.1 The solvent model
The solvent consists of N molecules in a fixed volume V which interact via a square-well
potential. The square-well fluid has been studied extensively [60, 61, 62, 63, 64, 65, 66,
67, 68]. The interaction between any pair of solvent particles is a square-well potential,
depicted in Fig. 3.1(b). To be able to compare to the previous studies, a popular set
of parameters have been used, where σ and σ′ , representing inner and outer points of
discontinuity of the potential well satisfy σ′σ
= 1.5. σ and σ′ are chosen to be 4.16 A and
6.24 A respectively, and the potential depth for the square-well interaction between the
fluid beads, εl, is defined as (0.35/1.5)ε ' 0.23ε. Therefore, the energy of each hydrogen
bond between two solvent particles is εl=4.7 kJ/mol, which is a relatively weak hydrogen
bond in comparison to the intra-chain hydrogen bonds by the energy of 19.9 kJ/mol.
The mass of each fluid particle is chosen as ml = 0.15mp, where ml and mp are the
masses of a fluid particle and a chain bead respectively. This choice makes the fluid
particles much lighter than the chain beads. In physical units, the solvent particle mass
is very close to that of a water molecule, i.e., 18 amu, and the mass of each bead is very
close to an average mass of amino acid, i.e., 120 amu. Choosing relatively light solvent
particles influences the sampling efficiency of the simulations, and consequently the cost
of simulation runs.
The solvent and the chain interact as follows. The solvent particles can make bonds
with the chain beads i = 4k + 2, where k is a positive integer number, with the potential
depth of εl. The interaction range is the same as the hydrogen bonds between the chain
beads, in which the parameters for the square well σ1 and σ2 are chosen 4.64 A and 5.76 A,
respectively. Hence, the same beads that are involved in making attractive bonds inside
the protein-like chain are involved in making hydrogen bonds with the solvent particles.
Chapter 4. Protein-like Chain Inside a Solvent 77
Other chain beads have a hard sphere repulsive interaction with the solvent particles.
The hard sphere interaction range is set to a relatively large value of 6.4 A(1.54 σ) to
mimic the hydrophobicity of most amino acids. The main reason for choosing this value
is related to the possible number of bonds between a chain bead and the solvent particles.
Fixing the range of hydrogen bonds between chain beads and the solvent particles, the
range of hydrogen bonds between solvent particles and the hard core repulsion distances
were varied. It was found using this set of parameters especially by choosing this large
hard sphere repulsion distance, the number of bonds between a chain bead and the solvent
particles is limited to four bonds, which only happens at very low temperatures.
To simulate a system at a given density, the simulation occurs in a cubic box of size
L × L × L that contains N solvent particles and one protein-like chain. To minimize
finite-size effects, periodic boundary conditions are used. To avoid artifacts due to the
periodic boundaries, L should be chosen large enough to allow the protein-like chain
to be stretched without the last two end beads of the chain affecting each other either
directly or through solvent induced interaction. The maximum observed value for the
end-to-end vector in the previous study was used as the worst case scenario, and the
value for L was chosen to be comfortably larger. Because of the next-nearest neighbor
distance restriction, the maximum end-to-end distance can be determined analytically
from the model’s definition. The used values for L are roughly 10A larger than the
theoretical maximum end-to-end distance, which is itself substantially larger than the
observed end-to-end distance in the absence of a fluid. For example, for the 25-bead
chain the maximum observed value for the end-to-end vector is 64 A and theoretical
calculation shows the maximum possible value for the end-to-end vector is 76.8 A, while
the value for L is 88.0 A, which is 24 A larger than the maximum observed value in
the simulation runs and 11.2A larger than the theoretical maximum value. Following a
similar reasoning, for the ` =15, 20 and 25 beads chains, the values of L are set to 54.4 A
(13.08 σ), 72.0 A (17.31 σ) and 88.0 A (21.15 σ), respectively.
Chapter 4. Protein-like Chain Inside a Solvent 78
The reduced temperature is defined as T ∗ = kbT/ε, however another reduced tempera-
ture, T ∗l , is defined using the potential depth of the fluid particles square-well interactions
to make the comparison easier with earlier studies of the phase diagram of this type of
fluid. Hence, T ∗l is chosen to be T ∗
l = kbT/εl, where T ∗l = (ε/εl)T
∗ = (1.5/0.35)T ∗ '4.29T ∗. β∗ and β∗l are defined as the inverse functions of T ∗ and T ∗
l respectively. Note
that T ∗ = 1.0 corresponds to 2400K, while T ∗l = 1.0 corresponds to 560K and T ∗
l ' 0.5
is roughly room temperature.
Once the total volume of the simulation box has been determined, one can set the
number of particles N such that the solvent has the required density. The density of
the system is defined as ρ∗ = ρσ3, where ρ = NVl
, and Vl, the effective free volume that
fluid particles can occupy, is calculated as Vl = L3−Vexcl, where Vexcl is the approximate
excluded volume of the chain. To calculate the approximate excluded volume of the chain,
it is assumed that the protein-like chain lies completely straight and the distance between
two neighboring beads is 4.16 A, which is the mid point of vibrating distance of protein-
like beads. Then the volume of the cylinder around this chain, in which no other particle
can exist, is considered as the excluded volume. ρ∗ was chosen to be 0.5 and consequently,
N are 1066, 2522 and 4644 for the ` =15, 20 and 25 bead chains, respectively. As the
number of solvent particles required to avoid periodic boundary effects scales with the
third power of number of beads in the chain, exploring the energy landscape becomes
more challenging as the number of beads in the chain increases. Increasing the number
of beads makes the simulation runs more costly and the possible phase transition effects
in the studied system more apparent. Thus, the application of parallel tempering (PT)
becomes more challenging.
4.1.2 Definition of Configuration
Understanding how the configurations are defined is a necessary step to determine the
free energies of the configurations. In this study only intra-chain interactions are counted
Chapter 4. Protein-like Chain Inside a Solvent 79
to identify a configuration. Since there are additional interactions (solvent-chain and
solvent-solvent), a configuration does not have a unique energy within this model, in
contrast to the previous chapter in the absence of a solvent in which the energy of a
configuration was constant (Sec. 4.2.4). As was done in the previous chapter, a configu-
ration is represented by a string of alphabetical pairs. For example, BF represent a bond
between beads 2 and 6, and BF FJ JN represents the configuration with three bonds,
between beads 2 and 6, 6 and 10, and 10 and 14.
By identifying configurations without considering their bonds with solvent particles,
the free energies that will be found are averaged over the bond(s) with solvent particles,
in line with the ideas of Refs. [14] and [21]. The free energy values are further coarse-
grained in the sense that they are not a function of all the positions of the atoms in the
chain, but they are a function of the absence or presence of bonds.
4.1.3 Simulation Structure
As discussed in Chapter 2, the simulation is a combination of Discontinuous Molecu-
lar Dynamics (DMD) and the Parallel Tempering (PT) Method. The simulated system
consists of a number of replicated protein-like chains inside a solvent exploring the config-
urational space individually by the DMD method [36, 37]. All replicas evolve using DMD
for a fixed amount of time and then some of the replicas exchange their temperatures
according to the PT method, provided in Eq. 2.23. The simulation structure is very
similar to that used in the absence of an explicit solvent case discussed in the previous
chapter (Chapter 3). However, here the potential energy of the system depends not only
on the intra chain bonds but on the bonds between the chain and solvent particles as well
as the bonds between solvent particles. The velocities of all solvent particles and chain
beads of all replicas are drawn from the Maxwell-Boltzmann distribution both initially
and at the end of any replica exchange event. Since the velocities of all replicas are being
updated periodically using the Maxwell-Boltzmann distribution and the DMD dynamics
Chapter 4. Protein-like Chain Inside a Solvent 80
is reversible and preserves phase space volume, all necessary conditions for generating a
state with canonical distribution are satisfied [52].
Since studying the energy landscape of a protein-like chain in the presence of thou-
sands of particles is computationally demanding, we developed a parallel program using
the MPI (Message Passing Interface) technique [54]. The object-oriented setup of the
serial version of the code, used in the previous chapter for applying in the PT method, sig-
nificantly facilitates the implementation of the parallel version. In parallel programming,
the master processor is responsible for the measurement events as well as calculating the
replica exchange probabilities. The master node serves as a hub with which all the nodes
communicate. Each replica runs on one processor and the energy values of the replicas
are sent to the master processor at the replica exchange event, which determines whether
a temperature exchange should take place. The master node then sends each replica its
updated temperature, which can be the same as its temperature prior to the exchanging
attempt. Then, the velocities are drawn at each node (replica) using the updated tem-
perature, and each replica starts its DMD run again. The process of drawing velocities,
DMD dynamics, and PT exchange moves is repeated until enough independent statistics
on the frequency at which different configurations are seen (fobs, or, “population”) is
gathered. At specific times, all replicas also send their configuration matrices as well as
some other parameters to the master processor for storing or calculation.
4.2 Results
4.2.1 Parallel tempering efficiency
Efficient sampling of configurations at each temperature mainly depends on choosing
the number of replicas used, the temperature difference between successive temperature
∆β and the time between two consecutive replica exchange events in which each replica
follows DMD dynamics (PT update period). The choice of parameters for the PT sim-
Chapter 4. Protein-like Chain Inside a Solvent 81
ulations has a strong effect on the efficiency of dynamics. Choosing the most efficient
parallel tempering update period, when each replica system is being evolved prior to any
PT replica exchange event, plays a significant role in optimizing the computational cost.
Since decreasing the PT update period may cause the replicas to explore a smaller part of
the configurational space, there is an optimum value for the time between PT exchanges
for a fixed computational cost (fixed cpu hours) which has to be found by trial and error.
A key concept to assess the efficiency of a PT simulation is a PT period (or cycle), which
is the time for the replica to travel between the maximum and minimum temperatures
and back[69]. For efficient sampling, several cycles should be observed in one run. For a
fixed computational cost (i.e. run-time), it was found that there is a specific value of PT
interval time, that results in the maximum number of cycles. This value is quite different
for various lengths, `, of the chain under consideration. However, the number of inter-
action events that happen during each PT update period for 15, 20 and 25 beads chains
are similar. This provides us with a good guess for the optimum value of the PT update
period of the larger systems based on the results of smaller systems, and facilitates the
trial and error process.
In principle, increasing the number of replicas makes it possible to study any range of
temperature. However, it was found that when the PT system contains a large number of
replicas, some of the replicas may not move very well among the full range of temperatures
during one PT cycle. This can lead to a prohibitively inefficient PT dynamics. For
example, good dynamics was rarely observed in the system containing more than 200
replicas. In addition, it was found that the presence of a phase transition in the solvent
reduces the range of temperatures that can be studied.
As discussed in section 2.3.2, to have a proper PT dynamics, in most of cases ∆β
should vary with β. As an example, in Fig. 4.1, proper dynamics for the 15-bead chain
can be seen in which a range of temperatures between T ∗l = 0.76 and T ∗
l = 2.5 is
investigated by 95 replicas. For this case the inverse temperature difference ∆β∗l for
Chapter 4. Protein-like Chain Inside a Solvent 82
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350
β In
dex
PT Replica Exchange Event (/1000)
Figure 4.1: Proper temperature dynamics for one of 95 replicas for ` = 15.
the 10 replicas with the highest temperatures is 0.012 and in the next 60 replicas the
∆β∗l decreases linearly to 0.008 and then it remains constant. This means that ∆β∗ is
larger at higher temperatures, and it decreases when the temperature decreases. Plots
like the one in Fig. 4.1 are a helpful tool in checking for poor sampling. The example in
Fig. 4.2 shows what such a plot looks like for a poorly behaving PT simulation in which
a range of temperatures between T ∗l = 0.82 and T ∗
l = 2.5 is investigated by 79 replicas.
For this case, the PT update period is 2 ps, which is 2.5 times larger than the previous
case in Fig. 4.1. ∆β∗l for the highest 30 temperatures is 0.012, and then for the next 40
temperatures the ∆β∗l decreases linearly to 0.008 and then ∆β remains constant for the
rest of temperatures.
4.2.2 Phase of the solvent
One of the important aspects of this study is related to the phase of the solvent, since the
whole study is based on the presence of a fluid around the protein-like chain. Figure 4.2
shows an apparent barrier in the PT dynamics at a specific temperature at which replicas
Chapter 4. Protein-like Chain Inside a Solvent 83
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100 120 140
β In
dex
PT Replica Exchange Event (/1000)Figure 4.2: The effect of phase transition, which happens around β index 50, on the PT
dynamics for one of 79 replicas for ` = 15.
have a strong tendency to stay above, or below, that specific temperature, and rarely
cross it especially for larger systems. While by choosing proper ∆β∗l and efficient PT
update time, the effect of this barrier becomes very small in the PT dynamics (Fig. 4.1
compared to Fig.4.2), the barrier effect becomes very apparent almost with any set of
∆β∗l for the larger systems (i.e. for ` = 20 and ` = 25). It turns out that the apparent
barrier is related to the phase of the solvent.
The highest temperature in the simulations was T ∗l = 2.5 for all chain lengths, while
the lowest temperatures (T ∗l ) for different chains were 0.76, 1.05 and 1.22 for the 15-
bead, 20-bead and 25-bead system respectively. The square-well model for the solvent
has been studied extensively [60, 61, 62, 63, 64, 65, 66] and for the model used here
with ρ∗ = 0.5 and λ = σ′/σ = 1.5, the critical reduced temperature, T ∗c , for the solvent
is predicted to be 1.2172 [60], 1.210 (in Ornstein-Zernike approximation) [62], 1.3603
(using an analytical equation of state based on a perturbation theory) [62], 1.226 [63],
1.2180 [64], 1.27 [65] and 1.218 [66]. Most of the previous studies [60, 61, 62, 67], predict
Chapter 4. Protein-like Chain Inside a Solvent 84
a vapor-liquid coexistence line to be crossed somewhere between T ∗l = 1.0 and T ∗
l = 1.2
for ρ∗ = 0.5 and λ = 1.5. Since the model studied here contains a few thousand particles,
which is far from a real thermodynamic system with an order of 1023 molecules, finite
size effects may shift the apparent critical temperature.
As a first check to confirm the fluid-like character of the solvent model, the radial
distribution function (RDF) of the solvent was studied for four different temperatures.
These are plotted in Fig. 4.3 and Fig. 4.4, and show fluid behavior with no sign of any
phase transition. Due to the two discontinuities in the solvent interaction potential at σ
and σ′, respectively, the radial distribution function is relatively high between these two
points. For T ∗l = 2.0, the RDF graph 4.3(a) is very similar to what was found for this
model in the earlier studies (3rd graph in Fig. 2 in Ref. [63]). Also at this temperature,
fluid-like long range correlation can be seen in which the peaks are smaller than the
peaks at the lower temperatures. The RDF for T ∗l = 1.25, Fig. 4.3(b), and that for
T ∗l = 0.83, Fig. 4.4(a), look like those of a typical fluid with more distinct peaks than
the high temperature RDF. At relatively low temperatures, as in Fig. 4.4(b), the onset
of short range structural peaks may be showing itself in the first two peaks, while still
other peaks show a fluid-like behavior, but there is no clear sign of a phase transition.
RDFs are, however, not a very good indicator of a phase transition, especially for a
second order phase transition, such as between two fluid phases. Better indicators are
the heat capacity Cv and the compressibility κ, which are second derivatives of the free
energy. Cv can be measured from the fluctuations in energy, while κ can be estimated from
fluctuations in local density. For calculating the compressibility, the system is divided
into several boxes and the densities in each box and the standard deviation of the local
density are calculated. Numerical estimates for the heat capacity and compressibility are
plotted in Figs. 4.5 and 4.7. The range of studied temperatures is clearly sufficient to
observe the effects of a phase transition for smaller systems. This phase transition occurs
at a temperature that is very close to the temperatures at which other studies predict
Chapter 4. Protein-like Chain Inside a Solvent 85
(a)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 1 2 3 4 5 6
g(r)
r/σ(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 1 2 3 4 5 6
g(r)
r/σ
Figure 4.3: Radial Distribution Function: the (a) T ∗l = 2.0, and (b) T ∗
l = 1.25. Lines are
drawn to guide the eye.
Chapter 4. Protein-like Chain Inside a Solvent 86
(a)
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5 6
g(r)
r/σ(b)
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6
g(r)
r/σ
Figure 4.4: Radial Distribution Function: the (a) T ∗l = 0.83, and (b) T ∗
l = 0.31. Lines
are drawn to guide the eye.
Chapter 4. Protein-like Chain Inside a Solvent 87
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
CV
/ N
Tl*
2592 particles1500 particles1118 particles500 particles
Figure 4.5: Heat capacity per particle vs. the liquid reduced temperature for N = 500,
1118, 1500 and 2592 in the absence of the Protein-like chain.
the liquid-vapor coexistence line for this density.
Even when dividing the heat capacity by the number of particles in the system, as
in Fig. 4.5, the average heat capacity per solvent particle still increases with increasing
system size at the phase transition point. This suggests that for infinitely large systems,
the heat capacity diverges to infinity at the phase transition. To understand the order of
phase transition, a further study would be required which lies outside the scope of this
project.
While these results are for a pure solvent system, our studies revealed that there is no
major difference in the behavior of the heat capacity and the compressibility for systems
containing a protein-like chain. In Fig. 4.6, heat capacities of the systems with the same
size and densities with and without a chain are presented. As can be seen in this figure,
the heat capacity of a system containing the 15-bead chain (or the 20-bead chain) behaves
very similarly to the heat capacity of a system with the same size containing only solvent
particles.
Chapter 4. Protein-like Chain Inside a Solvent 88
0
2000
4000
6000
8000
10000
12000
14000
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
CV
Tl*
1118 particles15-bead chain system
2592 particles20-bead chain system
Figure 4.6: Comparison between systems with the same density (ρ = 0.5): The 15
bead-chain system contains 1066 solvent particles and the pure solvent system with 1118
particles both have the same box size (L=54.4 A). The 20-bead chain system contains
2522 solvent particles and the pure solvent system with 2592 particles both have the
same size (L=72.0 A)
Studying the variation of the compressibility versus volume is another common ap-
proach to investigate phase transitions. To calculate the compressibility, the simulation
box is divided into smaller boxes, where each box has a size of (L/6) × (L/6) × (L/6).
Then the densities in each box and the standard deviation of the density in the box are
calculated. Since each box can exchange particles with its neighboring boxes, each box
is in the grand canonical state and the compressibility of the system can be calculated
using the variation in the number of particles. The calculation details are presented in
appendix A. In Fig. 4.7, the variation of the compressibility vs. temperature shows a
similar behavior to that observed for the heat capacity. By increasing the system size,
the compressibility seems to diverge to infinity around the same point where the heat
capacity diverges. This confirms that there is a phase transition at this point.
Chapter 4. Protein-like Chain Inside a Solvent 89
0
2
4
6
8
10
12
14
16
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
Com
pres
sibi
lity
Tl*
1118 particles1500 particles2592 particles
Figure 4.7: Compressibility vs. the liquid reduced temperature for N = 1118, 1500 and
2592.
4.2.3 Observed structures and Free energy landscape
Simulations using PT and DMD were performed for three different chain lengths: ` = 15,
20 and 25, all in a liquid at density ρ∗ = 0.5. The ranges of temperature and number of
replicas differed in all these cases. The quantities of interest were the frequencies of oc-
currence (fobs) of each configuration (denoted in the alphabetic representation explained
in Sec. 3.1.1). The most frequently occurring configuration at any given temperature will
be called the dominant configuration if its population is clearly higher than the second
most common structure.
All errors reported below indicate the 95% confidence intervals, which is equal to
1.96 times the standard deviation for normally distributed errors. The term “energy”
here only refers to the potential energy, and the term “system energy” refers to the
potential energy of all the solvent particles and beads energies in a box containing a
specific configuration (all possible hydrogen bonds), while the configuration energy only
refers to the intra-chain bonds energy, and the term “bond” refers to a hydrogen bond
Chapter 4. Protein-like Chain Inside a Solvent 90
and not the repulsive interactions, unless otherwise specified.
The 15-bead chain
For ` = 15, 95 temperature values were selected for the replicas, such that ∆β∗l = 0.012
for the highest 10 temperatures and then for the next 60 temperatures, the ∆β∗l linearly
decreases to become 0.008 at the 70th highest temperature and then ∆β value remains
constant for the the rest of temperatures, while the range of studied temperatures is
T ∗l = [0.76 , 2.5] (β∗ = [1.7 , 5.64]). The most efficient PT update period was found to be
0.8 picoseconds, while 1.2 picoseconds also generated very efficient PT dynamics. These
values are smaller than the PT update period used in the absence of a solvent, which
was two picoseconds. As mentioned above, the number of solvent particles appropriate
for this case to generate the density of 0.5 is N = 1066, while the length of the sides of
the periodic box are set at L = 54.4 A.
In Table 4.1 the results for the dominant configurations at different temperatures are
presented for the system in the presence and in the absence of any solvent(Chapter 3).
One sees that at low enough temperature one structure (BF FJ JN) becomes clearly
dominant as its probability exceeds 60%.
In Table 4.2, the populations and the average system energies of the most common
structures are provided for T ∗l = 0.816 (β∗ = 5.25), which is a relatively low temperature.
The configuration with the lowest configurational energy is observed to be also the one
with the lowest total system energy. One also sees that the next three common structures,
configurations 2, 3 and 4 in Table 4.2, are all very close in their frequencies of occurrence,
fobs, as well as in their average system energy. Here the energy landscape (the free energy
as a function of chain conformation) consists of a relatively deep point at BF FJ JN, and
three minima beside this point located inside a wide funnel. The depth in the landscape
associated with each configuration is proportional to its free energy and the relative
distance between any pair of configuration is proportional to the similarities of the two
Chapter 4. Protein-like Chain Inside a Solvent 91
configurations. Therefore, the next three deepest minima (configurations 2, 3 and 4)
are very close to the deepest point since the configurations differ by only one bond from
the first configuration, and the free energy barriers between these configurations seem to
be small. Since the last three configurations in Table 4.2 also differ by only two bonds
from the first configuration, their locations in the landscape should be further from the
deepest point in the way that the configurations 2,3 and 4 should be located between the
deepest point and these configurations. A rough picture of this landscape is presented
in Fig. 4.8 in which the distances between the structures are based on their similarities
and the area differences are related to differences in their computed entropy (calculated
in the absence of the solvent). As can be seen, the lowest free energy structure at low
temperatures, BF FJ JN, is located in the middle, and the other structures based on
their similarities to the BF FJ JN, are located around this point. For example, BF FJ
is located between the deepest point (BF FJ JN) and BF and FJ. This diagram gives
some idea about the folding pathways. For example, to reach the lowest energy structure
with three bonds from the structure with no bonds, initially, the first bond and then the
second bond should be made.
According to Table 4.2, the lowest energy configuration, BF FJ JN, is associated
with the lowest system energy as well. However, since the uncertainty in the computed
energy of the system of the 7th structure in Table 4.2 is relatively large, this can not
be completely verified by the values provided in the table. However, there are good
arguments for why the first configuration should have the lowest total energy. The
first configuration is the most populated one for all the 66 temperatures that lie in
T ∗l = [0.76 , 2.0], so it is the lowest free energy system at these temperatures. As discussed
in the previous chapter, by adding more bonds and consequently adding more geometrical
restrictions, the configurational entropy decreases and therefore, BF FJ JN has the lowest
configurational entropy among 15-bead configurations. Under the assumption that the
average entropy contributions from the solvent particles for different configurations are
Chapter 4. Protein-like Chain Inside a Solvent 92
BF FJ JN
BF FJ
FJ JN BF JN
BFFJ
JN
No Bond
No Bond
Figure 4.8: A rough picture of the 15-bead chain landscape.
very similar, it can be concluded that BF FJ JN system should have the lowest energy for
T ∗l = [0.76 , 2.0]. It is expected that for the short chains, where the chains do not collapse,
the energy difference between two systems mainly depends on their configurational energy
difference and the average energy contribution from the solvent particles will be the same
for different systems energies. For example, at β∗ = 5.25 (T ∗l = 0.816) BF FJ JN and
BF JN, the two most common configurations of Table 4.2, have on average 0.1 ± 0.02
and 0.19 ± 0.3 bonds with solvent particles, respectively. Hence, the contribution to
the energy difference of their systems from the bonds between solvent particles and the
chain beads is around 0.02 ε, while their configurational energy difference is 1ε. The
average number of bonds that each solvent particle makes with other solvent particles at
β∗ = 5.25 (T ∗l = 0.816) is around 5.1.
If, as is plausible, “BF FJ JN” has the lowest system energy, one may expect that
the population of this configuration will approach 100% at lower temperatures where the
Chapter 4. Protein-like Chain Inside a Solvent 93
β∗ Inside Solvent fobs(%) Without Solvent fobs(%)
1.8 No bond 18.2 ± 0.8 No bond 41.3 ± 1.5
2.4 No bond 18.9 ±0.6 No bond 30.15 ± 1.6
3.0 No bond 11.3 ± 0.8 No bond 19.7 ± 1.3
3.6 BF FJ JN 24.0 ± 0.9 BF JN 17.1 ± 1.2
4.2 BF FJ JN 37.7 ± 0.8 BF FJ JN 26.7 ± 1.5
4.5 BF FJ JN 43.2 ± 0.9 BF FJ JN 35.0 ± 1.5
4.8 BF FJ JN 53.3 ± 0.8 BF FJ JN 44.1 ± 1.6
5.1 BF FJ JN 60.5 ± 1.2 BF FJ JN 55.1± 1.8
5.4 BF FJ JN 67.3 ± 0.9 BF FJ JN 61.5 ± 1.6
9 NA NA BF FJ JN 98.6 ± 0.4
Table 4.1: Most common configurations of the 15-bead chain for different temperatures,
with and without the solvent.
Rank Configuration fobs(%) Average System Energy Chain Energy
1 BF FJ JN 64.9 ± 0.9 -1273.3 ± 0.3 -3
2 BF JN 10.9 ± 0.5 -1271.7 ± 0.7 -2
3 FJ JN 9.6 ± 0.5 -1272.0 ± 0.7 -2
4 BF FJ 8.9 ± 0.5 -1271.8 ± 0.7 -2
5 JN 1.5 ± 0.1 -1271.9± 1.8 -1
6 FJ 1.5 ± 0.1 -1272.1 ± 1.9 -1
7 BF 1.3 ± 0.1 -1272.6 ± 1.8 -1
Table 4.2: Most common configurations of the 15-bead chain with the solvent environ-
ment at T ∗l = 0.816(β∗ = 5.25).
Chapter 4. Protein-like Chain Inside a Solvent 94
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
Mos
t Com
mon
Str
uctu
re P
roba
bilit
y
β*
s15 explicit solvents15 without solvent
Figure 4.9: Variation of the probabilities of the most common structure versus the β∗ for
the 15-bead chain with and without the solvent.
free energy mainly depends on the energy of the system and little on the system entropy.
This expected trend of the population is very similar to the previous chapter results for
the chains smaller than 30 beads, where one structure becomes completely dominant at
low temperatures. The population trends of the 15-bead chain dominant structure both
inside and in the absence of the solvent are compared in Fig. 4.9, where in both cases
(with and without a solvent) the probability of the dominant structure reaches a high
value. This behavior happens at higher temperatures inside the solvent, which suggests
that the hydrophobic effects of 75% of the chain beads assist the folding process and
make the helical structures more favorable. In all the studied temperatures, the average
radius of gyration for the 15-bead chain inside the solvent is smaller than in the absence
of the solvent case, which shows that the solvent is a poor solvent.
Chapter 4. Protein-like Chain Inside a Solvent 95
The 20-bead chain
Generally, it is harder to study a wide range of temperatures for large systems due to
the need for using smaller ∆β and consequently, a higher number of replicas. However,
as was discussed in section 4.2.1, for this case, the study of a wide range of temperatures
becomes extremely hard due to the effect of the phase transition on the PT dynamics.
Consequently, for ` = 20, the range of studied temperatures, from T ∗l = 1.05 to 2.5 (β∗ =
[1.7 , 4.1]), is smaller than ` = 15. To study the energy landscape, 79 temperature values
were selected for the replicas such that ∆β∗l = 0.008 for the 40 highest temperatures and
then ∆β∗l decreases to 0.006 and remains constant for the the rest of temperatures. For
` = 20, the number of appropriate solvent particles to generate a density of 0.5 for the
system with the periodic box of L = 72 A is N = 2522. The most efficient PT update
period was found to be 320 femtoseconds, which is smaller than the value of the 15-
bead case. While it was found that 320 femtoseconds is the most efficient value for the
PT update period of the 20-bead chain system, it seems that there is a range, 320-360
femtoseconds, that yield similar efficiencies. As mentioned around T ∗l = 1.1, the heat
capacity and compressibility of the fluid show signs of a phase transition, in line with
other studies that predict the location of the vapor-liquid coexistence line. Because of
the phase transition, sampling a wider range of temperatures resulted in poor dynamics
even when using small values for ∆β∗.
The dominant configurations at different temperatures in the presence and in the
absence of the solvent are presented in Table 4.3. Since very low temperatures could not
be reached, it was not possible to study whether the probability of the most common
structure approaches one at lower temperatures. Table 4.4 shows that at β∗ = 3.9, BF
FJ JN NR is the most common structure while BF BR FJ JN NR, which has the lowest
configurational energy, has a smaller population. BF FJ JN NR, is a complete helical
structure that has all the necessary helical bonds between every two consecutive turns
of the protein-like chain; while BF BR FJ JN NR, the lowest energy configuration, is
Chapter 4. Protein-like Chain Inside a Solvent 96
a collapsed helical structure with a bond BR that connects the two ends of the chain.
Hence, BF FJ JN NR, an unfolded but complete helical structure, has a much higher
entropy than the lowest energy configuration, while their energies are close since they
only differ by one internal bond. Furthermore, the complete helical structure can make
more bonds with the solvent particles because of its non-collapsed shape: 17% of the
population of the complete helical structure make bonds with the solvent particles at
β∗ = 3.9 (T ∗l = 1.1), while only 4% of the lowest energy configuration population make
bonds with the solvent particles. The number of bonds with the solvent particles also
shows that in this model the structures are not soluble, and the energy contribution from
bonds between the chain beads and the solvent particles to the system energy is relatively
small.
According to Table 4.4, the energies of the most common structures are very close to
each other. It is therefore hard to predict whether there is a dominant structure at lower
temperatures as seen in the previous Chapter for the 20-bead chain in the absence of a
solvent. While a complete helical structure can make more bonds with solvent particles
because of its structure in comparison with the lowest energy configuration, the average
number of bonds with the solvent particles is still less than one. It is expected that the
average energy contribution from the bonds between solvent particles becomes very simi-
lar for different configurations. However, it would require much better sampling statistics
than what was obtained to check this prediction. Since the energy of each intra-chain
bond is equivalent to 4.29 bead-solvent bonds, it is expected that the system containing
the lowest energy configuration should be the lowest energy system. This means that the
lowest energy configuration should become the most common structure at lower tempera-
tures. The reason that the lowest energy configuration does not become dominant at the
studied temperatures is that the BR bond greatly restricts the configurational freedom
and therefore, the non-collapsed helical structure, having one less bond but with much
larger entropy, becomes the most common structure.
Chapter 4. Protein-like Chain Inside a Solvent 97
The population of the lowest energy configuration with the largest number of bonds
becomes almost equal to that of the structure with no potential energy (no bond) at
β∗ = 3.27, while this happens at lower temperature, β∗ = 4.05, in the absence of a solvent.
At β∗ = 3.27 in the solvent environment, only 15% of the lowest energy configurations
make bonds with the solvent particles, while 89% of the “ no bond” structures make bonds
with the solvent particles. By assuming that the energy contribution from bonds between
solvent particles is almost the same for these two systems, it can be concluded that the
average system energy difference in this case is likely less than 5ε. The populations of two
structures become equal when their free energy difference is around zero. Therefore, the
entropy difference of the no bond structure and the lowest energy configuration, ∆S, can
be calculated which almost represents the maximum configurational entropy difference.
According to this calculation, in the absence of the solvent ∆S ' 20.25kb and in the
solvent environment ∆S ≤ 16.35kb. Hence, having hydrophobic chains in this model
results in a smaller entropy range, which indicates that in comparison to the absence
of a solvent, the probability of the dominant configuration may approach one at higher
temperatures. This behavior can be seen clearly for the 15-bead chain in Fig. 4.9.
The 25-bead chain
The 25-bead system includes 4644 solvent particles, which is nearly twice the number of
solvent particles in the 20-bead system, in a box of L = 88 A. According to Fig. 4.5, the
temperature at which the phase transition behavior is observed increases slightly with
increasing N . Therefore, the range of temperatures that could be investigated for the
25-bead chain system is even smaller than the 20-bead case. A set of temperatures with
95 replicas was chosen, such that for the 20 highest temperatures ∆β = 0.006, and for
the rest of temperatures ∆β = 0.004, while the range of studied temperatures is T ∗l =
[1.22 , 2.5] (β∗ = [1.7 , 3.5]). The most efficient PT update period is 120 femtoseconds,
which is even smaller than the 20-bead case. This confirms that by increasing the size
Chapter 4. Protein-like Chain Inside a Solvent 98
β∗ Inside Solvent fobs(%) Absence of Solvent fobs(%)
1.8 No bond 7.2± 0.5 No bond 28.3 ± 1.6
2.4 No bond 7.8± 0.5 No bond 21.5 ± 1.3
3.0 No bond 4.6± 0.4 No bond 11.6 ± 1.1
3.3 BF FJ JN NR 6.5 ± 0.5 BF 7.3 ± 0.9
3.6 BF FJ JN NR 10.3 ± 0.6 BF NR 7.6 ± 0.8
3.9 BF FJ JN NR 12.3 ± 0.6 BF JN NR 9.8 ± 0.8
4.5 N/A N/A BF FJ JN NR 16.3 ± 1.3
6.0 N/A N/A BF BR FJ JN NR 47.7 ± 1.6
10.5 N/A N/A BF BR FJ JN NR 99.1 ± 0.3
Table 4.3: Most common configurations of the 20-bead chain inside and in the absence
of the solvent.
Rank Configuration fobs(%) Average Total Energy Chain Energy
1 BF FJ JN NR 12.3±0.6 -2482.6 ± 1.4 -4
2 BF FJ NR 8.5 ± 0.4 -2484.4 ± 1.6 -3
3 BF JN NR 8.2 ± 0.4 -2483.1 ± 1.6 -3
4 BF FJ JN 7.3 ±0.5 -2482.6 ± 1.8 -3
5 FJ JN NR 7.3 ± 0.5 -2482.7 ± 1.8 -3
6 BF BR FJ JN NR 5.3 ± 0.4 -2483.6 ± 2.2 -5
7 BF JN 4.6 ± 0.4 -2484.1 ± 2.2 -2
Table 4.4: Most common configurations of the 20-bead chain inside the solvent at β∗ = 3.9
(T ∗l = 1.1).
Chapter 4. Protein-like Chain Inside a Solvent 99
β∗ Inside Solvent fobs(%) Without Solvent fobs(%)
1.8 No bond 2.7±0.3 No bond 21.7±1.3
2.4 No bond 3.3±0.3 No bond 15.5±1.0
3.0 NR 1.3±0.2 No bond 6.7±0.9
3.3 BF BR BV FJ FV JN NR RV 2.6±0.2 No bond 4.3±0.6
4.5 N/A N/A BF BR BV FJ FV JN NR RV 7.5±1.0
9.0 N/A N/A BF BR BV FJ FV JN NR RV 98.0±0.4
Table 4.5: Most common configurations of the 25-bead chain inside and in the absence
of the solvent.
of system and increasing the number of particles, the most efficient PT update period
decreases.
According to Table 4.5, the structure with the lowest configurational energy becomes
dominant at higher temperatures in comparison to the previous chapter study in the ab-
sence of a solvent. However, the range of studied temperature is not sufficient to observe
a very deep funnel in the free energy landscape at low temperatures that was observed
in the absence of a solvent. The most common structures of the 25-bead chain inside a
solvent at β∗ = 3.3 are presented in Table 4.6. While the populations of configurations
3-10 are equal within statistical error, the population of the first configuration (with 8
bonds) is clearly higher than that of the other configurations. Our study reveals that
the first configuration with the most number of bonds is clearly the most populated con-
figuration for T ∗l ≤ 1.32 (β∗ ≥ 3.24). This means that for all the temperatures in the
range 1.22 ≤ T ∗l ≤ 1.32, the structure with the lowest configurational energy is the most
common structure. Since the configurational entropy decreases by increasing the number
of bonds (because of adding more restrictions), the first configuration should have the
lowest configurational entropy. Since the first configuration has been the most common
structure at the lowest studied temperature, the first configuration system should be
Chapter 4. Protein-like Chain Inside a Solvent 100
Rank Configuration fobs(%) Average Total Energy Chain Energy
1 BF BR BV FJ FV JN NR RV 3.4 ± 0.3 -4219.9 ±1.9 -8
2 BF FJ JN NR RV 2.0 ± 0.3 -4217.7 ± 2.5 -5
3 FJ JN NR RV 1.5 ± 0.2 -4216.0 ± 2.7 -4
4 BF FJ JN NR 1.5 ± 0.3 -4219.4 ± 2.9 -4
5 BF FJ JN RV 1.5 ± 0.3 -4215.8 ± 3.0 -4
6 BF BR BV FJ JN NR RV 1.3 ± 0.2 -4218.4 ± 2.9 -7
7 BF FJ NR RV 1.3 ± 0.2 -4215.2 ± 3.0 -4
8 BF JN NR 1.3 ± 0.1 -4213.0 ± 3.7 -3
9 JN NR RV 1.2 ± 0.1 -4216.3 ± 2.9 -3
10 FJ JN RV 1.2 ± 0.1 -4213.1 ± 3.1 -3
Table 4.6: Most common configurations of the 25-bead chain inside the solvent at β∗ = 3.3
(T ∗l = 1.30).
the lowest system energy at these temperatures. It is expected that by decreasing the
temperature, the order of system energies does not change dramatically and therefore,
when decreasing the temperature, the first configuration system likely remains the one
with the lowest energy and therefore, the population of this structure should approach
one at low temperatures, similar to the results of the previous chapter.
A similar reasoning for the 20-bead chain leads to the prediction of a large config-
urational entropy difference between the lowest energy configuration with a completely
collapsed shape (first configuration of Table 4.6) and the complete helical structure (sec-
ond configuration of Table 4.6). Therefore, one anticipates that the non-collapsed helical
structure has the highest population for the limited range of temperatures that was
studied in the simulations. However, because of the the energy difference of 3ε, the first
configuration becomes dominant, even at not very low temperatures. This is unlike the
20-bead chain, for which the complete helical structure (1st configuration of Table 4.4) is
Chapter 4. Protein-like Chain Inside a Solvent 101
-4
-2
0
2
4
6
8
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6
Ent
ropy
Dif
fere
nce
Tl*
Figure 4.10: The system entropy difference for two 15-bead chains, ∆S = SBF−FJ −SBF−JN , versus the liquid reduced temperature in the solvent environment.
dominant at similar temperatures. Consequently, the probability of the collapsed helical
structure of the 25-bead chain approaches one at higher temperatures as it does in the
absence of a solvent.
4.2.4 Relative configurational entropy
Unlike the previous study of a protein-like chain with no solvent, where the relative
configurational entropy did not depend on temperature, the relative configurational en-
tropy of the solvated polymer system depends on temperature due to the way that the
configuration is defined here which only depends on the chain beads positions and not
on the solvent particles. Even if it is assumed that the relative configurational entropy
does not depend strongly on temperature, there are several obstacles to calculate the
configurational entropy of each structure. In the absence of a solvent, the configura-
tional entropy difference of two structures at a specific temperature in units of kb was
calculated using ln(f1
f2) + β(U1 − U2), where f1 and f2 are the populations and U1 and
Chapter 4. Protein-like Chain Inside a Solvent 102
U2 are the average system energies of the two structures at the inverse temperature β.
With no solvent present, U1 and U2 depend only on the number of intra-chain bonds, and
hence are constant for a fixed pair of configurations. In the presence of solvent particles,
there are relatively large energy fluctuations in the system energy because of the large
number of particles and the limited number of samples. The standard deviation of the
average energy of the solvent particles is calculated according to σmean = σ/√
N , where
σ is the standard deviation of the distribution of energy of the solvent particles and N
is the number of samples. It is clear that by increasing N the standard deviation of the
average energy decreases and the statistical uncertainty scales as the square root of the
number of samples. Even if the statistical uncertainties in the estimate of the energy
were improved by better sampling, the calculated value would not be the relative config-
urational entropy of the two structures, but rather, the average relative entropy of the
two systems, including the solvent particles. As with the average energy, there are large
fluctuations of the entropy of the system which scale with the total number of particles
in the system, and are particularly significant around the phase transition temperature
of the solvent. Therefore, inside the solvent, it is extremely difficult to calculate the con-
figurational entropies of the protein-like chains accurately using their populations. For
example, while in the absence of a solvent the relative entropy for “BF FJ” and “BF
JN” is 0.37± 0.21kb, where “BF JN” has a higher configurational entropy, the statistical
uncertainty in computed relative system entropies in the solvent is too large, as can be
seen in Fig 4.10.
Chapter 5
Simple Dynamics Using
Smoluchowski Equation
In this Chapter a simple model of the dynamics of a protein-like chain is introduced
and the equilibrium folding dynamics is analyzed as a function of temperature. The
model consists of a single protein-like chain in which the monomers interact via the
discontinuous potentials introduced in Chapter 3 (model B) immersed in a solvent of
particles that interact with the monomers via hard core collisions at short distances. It is
assumed that the solvent particles interact on a time scale that is fast compared to the
time scale of structural rearrangements between conformations. In this limit, the motion
of the monomers is governed by the Smoluchowski equation with a configurationally
independent diffusion coefficient. We demonstrate that there are important qualitative
differences in the folding dynamics as the length of the chain increases.
5.1 Model
The discrete nature of the interactions allows configurational space to be partitioned into
microstates by defining an index function for a configuration c that depends on the set
103
Chapter 5. Simple Dynamics Using Smoluchowski Equation 104
of spatial coordinates of the chain R
χc(R) =
1 if only bonds in c are present,
0 otherwise.
The partitioning of configurational space arises naturally by expanding the product in
the identity
1 =
nb∏i=1
(1−H(xi − xc) + H(xi − xc)) =
nb∏i=1
(Hb(xi − xc) + H(xi − xc)
)
=ns∑
k=1
χck(R), (5.1)
where nb is the number of attractive bonds in the model, ns = 2nb is the number of
microstates, Hb(x) = 1−H(x) = H(−x), and H(x) is the Heaviside function
H(x) =
1 x ≥ 0
0 otherwise.
In Eq. 5.1, xi is the distance between monomers in the ith bond, and xc (known as σ2 =
5.76 A in the previous chapters) is the critical distance at which a bond is formed. For
notational simplicity, we order the index of configurations based on the number of bonds
starting with the configuration with no bonds, χ1(rN) =
∏nb
i=1 H(xi − xc), and ending
with the configuration with the maximum number of bonds, χns(rN) =
∏nb
i=1 H(xc−xi).
As was seen in Chapter 3, the configurational space can be unambiguously partitioned
into microstates whose equilibrium populations can be estimated. Therefore, one can
also estimate the cumulative distribution functions, probability densities, and potential
of mean force associated with the formation of a bond. For example, for the 25-bead
chain, the configuration c2 = BFBR can be formed from the configuration c1 = BF by
the formation of the BR bond, which occurs when the distance xBR = |rB − rR| is less
than the critical bond formation distance xc. One can define a probability density ρc1(x)
Chapter 5. Simple Dynamics Using Smoluchowski Equation 105
and the cumulative distribution Cc1(x) =∫ x
0dy ρc1(y) in terms of canonical ensemble
averages restricted over microstates as
ρc1(x) = 〈δ(x− xBR)〉c1 , (5.2)
where the notation 〈B(R)〉c1 denotes the normalized uniform average
〈B(R)〉c1 =
∫dR χc1(R) B(R)∫
dR χc1(R).
One simple way to estimate ρc1(x) is to construct histograms of the distance xBR from
Monte Carlo simulations. Since the probability density and the cumulative distribution
function are independent of temperature, the distance xBR from any instantaneous con-
figuration that satisfies the bonding criteria for configuration c1 can be used. A more
appealing way of constructing analytical fits to the densities and cumulative distribu-
tion functions is to use a procedure that constructs these quantities from sampled data
using statistical fitting criteria[70]. The temperature-dependent potential of mean force
φc1c2(x) connecting states c1 and c2 can be computed from the probability densities ρc1(x)
and ρc2(x) by first considering the cumulative distribution function connecting the two
states,
C1→2(x) =
e∆S∗eβε
1+e∆S∗eβε Cc2(x) x < xc
11+e∆S∗eβε Cc1(x) + e∆S∗eβε
1+e∆S∗eβε x ≥ xc
, (5.3)
where ∆Uc1c2 = −ε and ∆S∗ is the relative entropy difference of the configurations
divided by the Boltzmann constant, kb. Noting that ρc(x) = dCc(x)/dx, we find that
ρ1→2(x) =
e∆S∗eβε
1+e∆S∗eβε ρc2(x) x < xc
11+e∆S∗eβε ρc1(x) x ≥ xc
, (5.4)
which is discontinuous due to the nature of the potential at x = xc. The potential of mean
force, φ1→2(x) = − ln ρ1→2(x), describes the reversible work associated with pulling the
system from configuration c1 to c2 by reducing the distance x = xBR between monomers B
Chapter 5. Simple Dynamics Using Smoluchowski Equation 106
1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4NR bond distance
-2
-1
0
1
2
3
4
Pote
ntia
l Mea
n Fo
rce
(di
men
sion
less
)
β* = 1
β* = 2
β* = 3
Figure 5.1: The potential of mean force in dimensionless units as a function of the critical
bond distance, where each distance unit is equivalent of 3.2 A.
and R. Due to the simplicity of the interaction potential, the potential of mean force can
be computed at any inverse temperature β from the temperature independent densities
ρc1 and ρc2 . Note that the potential of mean force φ1→2(x) includes an effective volume
factor of the form −2/β ln x that arises from the conversion of Cartesian coordinates
into spherical polar coordinates. This volume factor leads to a temperature independent
entropic barrier for the formation of a bond. In Fig. 5.1, the potential of mean force is
plotted as a function of the NR distance for configurations c1 = BF BR BV FJ JN RV
and c2 = BF BR BV FJ JN NR RV for a range of different inverse temperatures. Note
the entropic barrier between the non-bonded state with x > xc = 1.8 (5.76A) and bonded
states with x < xc. We shall see in the next section that the densities ρi and cumulative
distributions Ci play an important role in determining the rate constants relating the
rate of population transfer between microstates.
Chapter 5. Simple Dynamics Using Smoluchowski Equation 107
5.2 Smoluchowski dynamics
We assume that the monomer-solvent interactions lead to a fast decay of correlations
of the monomer position and momentum, so that the dynamics of dynamical variables
B(rN , t) is governed by the Smoluchowski equation
B(rN , t) = L†B(rN , t), (5.5)
with formal solution B(rN , t) = exp{L†t}B(rN , 0), where the Smoluchowski operator is
given by
L† = D(∇rN − β∇rN U) · ∇rN , (5.6)
where U is the potential of mean force describing monomer-monomer interactions aver-
aged over the solvent bath. In Eq. 5.6, we have defined an inner product over monomer
vector positions as rN · rN =∑N
i=1 ri · ri, where ri is the position vector of monomer i.
The dynamics of the monomers can be justified from first principles by considering
the full dynamics of the system and applying projection operator methods[71]. To arrive
at Eq. 5.6, we have assumed that there is a clear separation of time scale between the time
scale for decay of correlations involving functions of the bath and correlation functions
of the monomer positions. We will also concern ourselves with long time dynamics in
the “diffusive” regime t À m/(βΓ), where Γ is the effective friction a monomer feels due
to the solvent bath. Implicitly, it has been assumed that the generalized friction coeffi-
cient matrix Γij(rN) describing the effective friction on monomer i due to hydrodynamic
interactions arising from monomer j is diagonal so that Γij(rN) ∼ δijI Γ(rN). This ap-
proximation amounts to assuming the hydrodynamic interactions are instantaneous[72],
and leads to a simple relation between the friction and the diffusion coefficient appearing
in Eq. 5.6
D(rN) =(kBT )2
Γ(rN). (5.7)
Chapter 5. Simple Dynamics Using Smoluchowski Equation 108
We furthermore simplify this analysis by assuming the diffusion coefficient for each
monomer is a constant independent of the configuration of the polymer. This assumption
is particularly drastic, since the interactions of the monomers with the solvent bath will
depend strongly on the configuration of the polymer in the vicinity of a given monomer
and thereby influence the friction on the monomer. It is expected that such effects are
minimized in an idealized solvent in which the bath particles interact with the monomers
on very short length scales so that local monomer shielding is negligible.
If the transitions between microstates are slow compared to the diffusive time scale
of the motion of the monomers, the dynamics of populations c(t) = {c1(t), . . . , cn(t)} of
the microstates is well-represented by a simple Markov model:
c(t) = K · c(t), (5.8)
where K is an ns × ns matrix of transition rates connecting the ns microstates. In the
next section, simple means of computing the rate constants composing the matrix K is
outlined.
5.2.1 First passage time approach to rate constants
The general problem of diffusive barrier crossing in asymmetric double well potentials
can be addressed by considering the barrier crossing as a two step reaction[73, 74]
A
k1
k−1
C
k2
k−2
B (5.9)
where state C is defined to be the region near x = xc. Writing first-order kinetic equations
for the two step reaction and assuming the steady state approximation dC/dt = 0 to
Chapter 5. Simple Dynamics Using Smoluchowski Equation 109
eliminate two of the rate constants, we find the effective rate equations
dA
dt= −kfA + krB (5.10)
dB
dt= −krB + kfA, (5.11)
where
k−1f = k−1
1 + k−1−2
ZA
ZB
k−1r = k−1
−2 + k−11
ZB
ZA
, (5.12)
where ZA and ZB are the equilibrium populations of A and B. To obtain Eq. 5.12, we have
used the detailed balance condition, kf/kr = ZB/ZA = (k1k2)/(k−1k−2). In equilibrium,
the relative populations of A and B are ZA/(ZA + ZB) and ZB/(ZA + ZB), respectively
and the relaxation for a system initially in state B obeys
NB(t) =ZB
ZA + ZB
+
(1− ZB
ZA + ZB
)e−(kf+kr)t, (5.13)
with characteristic relaxation time (kf + kr)−1.
The rate constants k1 and k−2 can be approximated by computing the first passage
time out of the stable wells to an absorbing state at x = xc. To compute k1, we assume
the particle starts at some position x = a in the A well and the probability P (x, t|a) of
finding the particle at position x at time t given that there is an absorbing trap at x = xc
is governed by
∂P (x, t|a)
∂t= D
∂
∂x
(∂
∂x+ βφ′(x)
)P (x, t|a). (5.14)
The absorbing boundary condition requires P (xc, t|a) = 0. From the definitions above,
the survival probability of the particle is given by Ps(t|a) = 1 − ∫ xc
0dxP (x, t|a), so the
absorption rate at xc is
f(t|a) =dPs(t|a)
dt= −
∫ xc
0
dx∂P (x, t|a)
∂t=
∫ xc
0
dx∂J(x, t|a)
∂x= J(xc, t|a), (5.15)
Chapter 5. Simple Dynamics Using Smoluchowski Equation 110
where J(x, t|a) is the flux J(x, t|a) = DdP (x, t|a)/dx + βDφ′(x)P (x, t|a) which satisfies
J(0, t|a) = 0 for a reflecting boundary at x = 0.
We assume that k−11 = 〈τfp(a)〉, where τfp(a) is the first passage time averaged over
the density f(t) given that the particle started at x = a, and 〈τfp(a)〉 is given by
〈τfp(a)〉 = Z−1A
∫ xc
0
dx e−βφ(x)τfp(x) = Z−1A
∫ ∞
0
dt
∫ xc
0
dx e−βφ(x)f(t|x)t. (5.16)
Integrating by parts and using the fact that tP (x, t|a) vanishes at t = 0 and t = ∞, we
find that
τfp(a) =
∫ xc
0
dx
∫ ∞
0
dt P (x, t|a) =
∫ xc
0
dx P (x|a), (5.17)
where P (x|a) is the time integral of P (x, t|a). Using Eq. 5.14, we find that P (x) obeys
the equation
∫ ∞
0
dt∂P (x, t|a)
∂t= −P (x, 0|a) = −δ(x− a) = D
∂
∂x
(∂
∂x+ βφ′(x)
)P (x|a). (5.18)
Integrating this equation from 0 to y yields
−H(y − a) = De−βφ(y) d
dy
(eβφ(y)P (y|a)
), (5.19)
where we have used the fact that the flux is zero at the origin due to the reflecting
boundary. Multiplying both sides of the equation above by exp{βφ(y)} and integrating
from x to xc yields the solution
P (x|a) = D−1e−βφ(x)
∫ xc
x
dy H(y − a)eβφ(y), (5.20)
where we have used the fact that P (xc|a) = 0. Inserting this expression into Eq. 5.17,
we find
τ(a) = D−1
∫ xc
0
dy
∫ y
0
dxH(y − a)eβφ(y)e−βφ(x)
= D−1
∫ xc
a
dy eβφ(y)
∫ y
0
dx e−βφ(x) = D−1
∫ xc
a
dycA(y)
ρA(y), (5.21)
Chapter 5. Simple Dynamics Using Smoluchowski Equation 111
so that
k−11 = 〈τ(a)〉 = D−1
∫ xc
0
dx ρA(x)
∫ xc
x
dycA(y)
ρA(y)
= D−1
∫ xc
0
dycA(y)
ρA(y)
∫ y
0
dx ρA(x) = D−1
∫ xc
0
dycA(y)2
ρA(y). (5.22)
Following a similar procedure for k−1−2, we obtain
k−1−2 = D−1
∫ ∞
xc
dy(1− cB(y))2
ρB(y), (5.23)
leading to the following expressions for the forward and reverse rate constants:
k−1f = D−1
∫ xc
0
dycA(y)2
ρA(y)+ D−1
∫ ∞
xc
dy(1− cB(y))2
ρB(y)
ZA
ZB
k−1r = D−1
∫ ∞
xc
dy(1− cB(y))2
ρB(y)+
ZB
ZA
D−1
∫ xc
0
dycA(y)2
ρA(y). (5.24)
It can be shown that these expressions coincide with the equations that can be found
through the spectral decomposition of a projected time evolution operator governing
the time dependence of a correlation function whose long-time limit yields expressions
for the rate constants. However, the above equations were obtained from much simpler
arguments.
5.2.2 Numerical test of microscopic rate expressions
A direct and simple way to verify that an adequate separation of time scale holds between
the time scale of relaxation within a well and the time scale of structural rearrangements
is to simulate the Smoluchowski dynamics of the populations under the appropriate
potential of mean force and verify that the population dynamics show exponential decay
with decay rate (kf + kr)−1 suggested in Eq. 5.13.
To this end, an initial non-equilbrium system in which all members of an ensemble
evolve from an initial state of conditional equilibrium in state B according to the effective
potential in Fig. 5.1 was simulated. The simulation was done using a Monte Carlo
procedure in which steps of magnitude ±∆x were attempted with equal probability and
Chapter 5. Simple Dynamics Using Smoluchowski Equation 112
accepted with probability min(1, ρ(xt)/ρ(x)), where x is the current state of the system
and xt = x±∆x is the trial configuration and ρ(x) is the probability density in Eq. 5.4.
The simulation was done under conditions with diffusion coefficient D = (∆x)2/∆t = 1,
so that the discrete system time evolved with ∆t = 1/∆x2. In the simulation results
shown in Fig. 5.2, ∆x = 0.01. As is clear from the results shown in Fig. 5.2, the decay of
the population of the unbound state is exponential and well described by the theoretical
predicted rate for the range of temperatures examined. The integrals in Eqs. (5.22)
and (5.23) for the intermediate rate constants k1 and k−2 were carried out numerically
using analytical fits to the cumulative distributions and densities, and it was found that
k−11 = 0.0287, and k−1
−2 = 1.532.
5.3 Markov model of configurational dynamics
Based on the considerations of the previous section, it is clear that under conditions of
reasonably low temperature β∗ ≥ 1, the dynamics of the fractional populations c(t) =
{c1(t), . . . , cn(t)} of the microstates is well-represented by a simple Markov model:
c(t) = K · c(t), (5.25)
with formal solution c(t) = eKtc(0), where K is an ns × ns matrix of transition rates
connecting the ns microstates. We recall that since the sum of the fractional populations
is one,∑
α cα(t) = 1, the diagonal elements of K are given by
Kαα = −∑
β 6=α
Kαβ, (5.26)
whereas the off-diagonal rates are
Kαβ =
kfαβ α > β
krαβ β > α,
(5.27)
where the forward and backward rates between states α and β are given by Eq. 5.24.
Chapter 5. Simple Dynamics Using Smoluchowski Equation 113
0 2 4 6 8 10 12 14Time
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Popu
latio
n U
nbou
nd S
tate
β* = 1
β* = 2
β* = 3
Population DynamicsSimulation versus Kramers Solution
Figure 5.2: The equilibration dynamics of the population of the unbound state based
on an ensemble size of 5000 random walkers. The solid lines correspond to simulation
results, and the dashed lines are analytical fits based on the Kramers first-passage time
solution for the potential of mean force in Fig. 5.1. The different curves correspond to
different effective temperatures β∗ = βε. Note that at the lowest effective temperature
β∗ = 3 (the bottom curve), the equilibrium bound population is significantly larger than
the unbound population.
Chapter 5. Simple Dynamics Using Smoluchowski Equation 114
Since it is assumed that there is a clear separation of time scale between reactive events
and the transient time scale of the evolution within a given configuration, the matrix
element Kαβ = 0 for any two states α and β which differ by more than a single bond.
This property means the ns × ns matrix is sparse, particularly for systems with a large
number of possible bonds. For larger systems, the equilibrium population of certain states
with small entropies compared to other configurations with the same number of bonds
is effectively zero over the entire temperature range, and such states are not considered.
From the form of the rate constants, it is expected that the rate of transitions from other
states to these rare states is relatively small and justifies their neglect.
A dynamical picture of the equilbration of an initial non-equilibrium ensemble of
states can be obtained by diagonalization of the matrix K to write
cα(t) = Qαβ e−λβt Q−1βγ cγ(0), (5.28)
where summation over repeated Greek indices is implied. In Eq. 5.28, the columns of
the transformation matrix Q contain the ns eigenvectors, which have eigenvalues {−λ}.Conservation of overall population guarantees that one of the eigenvalues λ0 = 0, with its
corresponding eigenvector ceq being the equilibrium populations. All other eigenvalues
are non-zero and negative (i.e. λn > 0 for n > 0). Thus, we can write
cα(t)− ceq =ns−1∑n=1
Qαn e−λnt Q−1nγ cγ(0), (5.29)
When the states are ordered so that α = 0 corresponds to the completely unbonded
state and α = 2nb − 1 = ns − 1 the state with the maximum number of bonds (hereafter
referred to as the “folded state”), we can monitor the equilibration of the population
of the folded state cf (t) = cns−1(t) starting from an initial population of completely
unbound state c = (1, 0, . . . , 0) by tracking
cf (t) = cf (t)− cf,eq =ns−1∑n=1
Qns−1,n e−λnt Q−1n0 , (5.30)
which has initial value cf (0) = −cf,eq. The relaxation profile cf (t) of a 15-monomer chain
Chapter 5. Simple Dynamics Using Smoluchowski Equation 115
0 5 10 15 20 25 30Time
-1
-0.8
-0.6
-0.4
-0.2
0
c f(t)
β* = 8
β* = 2
Folding Equilibration3 bond model
Figure 5.3: The equilibration dynamics of the folding state population cf (t) versus time
for a chain of 15 monomers for values of inverse effective temperature β∗ ranging from
β∗ = 2 (top curve, dotted lines) to β∗ = 8 (solid black line, bottom curve).
with a maximum of nb = 3 attractive bonds is shown in Fig. 5.3. For this system, there
are only ns = 8 possible microstates, and hence cf (t) may be written as a superposition
of 7 exponentials. At low temperatures (large β∗), the dynamics becomes essentially
independent of temperature as the forward rate constants kf become small and kr ≈ k−2
and independent of temperature (assuming a constant diffusion coefficient D). At low
temperatures, the first non-zero eigenvalues are nearly triply degenerate with a value of
λ1,2,3 ≈ 0.2, while the next set of three eigenvalues are roughly twice as large. Thus, the
dynamics is effectively single exponential, reflecting the equivalence of three relaxation
channels in which the fully folded state BF FJ JN can be reached from any of the three
precursor states with two bonds. At low temperatures, the dynamics roughly consists of
the system falling down a series of steps from the fully unbonded state to the fully folded
(bonded) state with little back-reaction (i.e. climbing back up the steps), and there are
multiple stairways leading to the folded state.
Chapter 5. Simple Dynamics Using Smoluchowski Equation 116
0 50 100 150 200 250 300Time
-1
-0.8
-0.6
-0.4
-0.2
0
c f(t)
Folding Dynamics8 bond model
Figure 5.4: The equilibration dynamics of the folding state population cf (t) versus time
for a chain of 25 monomers for values of inverse effective temperature β∗ ranging from
β∗ = 2 (top curve, dotted lines) to β∗ = 8 (solid black line, bottom curve).
The dynamics of longer chains is much more interesting due to the possibility of form-
ing longer bonds between distant residues that mimics the folding of secondary structural
elements into tertiary structures. In Fig. 5.4, the relaxation profile of a 25-monomer chain
is plotted versus time for a range of inverse temperatures. The dynamics of the larger
system, with 25 monomers, nb = 8 attractive bonds and ns = 256 microstates, is much
more complex than the smaller system since cf (t) is now a combination of 255 expo-
nentials and can acquire characteristics of a stretched exponential frequently observed in
glassy systems with frustration. Note that the folding time of the 25 monomer system is
typically longer than in the shorter chain, even at high temperatures. At high tempera-
tures, the smallest nonzero eigenvalue is doubly degenerate, with λ(25)1,2 ≈ 0.01, two orders
of magnitude smaller than the value of roughly 0.2 observed in the 15-monomer chain.
As the temperature decreases, the relaxation profile becomes more complex as more
eigenmodes contribute to the relaxation at longer time scales, leading to a stretched-
Chapter 5. Simple Dynamics Using Smoluchowski Equation 117
exponential appearance. The folding time clearly increases as the temperature is lowered
at intermediate values of the temperature 1 ≤ β∗ ≤ 4. However, below this temperature
regime the equilibration profile simplifies to a characterstic single exponential form with
a shorter overall folding time. Once again, at low effective temperatures β∗ ≥ 6, the
relaxation profile becomes independent of temperature and roughly single exponential.
The behavior of the profile cf (t) as the temperature is lowered can be understood in
terms of the number of relaxation modes or “folding pathways” that contribute to the
evolution of the microstate populations. At intermediate temperatures, many modes are
connected to one another since the forward rate constants kf describing the rate of escape
from a bonding well are large enough to allow rapid formation and loss of bonds as the
system equilibrates. However, as the temperature is lowered, the forward rate constants
become small and the relaxation proceeds as a sequence of steps of “falling” down the
steps in the free energy landscape. Once again, at low temperatures we find that kf ≈ 0
and kr ≈ k−2, leading to temperature independent dynamics.
In summary, we observe that even though the relaxation profile of longer chains in
model B is much more complicated than that of shorter chains (such as the 15 monomer
system), both folding profiles appear to be single exponential and independent of tem-
perature in the low temperature, high β regime, β∗ ≥ 6. It should be noted that the
overall shape of the free energy landscape of the model B protein-like chain system is
quite funnel-like especially for shorter chains. It is likely it is the smoothness of the funnel
that leads to relatively simple folding dynamics. It would be quite interesting to study
chains longer than 29 beads or to modify the model to examine the effect of long-lived
metastable misfolded states on the qualitative nature of the dynamics.
Chapter 6
Conclusions, Summary and Future
Work
Simple models of a protein-like chain were constructed to investigate the free energy
landscape of a system possessing features of bimolecular systems. Using a combination
of Parallel Tempering (PT) and Discontinuous Molecular Dynamics (DMD), the free
energy landscape of these models of protein-like chains with and without solvent were
investigated.
6.1 Free Energy Landscape in the Absence of a sol-
vent
Simple models of a protein-like chain were used not to capture very detailed behavior
of proteins, which is not possible because of the simplicity of the models, but rather to
capture the basic behavior of proteins and to observe different phases of proteins in a
relatively short computational time. Finding simple models that can be applied to study
the qualitative behavior of proteins without becoming computationally prohibitive is an
important step in understanding the dynamics of protein folding.
118
Chapter 6. Conclusions, Summary and Future Work 119
Two models were presented, called model A and model B. Model A and model B
differ primarily in the number of attractive interactions between beads (illustrated in
Fig. 3.2). Fewer bonding interactions are present in model B, leading to a system with
less frustration and a free energy landscape that possesses fewer local minima and a less
compact folded structure at low temperatures. The secondary structure of an alpha helix
can be observed clearly in model B. In model B, for chains longer than 17 beads, the
most common structure at low temperatures is a collapsed structure in which there are
bond(s) between the two ends of the chain as well as the bond(s) between different layers
of helix. For chains shorter than 18 beads the secondary structure of alpha helix can be
observed without any tertiary structure.
It was shown that for model B, the free energy landscape of the 25-bead chain has a
smooth funnel that has important effects on both the dynamics and the thermodynamics
of the system. In this model, the free energy landscape at low temperatures contains a
deep point with several minima around it located inside one basin. As the temperature
decreases, the deepest point of the funnel becomes deeper, while the minima around the
deepest point become shallower. This trend continues until a temperature is reached in
which all local minima in the free energy landscape have vanished and only a single global
minimum exists. In contrast to Model B, Model A not only takes more time to simulate,
but does not exhibit a preference for a specific native structure at low temperatures.
This may be attributed to several factors such as the lack of rigidity of the chain in this
model, several large entropic barriers, and the possibility of having many structures with
the same energy.
It should be mentioned that before settling on models A and B, more than ten similar
models were considered and their free energy landscapes were investigated. For instance,
in one of the studied models attractions were allowed between any bead with index i and
with index i + 4n, where n cannot be 2 or 3. This means that similar to model B, beads
that are separated along the chain by eight or twelve beads do not attract one another,
Chapter 6. Conclusions, Summary and Future Work 120
and similar to model A, all kinds of beads can be involved in an attractive bond. Since
the results for this model were similar to those for model A, this model was not presented
here.
In order to elucidate the free energy landscape, the PT method was applied and found
to be reasonably effective. However, the suitability of the PT algorithm in studying the
free energy landscape is questionable when studying the wide range of temperatures due
to slow convergence properties. Using too many replicas in the PT method usually causes
problems in getting an efficient PT dynamics. Therefore, studying the landscape at low
temperatures is challenging for some of the models that have very complex free energy
landscapes.
In combination with PT, DMD was used to sample the configurational space available
to the system. Using DMD increased the complexity of the algorithm since all events
must be processed sequentially and efficiently. However, the difficulty of implementing
the dynamics was well worth the effort, since by using DMD it was possible to run systems
with more than 200 replicas for nearly 8 microseconds, processing almost 1010 collision
events in less than 48 CPU hours.
In the absence of solvent, it was shown that the relative configurational entropy is
temperature independent. Hence, using the populations of the configurations at different
temperatures, the relative free energy and entropy of any pair of configurations can
be calculated. From the Helmholtz free energies of different structures at the studied
temperatures, the populations of all configurations at any temperature were predicted
and verified against simulation results. These results agree reasonably with the simulation
results, which shows one of the great advantages of using discontinuous potentials in
studying the free energy landscape.
For model B, short chains with 15 and 20 beads and longer chains with 29, 30 and 35
beads were investigated. For chains shorter than 30 beads, one finds the probability of the
most common structure at low temperatures approaches one similar to the 25-bead chain,
Chapter 6. Conclusions, Summary and Future Work 121
which suggests that the free energy landscape has a deep global minimum inside a funnel
at low temperatures. However, for chains longer than 29 beads the structure satisfying all
possible attractive bonds is geometrically impossible and the entropic barriers between
the configurations with different energies become larger. Hence, at low temperatures
the energy landscape for chains smaller than 30 beads consists mainly of a funnel in
which the lowest energy configuration corresponds to the deepest point, and there are
several local minima with direct access to the lowest energy structure located around the
deepest point. By decreasing the temperature these local minima become shallower and
consequently, the funnel becomes steeper. However, for chains longer than 29 beads the
landscape at low temperatures consists of a few funnels relatively close to each other,
which are shallower than the funnel observed for chains smaller than 30 beads.
The observed landscape can provide insight into the shape of the landscape of actual
proteins. While for small chains the native structure seems to be the lowest free energy
structure, the existence of several distinct funnels in the landscape of long chains suggests
the possibility that the native structure of real proteins is not necessarily the lowest free
energy structure but may correspond to a configurational basin that can be accessed
easily during the folding dynamics. Another factor that should be considered for long
proteins is the important effect of temperature on the morphology of the landscape. In
our study, the basin containing the global minimum becomes steeper as the temperature
decreases for short chains. However, for longer chains, the basin becomes steeper while the
deepest point of the landscape can shift from one configuration to another configuration
with slightly different bonds over the same temperature range. Thus, for some of the
long proteins, the structure may be more sensitive to temperature fluctuations and by
slightly changing the temperature the thermodynamically stable configuration can shift
to a configuration that differs substantially.
Chapter 6. Conclusions, Summary and Future Work 122
6.2 Free Energy Landscape for a Chain Solvated by
a Square-Well Fluid
The free energies of different configurations (i.e., the free energy landscape) of a protein-
like chain in a solvent at different temperatures were also investigated. Qualitatively,
the behavior of a protein-like chain inside a square-well solvent is similar to the behavior
in the absence of a solvent. For the 15-bead chain, the lowest free energy configuration
was found to be an alpha helix that becomes dominant at low temperatures as in the
absence of solvent. The free energy landscape of the 15-bead chain at low temperatures
consists of a funnel with a very deep global minimum and few local minima around it. By
lowering the temperature, the global minimum becomes deeper while the others become
shallower and consequently, the funnel becomes steeper.
For larger chain lengths, in particular for ` = 20 and ` = 25, a phase transition of the
square-well solvent effectively puts a lower bound on the temperature range accessible in
the simulations. The observed phase transition temperature coincides roughly with the
temperature at which previous studies observed a liquid-vapor coexistence line. Inves-
tigating the free energy landscape of a solvated system over a phase transition point of
the solvent can be very challenging using the PT method, especially for larger systems.
For 20-bead and 25-bead chains the effects of the phase transition become more apparent
because of the larger number of particles in comparison with the 15-bead chain. Conse-
quently, the temperature range studied here could not be extended below the (effective)
phase transition temperature for 20-bead and 25-bead chain systems. This difficulty is
not easy to overcome, since it is related to the efficiency of the PT algorithm itself near
the phase transition point. Substantial computational resources, over a million cpu hours,
were used to obtain the results presented here, which were mainly consumed to obtain
the best set of parameters for the PT runs. As a result of the considerable computational
demand of computing the free energy of the solvated system below the phase transition
Chapter 6. Conclusions, Summary and Future Work 123
point, a direct comparison with the results in the absence of a solvent could not be done
for the whole range of temperatures for 20-bead and 25-bead chain systems. However,
it is expected that for both 20-bead and 25-bead chain systems the configuration with
the lowest configurational energy becomes dominant at lower temperatures, since their
systems energy seem to be the lowest ones at very low temperatures, which is mainly
due to their lowest configurational energy and the hydrophobicity of 75 % of protein-like
chain beads (having only hard-core repulsive interactions with solvent particles).
The relative configurational entropies could not be calculated here due to temperature
dependent averages over solvent degrees of freedom, and a large number of sampled
configurations is needed to reduce the statistical error of computed values of the system
energy and entropy. For example, even to decrease the statistical error in the system
entropy of the 15-bead chain by a factor of 10, which may be not sufficient, 100 times
more samples are needed, which here means around half a million cpu hours. For larger
systems, the statistical errors are even larger as they scale with the size of the system
and the calculations become much more expensive.
While for the 15-bead chain the lowest energy configuration is an unfolded alpha-
helix without any specific tertiary structure, for longer chains because of the bonds
between different layers of the helix, such as the bond between two ends of the chain,
the lowest energy structure is a folded structure. Our study showed that for longer
chains the entropic barrier for making bonds between the two ends of the chain is larger
than the change in entropy associated with the bonds necessary for forming a helix
structure. As a consequence, the unfolded helix structure is dominant for a relatively
wide range of temperature until the low temperature regime where the folded helix is
favored. Therefore, similar to the absence of a solvent, the effect of temperature on the
morphology of the landscape is more apparent for the longer chains.
Similar to the absence of the solvent, it was confirmed that the model B has a proper
criteria for studying real proteins. The number of bonds is sufficient to generate common
Chapter 6. Conclusions, Summary and Future Work 124
secondary structure of alpha helix and tertiary structure of the folded structure. The
models that have more possible bonds, such as model A introduced in chapter three, are
much more expensive than model B while they are not necessarily better in representing
real proteins. The lack of attractive bonds between beads that are separated by eight or
twelve beads makes the chain more rigid, and this restriction was successful in reproducing
some of the effects of dihedral angles interactions and side chains in real proteins.
One of the major differences between a protein model in a solvent and without a
solvent is the effects of mainly repulsive interactions of the protein-like chain beads with
the solvent particles in the folding process. Only 25% of the beads can make attractive
bonds with the solvent, while the rest of the beads only have repulsive interactions with
the solvent, which makes most of the beads hydrophobic. Because of the restriction
effects of the repulsive interactions, the entropy range (system entropy difference of the
maximum and minimum number of bonds configurations) is smaller in comparison to the
absence of a solvent. Because of the smaller entropy range, the landscape shows funnel
behavior at higher temperatures in comparison to the absence of a solvent.
6.3 Simple Dynamics Using Smoluchowski Equation
In Chapter 5, it was shown that one of the possible avenues for future research is to inves-
tigate the dynamics of the folding transition, instead of only studying the free energies.
Using some earlier studies, where some simple connections between energy landscapes
and protein folding kinetics are provided [75, 76], we applied Smoluchowski and Kramer’s
equations to study the dynamics. By considering the distance between any two beads
that can make a bond as a reaction coordinate and by using the populations of the con-
figurations at each distance segment, the potential of mean force (Helmholtz free energy)
versus the reaction coordinate (distance of the two beads that can make a bond) were
calculated.
Chapter 6. Conclusions, Summary and Future Work 125
It was shown through simulation of a stochastic model of the evolution of the system
that the dynamics of transitions between microstates of the chain is well described by the
first-passage time solution of the Smoluchowski equation. Applying this equation for all
the possible bonds of the 15-bead and 25-bead chains, we calculated the relaxation matrix
K for both of the chains, where Kij is the rate constants for the transition from microstate
j to microstate i. We investigated the equilibration process from an ensemble of initially
extended configurations to mainly folded configurations at low effective temperatures. We
observed that while the relaxation profile of the 25-bead chain in model B is much more
complicated than that of the 15-bead chain, both folding profiles appear to be single
exponential and independent of temperature in the low temperature, high β regime,
β∗ ≥ 6. It should be noted that the overall shape of the free energy landscape of the
model B protein-like chain system is quite funnel-like especially for shorter chains. For
the chains shorter than 30 beads in model B, the funnels are smooth and regular and free
of “mis-folded states”, corresponding to alternate funnels well-removed from the funnel
leading to the native state. It is likely it is the smoothness of the funnel that leads to
relatively simple folding dynamics.
6.4 Future work
The main problem in studying the protein-like chain inside a solvent is the slow conver-
gence of estimates of the free energy of configurations using the PT method. For example,
the presence of phase transition in the square-well fluids leads to large statistical errors
in the PT method. To overcome the effects of the phase transition on sampling, the
PT method should be enhanced by incorporating other techniques, such as the umbrella
sampling[77]. Another solution for this problem is to use different parameters for the
square-well liquid such that the phase transition temperature lies outside the tempera-
ture range of interest. Since in this work, the phase transition was observed at the same
Chapter 6. Conclusions, Summary and Future Work 126
temperature at which previous studies predict the liquid-vapor coexistence line for the
density of ρ∗ = 0.5, another set of parameters could be used for which, according to the
previous studies, no phase transition occurs inside the studied range of temperatures.
According to Ref. [61], by increasing the ratio λ = σ′/σ, the liquid-vapor coexistence line
shifts to higher temperatures for the density of ρ∗ = 0.5. For example, for λ = 2.0 the
liquid-vapor coexistence line is crossed at a temperature around T ∗l = 2.4 for ρ∗ = 0.5
[61, 60, 78], which is very close to the highest studied temperature (T ∗l = 2.5).
While the models used here are too simple to represent specific protein, they still
can describe the general behavior of an alpha helix. It is possible to extend this project
by using more complex models that can be done in several different ways. One possible
extension is to define 20 different beads, representing 20 different amino acids, instead of
the current four kinds and try to define their interactions based on the characteristics of
real amino acids.
Another possible extension is to use a model in which the bonds are defined using
experimental results for the folded structure of a specific protein, where a possible bond
is defined only if there is a hydrogen bond between the two residues of a specific protein
in its natural folded state.
The model can also be extended by using some of the studies on applying network
motifs in understanding hydrogen bonds patterns in proteins[79, 80, 81]. Based on these
studies, there are some common hydrogen-bond patterns in proteins that can be presented
by graphs with relatively small number of nodes, which means that “surprisingly, very
few parameters are needed to define the hydrogen-bond motifs”[81]. It is possible to use
these patterns to define the attractive interactions between chain beads.
Another possible extension is to use several beads for modeling amino acids. This
means that each atom (or a group of atoms) in an amino acid can be represented by a
bead. However, because of the significant increase in the cost of simulation, applying
this extension may not be worthwhile.
Chapter 6. Conclusions, Summary and Future Work 127
It would be quite interesting to apply Smoluchowski equation to study the dynamics
of chains longer than 29 beads or to modify the model B to examine the effect of long-lived
metastable misfolded states on the qualitative nature of the dynamics.
Appendices
128
Appendix A
Heat Capacity and Compressibility
A.1 Heat Capacity
The heat capacity is related to the fluctuation of energy in the canonical ensemble ac-
cording to:
Cv =∂U
∂T= − 1
kb T 2
∂U
∂β= − 1
kb T 2
−∂ ∂ ln Z∂β
∂β=
1
kb T 2
∂2 ln Z
∂β2=
1
kb T 2〈(U − U)2〉, (A.1)
where Cv, U , T, kb, β and Z are the heat capacity at constant volume, internal en-
ergy, temperature, Boltzmann’s constant, 1/kbT, and the canonical partition function
respectively.
A.2 Compressibility
Since the number of particles is fixed in the both canonical and microcanonical ensembles,
the compressibility can be related to the number variation only in the grand canonical
ensemble. Number of particles can be presented as:
N = 〈N〉 =1
Zβ
∂Z
∂µ=
1
β
∂
∂µln Z, (A.2)
where N , µ and Z are the average number, chemical potential and the grand canonical
partition function. Then the fluctuation in the number of particles in volume V can be
129
Appendix A. Heat Capacity and Compressibility 130
written as:
〈N2〉 − 〈N〉2 =1
Zβ2
∂2Z
∂µ2−
[1
Zβ
∂Z
∂µ
]2
=1
β2
∂2lnZ
∂µ2=
1
β2
∂(β〈N〉)∂µ
= kBT
(∂N
∂µ
)
TV
.(A.3)
Using the Gibbs-Duhem equation, Ndµ = V dp− SdT , it can be shown that:
−N2
V
(∂µ
∂N
)
TV
= V
(∂p
∂V
)
TN
⇒(
∂N
∂µ
)
TV
=N2
VκT , (A.4)
where κT = − 1V
(∂V∂p
)TN
. Therefore,
〈N2〉 − 〈N〉2 =N2
VkBTκT . (A.5)
Appendix B
Temperature sets in PT
B.1 In the absence of solvent
For model A, ∆β∗ varies with β∗ from ∆β∗ = 1.5 for the highest temperatures, to
∆β∗ = 0.375 for the lowest temperatures. More specifically:
β∗i =
32(i + 1) if i ≤ 5
i + 4 if 5 ≤ i ≤ 20
34(i + 12) if 20 ≤ i ≤ 36
35(i + 24) if 36 ≤ i ≤ 51
12(i + 39) if 51 ≤ i ≤ 63
38(i + 73) if 63 ≤ i ≤ n
(B.1)
The inverse temperature sets were established by trial and error, and are not unique.
The most important property of this set is that subsequent temperature differences vary
smoothly with temperature. The larger ∆β∗ at high temperature allows one to reach
lower temperatures T ∗n without having to add too many replicas.
For model B, the temperatures set is simpler, since a wide range of temperatures is
not required to study the landscape. Since the used ∆β∗ is very small, ∆β∗ can be taken
131
Appendix B. Temperature sets in PT 132
to be uniform for this model, with
β∗i =3
2
(i
10+ 1
). (B.2)
Bibliography
[1] E. Shakhnovich. Protein folding thermodynamics and dynamics: where physics,
chemistry, and biology meet. Chem. Rev., 106(5):1559–1588, 2006.
[2] Editorial. So much more to know. Science, 309(5731):78–102, 2005.
[3] Leonor Cruzeiro-Hansson and Paulo A.S. Silva. Protein folding : thermodynamic
versus kinetic control. Journal of Biological Physics, 27:S6S8, 2001.
[4] Martin Karplus. The levinthal paradox: yesterday and today. Fold. Des., 2:69–75,
1997.
[5] Yaoqi Zhou and Martin Karplus. Folding thermodynamics of a model three-helix-
bundle protein. Proc. Natl. Acad. Sci. USA, 94(26):14429–14432, 1997.
[6] Oleg B. Ptitsyn. How the molten globule became. Trends in Biochem. Sci.,
20(9):376–379, 1995.
[7] Michel E. Goldberg. The second translation of the genetic message: protein folding
and assembly. Trends in Biochemical Sciences, 10(10):388–391, 1985.
[8] P. J. Thomas, B. H. Qu, and P. L . Pederson. Defective protein folding as a basis
of human disease. Trends Biochem Sci., 20(11):456–459, 1995.
[9] E. Haber and C. B. Anfinsen. Side-chain interactions governing the pairing of half-
cystine residues in ribonuclease. The Journal of Biological Chemistry, 237:1839–
1844, 1962.
133
Bibliography 134
[10] Alexander Schug, Thomas Herges, Abhinav Verma, and Wolfgang Wenzel. Inves-
tigation of the parallel tempering method for protein folding. J. Phys.: Condens.
Matter, 17:S1641–S1650, 2005.
[11] Christian Anfinsen. Principles that govern the folding of protein chains. Science,
181(4096):223–230, 1973.
[12] Ken A. Dill. Folding proteins: finding a needle in a haystack. Curr. Opinion Struct.
Biol., 3(1):99–103, 1993.
[13] Venkataramanan Soundararajan, Rahul Raman, S. Raguram, V. Sasisekharan, and
Ram Sasisekharan. Atomic interaction networks in the core of protein domains and
their native folds. PLoS ONE, 5(2):e9391, 2010.
[14] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes. Funnels, pathways,
and the energy landscape of protein folding: A synthesis. Proteins:Struct. Funct.
Genet., 21(3):167195, 1995.
[15] C. Levinthal. Are there pathways for protein folding. J. Chim. Phys., 65:4445, 1968.
[16] Sridhar Govindarajan and Richard A. Goldstein. On the thermodynamic hypothesis
of protein folding. Proc Natl Acad Sci U S A., 95(10):55455549, 1998.
[17] Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera, and Vincent A Voelz.
The protein folding problem: when will it be solved? Current Opinion in Structural
Biology, 17(3):342–346, 2007.
[18] A. Sali, E. Shakhnovich, and M. Karplus. How does a protein fold? Nature,
369(6477):248–251, 1994.
[19] P. S. Kim and R. L. Baldwin. Intermediates in the folding reactions of small proteins.
Annual Review of Biochemistry, 59:631–660, 1990.
Bibliography 135
[20] H. Roder H and W. Coln. Kinetic role of early intermediates in protein folding.
Current Opinion in Structural Biology, 7(1):15–28, 1997.
[21] H. Frauenfelder, F. Parak, and R. D. Young. Conformational substates in proteins.
Annu. Rev. Biophys. Biophys. Chem., 17:451–479, 1988.
[22] T.C.B. McLeish. Protein folding in high-dimensional spaces: Hypergutters and the
role of nonnative interactions. Biophys. J., 88(1):172–183, 2005.
[23] P. E. Leopold, M. Montal, and J. N. Onuchic. Protein folding funnels: A ki-
netic approach to the sequence-structure relationship. Proc. Natl. Acad. Sci. USA,
89:87218725, 1992.
[24] P. G. Wolynes, J. N. Onuchi, and D. Thirumulai. Navigating the folding routes.
Science, 267:16191620, 1995.
[25] J. N. Onuchic, P. G. Wolynes, Z. Luthey-Schulten, and N. D. Socci. Toward an
outline of the topography of a realistic protein-folding funnel. Proc. Natl. Acad. Sci.
USA, 92(8):3626–3630, 1995.
[26] Jose Nelson Onuchic, Nicholas D. Socci, Zaida Luthey-Schulten, and Peter G
Wolynes. Protein folding funnels: the nature of the transition state ensemble. Fold
Des., 1(6):441–450, 1996.
[27] Ken A. Dill and Hue Sun Chan. From levinthal to pathways to funnels. Nature
Structural Biology, 4:10–19, 1997.
[28] Nicholas D. Socci, Jose Nelson Onuchic, and Peter G. Wolynes. Protein folding
mechanisms and the multidimensional folding funnel. Proteins, 32(2):136–158, 1998.
[29] Brian C. Gin, Juan P. Garrahan, and Phillip L. Geissler. The limited role of nonna-
tive contacts in the folding pathways of a lattice protein. J. Mol. Biol., 392(5):1303–
1314, 2009.
Bibliography 136
[30] Michael Springborg. Chemical Modelling. Royal Society of Chemistry, 2010.
[31] Themis Lazaridis and Martin Karplus. ”new view” of protein folding reconciled with
the old through multiple unfolding simulations. Science, 278:1928–1931, 1997.
[32] S. B. Prusiner. Novel proteinaceous infectious particles cause scrapie. Science,
216(4542):136–144, 1982.
[33] S. B. Prusiner. Molecular biology of prion diseases. Science, 252(5012):1515–1522,
1991.
[34] Reinat Nevo, Vlad Brumfeld, Ruti Kapon, Peter Hinterdorfer, and Ziv Reich. Direct
measurement of protein energy landscape roughness. EMBO Rep., 6(5):482–486,
2005.
[35] Nikolay V. Dokholyan, Sergey V. Buldyrev, H Eugene Stanley, and Eugene I.
Shakhnovich. Discrete molecular dynamics studies of the folding of a protein-like
model. Fold Des., 3(6):577–587, 1998.
[36] D. C. Rapaport. The art of molecular dynamics simulation. Cambridge University
Press, Cambridge, 2nd edn. edition, 2004.
[37] Lisandro Hernandez de la Pena, Ramses van Zon, Jeremy Schofield, and Sheldon B.
Opps. Discontinuous molecular dynamics for semi-flexible and rigid bodies. J. Chem.
Phys., 126(7):074105, 2007.
[38] Y. Zhou, M. Karplus, J.M. Wichert, and C.K. Hall. Equilibrium thermodynamics
of homopolymers and clusters: molecular dynamics and monte-carlo simulations of
system with square-well interactions. J. Chem. Phys., 107(24):10691–10708, 1997.
[39] Derek N. Woolfson, Alan Cooper, Margaret M. Harding, Dudley H. Williams, and
Philip A. Evans. Protein folding in the absence of the solvent ordering contribution
to the hydrophobic interaction. J. Mol. Biol., 229(2):502–511, 1993.
Bibliography 137
[40] Manoj V. Athawale, Gaurav Goel, Tuhin Ghosh, Thomas M. Truskett, and Shekhar
Garde. Effects of lengthscales and attractions on the collapse of hydrophobic poly-
mers in water. Proc. Natl. Acad. Sci. USA, 104(3):733–738, 2007.
[41] Sowmianarayanan Rajamani, Thomas M. Truskett, and Shekhar Garde. Hydropho-
bic hydration from small to large lengthscales: Understanding and manipulating the
crossover. Proc. Natl. Acad. Sci. USA, 102(27):9475–9480, 2005.
[42] Robert H. Swendsen and Jian-Sheng Wang. Replica monte carlo simulation of spin
glasses. Phys. Rev. Lett., 57(21):2607–2609, 1986.
[43] C. J. Geyer. Markov chain monte carlo maximum likelihood. In Proceedings of the
23rd Symposium on the Interface: Computing Science and Statistics, pages 156–163,
1991.
[44] M. C. Tesi, E. J. Janse van Rensburg, E. Orlandini, and S. G. Whittington. Monte
carlo study of the interacting self-avoiding walk model in three dimensions. J. Statist.
Phys., 82(1-2):155–181, 1996.
[45] David J. Earl and Michael W. Deem. Parallel tempering: Theory, applications, and
new perspectives. Phys. Chem. Chem. Phys.,, 7:3910–3916, 2005.
[46] Kurt Binder and Dieter W. Heermann. Monte Carlo Simulation in Statistical
Physics: An Introduction. Springer, 2010.
[47] Jeremy Schofield and Ramses van Zon. Class notes chm1464h: Foundations of mole-
cular simulation. http://www.chem.toronto.edu/staff/JMS/simulation/notes.html,
2008.
[48] Daan Frenkel and Berend Smit. Understanding Molecular Simulation, Second Edi-
tion: From Algorithms to Applications. Academic Press, 2002.
Bibliography 138
[49] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller.
Equations of state calculations by fast computing machines. Journal of Chemical
Physics, 21(6):10871092, 1953.
[50] S. Toxvaerd. Energy conservation in molecular dynamics. Journal of Computational
Physics, 52:214–216, 1983.
[51] Loup Verlet. Computer “experiments” on classical fluids. i. thermodynamical prop-
erties of lennard-jones molecules. Phys. Rev., 159(1):98103, 1967.
[52] Simon Duane, A. D. Kennedy, Brian J. Pendleton, and Duncan Roweth. Hybrid
monte carlo. Phys. Lett. B, 195(2):216–222, 1987.
[53] S. B. Opps and J. Schofield. Extended state-space monte carlo methods. Phys. Rev.
E, 63(5 Pt 2):056701, 2001.
[54] William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel
Programming with the Message-Passing Interface. MIT Press, Cambridge, MA,
1994.
[55] A. Bellemans, J. Orban, and D. van Belle. Molecular dynamics of rigid and non-rigid
necklaces of hard discs. Mol. Phys., 39(3):781–782, 1980.
[56] Michel Daune. Molecular biophysics : structures in motion. Oxford University Press,
Oxford ; New York, 1999.
[57] H. Taketomi, Y. Ueda, and N. Go. Studies on protein folding, unfolding and fluctua-
tions by computer simulation. i. the effect of specific amino acid sequence represented
by specific inter-unit interactions. Int. J. Pept. Protein Res., 7(6):445–459, 1975.
[58] N. Go and H. Abe. Noninteracting local-structure model of folding and unfolding
transition in globular proteins. i. formulation. Biopolymers, 20(5):991–1011, 1981.
Bibliography 139
[59] Da-Wei Li and Rafael Bruschweiler. In silico relationship between configurational
entropy and soft degrees of freedom in proteins and peptides. Phys. Rev. Lett.,
102(11):118108, 2009.
[60] Jayant K. Singh, David A. Kofke, and Jeffrey R. Errington. Surface tension and
vaporliquid phase coexistence of the square-well fluid. J. Chem. Phys., 119(6):3405–
3412, 2003.
[61] P. Orea, Y. Duda, V. C. Weiss, W. Schrer, and J. Alejandre. Liquidvapor interface
of square-well fluids of variable interaction range. J. Chem. Phys., 120(24):11754–
11764, 2004.
[62] E. Schll-Paschinger, A. L. Benavides, and R. Castaeda-Priego. Vapor-liquid equilib-
rium and critical behavior of the square-well fluid of variable range: A theoretical
study. J. Chem. Phys., 123(23):234513, 2005.
[63] I Guillen-Escamilla1, M Chavez-Paez1, and R Castaneda-Priego. Structure and
thermodynamics of discrete potential fluids in the ozhmsa formalism. J. Phys.:
Condens. Matter, 19(8):086224, 2007.
[64] G. Orkoulas and A. Z. Panagiotopoulos. Phase behavior of the restricted primitive
model and square-well fluids from monte carlo simulations in the grand canonical
ensemble. J. Chem. Phys., 110:1581–90, 19999.
[65] J. Richard Elliott and Liegi Hu. Vapor-liquid equilibria of square-well spheres. J.
Chem. Phys., 110(6):3043–3048, 1999.
[66] F. Del Rio, E. Avalos, R. Espindola, L. F. Rull, G. Jackson, and S. Lago. Vapourliq-
uid equilibrium of the square-well fluid of variable range via a hybrid simulation
approach. Mol. Phys., 100(15):2531–2546, 2002.
Bibliography 140
[67] L. Vega, E. de Miguel, L. F. Rull, G. Jackson, and I. A. McLure. Phase equilibria
and critical behavior of square-well fluids of variable width by gibbs ensemble monte
carlo simulation. J. Chem. Phys., 96:2296–2305, 1992.
[68] A. Lang, G. Kahl, C. N. Likos, H. Lowen, and M. Watzlawek. Structure and ther-
modynamics of square-well and square-shoulder fluids. J. Phys.: Condens. Matter,
11(50):1014310161, 1999.
[69] Sheldon B. Opps and Jeremy Schofield. Extended state-space monte carlo methods.
Physical Review E, 63(5):056701, 2001.
[70] R. van Zon and J. Schofield. Constructing smooth potentials of mean force, radial
distribution functions and probability densities from sampled data. J. Chem. Phys.,
132:154110, 2010.
[71] J. Schofield and I. Oppenheim. The hydrodynamics of inelastic granular systems.
Physica A, 196:209–240, 1993.
[72] J. Schofield, A. H. Marcus, and S. A. Rice. The dynamics of quasi two dimensional
colloidal suspensions. J. Phys. Chem., 100:18950–18961, 1996.
[73] A. Szabo, K. Schulten, and Z. Schulten. First passage time approach to diffusion
controlled reactions. J. Chem. Phys., 72:4350–4357, 1980.
[74] K. Schulten, Z. Schulten, and A. Szabo. Dynamics of reactions involving diffusive
barrier crossing. J. Chem. Phys., 74:4426, 1981.
[75] D. J. Bicout and A. Szabo. Entropic barriers, transition states, funnels, and expo-
nential protein folding kinetics: a simple model. Protein Sci., 9(3):452465, 2000.
[76] Peter Hamm, Jan Helbinga, and Jens Bredenbecka. Stretched versus compressed
exponential kinetics in a-helix folding. Nonequilibrium Dynamics in Biomolecules,
323(1):54–65, 2006.
Bibliography 141
[77] G. M. Torrie and J. P. Valleau. Nonphysical sampling distributions in monte carlo
free-energy estimation: Umbrella sampling. Journal of Computational Physics,
23(2):187–199, 1977.
[78] Enrique de Miguel. Critical behavior of the square-well fluid with λ=2: A finite-
size-scaling study. Phys. Rev. E, 55(2):13471354, 1997.
[79] Ofer Rahat, Uri Alon, Yaakov Levy, and Gideon Schreibe. Understanding
hydrogen-bond patterns in proteins using network motifs. Structural bioinformatics,
25(22):29212928, 2009.
[80] T. Prasad, T. Subramanian, S. Hariharaputran, H.S. Chaitra, and N. Chandra.
Extracting hydrogen-bond signature patterns from protein structure data. Appl
Bioinformatics, 3(2-3):125–35, 2004.
[81] M. C. Etter, J. C. MacDonald, and J. Bernstein. Graph-set analysis of hydrogen-
bond patterns in organic crystals. International Union of Crystallography, 46(2):256–
262, 1990.