by hanif bayat movahed a thesis submitted in conformity with the … · 2012. 11. 2. · hanif...

Free Energy Landscape of Protein-Like Chains Interacting

Under Discontinuous Potentials

by

Hanif Bayat Movahed

A thesis submitted in conformity with the requirementsfor the degree of Doctor of PhilosophyGraduate Department of Chemistry

University of Toronto

Copyright c© 2011 by Hanif Bayat Movahed

Abstract

Free Energy Landscape of Protein-Like Chains Interacting Under Discontinuous

Potentials

Hanif Bayat Movahed

Doctor of Philosophy

Graduate Department of Chemistry

University of Toronto

2011

The free energy landscape of a protein-like chain is constructed from exhaustive simu-

lation studies using a combination of discontinuous molecular dynamics and parallel tem-

pering methods. The protein model is a repeating sequence of four kinds of monomers,

in which hydrogen bond attraction, electrostatic repulsion, and covalent bond vibrations

are modeled by step, shoulder and square-well potentials, respectively. These protein-

like chains exhibit a helical structure in their folded states. The model allows a natural

definition of a configuration by considering which beads are bonded. In the absence of a

solvent, the relative free energy of dominant structures is determined from the relative

populations, and the probabilities predicted from the calculated free energies are found

to be in excellent agreement with the observed probabilities at different temperatures.

The free energy landscape of the protein-like chain is analyzed and confirmed to have

funnel-like characteristics, confirmed by the fact that the probability of observing the

most common configuration approaches unity at low enough temperatures for chains

with fewer than 30 beads. The effect on the free energy landscape of an explicit square-

well solvent, where the beads that can form intra-chain bonds can also form (weaker)

bonds with solvent molecules while other beads are insoluble, is also examined. Simula-

tions for chains of 15, 20 and 25 beads show that at low temperatures, the most likely

structures are collapsed helical structures. The temperature at which collapsed helical

ii

structures become dominant is higher than in the absence of a solvent. Finally, the dy-

namics of the protein-like chain immersed in an implicit hard sphere solvent is studied

using a simple model in which the implicit solvent interacts on a fast time scale with the

chain beads and provides sufficient friction so that the motion of monomers is governed

by the Smoluchowski equation. Using a Markovian model of the kinetics of transitions

between conformations, the equilibration process from an ensemble of initially extended

configurations to mainly folded configurations is investigated at low effective tempera-

tures for a number of different chain lengths. It was observed that folding profiles appear

to be single exponentials and independent of temperature at low temperatures.

iii

To the memory of my Grandfathers,

Fereydoun Bayat Movahed, my role model in life for his vision, wisdom, charisma and morality

and

Mohammad Reza Roghani Zanjani for his kindness, productivity and hardworking character.

iv

Acknowledgements

Studying and working at the University of Toronto has been a wonderful life experience

for me. At UofT, besides earning experience through conducting research, teaching and

passing courses, I obtained valuable experience by becoming involved in policy devel-

opment in the academic board of the Governing Council and the Graduate Education

Council. Many people have a hand in my success, and they deserve to be named in

this acknowledgement, but naming all of them would require adding another chapter to

this thesis. However, I would like to express my sincere appreciations to the following

individual and organizations.

First, I should thank my supervisor, Prof. Jeremy Schofield, for giving me the chance

to work in his group and for all of his support, patience, guidance and friendship during

my PhD studies.

Thanks also go to Prof. Stuart Whittington and Prof. Gilbert Walker, the other

members of my advisory committee for all their advice and generous support during my

PhD program, and to my M.Sc. supervisor, Prof. Donald Sullivan, for his continuous

support.

I would like to extend very sincere thanks to Dr. Ramses van Zon whose help and

advice were essential for many challenging parts of my PhD project. Both when he was

part of the group and then when he joined SciNet, he always had time for my numerous

questions. He shared his knowledge with me in the most effective way and suggested very

smart and practical solutions. I should also thank my friend and colleague in CPTG, Dr.

Ali Nassimi, for his useful advice and comments regarding my PhD project.

I should express my deepest gratitude to the Department of Chemistry at the Univer-

sity of Toronto, Ontario Ministry of Training, Colleges and Universities, and the Natural

Science and Engineering Research Council of Canada for financial assistance; and also

SciNet HPC Consortium for providing outstanding computational facilities.

I owe much to my parents for their tremendous support during all my life. Their

v

continuous belief in me and their encouragement have helped me to move passionately

towards my goals. I will always remember their joy and happiness when they heard I

defended my thesis successfully. Beside this, great thanks go to my wonderful brothers

Saeed and Saber for being supportive and wise friends to me throughout our lives.

Finally, I should thank Fatemeh Jafargholi, my wife, for all she has done for me;

especially in the last year when her life was totally affected by my continuous struggle

with the project. I should thank her for understanding me and supporting me in all

aspects of my life during these years. Her love is the main asset of my life, and her

support, encouragement, comfort and suggestions are priceless.

Thank you.

vi

Contents

1 Introduction 1

1.1 Protein Folding Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Main Viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Energy Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Theoretical Studies of Protein Folding . . . . . . . . . . . . . . . . . . . 7

1.4.1 Using DMD to Study Protein Folding . . . . . . . . . . . . . . . . 8

1.5 Role of Solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Simulation Techniques 13

2.1 Different Simulation Techniques . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Molecular Dynamics (MD) . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Discontinuous Molecular Dynamics (DMD) . . . . . . . . . . . . . . . . . 19

2.2.1 Event Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Cell Crossing Event . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Collision Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.4 Measurement Events . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Parallel Tempering Method . . . . . . . . . . . . . . . . . . . . . . . . . 25

vii

2.3.1 Parallel tempering . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.2 Efficient Parallel Tempering Dynamics . . . . . . . . . . . . . . . 26

2.4 Simulation Structure of the Project . . . . . . . . . . . . . . . . . . . . . 28

2.4.1 Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Protein-like Chain Without a Solvent 31

3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Definition of configurations . . . . . . . . . . . . . . . . . . . . . . 36

3.1.2 Temperature independence of relative configurational entropies . . 38

3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.1 Parallel tempering efficiency . . . . . . . . . . . . . . . . . . . . . 40

3.2.2 Observed structures . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.3 Free energy landscape . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2.4 Entropy and free energy calculation for the model B 25-bead chain 56

3.2.5 Entropy and free energy calculation for 35 beads protein-like chain 64

3.2.6 Effects of the protein-like chain length . . . . . . . . . . . . . . . 68

4 Protein-like Chain Inside a Solvent 75

4.1 The System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.1.1 The solvent model . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.1.2 Definition of Configuration . . . . . . . . . . . . . . . . . . . . . . 78

4.1.3 Simulation Structure . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.2.1 Parallel tempering efficiency . . . . . . . . . . . . . . . . . . . . . 80

4.2.2 Phase of the solvent . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.2.3 Observed structures and Free energy landscape . . . . . . . . . . 89

4.2.4 Relative configurational entropy . . . . . . . . . . . . . . . . . . . 101

viii

5 Simple Dynamics Using Smoluchowski Equation 103

5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.2 Smoluchowski dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2.1 First passage time approach to rate constants . . . . . . . . . . . 108

5.2.2 Numerical test of microscopic rate expressions . . . . . . . . . . . 111

5.3 Markov model of configurational dynamics . . . . . . . . . . . . . . . . . 112

6 Conclusions, Summary and Future Work 118

6.1 Free Energy Landscape in the Absence of a solvent . . . . . . . . . . . . 118

6.2 Free Energy Landscape for a Chain Solvated by a Square-Well Fluid . . . 122

6.3 Simple Dynamics Using Smoluchowski Equation . . . . . . . . . . . . . . 124

6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Appendices 129

A Heat Capacity and Compressibility 129

A.1 Heat Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

A.2 Compressibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

B Temperature sets in PT 131

B.1 In the absence of solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Bibliography 132

ix

Chapter 1

Introduction

1.1 Protein Folding Problem

Proteins naturally have a tendency to assume conformations that range from a wide range

of extended configurations at high temperatures to unique three-dimensional “folded”

structures at physiological conditions that play functional roles in an organism. Under-

standing how proteins fold has been a very challenging problem in physics, chemistry and

biology[1], and was considered by Science as one of the 100 biggest unsolved problems

that span the sciences[2]. There are two major challenges in understanding the physics

of protein folding. The first one is concerned with the prediction of three dimensional

configuration from a one dimensional sequence of amino acids, and the second one con-

cerns understanding the mechanism by which an unfolded structure ends up in a folded

state[3, 4]. Obviously, these two questions are connected and finding a complete solution

for the second problem would likely provide great insight into the connection between

the sequence and the structure. However, the first problem will be likely solved by other

methods that are mainly based on the understanding of known structures[4]. Character-

izing and understanding how the free energy of a protein depends on its configuration

(the free energy landscape) would significantly help in answering the second question and

1

Chapter 1. Introduction 2

consequently, the first one.

One of the main characteristics of proteins is their complex phase behavior, where

the main phases are: the denatured coil state with a random structure, a relatively

structured compact globule state, and the native state[5]. The compact globule state

plays the role of an intermediate between the completely unfolded denatured state and

the native folded structure[6].

1.2 Main Viewpoints

Historically, there are two opposing views of the folding mechanism. The first one suggests

that for a specific sequence of amino acids the native structure is the most stable state,

which means that the free energy associated to the specific three dimensional structure

of the protein (protein configuration) is the lowest one among all possible configurations.

However, the other viewpoint maintains that the native structure is the structure that is

kinetically most accessible[3].

Nobel Prize Laureate Christian Anfinsen, a pioneer of the first viewpoint, proposed

a postulate that has been called the “second translation of the primary structure”[7].

He proposed that all the required information for folding into a functional and three

dimensional structure exists in the sequence of amino acids[8]. It was shown by Anfinsen

et al. that the unique secondary and tertiary structure of fully reduced ribonuclease

is thermodynamically the most stable configuration[9]. Later, based on the results of

his numerous denaturation-renaturation experiments, Anfinsen proposed one of the most

famous postulates in molecular biology, known as the “thermodynamic hypothesis” (or

Anfinsen’s dogma): The native configuration of a protein is associated with the global

minimum of the Gibbs free energy[10, 11], where the native configuration is unique,

stable and kinetically accessible. For some proteins, independent of the folding pathway,

native activity is thermodynamically reversible; thus, the native states are stable like


crystals and do not exhibit the characteristic long-lived metastable states of glasses where

the structure depends on the preparation history.[12]. A recent study on 1018 known

protein folds provided strong evidence in support of Anfinsen’s idea that the native

folded structure of protein domains is dictated by the information encoded in the amino

acid sequence[13].

In contrast to Anfinsen, Levinthal argued that the time required to explore all confor-

mations of an average protein would take thousands of years and therefore, there is not

sufficient time for the protein to find the global free energy minimum. Hence, kinetic path-

ways that are readily accessible must determine the conformation of a protein[14, 15, 16].

In this viewpoint the native structure is kinetically trapped in a low energy, long-lived

configuration. In his words, this idea is explained as ‘if the final folded state turned out

to be the one of lowest configurational energy, it would be a consequence of biological

evolution and not of physical chemistry”[15], and he proposed that the folding happens

following specific pathways[15]. The main argument of Levinthal, known as “Levinthal’s

paradox”, is about the length of time required for a protein to find its global free energy

minimum from a very large number of energetically accessible conformations. For ex-

ample, if we assume that there are only two possible conformations for each amino acid,

a protein with only 100 amino acids has on the order of 2100 ≈ 1030 different possible

conformations[14]. Therefore, the folding process is likened to searching for a needle in

a haystack or hitting the hole in a large flat golf course by rolling a ball randomly [12].

Because of the folding time problem introduced by Levinthal, the second major question

of protein folding introduced in section 1.1 has been categorized by Dill et al. to two

major questions of the folding code and the folding speed[17]. The first one deals with

the thermodynamic aspect of how the interatomic forces acting on the protein amino

acids form a native structure, and the second one concerns the kinetic aspect of how a

protein evolves quickly into a functional structure.

Some researchers have tried to connect these two viewpoints by identifying the fea-


tures for rapid folding to the global free energy minimum. Using a lattice Monte Carlo

model, Shakhnovich and co-workers suggested that a sufficiently large energy difference

between the native configuration and others structures is a sufficient and necessary con-

dition for fast folding. If the energy of the native state is much lower than that of all

other metastable configurations, the protein folds rapidly and avoids becoming kineti-

cally trapped in locally stable configurations for long periods of time.[18]. However the

use of lattice models to study qualitative aspects of protein folding has been criticized on

the grounds that the configurational space in lattice models is relatively small compared

to real off-lattice proteins. The restricted number of configurations makes them a poor

model to resolve “Levinthal’s paradox”, which concerns the difficulty of finding a folded

state from a large number of conformations[16].

While some experiments suggest that small proteins follow the thermodynamic hy-

pothesis [16, 19], there are proteins for which the native structure is not the thermody-

namically most stable configuration[16]. For some amino acid sequences it is predicted

that naturally-occurring proteins either fold to a configuration that is not a global mini-

mum of the free energy or are part of a subset of amino acid sequences that can fold to

the global free energy minimum in a reasonable time[14]. For example, the protein mis-

folding happening in many diseases such as Alzheimers and Creutzfeldt-Jakob has been

attributed to folding to an alternate lower energy state that acts as a kinetic trap[8, 16].

Several experimental results indicate the existence of specific pathways in which a

given protein folds[20], supporting Levinthal’s claim[15]. Based on these observations,

there are intermediates even for single-domain proteins that accumulate during the first

few milliseconds of folding, and even for proteins that seem to fold in a two-state reaction,

specific pathways have been characterized by protein intermediates[20]. However, there

is no comprehensive theoretical explanation for this behavior[3].


1.3 Energy Landscape

Generally, the term energy landscape refers to the characteristic shape and topology or

form of the free energy as a function of protein conformation to explain some aspects of

folding behaviors[14]. Stable or metastable configurations of the system can be unam-

biguously identified either through structural features or through basins in the potential

energy. Typically, the landscape of complex systems is covered with many local min-

ima that can lead to complicated thermodynamic and dynamical behavior[14, 21]. The

configurational space of a protein (composed of the set of values of all degrees of the

freedom of the molecular system) is a highly multi-dimensional space. Even for small

proteins, the dimensionality of this space is on the order of a thousand[22]. Within the

high dimensional space, the energy landscape (free energy as a function of protein con-

figurational space) features many local minima and entropic barriers. In addition, there

can be flat areas of nearly equal free energy in the landscape, which effectively trap the

system for long periods of time under the normal dynamics of the system and require long

periods of time for the system to reach the native structure. The free energy landscape

of many proteins is extremely rugged around the global minimum due to the relatively

close packing of the amino acids and their atoms in the native configuration[10].

Statistical mechanical modeling has helped significantly in answering Levinthal’s para-

dox by postulating that folding happens in funnel-shaped energy landscapes that allow

multiple efficient folding pathways rather than involving a single microscopic pathway[17].

Onuchic, Dill, Wolynes and co-workers proposed that a “folding funnel” is the special

characteristic of foldable proteins that directs the folding protein into the native state

without the need for a definite pathway[12, 14, 23, 24, 25, 26, 27, 28]. Based on this idea,

the landscape that pertains to the form of either energy or the free energy as a function

of protein conformation is shaped as a funnel. Protein folding is viewed as a process

in which the protein glides down the funnel shaped free-energy landscape along several

different paths towards its native structure[25, 26, 28, 29]. Thus, the low free energy


Figure 1.1: A rugged funnel shaped energy landscape[27]

structures of the free energy landscape are at the bottom of a broad valley and therefore,

a protein molecule in a conformation within one of the valleys can dynamically funnel to

the lowest free energy state. A rough picture of this funnel, made by Ken Dill et al., is

presented in Fig. 1.1.

As mentioned in section 1.2, the Levinthal paradox can be presented as hitting the

hole in a large flat golf course by rolling a ball randomly. However, if a golf course

has a funnel-shaped landscape, downhill everywhere towards the hole, a hole-in one can

happen every time in a reasonable time[12]. While Levinthal postulated there should be

one specific pathway for folding[15], typically there are many possible pathways down

the funnel in the single funnel landscape[30]. However, experimental results for the fast-

folding proteins “reconciled” these two ideas by illustrating that there is a statistically

predominant pathway[31].

There are several proteins that do not have a single stable structure[30]. For example,

Levinthal showed that the mutants of E-coli have two stable forms[15], or prions that


can occasionally misfold even in the absence of any mutation and consequently develop

neurodegeneration[32, 33]. Having more than one stable structure under specific thermo-

dynamic conditions suggests that the landscape consists of not only one funnel but rather

a multi-funnel free energy landscape[30]. Based on this picture, it has been suggested

that folding happens first by a kinetic step to select a specific funnel and then the protein

glides inside the selected funnel to reach its minimum free energy point[3].

Folding happens as a diffusion-like phenomenon on this landscape of hills and valleys[34].

As a result, the folding time is too long to be studied by almost all realistic computational

approaches[35, 22], and therefore, most of the theoretical studies have been carried using

relatively simple models such as lattice models.

1.4 Theoretical Studies of Protein Folding

The main common approaches to study protein folding are Molecular Dynamics (MD)

and Monte Carlo simulations of lattice models of biomolecular systems. As discussed

in the next chapter, standard molecular dynamics simulations are based on integration

schemes of Newtonian equations of motion that use the sequential application of coordi-

nate updates over brief time intervals that are typically on the order of a femtosecond.

Discontinuous Molecular Dynamics (DMD) is another common approach to study protein

folding. In the DMD approach, potentials are modeled as a series of discontinuous steps

between constant values of the interaction energy. Except for specific points where the

energy is discontinuous, the system is force free and the evolution equations can be solved

exactly. The effect of the discontinuities is to exert impulses that lead to discrete jumps in

the momenta of the system. Under many conditions, the system can be simulated quite

efficiently and propagated to longer times at a given computational load[36, 37, 35, 38].

In typical lattice models of biopolymers, amino acids are represented as single beads

that can only occupy sites on the underlying lattice. Typically this lattice is cubic with


several different types of beads. The interaction potential energy is defined between

neighboring beads occupying adjacent lattice sites, depending on the beads type. One

of the most popular lattice models is the HP model in which only two kinds of beads

are introduced to model hydrophobic (H) and polar (P) amino acids. Monte Carlo

simulation on lattice models typically is even simpler than discontinuous potential models

since the restriction of positioning beads on lattice sites implies they have a relatively

small configurational space that can be efficiently explored at low computational cost[35].

However, because of the limited possibility of angles between bonds, lattice models fail

to adequately describe the geometrical properties of proteins[35] and the entropy range

of configurations is greatly reduced. Lattice models can not adequately address the

“Levinthal’s paradox” since unlike a real protein, almost all conformations are kinetically

accessible in lattice models[16]. To overcome these obstacles, models amenable to DMD

simulation in which simulations can reach realistic time scales and which mimic the basic

thermodynamics properties of proteins, have become popular[5, 35].

1.4.1 Using DMD to Study Protein Folding

DMD of biomolecular systems is based on simple models in which detailed, smoothly

varying interaction potentials between constituents of the system are replaced with dis-

continuous potentials of stepped form. Such models can be designed to capture the

qualitative behavior of proteins at low computational cost. The potentials typically

must consist of flat areas with different potential values. For instance, attraction and

repulsion can be defined as step and shoulder potentials respectively.

These stepped forms also make it possible to use the potentials as natural index

functions for use in classifying and comparing different protein configurations. As we

will show, such a classification scheme has the added benefit of temperature-independent

relative configurational entropies in the absence of a solvent. This is quite different

from models for which MD may be used. There, the classification of structures is less


natural since structures are identified based on arbitrary critical distances which are not

clearly distinguished in the interactive potentials. Beside this, the relative configurational

entropies are (at least somewhat) temperature dependent.

The collapsed phase of a protein-like chain can be studied using simple DMD dynam-

ics starting from a random unfolded structure at a very high temperature, and decreasing

the temperature in several steps during the simulation, until reaching a specific temper-

ature below the room temperature. While the configuration obtained at the end of this

annealing process is in a collapsed phase, it is not guaranteed that this is the native

structure of the protein-like chain, since it can be a metastable local energy minimum.

If the free energy landscape is rugged, the latter would actually be more likely. In order

to distinguish between compact configurations and the “folded” native structure at the

global minimum of the free energy, it is therefore necessary to study the free energy

landscape.

As will be discussed clearly in Chapter 2, the investigation of the free energy landscape

can be done using a Hybrid Monte Carlo (HMC) method. Usually, HMC is implemented

as a combination of the Monte Carlo and MD methods. Here, we combine a Monte Carlo

procedure with a dynamical updating scheme based on DMD.

1.5 Role of Solvent

There is no consensus on how significant the role of the solvent is for protein folding,

or what interaction or set of interactions play the main role[39]. Some researchers

have proposed that folding is a balance between entropy versus enthalpy-dominated

hydration[40, 41], while there have been experiments that have shown that a protein can

fold into its native configuration with apparently negligible solvent ordering effects[39].

However, in nature and therefore in almost all experimental studies, folding occurs

in the presence of a fluid environment (in vitro or in vivo). Some experimental studies


suggest that “a significant portion of the fold-dictating information is encoded by the

atomic interaction network in the solvent-unexposed core of protein domains”[13].

To obtain a useful description of the energy landscape of a folding protein, the free

energy should be averaged over solvent coordinates, where the energy landscape becomes

only a function of the protein atoms coordinates[28, 14].

1.6 Thesis Outline

In Chapter 2, some of the simulation techniques used to explore the free energy landscape

of protein-like systems such as DMD (Sec. 2.2), the Parallel Tempering (PT) method

(Sec. 2.3) will be introduced. It will be shown that object oriented programming as well

as parallel programming can be implemented easily for the PT method.

In Chapter 3, studies of the energy landscape of a protein-like chain in the absence

of any fluid will be presented. To capture the basic behavior of proteins in a reasonable

computational time, in these models, discontinuous potentials are used for the potentials,

where attraction and repulsion are defined as step and shoulder potentials respectively.

It will be shown that using a family of such simple protein models, each consisting of a

periodic sequence of four different kinds of bead, these protein-like chains are found to

exhibit a secondary alpha helix structure in their folded states. It will be shown that in

these cases the relative configurational entropies of the protein-like chains are independent

of temperature, which makes it possible to compute the relative configurational entropies

and the free energies of the configurations very accurately. Relative configurational free

energies at different temperatures can be determined from relative populations at those

temperatures. The free energy results can be interpreted in terms of the free energy

landscape picture. For example, if at a specific temperature the population of the most

common structure is around 99% of the total population, this is a sign of a deep free

energy valley in the landscape belonging to that particular structure at that temperature.


Such understanding of the free energy landscape is the main objective of this work.

In Chapter 4, the free energy landscape of a protein-like chain in the presence of a

square-well fluid is computed and contrasted to the unsolvated system. All interactions

in the system are defined in terms of discontinuous potentials. Similar to the previous

chapter, the investigation of the free energy landscape is done using a Hybrid Monte Carlo

(HMC) method, where HMC is implemented as a combination of the Monte Carlo and the

DMD method. The Parallel Tempering (PT) method [42, 43, 44] is used for the Monte

Carlo part to avoid getting trapped in local free energy minima and to increase the speed

of phase space exploration[45]. The parallel tempering method allows configurations to

be generated with weights given by the canonical ensemble over a range of temperatures.

It will be discussed that the relative configurational entropies in the presence of the fluid

particles can be temperature dependent. The phase of the used solvent will be studied

and compared to previous studies. It will be shown that existence of a phase transition

can have a huge impact on the efficiency and usefulness of the PT sampling approach.

In Chapter 5, we review a simple model of the dynamics of a protein-like chain stud-

ied in the previous chapters, Model B, in the presence of an implicit solvent. The model

assumes that the implicit solvent interacts on a fast time scale with the chain beads

compared to the time scale for structural rearrangements of the chain, and the implicit

solvent provides sufficient friction so that the motion of all beads in the protein-like

chain is governed by the Smoluchowski equation. It will be shown through simulation

of a stochastic model of the evolution of the system that the dynamics of transitions

between microstates of the chain is well described by the first-passage time solution of

the Smoluchowski equation. The individual rates between microstates are incorporated

into a Markovian model of the relaxation of the chain. Using this model, the equili-

bration process from an ensemble of initially extended configurations to mainly folded

configurations is investigated at low effective temperatures for a number of different chain

lengths.


Finally, in Chapter 6, the conclusion will be given based on the results of the previous

chapters.

Chapter 2

Simulation Techniques

2.1 Different Simulation Techniques

The content of this section is a brief introduction to different simulation techniques

that are used in protein folding studies. These methods are discussed extensively in

refs. [36, 46, 47, 48].

Monte Carlo (MC) and Molecular Dynamics (MD) are the two most common ap-

proaches for numerical studies of many-particle systems. The objective of these numer-

ical methods is to simulate a system applying some approximations and observe some

properties for computing the quantities that are infeasible to calculate analytically.

2.1.1 Monte Carlo Method

Monte Carlo methods refer to a class of computational algorithms that use numerous

random sampling methods to compute a specific property. For example, Monte Carlo

sampling can be used to compute a simple integral:

I =

∫ b

a

f(x)dx = |b− a|〈f(x)〉 ≈ |b− a| 1N

N∑i=1

f(xi), (2.1)

13

Chapter 2. Simulation Techniques 14

where N is the number of random samples and xi values uniformly drawn from the interval

[a, b]. As N →∞ the two sides of Eq. 2.1 become equal.

In many problems of interest in statistical physics, where the systems have many

degrees of freedom, Monte Carlo methods can be applied to compute high dimensional

integrals that correspond to ensemble averages. For the canonical ensemble, where the

number of particles and temperature are fixed, the equilibrium average of an observable

F can be expressed in terms of configuration space integrals as:

〈F 〉T =1

Z

∫F (rN) exp[−U(rN)/kbT ] drN , (2.2)

where U(rN), r, N and T are the potential energy, coordinate, number of particles and

temperature respectively, and Z is

Z =

∫exp[−U(rN)/kbT ]drN . (2.3)

Here, the most efficient Monte Carlo sampling can be applied if each sample is chosen

according to the probability weight of

w(rN) = exp[−U(rN)/kbT ], (2.4)

which is the Boltzmann factor. Eq. 2.2 can be re-expressed as:

〈F 〉T =1

Z

∫F (rN) exp[−U(rN)/kbT ]

w(rN)w(rN) drN , (2.5)

and consequently:

〈F 〉T =1

Z

∫F (rN)w(rN) drN . (2.6)

If a number of points, m, are randomly generated in configuration space according to

weight function Eq. 2.4, Eq. 2.6 can be written as:

〈F 〉T ≈ 1

m

m∑i=1

F (rNi ), (2.7)

where by increasing m, the right side of Eq. 2.7 becomes a better approximation of 〈F 〉T .


Markov chain Monte Carlo (MCMC)

By assuming that the Hamiltonian of the system can be written as H = ΣNi=1p

2i /2mi +

U(rN), by separating the kinetic energy from the Hamiltonian, the probability density

can be factored into a density for the spatial degrees of freedom and a Maxwell-Boltzmann

density for the momenta. The ensemble average of a property F (rN) that depends only

on the spatial degrees of freedom can be described by Eq. 2.2.

Considering Eqs. 2.6 and 2.7, a Monte Carlo procedure can be implemented using a

random walk in such a way that the visiting of a particular point rN happens with the

probability density proportional to the Boltzmann factor (exp[−βU(rN)]). Therefore, the

main task is to generate a sequence of configurations rN1 , rN

2 , ..., rNm in which the probabil-

ity of finding a configuration rNi is exp[−U(rN

i )/kbT ]drNi when m → ∞. This sequence

can be generated using stochastic methods in many different ways. The following three

steps scheme was introduced by Metropolis et al.[49]:

1. Select a random configuration and compute its energy, U(rN)

2. Then, slightly move the location of the system in the configurational space from rN

to r′N , and calculate the new energy, U(r′N).

3. The move (rN → r′N) is accepted with probability of min (1, exp(−β[U(r′N) −U(rN)])).

It can be shown that every point in configuration space can be reached from any other

state in a finite number of MC steps[47]. Later in this chapter, the Parallel Tempering

Method, which is one of the common MCMC methods, will be introduced.

2.1.2 Molecular Dynamics (MD)

Molecular dynamics is a very common approach for studying the properties of classi-

cal many body systems. In this approach, the equations of motions for each particle are


solved numerically using short time steps to construct approximate solutions of the equa-

tions of motion. For most classical systems, the particle positions are updated according

to the Newtonian equations of of motion:

~ri =~Pi

m, ~Pi = ~Fi = −∇ri

∑

j 6=i

U(rij). (2.8)

This means that the position of a specific particle at time t + ∆t can be understood

using the Taylor expansion:

x(t + ∆t) = x(t) + x(t)∆t +1

2x(t)∆t2 + ... = x(t) + vx(t)∆t + O(∆t2), (2.9)

where x is one of the three dimensions of the position vector (~r), and O(∆t2) is the local

truncation error. Therefore:

x(t + ∆t) ≈ x(t) + vx(t)∆t, (2.10)

which is known as the “Euler scheme”. To calculate vx(t) in Eq. 2.10, the same technique

can be applied in which

vx(t + ∆t) ≈ vx(t)− 1

m

∂U(t)

∂x(t)∆t. (2.11)

The simulation proceeds iteratively by calculating the accelerations and displacing

the particles according to Eqs. 2.10 and 2.11. This process of calculating of velocities

and moving the particles according to Eq. 2.10 is continued until meeting the target time.

It should be mentioned that there are other integration schemes such as the leapfrog and

Verlet algorithms that can be applied instead of Eqs. 2.10 and 2.11, which have smaller

global error considering the whole run.

To derive the Verlet method, four terms of Taylor expansion of Eq. 2.9 should be

kept, where x(t + ∆t) will be:

x(t + ∆t) = x(t) + vx(t)∆t +1

2ax(t)∆t2 +

1

6bx(t)∆t3 + O(∆t4), (2.12)


where ax(t) and bx(t) are the acceleration and jerk (the third derivative of x respect to

t) at time t. Applying Taylor expansion in different direction, x(t−∆t) can be derived

as:

x(t−∆t) = x(t)− vx(t)∆t +1

2ax(t)∆t2 − 1

6bx(t)∆t3 + O(∆t4). (2.13)

By adding these two expansions (Eqs.2.12,2.13) the Verlet formula is derived as:

x(t + ∆t) = 2x(t)− x(t−∆t) + ax(t)∆t2 + O(∆t4). (2.14)

Because of the cancelation of first and third order terms of Taylor expansion, the Verlet

integrator becomes more accurate than using simple Taylor expansion.

According to the ergodic theorem, the volume of the phase space covered by a dynam-

ical trajectory is proportional to the time of its evolution. Consequently, the time average

of a dynamical variable along a long trajectory converges to the uniform average of the

dynamical variable over a constant energy surface of phase space as the length of the

trajectory tends to infinity. The average over the constant energy hypersurface of phase

space is known as the microcanonical ensemble average. While the ergodic hypothesis is

applicable for microcanonical ensemble averages, averages in the canonical ensemble can

be shown to be equivalent to microcanonical ensemble averages over an extended phase

space in which auxiliary variables are introduced that act as a thermostat to fix the tem-

perature. Furthermore, it can be shown that the average of most dynamical variables in

the canonical ensemble differ from their microcanonical ensemble counterparts by terms

that differ by at most order N−1, which become insignificant for large systems.

One consequence of the typical form of the Hamiltonian is that the dynamics conserves

specific variables, such as the total energy of the system, H =∑

i p2i /2m+U(Ri). Energy

drift tends to happen in typical MD algorithms due to the use of a finite time step

and numerical round-off error. This means that when MD is applied for large time

scales or when MD is used for sampling phase space, the stability of the run should

be monitored to ensure that the energy of the system is stable and the trajectory is


reasonable. However, for the Verlet algorithm (Eq. 2.14), it can be shown that while the

energy fluctuates between steps, there is no energy drift, and the energy oscillates around

a constant value which is the solution of a shadow Hamiltonian (slightly different from

the real Hamiltonian)[50, 51]. However, similar to other MD methods, for large time

increments, the Verlet algorithm does not generate accurate trajectories and can become

unstable.

While the computational cost increases significantly as the time step is reduced, it

is necessary to choose a very small time step (on order of a femtosecond) to have a

sufficiently accurate algorithm. The most efficient MD algorithm is the one that allows

the largest possible time step for a specific level of accuracy while maintaining stability

and preserving conservation laws.

The interaction potentials between particles in the system can be either detailed

models such as Lennard-Jones or coarse-grained models such as square-well potentials.

It is clear that while using detailed and accurate potentials makes the dynamics more

realistic, it increases the cost of simulation significantly. When the potentials are dis-

continuous, another approach, known as discontinuous molecular dynamics (DMD), can

be applied which has a lower computational cost than a trajectory of equivalent length

carried out using standard MD methods. In DMD the dynamics is essentially exact and

the conservation of energy does not depend on a chosen time step.

2.1.3 Hybrid Monte Carlo

Traditional MC simulation methods may suffer from strong correlation between states

that are generated in two consecutive MC steps, which means that there are only small

differences between the two states[47]. Consequently, applying MC alone can lead to a

slow rate of convergence of estimated ensemble averages, particularly for systems with

an underlying energy landscape that is rough and pitted with many local minima and

saddle points. The simulation methods that are a combination of both Monte Carlo


sampling and molecular dynamics are called Hybrid Monte Carlo (HMC), where a dy-

namical procedure is used to change a set of coordinates for use as a trial state in a MC

procedure[52].

The HMC algorithm consists of selecting a set of momenta conjugate to each spatial

coordinate based on the Maxwell-Boltzmann density, propagating the dynamical system

according to some effective Hamiltonian, and then applying an acceptance test to deter-

mine if the trial state obtained at the end of the propagation is acceptable or not. The

dynamics is time-reversible and preserves the volume of the phase space relevant for the

dynamics. A trajectory can be accepted with a probability proportional to exp(−β∆H),

where ∆H = Hfinal −Hinitial is the change in the total Hamiltonian between the initial

and final states. If the energy of the system is conserved exactly, as in the case of DMD

trajectories discussed in the next section, all trajectories are accepted.

In this project HMC is applied for the sampling of the energy landscape of protein-like

chain in which the Monte Carlo sampling is done using Parallel Tempering (PT) and the

generation of trial configurations is carried out by Discontinuous Molecular Dynamics

(DMD). Both the DMD and the PT method will be discussed extensively in the next

sections.

2.2 Discontinuous Molecular Dynamics (DMD)

The content of this section is a brief introduction to the DMD simulation method based

on the extensive discussions provided in ref. [36].

DMD is a version of MD in which the potentials are discontinuous and the system

evolves event by event rather than by iteration of a fixed time step. The discontinuous

potentials make the dynamics force free and consequently Newton’s equations are exactly

solvable. Since in the DMD method the studied system evolves event by event instead of

by sequential propagation of the system over discrete time intervals, the method is also


called an “event driven” method. There are various advantages to using DMD over MD,

especially for runs that simulate relatively long trajectories. DMD not only offers faster

computational speed, but stability as well, since the total energy in the simulation is

always conserved. A very simple scheme of the DMD simulation consists of the iteration

of predicting the first collision event and evolving the system up to the collision time.

Since one of the main goals of using DMD is to decrease the simulation cost, a

number of techniques are associated with this method to obtain an optimal computational

performance.

2.2.1 Event Tree

In event driven dynamics, collisions between particles and other events must be executed

in their proper chronological order. An important component of such a simulation is the

storage and ordering of events to simplify the search for the next event to be executed.

A number of algorithms developed for database systems, such as binary trees, are useful

to optimize performance in event driven simulations.

A binary tree is used to store and sort the different events such as collisions and

measurements, where the event time is an ordering “key”. A few functions are responsible

to insert new predicted events in the tree and search for the earliest event. The insertion

of a new event scales as log N with a small prefactor, where N is the number of events in

the tree [36]. Each node of the tree contains a number of essential pieces of information

such as the event participants, the event time and the type of event. Some of this

information can be used to understand whether an event in the tree has been invalidated

by the occurrence of an earlier event. It is easy to add or delete events because of the

tree structure. Since some of the previously predicted events become irrelevant after each

collision, for each particle several other future collisions are predicted and stored in the

tree. After executing each event, the new events are scheduled for the participant(s) of

the event. Unlike molecular dynamics, a particle position is only updated when an event


Figure 2.1: Cell partitioning for a chain system

associated with that particle is executed. This means that except for the measurement

events where a full update of the system occurs, each particle has a record or a local clock

that records the time when the last collision or cell crossing of that particle happened.

2.2.2 Cell Crossing Event

Another technique used to increase the efficiency of the simulation is to divide the sim-

ulation box into cells. By using cubic cells to partition the system, the search for future

collision partners for a given particle is restricted to the particles occupying the same and

neighboring cells instead of all the particles in the system. Whenever a particle moves

out of a cell, the new neighboring cells must be checked for possible collision events.

This means that in addition to the collision prediction event, the time of cell crossing

of every particle should be computed and stored in the event tree. The chosen cell size

should be larger than any critical interaction distance to ensure that only by checking

the neighboring cells of a particle, its next events are predicted. A very basic scheme of

cell partitioning for a chain consists of several beads is presented in Fig. 2.1.

2.2.3 Collision Event

Predicting the collisions between particles plays an important role in the DMD simulation.

The prediction of new events is done by checking the distances and the relative velocities

of all pairs of particles in adjacent cells. The next interaction time for each pair of particles


must be determined for storage in the binary tree. The interaction times are determined

by solving for the time at which the distance between the pair reaches a critical value.

At the collision |~r + ~vτ | = λ , where ~r = ~ri − ~rj is the initial relative vector of particles i

and j and ~v = ~vi − ~vj is the initial relative velocities of these two particles and λ is the

critical distance between the pair, where the potential has a discontinuity. Based on this:

τ =−b + α

√b2 − v2(r2 − λ2)

v2(2.15)

where b = ~r.~v and α can be either 1 or -1.

The collisions can be identified by categorizing the collision events into two main

categories. The first category consists of hard sphere collisions in which the potential

energy does not change, while the second category consists of events in which the potential

energy of the system changes and the total kinetic energy of the colliding particles will

change. For the hardcore collision, α = −1, since the smaller positive solution of Eq. 2.15

presents the first time that the separation between two particles become equal to λ. To

check whether two particles will collide, Eq. 2.15 should have a real positive solution,

which requires b < 0 and (b2 − v2(r2 − λ2)) ≥ 0. Applying the laws of conservation of

energy and momentum, the velocities of interacting particles i and j should change after

a collision by:

∆~vi = − b (2mj)

(mi + mj) λ2~r (2.16)

∆~vj =b (2mi)

(mi + mj) λ2~r (2.17)

where mi and mj are the masses of particle i and j respectively. It is clear that when

mi = mj, ∆~vi = −∆~vj = − bλ2 ~r.

For entering and exiting a potential well or shoulder, α in Eq. 2.15 should be −1 and

1 respectively. However, the feasibility of exiting a well (or entering a shoulder potential)


depends on the kinetic energy of the particles. Since there is an effect of gaining or losing

energy, the final velocities after the collision in the case of entering and exiting a well (or

shoulder) are:

∆~vi =−(bµ) + α

√(bµ)2 − 2r2µ ∆u)

r2mi

~r (2.18)

where ∆u is the potential depth of the well (or potential height of the shoulder), µ is the

reduced mass (mi mj

(mi+mj)), and the value of α for entering and exiting the well is -1 and 1

respectively. In the case that mi = mj, Eq. 2.18 simplifies to:

∆~vi = −∆~vj =−b + α

√b2 − 4r2(∆u/m)

2r2~r (2.19)

2.2.4 Measurement Events

Beside the cell crossing and collisions, there are a few measurement events that occur

at specific times or after specific number of events. As mentioned earlier, unlike MD,

during the DMD simulation, each particle has its own clock recording the last time its

coordinates were updated. To record an instantaneous configuration of the system, the

coordinates of all particles in the system must be updated. Such a procedure is signalled

in the event tree by a measurement event.

Observables

To track changes in the protein-like chain, properties such as the end-to-end vector (Re),

radius of gyration, potential and kinetic energies and principle moments of inertia are

measured. The most important thing that must be recorded during the simulation is the

chain configuration, which is represented by its bonds.

The radius of gyration, Rg, the root mean square distance of the beads of the protein-

like chain from the chain center of mass, is defined as


Rg =

√√√√ 1

N

N∑

k=1

(rk − r)2, (2.20)

where N is the number of beads in the protein-like chain and r is the chain center of

mass. Beside this, since the main objective of this study is to investigate the populations

of different structures, the matrix representing the configuration structure is stored in a

measurement event. Although end-to-end distances can be used to distinguish compact

configurations from stretched or partially collapsed conformations, distances between

only a pair of beads are not specific enough indicators to identify configurations un-

ambiguously. The radius of gyration, Rg, is a better indicator for distinguishing the

protein-like phases and it can indicate the folded structure in a more precise way than

the end-to-end distance.

Potential and kinetic energies are also measured frequently during the simulation.

Checking the conservation of energy is a useful tool to confirm that no collision has been

missed or that an event has been executed out of order. This capability of verifying the

accuracy of a trajectory is one of the main advantages of DMD over MD. The potential

energy of the system can be measured either by checking the separation distances of all the

particles or by adding or subtracting the potential depth of a well whenever any particle

enters or exits. Since the first approach is computationally much more demanding, it is

used less frequently just to ensure that no collision is missed. The potential energy is the

main parameter for calculating the probability of exchange in the PT method and it is one

of the main tools in understanding the shape of the energy landscape. In chapter 4, the

heat capacity (Cv) and the compressibility factor (κT ) are calculated using the variation

in energy and in the number of particles respectively to examine the thermodynamic

properties of the system and look for changes in phase of solvated systems. The details

of the relations between Cv and the energy variation and κT and the number of particles

variation will be discussed in Appendix A.


Finally, the principal moments of inertia are measured (Chapter 3) to study qualita-

tively the shape of structures.

2.3 Parallel Tempering Method

2.3.1 Parallel tempering

In Markov Chain Monte Carlo (MCMC), a collection of configurations is gathered during

the sampling, i.e., X=(S1, S2, ..., Sn), where each configuration appears in the chain

of states with a known weight. States in the Markov chain can be strongly correlated,

leading to a slow rate of convergence of some estimates to their true asymptotic averages.

Unlike a conventional MCMC, which uses one Markov chain for sampling, the parallel

tempering (PT) method can be viewed as using multiple coupled chains for sampling

and studying ensemble averages. Applying several Markov chains can help in collecting a

set of configurations in which two consecutive configurations are not strongly correlated.

This method was initially introduced by Swendsen and Wang [42] and formulated by

Geyer [43], and then it was subsequently implemented, developed and applied in physics

as the PT method by Tesi, van Rensburg, Orlandini and Whittington [44]. This Monte

Carlo sampling method can increase the mobility of exploring phase space, particularly

at low temperatures, and it converges to the desired distribution more rapidly, especially

for systems with rugged potential energy landscapes. Therefore, this method has been

applied to study protein folding [10].

Considering βi = 1/kbTi, where kb and Ti are the Boltzmann constant and the system

temperature respectively, the main idea of PT is to select a set of values β0 < β1 <

β2 < · · · < βn within a chosen range of temperatures [β0, βn], such that there is a

significant amount of overlap in the distributions of two adjacent β values [44]. Note

that Tn = 1/(kbβn) is the lowest temperature, while Tmax = T0 = 1/(kbβ0) is the highest

one and all the replicas of the system are running at different temperatures inside this


range. In the canonical ensemble, the probability of state j (Sj) at inverse temperature

βi is

P(Sj ,βi) =exp(−βiUj)

Zi

(2.21)

where P(Sjβi) is the probability of state j (Sj) at βi and Uj is the potential energy of state

j and Zi is the normalization factor. To generate a canonical distribution, replicas at

adjacent temperatures exchange their configurations (or their temperatures) at specific

times or number of events with the probability of

p = min

(1 ,

P(Si+1,βi)P(Si,βi+1)

P(Si,βi)P(Si+1,βi+1)

)= (2.22)

min (1 , exp(−(βi+1 − βi)(Ui − Ui+1))) = min (1 , exp(∆β∆U)), (2.23)

where ∆U = Ui+1−Ui and ∆β = βi+1−βi. The generated structures obey the canonical

ensemble probability.

By using the PT sampling, the high temperature replicas explore large volumes of

the phase space, while the low temperature ones explore local low-energy regions of the

phase space. Note that although running n replicas means an order of n times increase

in computational effort, it makes the MCMC convergence more than n times faster[45].

2.3.2 Efficient Parallel Tempering Dynamics

To achieve an efficient sampling, all the temperatures should be assigned to each replica

in a reasonable amount of simulated time. To achieve this purpose, each replica should

travel easily between any pair of temperatures and spend a comparable amount of time

at different temperatures. Thus, swapping should occur frequently and the probability

distributions of βi and βi+1 should have a reasonable overlap [44], which requires choosing

a proper ∆β = βi+1 − βi for different parts of the temperature spectrum. It is clear

that ∆β should be sufficiently small and therefore, depending on the range of studied


temperatures, the number of replicas can be rather large [44]. However, since increasing

the number of replicas increases the computation cost, an optimum value for ∆β should

be found to avoid using too many replicas.

In addition to choosing a proper ∆β, the temperature range βn−β0 should be chosen

carefully. For example, in this project, the very rough landscape of the protein-like chain

have deep metastable minima. The range of temperatures must be sufficiently large to

enable an escape from these minima [53]. Consequently, the highest temperature must be

high enough to avoid the trapping of replicas in local energy minima [45]. It is expected

that the most common structure at the highest temperature T0 = 1/(kBβ0) be a non-

bonded structure. It is predicted that in most of the cases the most common structure

at the lowest temperature Tn = 1/(kBβn) is the configuration with the lowest energy in

the system.

One of the challenges of using the PT method is to estimate the right number of

replicas, and the temperature difference between adjacent replicas, where ∆β can depend

on the temperature. In most of the cases, ∆β varies in the temperature spectrum, where

at lower temperatures typically ∆β should be smaller.

To study the efficiency of the PT method in a specific run, a period is defined as an

amount of time that is needed for one replica to travel back to its initial temperature after

several parallel tempering exchange events. Typically when the PT system contains a

large number of replicas, it is harder to have good dynamics, and typically some of

the replicas may not move very well between all the temperatures during one period

(i.e. due to possible entropic barriers). This can lead to a prohibitively inefficient PT

dynamics. Sampling is efficient if the period is small and if in one period many (if not

all) temperatures are visited.

As will be discussed in chapter 4, another thing that should be considered is the

difficulty of applying the PT method to the systems that undergo a phase transition in

the range of studied temperatures. While finite size effects limit the scale of fluctuations


near phase boundaries in small systems, it is still possible to monitor for phase transitions

by looking at derivatives of the free energy of the system, such as the heat capacity and

the compressibility.

The time interval between two consecutive PT exchange events can vary for different

systems. This means that an optimum value for the time step should be found, where

this time should be large enough that the system can locally equilibrate yet small enough

to avoid any unnecessary computation. Considering a fixed computational cost (fixed

number of CPU hours), finding this optimum time value for the dynamics step is one of

the most challenging tasks to optimize the PT algorithm. It is clear that for a specific

system, this value depends on the characteristics of the system, such as the number of

particles as well as the range of temperatures under investigation.

2.4 Simulation Structure of the Project

Unlike typical HMC where the dynamics between Monte Carlo steps is conducted using a

reversible, symplectic integration scheme, here the dynamics is conducted using the DMD

method and the Monte Carlo sampling is done using the PT method. For the Monte Carlo

part, the Parallel Tempering (PT) method can be employed to avoid getting trapped in

local free energy minima and to increase the speed at which the phase space of the

system can be explored. As was mentioned in Chapter 1, in this work all the interactive

potentials are discontinuous. The simulated system consists of several protein-like chains

exploring the configurational space individually by the DMD method. This process of

exploration can occur in the absence or presence of a solvent environment, which will be

discussed in Chapter 3 and Chapter 4 respectively.

The initial velocities are drawn from the Maxwell-Boltzmann distribution based on

the assigned temperatures. At specific times, the protein-like chain systems exchange

their temperatures according to the probability that derives from the Parallel Tempering


(PT) method. Each system containing the protein-like chain and its surrounding envi-

ronment is called a replica and the process of exchanging the temperatures is called the

replica exchange. After allowing replicas to propagate using DMD for a fixed amount of

time, some of the replicas exchange their temperatures. Then the velocities are drawn for

all the replicas from the Maxwell-Boltzmann distribution based on the current tempera-

ture for the replica. The parallel tempering method allows configurations to be generated

according to the canonical ensemble. Since the velocities of all replicas are being updated

periodically using the Maxwell-Boltzmann distribution, and DMD similar to MD is sym-

plectic which means that the mapping is volume-preserving and the dynamics are time

reversible, and the PT sampling is done according to the canonical ensemble probability,

all necessary conditions for generating a state with canonical density are satisfied [52].

The potential-based classification of structures can be used to find the population (i.e.

frequency of occurrence in the simulation) of each structure at a specific temperature.

Since each structure is generated by a probability proportional to e−βF , where β = 1kBT

and F is the Helmholtz free energy of that configuration, by comparing the popula-

tions of different structures, it becomes possible to calculate the entropy and free energy

difference of any pair of configurations.

2.4.1 Parallel Programming

Object-oriented program facilitated setting up a code to simulate systems with many

replicas. In chapter 4, a parallel programming, MPI (Message-Passing Interface)[54]

is used to make the program approachable in time to study the energy landscape of a

protein-like chain in the presence of thousands of fluid beads. Applying the PT method

using an objected oriented programming makes the implementation of the parallel pro-

gramming straightforward. To construct a parallel version of the code, each replica as an

object runs in one processor and at the replica exchange event, the energy values of the

replicas are sent to the main node and then each replica receives its updated temperature,


which can be the same as its earlier temperature. The measurement events happen after

several parallel tempering steps. The parameters measured and recorded on each replica

including the matrix representing the structure are sent from the processor to the main

node where it is stored in the RAM of the main node. At the end of simulation, all the

necessary statistics are accumulated and computed on the main node.

Chapter 3

Protein-like Chain Without a

Solvent

3.1 Model

In this chapter, a protein-like chain is studied in the absence of any solvent. Since the

main objective is to study the basic behavior of proteins in a very short computational

time, the focus is on constructing simple models of a protein-like chain which have the

ability to be folded into an alpha helix structure at sufficiently low temperature. The

model is designed to have a free energy landscape similar to that of simple protein systems

that make use of much more detailed potentials.

The protein model used here is a beads on a string model in which each bead represents

an amino acid or residue. In this model the protein-like chain consists of a repeated

sequence of four different kinds of beads. While having four different types of beads is

not enough to represent the twenty different types of amino acids, it preserves at least

some of the differences between amino acids. The interactions between these beads are

designed to mimic the interactions that lead to the formation of common motifs in protein

structure, such as the alpha helix. Previous studies indicated that short chains containing

31

Chapter 3. Protein-like Chain Without a Solvent 32

6, 8 or 12 monomers are too short to fold into compact states at low temperatures, while

somewhat longer chains with 25 monomers can capture folded helical states[40]. Here,

chains of moderate lengths of 25 to 35 beads have been used to facilitate the exploration

of the free energy landscape.

In an alpha helix, one of the most common secondary structures, each turn has 3.6

amino acids, and there is a hydrogen bond between beads i and i + 4. To capture this

feature of helices, the models analyzed here allow for attractive interactions, intended

to mimic hydrogen bonds between non-adjacent residues, between beads separated from

each other by 4n beads, where n ≥ 1, and with additional restrictions on the possible

hydrogen bonds to be specified below. Several models of protein-like chains have been

considered, but only the results for two of them are presented here. The choice was made

based on the models’ similarity to a real protein and the feasibility of being studied using

the parallel tempering (PT) method.

To make contact with real proteins, and because there are too many parameters to

form unique reduced units, physical units are used in the definition of the model, although

these should not be taken too literally: we only aim to set these to the right order of

magnitude to mimic real proteins. In particular, lengths will be expressed in Angstroms,

energies in kJ/mol and masses in atomic mass units.

The two presented models differ in the hydrogen-bond potentials, while other inter-

chain interactions are the same. In total, four different inter-chain potentials are used in

these models. The first kind of potential acts between the nearest and the next nearest

neighbors and restricts the distance between the beads to specific ranges by applying an

infinite square-well potential similar to Bellemans’ bonds model[55]. Fig. 3.1(a) shows

the shape of this kind of potential. To mimic a covalent bond between two consecutive

amino acids in the protein, the distance between two neighboring beads is restricted to the

range 3.84 A to 4.48 A. This potential allows these distances to “vibrate” around values

close to the distance between stereocenters used in Ref. [5]. The next-nearest neighbors’


(a) (b)

(c) (d)

Figure 3.1: Model potentials: the (a) infinite square-well potential, (b) attractive step

potential, (c) repulsive shoulder potentials, and (d) hard core repulsion.

infinite square-well potentials represent an angle vibration. Restricting their distance to

a range from 5.44 A to 6.40 A generates a vibration angle between 75◦ and 112◦. For

simplicity, dihedral angles are not considered in our models, but as discussed later, some

restrictions on hydrogen bonds are employed to create rigidity in the backbone of the

protein-like chain similar to the dihedral angle interactions in more detailed potentials.

Hydrogen bonds are modeled by an attractive square-well potential, depicted in

Fig. 3.1(b). In all the studied models, including the two presented models, the attractive


interactions are defined between beads i and i + 4n to resemble the hydrogen bonds in

alpha helix structures. However, the two main models differ in the possibility of these

attractive bonds and the values of i and n.

In the first model, named model A, the attractive interactions act between half the

same type beads such that bonds can be formed between two beads both with the same

index of i = 4k + 1 or the same index of i = 4k + 3, where k is an integer number, and

n can be any integer number such that i + 4n lies on the chain.

In the second model, model B, only the beads with index i = 4k + 2 can make bonds

with each other, and n cannot be 2 or 3. This means that there is no attractive bond

between beads separated along the chain by eight or twelve beads. Bonds between beads

i and i + 8 as well as i and i + 12 are disallowed to make the occurrence of turns more

difficult in the protein-like chain and make it more rigid. This restriction has a similar

function as dihedral angles interactions and side chains in real proteins where they prevent

a protein from bending over easily. The differences between these two models are shown

in Table 3.1. Also in Fig. 3.2 for the 25-bead version, the possible attractive bonds

for the two models are presented in which subsequent beads were labeled A through Y.

By having different numbers of possible hydrogen bonds, their properties are likely very

different.

In an alpha helix there are 3.6 amino acids in each turn, and the distance between

two consecutive amino acids is 1.5 A along the helical axis[56]. This means a translation

of 5.4 A along the helix axis in each turn. For both models A and B, the parameters for

the attractive square-well potential, σ1 and σ2, are chosen to be 4.64 A and 5.76 A with

a mid point of 5.2 A, which is close to the translation of 5.4 A along the helix. Compared

to covalent bonds, these attractive interactions act across longer distances. The unit of

energy, ε, is chosen as the depth of the potential well of the attractive interactions. The

ε is around 20kJ/mol and the mass of each bead is set to 2× 10−25 kg, which is close to

120 amu (atomic mass units).


To represent electrostatic interactions of the atoms, repulsive interactions act between

beads 1 + 4k and 4k′, where k and k′ are integers and k 6= k′. The repulsive interaction

takes the form of a shoulder potential, shown in Fig. 3.1(c). The range of the shoulder

is set to be from 4.64 A to 7.36 A, while the height is 0.9ε. The effect of changing the

number of step repulsions in a few models was evaluated in terms of minimizing the

free energy. It turned out that changing the number of repulsions does not have a huge

impact on the shape of free energy landscape around the native structure point. Since the

repulsion between the beads increases the potential energy, the most common structures

at low temperatures do not have any repulsive interactions. Therefore, the two discussed

models differ only in their attractive potentials, while their repulsive interactions are the

same.

Finally, all other bead pairs for which no covalent bonds, hydrogen bonds or shoulder

repulsive interactions are defined feel a hard sphere repulsion to account for excluded

volume interactions at short distances, depicted in Fig. 3.1(d). The hard sphere diameter

is set to be 4.64 A, which is slightly different from the value of 4.27 A used by Zhou et

al.[5].

The reduced temperature is defined as T ∗ = (kbT )/ε, where ε is the potential depth of

the square-well attractive interactions, and β∗ is the inverse of the reduced temperature,

β∗ = 1/T ∗. Based on the units that are chosen for mass, length and time, T ∗ = 1.0

corresponds to 2400K. This means that β∗ = 8 (T ∗ = 18) should be around standard

room temperature, 300K.

It is worth pointing out that the attractive and repulsive potentials used here are

Attracting beads i n

A 4k + 1 and 4k + 3 any number

B 4k + 2 n 6= 2 and n 6= 3

Table 3.1: Allowed values of attracting bead pairs (i, i + 4n).


(a) (b)A

B

C

D

E

F

G

H

I

J

KLMN

O

P

Q

R

S

T

U

V

W

XY A

B

C

D

E

F

G

H

I

J

KLMN

O

P

Q

R

S

T

U

V

W

XY

Figure 3.2: Possible attractive bonds of (a) model A, and (b) model B for a chain of 25

beads.

qualitatively different from those of the popular Go model[57, 58] in which attractive

interactions are only defined between beads that are in contact in the native structure.

3.1.1 Definition of configurations

One of the advantages of using discontinuous potentials is the ease of comparing config-

urations. The bonds are defined using the specific range of bead separations rij in which

the potential energy V = 12

∑ij U(rij) is equal to a specific, non-zero value. Since only

one bond can exist between each bead pair (i, j) in the current models, each configuration

or structure can be represented by a matrix of interactions in which the entry at row i

and column j is 1 if i and j are bonded and 0 otherwise. Because bonded interactions

largely determine the form of the protein, we will identify this matrix with the config-

uration of the protein-like chain. Thus, by comparing the matrices, identical structures

can be easily found.

However, to represent these matrices, a more user-friendly alphabetical notation is


applied. Each bead is represented by an alphabetical letter and each bonded interaction

is shown by a pair of letters. The two dimensional matrix can thus be represented by a

string of alphabetical pairs. Since most of the studied cases involve 25-bead chains, A to

Y have been used to label different beads. For chains longer than 26 beads, both capital

and small letters are used.

The simulations produce a large series of structures. To extract the most common ones

(with their frequency of occurrence fobs), their interaction matrices need to be compared.

To make the comparison between the matrices faster, two indices were introduced that

are not necessarily unique for each interaction matrix but must be equal for matrices to be

identical. Instead of comparing the matrices directly, first these two indices are compared,

and the matrices are compared only when the indices are equal. This last comparison is

further optimized by storing each interaction matrix as an array of integers, one for each

row, with each bit representing one matrix entry. Chains longer than the number of bits

per integer (32 bits) require multiple integers per row.

The two indices were defined as follows. The protein-like chain consists of periodically

repeated regions of four different kinds of beads, and for the purpose of constructing the

indices, these regions are numbered. For example, the beads 1 to 4 are considered the first

region and the beads 21 to 24 are considered region 6. The interaction index is defined

such that the nth digit (from the right) of the interaction index defines the number of

attractive or repulsive interactions between the region n and the beads of the previous

regions, 1 to n− 1. For example, from the interaction index 321110 for a 25-bead chain,

one can understand that the beads 21-24 have three bonds with the beads 1 to 20, beads

17-20 have two bonds with the beads 1 to 16, beads 13-16 have one bond with beads 1

to 12, beads 9-12 have one bond with beads 1 to 8 and beads 5-8 have one bond with

the beads 1 to 4.

The second index, the attraction index, is defined such that the nth digit (from the

right) of the index represents the number of attractive bonds between beads that are


separated from each other by n − 1 regions. For example, an attractive index of 12005

means that there are five bonds between neighboring regions (for example, beads 1 to

4 and beads 5 to 8 are in neighboring regions), and there are two bonds between beads

that are separated by three regions (i.e., separation by three regions means there are at

least 12 beads between these beads), and there is only one bond between beads that are

separated from each other by four regions (separated by at least 16 beads).

3.1.2 Temperature independence of relative configurational en-

tropies

The definition of configurations presented above was based on the presence of bonds.

Within the model, having a certain set of bonds (and no others) leads to a specific

potential energy Uc for each configuration c. As shown below, this leads to a temperature

independent relative configurational entropy.

Here, the configurational entropy of any particular configuration c is the entropy of

a sub-ensemble in which the phase points are restricted to those of configuration c. I.e.,

the full phase space of the protein-like chain can be subdivided in regions corresponding

to specific configurations. Denoting the collection of spatial degrees of freedom by R, for

each configuration c, one defines an index function

χc(R) =

1 if (only) the bonds in c are present,

0 otherwise.(3.1)

In the canonical ensemble, the probability fobs(c, T ) of observing a configuration c at

temperature T is

fobs,c = e−β(Fc−F ), (3.2)

where Fc is the free energy of configuration C, and F is the full free energy of the system.

By definition, one has

e−βFc =1

h3N

∫dR dP χc(R)e

−β

�PNi=1

|pi|22m

+V (R)

�

, (3.3)


where N is the number of beads, m is their mass, and V is the potential energy function.

The configurational entropy is related to Fc via

Fc = Ec − TSc, (3.4)

where Ec is the average energy of configuration c at temperature T . Since its potential

energy V is always equal to Uc when χc = 1, one has

Ec = Uc +3

2NkBT. (3.5)

Combining Eqs. 3.3-3.5, one finds

Sc =3

2NkB ln

(2πme

βh2

)+ kB ln

∫dR χc(R), (3.6)

so the relative entropy of two configurations c1 and c2 at a specific temperature is

∆Sc1c2 = Sc1 − Sc2 = kB ln

∫dR χc1(R)∫dR χc2(R)

, (3.7)

which does not depend on temperature.

From Eqs. 3.5 and 3.6 it can be concluded that the free energy of a configuration is

Fc = Uc − 3

2NkBT ln

(2πm

βh2

)− kBT ln

∫dR χc(R), (3.8)

where the second term, 32NkBT ln

(2πmβh2

), is the same for all the configurations at tem-

perature T.

Because relative configurational entropies do not depend on temperature, relative

entropies can be determined from a single run at a temperature, T, using

∆Sc1c2 =∆Ec1c2 −∆Fc1c2

T

=∆Ec1c2

T+ kB ln

fobs(c1, T )

fobs(c2, T )(3.9)

=∆Uc1c2

T+ kB ln

fobs(c1, T )

fobs(c2, T ). (3.10)

Therefore, no approximation is necessary to calculate the relative configurational

entropies in contrast to molecular dynamics (MD) studies (see e.g. Ref. [59]).


3.2 Results

3.2.1 Parallel tempering efficiency

In the current context, we will call the simulation efficient if it generates many inde-

pendent configurations in a given simulated time period. For instance, since the PT

simulations can be seen as replicas moving from temperature to temperature while they

change their configurations, if a certain replica gets stuck in a certain range of temper-

atures, the sampling would likely provide poor estimates for ensemble averages. All the

presented results belong to the the simulation runs in which the highest used tempera-

ture, T ∗0 , is 2

3, while the lowest temperature T ∗

n varies mainly depending on the model.

The reason for choosing T ∗0 = 2

3is that at this temperature the most common structure

is the structure with no bond, which ensures that the temperature is sufficiently high so

that the chain does not become trapped in any potential minima.

Initially at the start of the simulation, T ∗0 is assigned to replica 0 and T ∗

n is assigned

to replica n; however, during the replica exchange events, these temperatures can be

assigned to other replicas as well. To represent the existing temperatures, typically

β∗0 to β∗n are used. The replica exchange between adjacent replicas happens every two

picoseconds of simulated time, in which approximately 20 events happen at a single

temperature. In each replica exchange event, half of replicas are chosen at random

and these replicas get a chance to exchange their temperatures (or configurations) with

replicas at adjacent temperatures. Most runs consist of more than half a million replica

exchange events.

As explained in chapter 2, the parameters chosen for the PT method have a strong

effect on the PT efficiency. Therefore, before determining free energies and other prop-

erties, the efficiency of the simulations should be assessed. To evaluate the PT efficiency

in a simulation a PT period is defined as the time for a replica to travel between all the

temperatures and come back again to its initial temperature. This is equivalent to twice


0

20

40

60

80

100

120

140

160

180

0 10 20 30 40 50 60 70 80 90 100

β* Index

PT Replica Exchange Event (x 0.0001)

Figure 3.3: Example of less efficient dynamics in inverse temperature space for one replica

(the one that started at β∗ = 38.4) for the system of 170 replicas of model A.

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

β* Index


Figure 3.4: Example of efficient dynamics in temperature space for one replica (which

starts at β∗ = 10.5) for a system of 90 replicas of model B.


0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Replica Index


Figure 3.5: Example of efficient dynamics in replica space at one temperature (β∗ = 7.5)

of the system with 90 replicas for model B.

the traversal time between the maximum and minimum temperatures. The traveling

process that happens in one PT period is called a cycle. The key concept to check the

efficiency of a PT simulation is the number of PT cycles in one run.

When the PT system contains a large number of replicas, it is harder to have good

dynamics and some of the replicas may not move very well through all the temperatures

during one PT cycle due to barriers. For example, in Fig. 3.3, in a system of 170 replicas

for model A, replica 40 does not visit all the existing temperatures in short time and

spends a long period of time moving among one third of the temperatures during one PT

period. In this case, while in a reasonable amount of time (less than 25,000 PT replica

exchange events for this case) replica 40 would travel back to its initial temperature, it

does not visit all the existing temperatures.

As mentioned in section 2.3.2, ∆β∗ can depend on β∗i . However, depending on the

potential landscape, it may be possible to observe efficient dynamics even with a large

number of replicas, without requiring a temperature dependent ∆β∗. For example in


0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Replica Index


Figure 3.6: Example of less efficient dynamics in replica space at a relatively low tem-

perature (β∗ = 14.7) of the system with 90 replicas for model B.

Fig. 3.4, in a study of model B with 90 replicas and 5× 105 exchange intervals, one sees

that replica index 60 travels between any two temperatures in a reasonable amount of

time.

The main factor for assessing an efficient PT dynamics is the sequence of replicas

that have a specific temperature as the PT simulation progresses. Ideally, in a long

run, each temperature should be assigned to different replicas with the same probability,

so that the dynamics is smooth in replica space at a fixed temperature. This can be

observed in Fig. 3.5, where all replicas visiting temperature β∗ = 7.5 with almost the

same probability. Even within the same run, the dynamics typically become less efficient

as temperature is lowered. As can be seen in Fig. 3.6, for β∗ = 14.7, which is a relatively

low temperature in the system, the replica index does not vary smoothly and uniformly

as the PT simulation progresses, but spends more time at specific replicas. To have

properly smooth dynamics between the replicas, model A requires that ∆β∗ decreases at

high β∗i . Details of the temperature set used in PT are provided in the appendix B.1.


(a) (b)

(c) (d)

Figure 3.7: Pictures a to d show the snapshots from four different steps of dynamics that

start from an unfolded state and ends in the collapsed structure.

3.2.2 Observed structures

It is clear that at very low temperatures, the most common structures only have attractive

bonds and no repulsive bonds. Therefore, unless otherwise specified, here the term

“bond” always refers only to an attractive bond (or hydrogen bond) and not repulsion or

covalent bonds. Before studying the free energy landscape using PT, the discontinuous

molecular dynamics (DMD) method was used to study the dynamics of the protein-

like chain. Starting the dynamics from an unfolded state, by decreasing the temperature


Figure 3.8: Snapshot of the lowest potential energy configuration of model B for the

25-bead chain.

during several steps, the collapsed structures were observed. In this process by decreasing

the temperature, for the 25-bead chain of model A, the radius of gyration and the end-to-

end vector dropped to 58%–75% and 50%–78% of their initial values at high temperatures

respectively. Four snapshots illustrating the change in the typical conformation of the

protein-like chain from high to low temperatures are presented in Fig. 3.7. However,

the collapsed structures that were observed in this process can be some local minima

structures and not necessary the native structures. This was one of the initial motivations

to study the free energy landscape and its variation with temperature.

Using the PT method, the most common structures of the protein-like chain at dif-

ferent temperatures can be determined. For model B, except at high temperatures the

helical structures are the most common structures, which can be seen clearly in the snap-

shot of one of these structures in Fig. 3.8. In contrast, model A lacks sufficient rigidity

for turning, where beads that are separated by 8 or 12 beads can make bonds, leading


(a)

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

Rg (Å)


(b)

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

Rg (Å)


Figure 3.9: Variation of the radius of gyration at high temperature (green crosses, T ∗ =

0.333) and low temperature (red pluses, T ∗ = 0.073) for (a) model A and (b) model B.


46 48 50 52 54 56 58 60 62 64 68 72 76 80 84

80

84

88

92

96

100

Iz

Ix

Iy

Iz

Figure 3.10: Variation of principal moments of inertia of the shapes that are represented

by one interaction matrix (i.e., one configuration), which belongs to the most common

structures of model A at β∗ = 9 (AE AI AM CG CK CO CS CW EI GK GS GW KO

KS OS QU QY SW UY). One unit corresponds to 2.048×10−44 kg m2.

to structures that are unrealistically compact. Consequently, no alpha helical structure

was observed in the common structures of this model.

In Fig. 3.9 the variation of the radius of gyration of the chain during the PT simulation

is shown for configurations of both model A and B at two temperatures. The green dots

belong to the temperature T ∗ = 0.333 and the red dots belong to the temperature

T ∗ = 0.073. One sees that at low temperature, most of the configurations are in the

collapsed phase (compact globule or native structure). However, at high temperature,

most of the configurations are not restricted by any bonds and are completely unfolded

with various number of shapes. For the configurations at low temperatures, the radius

of gyration is around 16 A, indicating a relatively compact conformation consistent with

both native and intermediate phases. In contrast, the value of the radius of gyration at

the higher temperature varies between 16 A and 80 A. According to these graphs both

the collapsed and unfolded structures in model A are denser than model B, which is the


42 44 46 48 50 52 54 56 58 60 62 60 65

70 75

80 85

90 95

75 80 85 90 95

100 105

Iz

Ix

Iy

Iz

Figure 3.11: Variation of principal moments of inertia of the shapes of the most (red)

and the second most (green) common configurations of model A at β∗ = 9. One unit

corresponds to 2.048×10−44 kg m2.

result of more attractive interactions in model A.

In this thesis configuration and structure refer to the same concept, which is defined

by the matrix of interactions. To confirm that this definition of a configuration is proper,

it should be shown that the configurations that are represented by one matrix are very

similar in their shapes. For this purpose the principal moments of inertia of different

shapes that are represented by one matrix are calculated and plotted. To calculate the

principal moments of inertia, all the elements of the moment of inertia tensor, I, are

calculated according to

I ≡N∑

i=1

m

y2i + z2

i −xiyi −xizi

−yixi x2i + z2

i −yizi

−zixi −ziyi x2i + y2

i

, (3.11)

where “i” is the bead index and “N” is the number of beads in the chain. The principal


moments of inertia are determined by finding the eigenvalues of this matrix, where Ix ≤Iy ≤ Iz. In Fig. 3.10, the variations of the principal moments of inertia are presented for

the most common structure of Model A at β∗ = 9, which is a relatively low temperature.

It can be observed in this figure that the principal moments of inertia lie in a reasonably

small range, and for example, Ix and Iy ranges do not have any overlap.

This matter can also be verified visually, where all the shapes that are represented

by a matrix are compared with one of the shapes chosen as a reference shape. For doing

this comparison, each shape, except the reference shape, is rotated 1000 times randomly,

and then the sum of the square distances between each chain bead of the rotated shape

and the same chain bead in the reference shape is calculated according to

D =Size∑i=1

~r 2i , (3.12)

where ~ri refers to the distance between the two same index beads in the reference shape

and the rotated shape. For each shape that is represented by the matrix, one shape from

the 1000 rotated shapes that has the minimum difference in the calculated sum of the

square distances, D, is chosen for a movie.

The movie made by these selected shapes demonstrates that the shapes are similar

to each other for the matrix that represents a low energy configuration (having several

attractive bonds). However, for high energy configurations, where there is no attractive

bond or there is only one bond, one matrix can represent many different shapes. This

is expected since having no bond or only one bond means that there are only a few

constraints in the configuration to form its shape.

The spreads in the principal moments of inertia for shapes corresponding to the most

common structure (AE AI AM CG CK CO CS CW EI GK GS GW KO KS OS QU QY

SW UY) and the third most common structure (AE AI AM CG CK CO CS EI GK GS

KO KS KW OS OW QU QY SW UY) of model A at β∗ = 9 are compared in Fig. 3.11.

It can be seen that these two configurations, which only differ in two hydrogen bonds,


fill a large common area in the principal moments of inertia space.

Although as was presented in Fig. 3.10, the same configurations (with exactly the same

bonds), have similar principal moments of inertia, the principal moments of inertia are

not a good indicator for distinguishing the configurations, and as can be seen in Fig. 3.11,

two different structures may have a similar principal moments of inertia. Therefore, the

structures can only be distinguished by their matrix of interaction explained in Sec. 3.1.1.

Since the structures are saved as interaction matrices, it is relatively easy to count the

number of occurrences of the different structures and to find the most common structures.

The most common configuration at each temperature is the configuration with the lowest

free energy of the system. Considering Eqs. 3.2 and 3.4, at low temperatures, the energy

dominates the entropy effects, and therefore, the structure with the lowest energy has

the lowest or one of the lowest free energies as well. Consequently, it is expected that at

very low temperatures, the lowest energy configuration is the most common structure.

The most common configurations at different temperatures for model A and model B

are presented in Table 3.2 and Table 3.3 respectively. According to Fig. 3.2, for the model

B 25-bead chain, the maximum number of attractive bonds is 8 bonds. As expected, the

most common structure for model B at low temperatures, β∗ ≥ 4.5, has 8 attractive bonds

and therefore has the lowest potential energy for this model. According to Table 3.2, the

lowest potential energy configuration in model A for the 25-bead chain has 21 attractive

bonds. However, according to Fig. 3.2(a), 36 possible attractive bonds are available for

the 25-bead chain in model A. This means that either the configurations with lower

energies that have more than 21 attractive bonds are not geometrically accessible (due

to constraints in the model) or their configurational entropies are too low to be observed

at these temperatures. It will be shown later (Sec. 3.2.6) that the first scenario is the

case. However, if the second scenario were true, the lower energy configurations would

become dominant by reaching lower temperatures.


β∗ the most common structure fobs(%)

1.5 No bond 14.2±0.6

4.5 AU AY CG CS CW EQ GK GO GS GW IM KO KS KW OS SW UY 1.3±0.2

9.0 AE AI AM CG CK CO CS CW EI GK GS GW KO KS OS QU QY SW UY 5.6±0.4

14.0 AE AI AY CG CK CS CW EI GK GO GS IY KO KS KW MQ MU OS QU SW 9.7±0.6

24.0 AE AI AY CG CK CS CW EI GK GO GS IY KO KS KW MQ MU OS QU SW 10.6±0.6

38.4 AQ AU AY CG CO CS CW EI EM GK GO GS GW IM KO KS OS QU SW UY 8.5±0.6


72.5 AE AI AM CG CK CO CS EI GK GS GW KO KS KW OS OW QU QY SW UY 8.1±0.6

87.5 AE AI AM AQ AU AY CG CK EI EY GK IM IQ IY MQ MU OS QU QY SW

UY

8.2±0.6

β∗ the second most common structure fobs(%)

1.5 SW 2.1±0.2

4.5 AE AI AM EI EM GK GO GW IM KO KS KW OS OW QU QY SW UY 0.7±0.2

9.0 AE AI AM CG CK CO CS CW EI GK GW KO KW OS OW QU QY SW UY 5.0±0.4

14.0 AE AI AY CG CK CO CS CW EI GK GO IY KO KW MQ MU OS OW QU SW 8.7±0.6



57.5 AE AI AM CG CK CO CS EI GK GS GW KO KS KW OS OW QU QY SW UY 4.5±0.4


87.5 AE AU AY CG CS CW EY GK GS GW IM IQ KO KS KW MQ OS OW SW

UY

6.6±0.6

Table 3.2: Most common configurations of the model A 25-bead chain, for the system

with 170 replicas.



1.5 No bond 22.4 ± 1.2

3.0 No bond 6.7 ± 1.0

3.5 BF JN 4.0 ± 0.6

3.8 BF JN RV 4.2 ± 0.6

4.2 BF FJ NR RV 6.5 ± 0.8

4.5 BF BR BV FJ FV JN NR RV 7.5 ± 1.0






β∗ the second most common fobs(%)

1.5 BF 3.5 ± 0.6

3.0 BF 5.6 ± 0.8

3.5 BF NR 4.0 ± 0.6

3.8 BF FJ NR RV 3.9 ± 0.6

4.2 BF FJ JN RV 4.9 ± 0.8

4.5 BF FJ JN NR RV 6.4 ± 0.8

5.3 BF BR BV FJ JN NR RV 10.1 ± 0.8




13.5 N/A N/A

Table 3.3: Most common configurations of the model B 25-bead chain, for the system

with 90 replicas.


3.2.3 Free energy landscape

As mentioned in chapter 1, the term energy landscape refers to the free energy as a

function of protein conformation specified here by the configuration matrix. Therefore,

to study the energy landscape at a specific temperature the most common structures

should be checked and the free energy of different structures should be calculated at the

temperature. Two structures are close in the landscape if they have similar configurations,

which means that they should have a large number of bonds in common. For model A,

these dominant structures are shown in Table 3.2, while those for model B are given in

Table 3.3. The most common structures at any temperature are those with the lowest

Helmholtz free energy at that temperature. Therefore, at low enough temperatures,

when the effect of entropy is small, the most common structure is the one with the

lowest possible potential energy. The term funnel refers to the relatively steep valley in

which the deepest point corresponding to the configuration with the lowest free energy

is easily accessible from almost anywhere inside the valley. This means that the barriers

between local minima located inside the funnel and the deepest point of the valley should

be small. If the barriers are relatively small it can be assumed that the chain folding

happens as a chain gliding down in the funnel shaped free-energy landscape along several

different paths towards its lowest free energy structure.

As can be seen in Table 3.2, by decreasing the temperature for model A, some dom-

inant structures are observed, but by decreasing the temperature further, the ratios of

their populations to the total population starts to decrease and new structures become

dominant. It can be concluded that in this model, the shape of the landscape changes

significantly by varying the temperature, where at high temperatures the landscape is

riddled with many local minima and one very deep but wide minimum (no bonded struc-

ture), and at low temperatures there are a few narrow deep minima. For model A, either

there are deep local minima inside a funnel shaped valley or there are only a few deep

local minima beside each other. At the studied temperatures, there is no structure with a


Rank most common structure fobs(%)

1 BF BR BV FJ FV JN NR RV 76.0 ± 1.2

2 BF BR BV FJ JN NR RV 6.8 ± 0.8

3 BF BV FJ FV JN NR RV 3.8 ± 0.6

4 BF BR BV FV JN NR RV 1.9 ± 0.4

5 BF BR BV FJ FV JN RV 1.3 ± 0.4

6 BF BR BV FJ FV NR RV 1.0 ± 0.3

7 BF BR BV FJ FV JN NR 1.0 ± 0.3

Table 3.4: Most common configurations of the model B 25-bead chain at β∗ = 6.

very large population, which confirms that there is no very deep point in the free energy

landscape. Since the most common structures at each temperature differ from each other

in a few bonds, these deep minima are located close to each other in the landscape but

not necessary inside a funnel. For example, as can be seen in Table 3.2, the first two most

common structures at β∗ = 57.5 differ in seven bonds. Hence, there are many barriers

to access one of these points from other ones, because seven bonds must be broken and

seven new bonds must be formed. On the other hand, these two structures share thirteen

bonds (65% of their total bonds), which indicates that they are similar and therefore,

their locations in the landscape are relatively close to each other.

By decreasing the temperature there is a process of trapping in and escaping from

these local minima. It was not possible to decrease the temperature further because this

would increase the number of replicas to the point that it would be impossible to have

very good PT dynamics.

Unlike the behavior observed in model A, by decreasing the temperature a single

dominant structure is identified in model B, where the probability of the most common

structure attains a value of nearly one at low temperatures (See Table 3.3). For β∗ ≥ 5.3,

the free energy landscape consists of a very deep funnel in which there are several deep


minima. The most common structures for β∗ = 6 are presented in Table 3.4. Since

for repulsion the alphabetic index of two beads should be 4k + 1 and 4k′, none of the

seven most common structures have a repulsive bond. This is not surprising, since the

formation of a repulsive bond both limits the number of accessible conformations and is

energetically unfavorable. The most common structure, BF BR BV FJ FV JN NR RV,

is the deepest point in the funnel, and the six other most common structures (2nd-7th)

differ only in one bond from this structure. This means that there is a funnel shaped

valley where there is a very deep minimum inside and there are a few local minima beside

this deepest point of the landscape. According to Table 3.3, by lowering the temperature

the deepest point of the funnel becomes deeper while the other minima become shallower,

since the population of the most common structure reaches a value higher than 99.9%.

This means that by lowering the temperature the funnel become smoother and steeper,

and the lowest free energy configuration becomes more accessible.

The trend of the probability of the most common structure for the two models can

be seen in Fig. 3.12, in which for model A the probability of the most common structure

at low temperatures is fluctuating around a value far from one (0.08) but in model B for

the 25-bead chain the probability of the most common structure nearly reaches to one.

Several of the most common structures of model B for chains longer than 29 beads

(such as the 35-bead chain) at very low temperatures have the same energy and similar

entropy values. Therefore, there is no single deep point in their energy landscape unlike

the case of the 25-bead chain and their landscape consists of several minima beside each

other inside a wide funnel. As will be discussed extensively later in section 3.2.6, this

happens for chains longer than 29 beads since the configuration with the theoretical

maximum number of bonds is geometrically prohibited.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50

Mos

t Com

mon

Str

uctu

re P

roba

bilit

y

β*

Model A - 25 beadsModel B - 25 beads

Figure 3.12: Variation of the probabilities of the most common structure versus the β∗

3.2.4 Entropy and free energy calculation for the model B 25-

bead chain

As shown in Sec. 4.2.4, one can obtain the relative configurational entropies and con-

sequently their free energies from the probability ratio of configurations at a specific

temperature using Eq. 3.10. Since there are fewer possible structures in model B than

model A, the uncertainty associated with the population of each structure is smaller for

model B and consequently, the calculated entropy and free energy of each configuration

have a smaller uncertainty.

The main factor that can be used to categorize the range of entropy values is the

number of bonds. As can be seen in Table 3.5 the entropies of configurations with the

same number of bonds (4, 5 and 7, 8 and 10, 11) are different but they are in the

same range, and typically the entropy decreases by increasing the number of bonds and

adding more restrictions to the configuration shape. However, as provided in Table 3.4,

configurations with the same energy of -6ε have different populations; therefore, their


entropies are different. While the free energy value of a configuration clearly defines how

deep the landscape is for the configuration, the entropy of a configuration defines how

much area the configuration occupies in the landscape.


configuration Ec/ε Sc/kB

1 AD 0.9 31.3 ±0.8

2 No Bond 0.00 31.8 ±0.6

3 BF -1 28.6 ± 0.6

4 BF JN -2 25.1 ± 0.6

5 BF NR -2 25.2 ± 0.6

6 BF JN RV -3 21.7 ± 0.4

7 BF FJ NR RV -4 17.8 ± 0.6

8 BF FJ JN RV -4 17.6 ± 0.6

9 BF FJ JN NR RV -5 13.2 ± 0.6

10 BF BR BV FJ JN NR RV -7 3.7 ± 0.8

11 BF BV FJ FV JN NR RV -7 2.9 ± 0.6

12 BF BR BV FJ FV JN NR RV -8 0

Table 3.5: Potential energy in unit of ε and relative entropy of the most common struc-

tures of the model B 25-bead chain.

β∗ ∆F1,2 ∆F1,3 ∆F1,4 ∆F1,5 ∆F1,6 ∆F1,7 ∆F1,8 ∆F1,9 ∆F1,10 ∆F1,11 ∆F1,12

1.5 -1.21 -0.013 1.27 1.24 2.53 4.05 4.27 6.25 10.54 11.05 11.93

2.4 -1.09 -0.72 -0.3 -0.31 0.12 0.69 0.83 1.69 3.63 3.95 4.12

3.3 -1.04 -1.04 -1.01 -1.02 -0.97 -0.83 -0.73 -0.38 0.48 0.71 0.57

4.2 -1.01 -1.23 -1.41 -1.42 -1.60 -1.71 -1.63 -1.56 -1.31 -1.13 -1.46

5.1 -0.99 -1.35 -1.67 -1.68 -2.01 -2.27 -2.21 -2.33 -2.47 -2.33 -2.77

6 -0.98 -1.43 -1.86 -1.87 -2.29 -2.66 -2.61 -2.86 -3.29 -3.16 -3.69

9 -0.95 -1.59 -2.21 -2.21 -2.83 -3.41 -3.37 -3.87 -4.83 -4.74 -5.43

12 -0.94 -1.67 -2.38 -2.38 -3.09 -3.78 -3.75 -4.38 -5.59 -5.53 -6.29

Table 3.6: Helmholtz free energy (∆Fij = Fi−Fj) in units of ε, for the configurations in

table 3.5 (25-bead model B).


The entropy difference between any two common configurations (∆S) can be cal-

culated based on the ratio of their populations (cf. Eq. 3.10). By using the calculated

entropies, the calculation of the relative Helmholtz free energy between any pair of config-

urations at any temperature becomes possible. This allows one to predict the population

of any structure at any temperature and predict the temperature that the populations

of two specific configurations become equal. In Table 3.6, the calculated free energies of

the 12 configurations of Table 3.5 relative to the free energy of the first configuration,

are presented as a function of temperature.

The frequency of occurrence of two structures at a specific temperature can be used to

calculate their entropy difference. However, often there is no reasonable overlap between

the population distributions of the most common structure at a very low temperature

and the most common structure at a very high temperature (e.g.: configurations 2 and

12 of Table 3.5). Hence, one or two intermediate configurations should be employed to

find ∆S for these two configurations. For example if A and D are the most common

configurations of very high and very low temperatures respectively and B and C are the

most common structures of the temperatures between, if ∆SA,B, ∆SB,C and ∆SC,D can

be calculated, the entropy difference for A and D can be determined. By implementing

this technique, the relative entropy of any pair of configurations can be found. The free

energy and entropy of some of the most common structures of model B for the 25-bead

chain are shown in Table 3.5.

According to Fig. 3.2(b), the maximum number of attractive bonds for the Model B

25-bead chain is 8 bonds. Therefore, BF BR BV FJ FV JN NR RV, which is the

most common structure at low temperatures (refer to Table 3.3), is the lowest energy

configuration, and by lowering the temperature it is not possible to observe any other

configuration as the most common structure.

The trend of β∗∆F versus the configuration index of Table 3.5 is shown in Fig. 3.13.

∆F1c is based on the calculated Helmholtz free energy of Table 3.5. Since both the en-


β∗ configuration ppred fobs ∆(%)

1.5 No Bond 0.206 0.165 25

1.5 BF 0.068 0.059 15

1.5 RV 0.059 0.065 9

1.5 FJ 0.053 0.067 21

1.5 JN 0.052 0.064 19

3.0 BF BR BV FJ FV JN NR RV 0.096 0.075 28

3.0 BF FJ NR RV 0.076 0.064 19

3.0 BF FJ JN NR RV 0.063 0.064 2

3.0 BF FJ JN RV 0.064 0.059 8


4.0 BF BR BV FJ JN NR RV 0.076 0.068 12

4.0 BF BV FJ FV JN NR RV 0.036 0.038 5

4.0 BF BR BV FV JN NR RV 0.018 0.019 5

5.0 BF BR BV FJ FV JN NR RV 0.949 0.941 0.8

5.0 BF BR BV FJ JN NR RV 0.020 0.019 5

5.0 BF BV FJ FV JN NR RV 0.010 0.012 17

6.0 BF BR BV FJ FV JN NR RV 0.988 0.980 0.8

6.0 BF BR BV FJ JN NR RV 0.005 0.005 0


Table 3.7: Comparison of the predicted probability (ppred) and the simulation results for

the frequency (fobs), and their relative difference (∆), for the most common structures

of the model B 25-bead.


-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

0 2 4 6 8 10 12

β* ∆

F

Configuration Index

β*=1.5β*=3.9

β*=6β*=12

Figure 3.13: Variation of the β∗∆F versus the configuration index of Table 3.5, where

β∗ = 1T ∗ = ε

kbTand ∆F is the Helmholtz free energy difference with configuration 1 in

unit of ε.

tropy and energy of the configurations are decreasing from configuration 1 to 12, the trend

of β∗∆F is very different for high and low β∗ values. At high temperatures (β∗ ≤ 3),

the second configuration of Table 3.5 (configuration 2) with no bonds, which has the

maximum entropy, is the lowest free energy structure. This can be seen for β∗ = 1.5 in

Fig. 3.13, where for β∗ = 1.5 the free energy for the structures with more bonds (higher

configuration index) are larger than the configurations with fewer bonds (lower configu-

ration index). But by decreasing the temperature, when 4.5 ≤ β∗, the last structure of

Table 3.5 (configuration 12), which has the lowest potential energy, becomes the lowest

free energy structure. This behavior can be seen clearly in Fig. 3.13, where for β∗ = 6

and β∗ = 12 the configuration 12 has the lowest free energy. This confirms that the effect

of entropy in the free energy at low temperatures is very small.

Using the calculated free energies, it is predicted that if 4.5 ≤ β∗, the last structure

of Table 3.5, configuration 12, becomes dominant, since for all the temperatures in that


-14

-12

-10

-8

-6

-4

-2

0

2

4

6

0 2 4 6 8 10 12 14 16

Free

Ene

rgy

Dif

fere

nce

β*

Figure 3.14: Free energy difference of configurations 2 and 12 of Table 3.5 in units of ε

versus β∗.

range (4.5 ≤ β∗) this configuration has the lowest free energy. The results representing the

population of each configuration confirm this prediction. The population of configuration

12 has the rank of 30th, 13th and 5th among all configurations for the β∗ values of 4.05,

4.2 and 4.35 respectively, and for 4.5 ≤ β∗, it is the most common configuration.

The relative free energy of configurations 2 and 12 is plotted versus β∗ in Fig. 3.14.

It can be seen that at β∗ ≈ 4 their free energies are equal, which implies that their

populations are the same. Indeed, the results show that the percentage populations of

configuration 2 and configuration 12 at β∗ = 3.9 are 1.0% and 0.5%, respectively and at

β∗ = 4.05 are 0.6% and 1.2%, respectively, which confirms that their population should

became equal in the range 3.9 ≤ β∗ ≤ 4.05.

Probability calculation

One of the objectives of calculating the free energy is to predict the population of a specific

structure at any temperature using the probability of a configuration in the canonical


ensemble, Pri = exp(−βFi)/Z, where Z =∑N

i=1 exp (−βFi), i is the configuration

index and N is the total number of configurations. In principle, the free energy values

of all possible configurations are required to compute the value of the configurational

partition function Z. However, here the structures that have a population of less than

0.5% of the total population at all the studied temperatures are eliminated from the

calculations. It is clear that by not considering some of the rarely occurring structures,

that appear with low probability the value of the partition function Z is underestimated,

which implies that the probabilities computed on the basis of simulation results are

overestimated. The reason for eliminating configurations with populations of less than

0.5% at all the studied temperatures is that because of their small populations at any

studied temperature, statistical uncertainty of computed values of the entropy are too

large to be reliable. Beside this, based on their populations, their free energies are very

small at all the studied temperatures so that neglecting them does not have a significant

effect on the computed value of Z and consequently on the calculated probabilities.

For calculating the probabilities of the configurations with 25 beads, 78 configurations

were chosen and while there is a systematic error because of not considering all the

possible configurations, the predicted probabilities are very close to the observed ones

from the simulation runs, as can be seen in Table 3.7. According to this table, the

predicted values agree better with the simulation results at lower temperatures. The

reason for this behavior is related to the fact that some configurations with very low

populations have not been considered in the probability calculations and since these

configurations occur more frequently at high temperatures, neglecting their contribution

leads to a larger error at high temperatures.


3.2.5 Entropy and free energy calculation for 35 beads protein-

like chain

The entropies and free energies of Model B 35-bead configurations are calculated in a

similar way to the 25-bead case. Adding only 10 beads to the chain changes the number

of possible attractive bonds from 8 in the 25-bead chain to 23 in the 35-bead chain

(cf. 3.2), which results in a much more complex energy landscape. This dramatic change

in landscape can be seen in Table 3.8 and Fig. 3.16, where we see that unlike the 25-bead

chain, the probability of the most common structure at even very low temperatures does

not become close to one.

As can be seen in Table 3.8, by increasing β∗ (decreasing temperature) a few structures

become dominant at different temperatures. Except for the lowest energy configuration

with 23 attractive bonds, other energies are degenerate with multiple configurations

possessing the same number of bonds. It will be shown in the next section 3.2.6 that a

structure with 23 attractive bonds is geometrically prohibited. The configurations with

21 or 22 attractive bonds have not been observed in any of the runs. However, if it

is assumed that configurations with 21 or 22 bonds are possible, even by lowering the

temperature it is not possible to observe one dominant structure because there should

be several configurations with 21 or 22 attractive bonds. Therefore, the trend that has

been observed for the 25-bead chain is not expected for the 35-bead chain even at very

low temperatures.

The landscape of the 35-bead chain is different from the 25-bead chain landscape

because of the large entropic barriers between configurations with different energies. All

the most common structures in Table 3.8 at high β∗ have an energy of −19ε. Beside

the two main configurations with the energy of −19ε, which are presented in Table 3.8,

there are at least 18 other configurations with the same potential energy but with lower

entropies (cf. Table 3.9). As can be seen in this table, three structures with an energy



1.5 No Bond 11.7 ± 1.3

3.75 BF NR VZ dh 0.7 ± 0.3

4.5 BF Bd Bh FJ FZ Fd Fh JZ Jd Jh NR RV Zd dh 2.7 ± 0.6

5.25 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 7.2 ± 0.9


16.5 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 25.8 ± 1.8



β∗ the second most common structure fobs(%)

1.5 dh 2.3 ± 0.6

3.75 BF FJ JN RV Zd dh 0.6 ± 0.3


5.25 BF BR BV BZ Bd Bh FJ Fd Fh JN Jh NR Nh RV Rh VZ Zd dh 5.3 ± 0.9


16.5 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 14.4 ± 1.4



Table 3.8: Most common configurations of the model B 35-bead chain, for the system

with 110 replicas.


of −20ε and with relatively low entropies have been observed in the runs. The results of

many runs suggest that there is only one configuration with a potential energy of −20ε,

but the result of one of the runs revealed the other two structures, configurations 22 and

23. Based on their populations, configurations 22 and 23 should have very low entropies.

According to Table 3.8, even at very low temperatures, configuration 21 is not among the

first two most common structures. Configuration 21 with the lowest observed energy is

different in five bonds from the first configuration of Table 3.8, which has been the most

common structure in the lowest studied temperatures. Hence, these two configurations

can not be located inside one steep funnel, and there is a huge entropic barrier between

configuration 21 and other structures with higher energies, such as the first configuration,

that can be overcome by lowering the temperature much further.

Based on the calculated entropies and energies of configurations 1 and 21 in Table 3.9,

it is predicted that at β∗ ≥ 63 configuration 21, with 20 bonds, becomes the most common

structure. However, at the lowest studied temperature, β∗ = 64.5, configurations 21 and

1 have 15% and 20% of the total population respectively, which does not support the

prediction, but it shows that by slightly lowering the temperature, configuration 21 should

become the most common structure. However, since there are other structures with the

same energy (cf. Table 3.9), its probability will not approach one.

The landscape for 35-bead chains at low temperatures is very different from the deep

and steep funnel that was observed for the 25-bead chain landscape. At low temperatures

the most common structures of the 35-bead chain are a few configurations with the

same energy. For example, as can be seen in 3.8, the two most common structures for

16.5 ≤ β∗ ≤ 53 have 19 attractive bonds. While these two structures differ slightly in

their populations, structurally they differ by more than one bond, quite unlike the seven

most common structures of the 25-bead chain at β∗ = 6 (cf. Fig. 3.4), which only differ

from each other by one bond. Since the most common structures of the 35-bead chain

at low temperatures share most of their bonds, their points in the landscape should be


configuration Ec/ε Sc/kB

1 BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 62.9

2 BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh -19 62.3±0.4

3 BF BR BV BZ Bd Bh FJ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 62.3±0.5

4 BF BR BV BZ Bd Bh FJ FZ Fd Fh JN Jh NR Nh RV Rh VZ Zd dh -19 61.5±0.6

5 BF BR BV BZ Bd Bh FJ FV Fh JN Jh NR Nd Nh RV Rh VZ Zd dh -19 61.4±1.3

6 BF BR BV BZ FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.3±1.5

7 BF BR BV BZ Bd Bh FJ FZ Fd JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.3±0.5

8 BF BR BV Bh FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 61.2±0.3

9 BF BR BV BZ Bd Bh FJ FZ Fd JN Jd NR Nd Nh RV Rh VZ Zd dh -19 60.6±1.5

10 BF BV BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh -19 60.4±0.5

11 BF BZ Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV VZ Zd dh -19 60.1±0.9

12 BF BV BZ Bd FJ FV FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh -19 60.0±0.7

13 BF BR BV FJ FV FZ Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 59.5±2.5

14 BF BZ Bd Bh FJ FV FZ Fd Fh JN JZ Jd Jh NR Nd RV VZ Zd dh -19 59.4±1.2

15 BF BR BV BZ FJ FV FZ Fd JN JZ Jd NR Nd Nh RV Rh VZ Zd dh -19 59.4±1.0

16 BF BV BZ Bd Bh FJ FV FZ Fd Fh JN JZ Jd Jh NR RV VZ Zd dh -19 59.3±1.0

17 BF BR BV BZ Bd FJ FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh -19 59.0±0.7

18 BF BR BV BZ Bd FJ FZ Fd JN Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 58.9±1.0

19 BF Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -19 58.9±0.8

20 BF BR BV BZ Bh FJ FZ Fd JN JZ Jd Jh NR Nh RV Rh VZ Zd dh -19 58.7±0.7

21 BF BR BV BZ Bh FJ FV FZ JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 0.0±1.0

22 BF BR BV Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 N/A

23 BF BR BV Bh FJ FV FZ Fh JN JZ Jd Jh NR Nd Nh RV Rh VZ Zd dh -20 N/A

Table 3.9: The lowest potential energy configurations of the model B 35-bead chain.

Since the reference point for calculating the entropies is the configuration 1, its entropy

variation is zero. The first twenty configurations are presented in order of their entropies.


close to each other. However, since the most common structure, as the deepest point of

the landscape, differs from other common structures by more than one bond, their points

in the landscape are not necessary inside one steep valley. Therefore, the landscape at

low temperatures for the 35-bead chain consists of several minima that are close but not

necessary inside one steep funnel, and there is no very deep point in the landscape similar

to the 25-bead case.

Obstacles

While the range of energies and entropies of the possible configurations are 8ε and 32kB

respectively for 25-bead chains, these ranges are increased to 20ε and 140kB respectively

for 35-bead chains, which confirms the view that the landscape of the 35-bead chain is

much wider than the 25-bead chain landscape. This also shows that for studying the

landscape a much wider range of temperatures and more replicas are required.

Predicting the probabilities of configurations for 35-bead chains is much more difficult

numerically than for 25-bead chains. To predict the probability of each configuration its

free energy as well as the free energies of almost all other configurations need to be

estimated. The need for a vast number of free energy estimates makes the probability

calculation for the 35-bead chain much harder, and consequently, the errors associated

with these calculations are clearly larger. In Table 3.10, it is evident that the predicted

and observed probabilities for β∗ = 9.8 have larger statistical uncertainties than those

observed in smaller systems (cf. Table 3.7).

3.2.6 Effects of the protein-like chain length

As was seen for 25-bead chains, the probability of the most common structure approaches

one at relatively low temperatures. In contrast, the probability of the most common

structure of the 35-bead chain does not become close to one at the low temperatures

studied here. There are two possible reasons for this behavior. First, the studied range


Configuration structure ppred fobs ∆ (%)

BF BR BV BZ Bh FJ FZ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 0.21 0.21 0

BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd RV VZ Zd dh 0.18 0.13 38

BF BZ Bd Bh FJ FV FZ Fd Fh JN Jd Jh NR Nd Nh RV VZ Zd dh 0.09 0.08 13

BF BR BV BZ Bd Bh FJ Fd Fh JN Jd Jh NR Nh RV Rh VZ Zd dh 0.06 0.07 6

BF Bd Bh FJ FZ Fd Fh JN JZ Jd Jh NR Nd Nh RV VZ Zd dh 0.14 0.06 130

Table 3.10: Comparison of the predicted probability and the simulation results values

for the most common structures of the model B 35-bead chain.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.5 3 4.5 6 7.5 9 10.5 12 13.5 15 16.5

Mos

t Com

mon

Str

uctu

re P

roba

bilit

y

β*

Model B - 15 beadsModel B - 20 beadsModel B - 25 beadsModel B - 29 beads


for chains with 15, 20, 25 and 29 beads


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Mos

t Com

mon

Str

uctu

re P

roba

bilit

y

β*

Model B - 29 beadsModel B - 30 beadsModel B - 35 beads


for the chains with 29, 30 and 35 beads. The result of the 29-bead chain from figure 3.15

is presented here as a reference.

of temperature was not sufficiently large and therefore, the lowest energy configuration

(theoretically according to Fig. 3.2) has not been observed in the simulation runs. The

second possible reason for this behavior can be that the lowest possible energy is not

geometrically accessible considering the criteria of model B (Table 3.1) and by increasing

β∗ (decreasing the temperature), several structures with the same energy are competing

for the highest probability. While the configurational entropies of these configurations

are different, there is no configuration with a much higher entropy than all the other

structures with the same energy, and hence the probability of none of them can approach

one. There are many pieces of evidence to support the second scenario that will be

explained here.

According to Table 3.1 and Fig. 3.2, the maximum possible number of attractive

bonds for the chains with length of 15, 20, 25, 29, 30 and 35 beads are 3, 5, 8, 12, 17

and 23 bonds respectively in model B. As can be seen in Fig. 3.15, for 15, 20, 25 and


29 beads chains the probability of the most common structure approaches one at low

temperatures. This happens since the configurations with the lowest possible energy,

which have the maximum number of bonds, becomes prominent at low temperatures. As

was seen in Table 3.9 for the 35-bead chain, for the longer chains the difference of the

entropies of the configurations that differ only in one bond becomes really large for low

energy structures.

For the 29-bead chain there is a peak in the most common structure probability

at β∗ = 7.5, which happens because of the very large ∆S between the most common

structure with 11 bonds and the most common structure with 12 bonds. Therefore, for

some β∗ values (4.35 ≤ β∗ ≤ 9), a configuration with 11 bonds becomes the most common

structure because of its higher entropy in comparison to the 12-bond configuration as well

as other 11-bond configurations. The probability of the 11-bond structure increases until

β∗ = 7.5, where from this point the probability of the configuration with 12 bonds

increases because of the lower entropy effect at the temperatures β∗ ≥ 7.5. At β∗ = 9

their probabilities become equal, and for 9 < β∗ the structure with 12 bonds becomes

the most common structure.

The configuration with the theoretical maximum number of bonds seems to be not

geometrically possible for chains longer than 29 beads. For the 30-bead chain, the maxi-

mum possible number of bonds is 17. No structure with 17 bonds was observed in several

runs with different numbers of replicas, different PT temperature sets and different ranges

of temperatures, which implies that satisfying all possible bonds for the 30-bead chain is

geometrically impossible. If this is the case, satisfying all possible bonds for any longer

chains should be impossible as well. Since for 30 and 35 beads chains the range of stud-

ied temperatures is larger, it is much harder to observe a very good PT dynamics in

comparison to 15, 20, 25 and 29 beads chains cases. The results presented in Fig. 3.16

for 30 and 35 beads chains, examine a relatively wide temperature range in which the

PT dynamics is relatively good. As can be seen in Fig. 3.16, when 4.5 ≤ β∗ ≤ 7.5, the


probability of the most common structure increases for the 30-bead chain (similar to the

behavior observed in 15, 20, 25 and 29 beads chain systems). Then, the probability of

the most common structure remains unchanged around 0.70 as β∗ increases, up to a β∗

value that can vary between β∗ = 13.5 and β∗ = 21, depending on the PT dynamics

setup (e.g.: initial temperature set and the number of replicas in the run). After this

flat area in the graph, the most common structure probability decreases until reaching

a β∗ value, where the population of the structure with the highest entropy among 16

bonds structures becomes equal to the population of the structure with the highest en-

tropy among 15 bonds structures. This is where the probability of the two most common

structures become equal, which can be seen as a local minimum in the plot of the 30-

bead chain of Fig. 3.16. After passing this local minimum, the structure with 16 bonds

becomes the most common structure. However, because there are at least six structures

with 16 bonds, the probability of the most common structure does not become close

to one even at very low temperatures. Since the structure with 17 bonds (theoretically

lowest free energy) seems to be not geometrically possible, several configurations with 16

bonds become very common at very low temperatures, where their populations mainly

depend on their configurational entropies ( configurations with higher energies have much

higher free energies). Therefore, as can be seen in Fig. 3.16, the most common structure

probability converges to a value that is lower than one at very low temperatures.

The attractive bonds can be formed at a range 4.6A ≤ rij ≤5.8A (σ1=4.6 A and

σ2=5.8 A), where rij is the distance between beads i and j. Because of these geometrical

criteria, the 30-bead chain structure with 17 bonds seems to be not accessible. Therefore,

if the structure with 17 bonds is not geometrically accessible, by increasing the range of

attractive bond, it should be possible to observe a structure with 17 bonds. Since the

entropic barriers become smaller by increasing the attractive bond range, the shape of the

graph representing the probability of the most common structure should change as well.

To change the attractive bond range, σ1 was kept constant and only σ2 was increased.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 3 6 9 12 15 18 21 24 27 30

Mos

t Com

mon

Str

uctu

re P

roba

bilit

y

β*

5.86.46.76.9


for the 30-bead chain for different attractive bond interaction distances (increasing σ2

from the initial 5.8 A to 6.4 A, 6.7 A and 6.9 A) .

At σ2 = 6.2A, it was possible to observe a configuration with 17 attractive bonds at low

temperatures, while this structure was not observed for the runs with σ2 ≤ 6.1 A. In

graph 3.17, for σ2 = 6.4A the probability of the 17 bonds structure as the most common

structure approaches one around β∗ = 27, and by increasing the value of σ2, this occurs

at lower β∗, since the entropic barriers between the low energies configurations, such

as the configurations with 15 bonds and 16 bonds, become smaller. The first bump

in the graphs of Fig. 3.17 represents a temperature region where the structure with 15

bonds becomes the most probable configuration, and the second bump occurs at higher

β∗ values, where the 16 bonds configuration becomes the most probable structure. Since

the entropy difference between the configurations with different energies becomes smaller

for larger σ, this range of β∗, where the structure with 15 bonds becomes the most

common structure, becomes smaller for larger σ values as can be seen for σ2 =6.7 A and

σ2 =6.9 A in Fig. 3.17.

Therefore, for chains smaller than 30 beads at low temperatures the landscape con-


sists of one deep funnel that contains several minima. The funnel becomes steeper by

decreasing the temperature, since by decreasing the temperature the effect of entropy

decreases and consequently, the relative free energy of the structure with the maximum

number of bonds and the structure with the lower number of bonds increases. At very

low temperatures the landscape consists of a funnel with a very deep point representing

the configuration with the maximum number of bonds. But for chains longer than 29

beads, at low temperatures the landscape consists of several minima that become further

from each other by increasing the chain length. Even at low temperatures, the landscape

of the longer chains does not consist of one deep funnel, and it consists of several minima

or funnels. The landscape becomes much more complex by increasing the length of the

chain. For example, the range of energy and entropy of the 35-bead chain are 2.5 and

4.4 times larger than the 25-bead chain ones.

The observed landscape behavior can give an insight into the native structure of

proteins. While for small proteins the native structure might be the lowest free energy

structure, there is the distinct possibility that for longer proteins the native structure is

not necessarily the lowest free energy structure. The native structure for longer proteins

might be one of the lowest free energy structures that can be accessed easily during

the folding dynamics. The effects of temperature on the free energy landscape seem to

be larger for longer chains. While for small chains at lower temperatures, the lowest

free energy structure does not change by decreasing the temperature, for longer chains

at the same range of temperatures, the deepest point of the landscape may change to

another structure with slightly different bonds by decreasing the temperatures. Thus,

the native structure may be more sensitive to fluctuations in temperature for some of the

long proteins.

Chapter 4

Protein-like Chain Inside a Solvent

While some experiments have shown that a protein can fold into its native configuration

with apparently negligible solvent ordering effects[39], in nature, and consequently in

most experimental studies, folding occurs in the presence of a fluid environment (in vitro

or in vivo), where it has been suggested that “a significant portion of the fold-dictating

information is encoded by the atomic interaction network in the solvent-unexposed core

of protein domains”[13]. In this chapter, the thermodynamics of a protein-like chain

interacting via discontinuous potentials is examined in the presence of a square-well fluid

capable of forming bonds with selected parts of the chain.

4.1 The System

The system consists of a protein-like chain inside an environment with thousands of

solvent particles. The protein-like chain is the same beads on a string model described in

the previous chapter (Chapter 3) in which each bead represents one amino acid or residue.

All intra-chain interactions of the protein-like chain are the same as the main model of

the previous chapter, model B (cf. Table 3.1 and Fig. 3.2). Similar to the previous

chapter, beside introducing the reduced units for energy and temperature, physical units

in the definition of the model are also introduced to make contact with real proteins.

75

Chapter 4. Protein-like Chain Inside a Solvent 76

In particular, lengths will be expressed in Angstroms, energies in kJ/mol and masses in

atomic mass units.

4.1.1 The solvent model

The solvent consists of N molecules in a fixed volume V which interact via a square-well

potential. The square-well fluid has been studied extensively [60, 61, 62, 63, 64, 65, 66,

67, 68]. The interaction between any pair of solvent particles is a square-well potential,

depicted in Fig. 3.1(b). To be able to compare to the previous studies, a popular set

of parameters have been used, where σ and σ′ , representing inner and outer points of

discontinuity of the potential well satisfy σ′σ

= 1.5. σ and σ′ are chosen to be 4.16 A and

6.24 A respectively, and the potential depth for the square-well interaction between the

fluid beads, εl, is defined as (0.35/1.5)ε ' 0.23ε. Therefore, the energy of each hydrogen

bond between two solvent particles is εl=4.7 kJ/mol, which is a relatively weak hydrogen

bond in comparison to the intra-chain hydrogen bonds by the energy of 19.9 kJ/mol.

The mass of each fluid particle is chosen as ml = 0.15mp, where ml and mp are the

masses of a fluid particle and a chain bead respectively. This choice makes the fluid

particles much lighter than the chain beads. In physical units, the solvent particle mass

is very close to that of a water molecule, i.e., 18 amu, and the mass of each bead is very

close to an average mass of amino acid, i.e., 120 amu. Choosing relatively light solvent

particles influences the sampling efficiency of the simulations, and consequently the cost

of simulation runs.

The solvent and the chain interact as follows. The solvent particles can make bonds

with the chain beads i = 4k + 2, where k is a positive integer number, with the potential

depth of εl. The interaction range is the same as the hydrogen bonds between the chain

beads, in which the parameters for the square well σ1 and σ2 are chosen 4.64 A and 5.76 A,

respectively. Hence, the same beads that are involved in making attractive bonds inside

the protein-like chain are involved in making hydrogen bonds with the solvent particles.


Other chain beads have a hard sphere repulsive interaction with the solvent particles.

The hard sphere interaction range is set to a relatively large value of 6.4 A(1.54 σ) to

mimic the hydrophobicity of most amino acids. The main reason for choosing this value

is related to the possible number of bonds between a chain bead and the solvent particles.

Fixing the range of hydrogen bonds between chain beads and the solvent particles, the

range of hydrogen bonds between solvent particles and the hard core repulsion distances

were varied. It was found using this set of parameters especially by choosing this large

hard sphere repulsion distance, the number of bonds between a chain bead and the solvent

particles is limited to four bonds, which only happens at very low temperatures.

To simulate a system at a given density, the simulation occurs in a cubic box of size

L × L × L that contains N solvent particles and one protein-like chain. To minimize

finite-size effects, periodic boundary conditions are used. To avoid artifacts due to the

periodic boundaries, L should be chosen large enough to allow the protein-like chain

to be stretched without the last two end beads of the chain affecting each other either

directly or through solvent induced interaction. The maximum observed value for the

end-to-end vector in the previous study was used as the worst case scenario, and the

value for L was chosen to be comfortably larger. Because of the next-nearest neighbor

distance restriction, the maximum end-to-end distance can be determined analytically

from the model’s definition. The used values for L are roughly 10A larger than the

theoretical maximum end-to-end distance, which is itself substantially larger than the

observed end-to-end distance in the absence of a fluid. For example, for the 25-bead

chain the maximum observed value for the end-to-end vector is 64 A and theoretical

calculation shows the maximum possible value for the end-to-end vector is 76.8 A, while

the value for L is 88.0 A, which is 24 A larger than the maximum observed value in

the simulation runs and 11.2A larger than the theoretical maximum value. Following a

similar reasoning, for the ` =15, 20 and 25 beads chains, the values of L are set to 54.4 A

(13.08 σ), 72.0 A (17.31 σ) and 88.0 A (21.15 σ), respectively.


The reduced temperature is defined as T ∗ = kbT/ε, however another reduced tempera-

ture, T ∗l , is defined using the potential depth of the fluid particles square-well interactions

to make the comparison easier with earlier studies of the phase diagram of this type of

fluid. Hence, T ∗l is chosen to be T ∗

l = kbT/εl, where T ∗l = (ε/εl)T

∗ = (1.5/0.35)T ∗ '4.29T ∗. β∗ and β∗l are defined as the inverse functions of T ∗ and T ∗

l respectively. Note

that T ∗ = 1.0 corresponds to 2400K, while T ∗l = 1.0 corresponds to 560K and T ∗

l ' 0.5

is roughly room temperature.

Once the total volume of the simulation box has been determined, one can set the

number of particles N such that the solvent has the required density. The density of

the system is defined as ρ∗ = ρσ3, where ρ = NVl

, and Vl, the effective free volume that

fluid particles can occupy, is calculated as Vl = L3−Vexcl, where Vexcl is the approximate

excluded volume of the chain. To calculate the approximate excluded volume of the chain,

it is assumed that the protein-like chain lies completely straight and the distance between

two neighboring beads is 4.16 A, which is the mid point of vibrating distance of protein-

like beads. Then the volume of the cylinder around this chain, in which no other particle

can exist, is considered as the excluded volume. ρ∗ was chosen to be 0.5 and consequently,

N are 1066, 2522 and 4644 for the ` =15, 20 and 25 bead chains, respectively. As the

number of solvent particles required to avoid periodic boundary effects scales with the

third power of number of beads in the chain, exploring the energy landscape becomes

more challenging as the number of beads in the chain increases. Increasing the number

of beads makes the simulation runs more costly and the possible phase transition effects

in the studied system more apparent. Thus, the application of parallel tempering (PT)

becomes more challenging.

4.1.2 Definition of Configuration

Understanding how the configurations are defined is a necessary step to determine the

free energies of the configurations. In this study only intra-chain interactions are counted


to identify a configuration. Since there are additional interactions (solvent-chain and

solvent-solvent), a configuration does not have a unique energy within this model, in

contrast to the previous chapter in the absence of a solvent in which the energy of a

configuration was constant (Sec. 4.2.4). As was done in the previous chapter, a configu-

ration is represented by a string of alphabetical pairs. For example, BF represent a bond

between beads 2 and 6, and BF FJ JN represents the configuration with three bonds,

between beads 2 and 6, 6 and 10, and 10 and 14.

By identifying configurations without considering their bonds with solvent particles,

the free energies that will be found are averaged over the bond(s) with solvent particles,

in line with the ideas of Refs. [14] and [21]. The free energy values are further coarse-

grained in the sense that they are not a function of all the positions of the atoms in the

chain, but they are a function of the absence or presence of bonds.

4.1.3 Simulation Structure

As discussed in Chapter 2, the simulation is a combination of Discontinuous Molecu-

lar Dynamics (DMD) and the Parallel Tempering (PT) Method. The simulated system

consists of a number of replicated protein-like chains inside a solvent exploring the config-

urational space individually by the DMD method [36, 37]. All replicas evolve using DMD

for a fixed amount of time and then some of the replicas exchange their temperatures

according to the PT method, provided in Eq. 2.23. The simulation structure is very

similar to that used in the absence of an explicit solvent case discussed in the previous

chapter (Chapter 3). However, here the potential energy of the system depends not only

on the intra chain bonds but on the bonds between the chain and solvent particles as well

as the bonds between solvent particles. The velocities of all solvent particles and chain

beads of all replicas are drawn from the Maxwell-Boltzmann distribution both initially

and at the end of any replica exchange event. Since the velocities of all replicas are being

updated periodically using the Maxwell-Boltzmann distribution and the DMD dynamics


is reversible and preserves phase space volume, all necessary conditions for generating a

state with canonical distribution are satisfied [52].

Since studying the energy landscape of a protein-like chain in the presence of thou-

sands of particles is computationally demanding, we developed a parallel program using

the MPI (Message Passing Interface) technique [54]. The object-oriented setup of the

serial version of the code, used in the previous chapter for applying in the PT method, sig-

nificantly facilitates the implementation of the parallel version. In parallel programming,

the master processor is responsible for the measurement events as well as calculating the

replica exchange probabilities. The master node serves as a hub with which all the nodes

communicate. Each replica runs on one processor and the energy values of the replicas

are sent to the master processor at the replica exchange event, which determines whether

a temperature exchange should take place. The master node then sends each replica its

updated temperature, which can be the same as its temperature prior to the exchanging

attempt. Then, the velocities are drawn at each node (replica) using the updated tem-

perature, and each replica starts its DMD run again. The process of drawing velocities,

DMD dynamics, and PT exchange moves is repeated until enough independent statistics

on the frequency at which different configurations are seen (fobs, or, “population”) is

gathered. At specific times, all replicas also send their configuration matrices as well as

some other parameters to the master processor for storing or calculation.

4.2 Results

4.2.1 Parallel tempering efficiency

Efficient sampling of configurations at each temperature mainly depends on choosing

the number of replicas used, the temperature difference between successive temperature

∆β and the time between two consecutive replica exchange events in which each replica

follows DMD dynamics (PT update period). The choice of parameters for the PT sim-


ulations has a strong effect on the efficiency of dynamics. Choosing the most efficient

parallel tempering update period, when each replica system is being evolved prior to any

PT replica exchange event, plays a significant role in optimizing the computational cost.

Since decreasing the PT update period may cause the replicas to explore a smaller part of

the configurational space, there is an optimum value for the time between PT exchanges

for a fixed computational cost (fixed cpu hours) which has to be found by trial and error.

A key concept to assess the efficiency of a PT simulation is a PT period (or cycle), which

is the time for the replica to travel between the maximum and minimum temperatures

and back[69]. For efficient sampling, several cycles should be observed in one run. For a

fixed computational cost (i.e. run-time), it was found that there is a specific value of PT

interval time, that results in the maximum number of cycles. This value is quite different

for various lengths, `, of the chain under consideration. However, the number of inter-

action events that happen during each PT update period for 15, 20 and 25 beads chains

are similar. This provides us with a good guess for the optimum value of the PT update

period of the larger systems based on the results of smaller systems, and facilitates the

trial and error process.

In principle, increasing the number of replicas makes it possible to study any range of

temperature. However, it was found that when the PT system contains a large number of

replicas, some of the replicas may not move very well among the full range of temperatures

during one PT cycle. This can lead to a prohibitively inefficient PT dynamics. For

example, good dynamics was rarely observed in the system containing more than 200

replicas. In addition, it was found that the presence of a phase transition in the solvent

reduces the range of temperatures that can be studied.

As discussed in section 2.3.2, to have a proper PT dynamics, in most of cases ∆β

should vary with β. As an example, in Fig. 4.1, proper dynamics for the 15-bead chain

can be seen in which a range of temperatures between T ∗l = 0.76 and T ∗

l = 2.5 is

investigated by 95 replicas. For this case the inverse temperature difference ∆β∗l for


0

10

20

30

40

50

60

70

80

90

100

0 50 100 150 200 250 300 350

β In

dex

PT Replica Exchange Event (/1000)

Figure 4.1: Proper temperature dynamics for one of 95 replicas for ` = 15.

the 10 replicas with the highest temperatures is 0.012 and in the next 60 replicas the

∆β∗l decreases linearly to 0.008 and then it remains constant. This means that ∆β∗ is

larger at higher temperatures, and it decreases when the temperature decreases. Plots

like the one in Fig. 4.1 are a helpful tool in checking for poor sampling. The example in

Fig. 4.2 shows what such a plot looks like for a poorly behaving PT simulation in which

a range of temperatures between T ∗l = 0.82 and T ∗

l = 2.5 is investigated by 79 replicas.

For this case, the PT update period is 2 ps, which is 2.5 times larger than the previous

case in Fig. 4.1. ∆β∗l for the highest 30 temperatures is 0.012, and then for the next 40

temperatures the ∆β∗l decreases linearly to 0.008 and then ∆β remains constant for the

rest of temperatures.

4.2.2 Phase of the solvent

One of the important aspects of this study is related to the phase of the solvent, since the

whole study is based on the presence of a fluid around the protein-like chain. Figure 4.2

shows an apparent barrier in the PT dynamics at a specific temperature at which replicas


0

10

20

30

40

50

60

70

80

0 20 40 60 80 100 120 140

β In

dex

PT Replica Exchange Event (/1000)Figure 4.2: The effect of phase transition, which happens around β index 50, on the PT

dynamics for one of 79 replicas for ` = 15.

have a strong tendency to stay above, or below, that specific temperature, and rarely

cross it especially for larger systems. While by choosing proper ∆β∗l and efficient PT

update time, the effect of this barrier becomes very small in the PT dynamics (Fig. 4.1

compared to Fig.4.2), the barrier effect becomes very apparent almost with any set of

∆β∗l for the larger systems (i.e. for ` = 20 and ` = 25). It turns out that the apparent

barrier is related to the phase of the solvent.

The highest temperature in the simulations was T ∗l = 2.5 for all chain lengths, while

the lowest temperatures (T ∗l ) for different chains were 0.76, 1.05 and 1.22 for the 15-

bead, 20-bead and 25-bead system respectively. The square-well model for the solvent

has been studied extensively [60, 61, 62, 63, 64, 65, 66] and for the model used here

with ρ∗ = 0.5 and λ = σ′/σ = 1.5, the critical reduced temperature, T ∗c , for the solvent

is predicted to be 1.2172 [60], 1.210 (in Ornstein-Zernike approximation) [62], 1.3603

(using an analytical equation of state based on a perturbation theory) [62], 1.226 [63],

1.2180 [64], 1.27 [65] and 1.218 [66]. Most of the previous studies [60, 61, 62, 67], predict


a vapor-liquid coexistence line to be crossed somewhere between T ∗l = 1.0 and T ∗

l = 1.2

for ρ∗ = 0.5 and λ = 1.5. Since the model studied here contains a few thousand particles,

which is far from a real thermodynamic system with an order of 1023 molecules, finite

size effects may shift the apparent critical temperature.

As a first check to confirm the fluid-like character of the solvent model, the radial

distribution function (RDF) of the solvent was studied for four different temperatures.

These are plotted in Fig. 4.3 and Fig. 4.4, and show fluid behavior with no sign of any

phase transition. Due to the two discontinuities in the solvent interaction potential at σ

and σ′, respectively, the radial distribution function is relatively high between these two

points. For T ∗l = 2.0, the RDF graph 4.3(a) is very similar to what was found for this

model in the earlier studies (3rd graph in Fig. 2 in Ref. [63]). Also at this temperature,

fluid-like long range correlation can be seen in which the peaks are smaller than the

peaks at the lower temperatures. The RDF for T ∗l = 1.25, Fig. 4.3(b), and that for

T ∗l = 0.83, Fig. 4.4(a), look like those of a typical fluid with more distinct peaks than

the high temperature RDF. At relatively low temperatures, as in Fig. 4.4(b), the onset

of short range structural peaks may be showing itself in the first two peaks, while still

other peaks show a fluid-like behavior, but there is no clear sign of a phase transition.

RDFs are, however, not a very good indicator of a phase transition, especially for a

second order phase transition, such as between two fluid phases. Better indicators are

the heat capacity Cv and the compressibility κ, which are second derivatives of the free

energy. Cv can be measured from the fluctuations in energy, while κ can be estimated from

fluctuations in local density. For calculating the compressibility, the system is divided

into several boxes and the densities in each box and the standard deviation of the local

density are calculated. Numerical estimates for the heat capacity and compressibility are

plotted in Figs. 4.5 and 4.7. The range of studied temperatures is clearly sufficient to

observe the effects of a phase transition for smaller systems. This phase transition occurs

at a temperature that is very close to the temperatures at which other studies predict


(a)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 1 2 3 4 5 6

g(r)

r/σ(b)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 1 2 3 4 5 6

g(r)

r/σ

Figure 4.3: Radial Distribution Function: the (a) T ∗l = 2.0, and (b) T ∗

l = 1.25. Lines are

drawn to guide the eye.


(a)

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5 6

g(r)

r/σ(b)

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5 6

g(r)

r/σ

Figure 4.4: Radial Distribution Function: the (a) T ∗l = 0.83, and (b) T ∗

l = 0.31. Lines

are drawn to guide the eye.


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6

CV

/ N

Tl*

2592 particles1500 particles1118 particles500 particles

Figure 4.5: Heat capacity per particle vs. the liquid reduced temperature for N = 500,

1118, 1500 and 2592 in the absence of the Protein-like chain.

the liquid-vapor coexistence line for this density.

Even when dividing the heat capacity by the number of particles in the system, as

in Fig. 4.5, the average heat capacity per solvent particle still increases with increasing

system size at the phase transition point. This suggests that for infinitely large systems,

the heat capacity diverges to infinity at the phase transition. To understand the order of

phase transition, a further study would be required which lies outside the scope of this

project.

While these results are for a pure solvent system, our studies revealed that there is no

major difference in the behavior of the heat capacity and the compressibility for systems

containing a protein-like chain. In Fig. 4.6, heat capacities of the systems with the same

size and densities with and without a chain are presented. As can be seen in this figure,

the heat capacity of a system containing the 15-bead chain (or the 20-bead chain) behaves

very similarly to the heat capacity of a system with the same size containing only solvent

particles.


0

2000

4000

6000

8000

10000

12000

14000

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6

CV

Tl*

1118 particles15-bead chain system

2592 particles20-bead chain system

Figure 4.6: Comparison between systems with the same density (ρ = 0.5): The 15

bead-chain system contains 1066 solvent particles and the pure solvent system with 1118

particles both have the same box size (L=54.4 A). The 20-bead chain system contains

2522 solvent particles and the pure solvent system with 2592 particles both have the

same size (L=72.0 A)

Studying the variation of the compressibility versus volume is another common ap-

proach to investigate phase transitions. To calculate the compressibility, the simulation

box is divided into smaller boxes, where each box has a size of (L/6) × (L/6) × (L/6).

Then the densities in each box and the standard deviation of the density in the box are

calculated. Since each box can exchange particles with its neighboring boxes, each box

is in the grand canonical state and the compressibility of the system can be calculated

using the variation in the number of particles. The calculation details are presented in

appendix A. In Fig. 4.7, the variation of the compressibility vs. temperature shows a

similar behavior to that observed for the heat capacity. By increasing the system size,

the compressibility seems to diverge to infinity around the same point where the heat

capacity diverges. This confirms that there is a phase transition at this point.


0

2

4

6

8

10

12

14

16

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6

Com

pres

sibi

lity

Tl*

1118 particles1500 particles2592 particles

Figure 4.7: Compressibility vs. the liquid reduced temperature for N = 1118, 1500 and

2592.

4.2.3 Observed structures and Free energy landscape

Simulations using PT and DMD were performed for three different chain lengths: ` = 15,

20 and 25, all in a liquid at density ρ∗ = 0.5. The ranges of temperature and number of

replicas differed in all these cases. The quantities of interest were the frequencies of oc-

currence (fobs) of each configuration (denoted in the alphabetic representation explained

in Sec. 3.1.1). The most frequently occurring configuration at any given temperature will

be called the dominant configuration if its population is clearly higher than the second

most common structure.

All errors reported below indicate the 95% confidence intervals, which is equal to

1.96 times the standard deviation for normally distributed errors. The term “energy”

here only refers to the potential energy, and the term “system energy” refers to the

potential energy of all the solvent particles and beads energies in a box containing a

specific configuration (all possible hydrogen bonds), while the configuration energy only

refers to the intra-chain bonds energy, and the term “bond” refers to a hydrogen bond


and not the repulsive interactions, unless otherwise specified.

The 15-bead chain

For ` = 15, 95 temperature values were selected for the replicas, such that ∆β∗l = 0.012

for the highest 10 temperatures and then for the next 60 temperatures, the ∆β∗l linearly

decreases to become 0.008 at the 70th highest temperature and then ∆β value remains

constant for the the rest of temperatures, while the range of studied temperatures is

T ∗l = [0.76 , 2.5] (β∗ = [1.7 , 5.64]). The most efficient PT update period was found to be

0.8 picoseconds, while 1.2 picoseconds also generated very efficient PT dynamics. These

values are smaller than the PT update period used in the absence of a solvent, which

was two picoseconds. As mentioned above, the number of solvent particles appropriate

for this case to generate the density of 0.5 is N = 1066, while the length of the sides of

the periodic box are set at L = 54.4 A.

In Table 4.1 the results for the dominant configurations at different temperatures are

presented for the system in the presence and in the absence of any solvent(Chapter 3).

One sees that at low enough temperature one structure (BF FJ JN) becomes clearly

dominant as its probability exceeds 60%.

In Table 4.2, the populations and the average system energies of the most common

structures are provided for T ∗l = 0.816 (β∗ = 5.25), which is a relatively low temperature.

The configuration with the lowest configurational energy is observed to be also the one

with the lowest total system energy. One also sees that the next three common structures,

configurations 2, 3 and 4 in Table 4.2, are all very close in their frequencies of occurrence,

fobs, as well as in their average system energy. Here the energy landscape (the free energy

as a function of chain conformation) consists of a relatively deep point at BF FJ JN, and

three minima beside this point located inside a wide funnel. The depth in the landscape

associated with each configuration is proportional to its free energy and the relative

distance between any pair of configuration is proportional to the similarities of the two


configurations. Therefore, the next three deepest minima (configurations 2, 3 and 4)

are very close to the deepest point since the configurations differ by only one bond from

the first configuration, and the free energy barriers between these configurations seem to

be small. Since the last three configurations in Table 4.2 also differ by only two bonds

from the first configuration, their locations in the landscape should be further from the

deepest point in the way that the configurations 2,3 and 4 should be located between the

deepest point and these configurations. A rough picture of this landscape is presented

in Fig. 4.8 in which the distances between the structures are based on their similarities

and the area differences are related to differences in their computed entropy (calculated

in the absence of the solvent). As can be seen, the lowest free energy structure at low

temperatures, BF FJ JN, is located in the middle, and the other structures based on

their similarities to the BF FJ JN, are located around this point. For example, BF FJ

is located between the deepest point (BF FJ JN) and BF and FJ. This diagram gives

some idea about the folding pathways. For example, to reach the lowest energy structure

with three bonds from the structure with no bonds, initially, the first bond and then the

second bond should be made.

According to Table 4.2, the lowest energy configuration, BF FJ JN, is associated

with the lowest system energy as well. However, since the uncertainty in the computed

energy of the system of the 7th structure in Table 4.2 is relatively large, this can not

be completely verified by the values provided in the table. However, there are good

arguments for why the first configuration should have the lowest total energy. The

first configuration is the most populated one for all the 66 temperatures that lie in

T ∗l = [0.76 , 2.0], so it is the lowest free energy system at these temperatures. As discussed

in the previous chapter, by adding more bonds and consequently adding more geometrical

restrictions, the configurational entropy decreases and therefore, BF FJ JN has the lowest

configurational entropy among 15-bead configurations. Under the assumption that the

average entropy contributions from the solvent particles for different configurations are


BF FJ JN

BF FJ

FJ JN BF JN

BFFJ

JN

No Bond

No Bond

Figure 4.8: A rough picture of the 15-bead chain landscape.

very similar, it can be concluded that BF FJ JN system should have the lowest energy for

T ∗l = [0.76 , 2.0]. It is expected that for the short chains, where the chains do not collapse,

the energy difference between two systems mainly depends on their configurational energy

difference and the average energy contribution from the solvent particles will be the same

for different systems energies. For example, at β∗ = 5.25 (T ∗l = 0.816) BF FJ JN and

BF JN, the two most common configurations of Table 4.2, have on average 0.1 ± 0.02

and 0.19 ± 0.3 bonds with solvent particles, respectively. Hence, the contribution to

the energy difference of their systems from the bonds between solvent particles and the

chain beads is around 0.02 ε, while their configurational energy difference is 1ε. The

average number of bonds that each solvent particle makes with other solvent particles at

β∗ = 5.25 (T ∗l = 0.816) is around 5.1.

If, as is plausible, “BF FJ JN” has the lowest system energy, one may expect that

the population of this configuration will approach 100% at lower temperatures where the


β∗ Inside Solvent fobs(%) Without Solvent fobs(%)

1.8 No bond 18.2 ± 0.8 No bond 41.3 ± 1.5

2.4 No bond 18.9 ±0.6 No bond 30.15 ± 1.6

3.0 No bond 11.3 ± 0.8 No bond 19.7 ± 1.3

3.6 BF FJ JN 24.0 ± 0.9 BF JN 17.1 ± 1.2

4.2 BF FJ JN 37.7 ± 0.8 BF FJ JN 26.7 ± 1.5

4.5 BF FJ JN 43.2 ± 0.9 BF FJ JN 35.0 ± 1.5

4.8 BF FJ JN 53.3 ± 0.8 BF FJ JN 44.1 ± 1.6

5.1 BF FJ JN 60.5 ± 1.2 BF FJ JN 55.1± 1.8

5.4 BF FJ JN 67.3 ± 0.9 BF FJ JN 61.5 ± 1.6

9 NA NA BF FJ JN 98.6 ± 0.4

Table 4.1: Most common configurations of the 15-bead chain for different temperatures,

with and without the solvent.

Rank Configuration fobs(%) Average System Energy Chain Energy

1 BF FJ JN 64.9 ± 0.9 -1273.3 ± 0.3 -3

2 BF JN 10.9 ± 0.5 -1271.7 ± 0.7 -2

3 FJ JN 9.6 ± 0.5 -1272.0 ± 0.7 -2

4 BF FJ 8.9 ± 0.5 -1271.8 ± 0.7 -2

5 JN 1.5 ± 0.1 -1271.9± 1.8 -1

6 FJ 1.5 ± 0.1 -1272.1 ± 1.9 -1

7 BF 1.3 ± 0.1 -1272.6 ± 1.8 -1

Table 4.2: Most common configurations of the 15-bead chain with the solvent environ-

ment at T ∗l = 0.816(β∗ = 5.25).


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16

Mos

t Com

mon

Str

uctu

re P

roba

bilit

y

β*

s15 explicit solvents15 without solvent

Figure 4.9: Variation of the probabilities of the most common structure versus the β∗ for

the 15-bead chain with and without the solvent.

free energy mainly depends on the energy of the system and little on the system entropy.

This expected trend of the population is very similar to the previous chapter results for

the chains smaller than 30 beads, where one structure becomes completely dominant at

low temperatures. The population trends of the 15-bead chain dominant structure both

inside and in the absence of the solvent are compared in Fig. 4.9, where in both cases

(with and without a solvent) the probability of the dominant structure reaches a high

value. This behavior happens at higher temperatures inside the solvent, which suggests

that the hydrophobic effects of 75% of the chain beads assist the folding process and

make the helical structures more favorable. In all the studied temperatures, the average

radius of gyration for the 15-bead chain inside the solvent is smaller than in the absence

of the solvent case, which shows that the solvent is a poor solvent.


The 20-bead chain

Generally, it is harder to study a wide range of temperatures for large systems due to

the need for using smaller ∆β and consequently, a higher number of replicas. However,

as was discussed in section 4.2.1, for this case, the study of a wide range of temperatures

becomes extremely hard due to the effect of the phase transition on the PT dynamics.

Consequently, for ` = 20, the range of studied temperatures, from T ∗l = 1.05 to 2.5 (β∗ =

[1.7 , 4.1]), is smaller than ` = 15. To study the energy landscape, 79 temperature values

were selected for the replicas such that ∆β∗l = 0.008 for the 40 highest temperatures and

then ∆β∗l decreases to 0.006 and remains constant for the the rest of temperatures. For

` = 20, the number of appropriate solvent particles to generate a density of 0.5 for the

system with the periodic box of L = 72 A is N = 2522. The most efficient PT update

period was found to be 320 femtoseconds, which is smaller than the value of the 15-

bead case. While it was found that 320 femtoseconds is the most efficient value for the

PT update period of the 20-bead chain system, it seems that there is a range, 320-360

femtoseconds, that yield similar efficiencies. As mentioned around T ∗l = 1.1, the heat

capacity and compressibility of the fluid show signs of a phase transition, in line with

other studies that predict the location of the vapor-liquid coexistence line. Because of

the phase transition, sampling a wider range of temperatures resulted in poor dynamics

even when using small values for ∆β∗.

The dominant configurations at different temperatures in the presence and in the

absence of the solvent are presented in Table 4.3. Since very low temperatures could not

be reached, it was not possible to study whether the probability of the most common

structure approaches one at lower temperatures. Table 4.4 shows that at β∗ = 3.9, BF

FJ JN NR is the most common structure while BF BR FJ JN NR, which has the lowest

configurational energy, has a smaller population. BF FJ JN NR, is a complete helical

structure that has all the necessary helical bonds between every two consecutive turns

of the protein-like chain; while BF BR FJ JN NR, the lowest energy configuration, is


a collapsed helical structure with a bond BR that connects the two ends of the chain.

Hence, BF FJ JN NR, an unfolded but complete helical structure, has a much higher

entropy than the lowest energy configuration, while their energies are close since they

only differ by one internal bond. Furthermore, the complete helical structure can make

more bonds with the solvent particles because of its non-collapsed shape: 17% of the

population of the complete helical structure make bonds with the solvent particles at

β∗ = 3.9 (T ∗l = 1.1), while only 4% of the lowest energy configuration population make

bonds with the solvent particles. The number of bonds with the solvent particles also

shows that in this model the structures are not soluble, and the energy contribution from

bonds between the chain beads and the solvent particles to the system energy is relatively

small.

According to Table 4.4, the energies of the most common structures are very close to

each other. It is therefore hard to predict whether there is a dominant structure at lower

temperatures as seen in the previous Chapter for the 20-bead chain in the absence of a

solvent. While a complete helical structure can make more bonds with solvent particles

because of its structure in comparison with the lowest energy configuration, the average

number of bonds with the solvent particles is still less than one. It is expected that the

average energy contribution from the bonds between solvent particles becomes very simi-

lar for different configurations. However, it would require much better sampling statistics

than what was obtained to check this prediction. Since the energy of each intra-chain

bond is equivalent to 4.29 bead-solvent bonds, it is expected that the system containing

the lowest energy configuration should be the lowest energy system. This means that the

lowest energy configuration should become the most common structure at lower tempera-

tures. The reason that the lowest energy configuration does not become dominant at the

studied temperatures is that the BR bond greatly restricts the configurational freedom

and therefore, the non-collapsed helical structure, having one less bond but with much

larger entropy, becomes the most common structure.


The population of the lowest energy configuration with the largest number of bonds

becomes almost equal to that of the structure with no potential energy (no bond) at

β∗ = 3.27, while this happens at lower temperature, β∗ = 4.05, in the absence of a solvent.

At β∗ = 3.27 in the solvent environment, only 15% of the lowest energy configurations

make bonds with the solvent particles, while 89% of the “ no bond” structures make bonds

with the solvent particles. By assuming that the energy contribution from bonds between

solvent particles is almost the same for these two systems, it can be concluded that the

average system energy difference in this case is likely less than 5ε. The populations of two

structures become equal when their free energy difference is around zero. Therefore, the

entropy difference of the no bond structure and the lowest energy configuration, ∆S, can

be calculated which almost represents the maximum configurational entropy difference.

According to this calculation, in the absence of the solvent ∆S ' 20.25kb and in the

solvent environment ∆S ≤ 16.35kb. Hence, having hydrophobic chains in this model

results in a smaller entropy range, which indicates that in comparison to the absence

of a solvent, the probability of the dominant configuration may approach one at higher

temperatures. This behavior can be seen clearly for the 15-bead chain in Fig. 4.9.

The 25-bead chain

The 25-bead system includes 4644 solvent particles, which is nearly twice the number of

solvent particles in the 20-bead system, in a box of L = 88 A. According to Fig. 4.5, the

temperature at which the phase transition behavior is observed increases slightly with

increasing N . Therefore, the range of temperatures that could be investigated for the

25-bead chain system is even smaller than the 20-bead case. A set of temperatures with

95 replicas was chosen, such that for the 20 highest temperatures ∆β = 0.006, and for

the rest of temperatures ∆β = 0.004, while the range of studied temperatures is T ∗l =

[1.22 , 2.5] (β∗ = [1.7 , 3.5]). The most efficient PT update period is 120 femtoseconds,

which is even smaller than the 20-bead case. This confirms that by increasing the size


β∗ Inside Solvent fobs(%) Absence of Solvent fobs(%)

1.8 No bond 7.2± 0.5 No bond 28.3 ± 1.6

2.4 No bond 7.8± 0.5 No bond 21.5 ± 1.3

3.0 No bond 4.6± 0.4 No bond 11.6 ± 1.1

3.3 BF FJ JN NR 6.5 ± 0.5 BF 7.3 ± 0.9

3.6 BF FJ JN NR 10.3 ± 0.6 BF NR 7.6 ± 0.8

3.9 BF FJ JN NR 12.3 ± 0.6 BF JN NR 9.8 ± 0.8

4.5 N/A N/A BF FJ JN NR 16.3 ± 1.3

6.0 N/A N/A BF BR FJ JN NR 47.7 ± 1.6

10.5 N/A N/A BF BR FJ JN NR 99.1 ± 0.3

Table 4.3: Most common configurations of the 20-bead chain inside and in the absence

of the solvent.

Rank Configuration fobs(%) Average Total Energy Chain Energy

1 BF FJ JN NR 12.3±0.6 -2482.6 ± 1.4 -4

2 BF FJ NR 8.5 ± 0.4 -2484.4 ± 1.6 -3

3 BF JN NR 8.2 ± 0.4 -2483.1 ± 1.6 -3

4 BF FJ JN 7.3 ±0.5 -2482.6 ± 1.8 -3

5 FJ JN NR 7.3 ± 0.5 -2482.7 ± 1.8 -3

6 BF BR FJ JN NR 5.3 ± 0.4 -2483.6 ± 2.2 -5

7 BF JN 4.6 ± 0.4 -2484.1 ± 2.2 -2

Table 4.4: Most common configurations of the 20-bead chain inside the solvent at β∗ = 3.9

(T ∗l = 1.1).


β∗ Inside Solvent fobs(%) Without Solvent fobs(%)

1.8 No bond 2.7±0.3 No bond 21.7±1.3

2.4 No bond 3.3±0.3 No bond 15.5±1.0

3.0 NR 1.3±0.2 No bond 6.7±0.9

3.3 BF BR BV FJ FV JN NR RV 2.6±0.2 No bond 4.3±0.6

4.5 N/A N/A BF BR BV FJ FV JN NR RV 7.5±1.0

9.0 N/A N/A BF BR BV FJ FV JN NR RV 98.0±0.4

Table 4.5: Most common configurations of the 25-bead chain inside and in the absence

of the solvent.

of system and increasing the number of particles, the most efficient PT update period

decreases.

According to Table 4.5, the structure with the lowest configurational energy becomes

dominant at higher temperatures in comparison to the previous chapter study in the ab-

sence of a solvent. However, the range of studied temperature is not sufficient to observe

a very deep funnel in the free energy landscape at low temperatures that was observed

in the absence of a solvent. The most common structures of the 25-bead chain inside a

solvent at β∗ = 3.3 are presented in Table 4.6. While the populations of configurations

3-10 are equal within statistical error, the population of the first configuration (with 8

bonds) is clearly higher than that of the other configurations. Our study reveals that

the first configuration with the most number of bonds is clearly the most populated con-

figuration for T ∗l ≤ 1.32 (β∗ ≥ 3.24). This means that for all the temperatures in the

range 1.22 ≤ T ∗l ≤ 1.32, the structure with the lowest configurational energy is the most

common structure. Since the configurational entropy decreases by increasing the number

of bonds (because of adding more restrictions), the first configuration should have the

lowest configurational entropy. Since the first configuration has been the most common

structure at the lowest studied temperature, the first configuration system should be


Rank Configuration fobs(%) Average Total Energy Chain Energy

1 BF BR BV FJ FV JN NR RV 3.4 ± 0.3 -4219.9 ±1.9 -8

2 BF FJ JN NR RV 2.0 ± 0.3 -4217.7 ± 2.5 -5

3 FJ JN NR RV 1.5 ± 0.2 -4216.0 ± 2.7 -4

4 BF FJ JN NR 1.5 ± 0.3 -4219.4 ± 2.9 -4

5 BF FJ JN RV 1.5 ± 0.3 -4215.8 ± 3.0 -4

6 BF BR BV FJ JN NR RV 1.3 ± 0.2 -4218.4 ± 2.9 -7

7 BF FJ NR RV 1.3 ± 0.2 -4215.2 ± 3.0 -4

8 BF JN NR 1.3 ± 0.1 -4213.0 ± 3.7 -3

9 JN NR RV 1.2 ± 0.1 -4216.3 ± 2.9 -3

10 FJ JN RV 1.2 ± 0.1 -4213.1 ± 3.1 -3

Table 4.6: Most common configurations of the 25-bead chain inside the solvent at β∗ = 3.3

(T ∗l = 1.30).

the lowest system energy at these temperatures. It is expected that by decreasing the

temperature, the order of system energies does not change dramatically and therefore,

when decreasing the temperature, the first configuration system likely remains the one

with the lowest energy and therefore, the population of this structure should approach

one at low temperatures, similar to the results of the previous chapter.

A similar reasoning for the 20-bead chain leads to the prediction of a large config-

urational entropy difference between the lowest energy configuration with a completely

collapsed shape (first configuration of Table 4.6) and the complete helical structure (sec-

ond configuration of Table 4.6). Therefore, one anticipates that the non-collapsed helical

structure has the highest population for the limited range of temperatures that was

studied in the simulations. However, because of the the energy difference of 3ε, the first

configuration becomes dominant, even at not very low temperatures. This is unlike the

20-bead chain, for which the complete helical structure (1st configuration of Table 4.4) is


-4

-2

0

2

4

6

8

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6

Ent

ropy

Dif

fere

nce

Tl*

Figure 4.10: The system entropy difference for two 15-bead chains, ∆S = SBF−FJ −SBF−JN , versus the liquid reduced temperature in the solvent environment.

dominant at similar temperatures. Consequently, the probability of the collapsed helical

structure of the 25-bead chain approaches one at higher temperatures as it does in the

absence of a solvent.

4.2.4 Relative configurational entropy

Unlike the previous study of a protein-like chain with no solvent, where the relative

configurational entropy did not depend on temperature, the relative configurational en-

tropy of the solvated polymer system depends on temperature due to the way that the

configuration is defined here which only depends on the chain beads positions and not

on the solvent particles. Even if it is assumed that the relative configurational entropy

does not depend strongly on temperature, there are several obstacles to calculate the

configurational entropy of each structure. In the absence of a solvent, the configura-

tional entropy difference of two structures at a specific temperature in units of kb was

calculated using ln(f1

f2) + β(U1 − U2), where f1 and f2 are the populations and U1 and


U2 are the average system energies of the two structures at the inverse temperature β.

With no solvent present, U1 and U2 depend only on the number of intra-chain bonds, and

hence are constant for a fixed pair of configurations. In the presence of solvent particles,

there are relatively large energy fluctuations in the system energy because of the large

number of particles and the limited number of samples. The standard deviation of the

average energy of the solvent particles is calculated according to σmean = σ/√

N , where

σ is the standard deviation of the distribution of energy of the solvent particles and N

is the number of samples. It is clear that by increasing N the standard deviation of the

average energy decreases and the statistical uncertainty scales as the square root of the

number of samples. Even if the statistical uncertainties in the estimate of the energy

were improved by better sampling, the calculated value would not be the relative config-

urational entropy of the two structures, but rather, the average relative entropy of the

two systems, including the solvent particles. As with the average energy, there are large

fluctuations of the entropy of the system which scale with the total number of particles

in the system, and are particularly significant around the phase transition temperature

of the solvent. Therefore, inside the solvent, it is extremely difficult to calculate the con-

figurational entropies of the protein-like chains accurately using their populations. For

example, while in the absence of a solvent the relative entropy for “BF FJ” and “BF

JN” is 0.37± 0.21kb, where “BF JN” has a higher configurational entropy, the statistical

uncertainty in computed relative system entropies in the solvent is too large, as can be

seen in Fig 4.10.

Chapter 5

Simple Dynamics Using

Smoluchowski Equation

In this Chapter a simple model of the dynamics of a protein-like chain is introduced

and the equilibrium folding dynamics is analyzed as a function of temperature. The

model consists of a single protein-like chain in which the monomers interact via the

discontinuous potentials introduced in Chapter 3 (model B) immersed in a solvent of

particles that interact with the monomers via hard core collisions at short distances. It is

assumed that the solvent particles interact on a time scale that is fast compared to the

time scale of structural rearrangements between conformations. In this limit, the motion

of the monomers is governed by the Smoluchowski equation with a configurationally

independent diffusion coefficient. We demonstrate that there are important qualitative

differences in the folding dynamics as the length of the chain increases.

5.1 Model

The discrete nature of the interactions allows configurational space to be partitioned into

microstates by defining an index function for a configuration c that depends on the set

103

Chapter 5. Simple Dynamics Using Smoluchowski Equation 104

of spatial coordinates of the chain R

χc(R) =

1 if only bonds in c are present,

0 otherwise.

The partitioning of configurational space arises naturally by expanding the product in

the identity

1 =

nb∏i=1

(1−H(xi − xc) + H(xi − xc)) =

nb∏i=1

(Hb(xi − xc) + H(xi − xc)

)

=ns∑

k=1

χck(R), (5.1)

where nb is the number of attractive bonds in the model, ns = 2nb is the number of

microstates, Hb(x) = 1−H(x) = H(−x), and H(x) is the Heaviside function

H(x) =

1 x ≥ 0

0 otherwise.

In Eq. 5.1, xi is the distance between monomers in the ith bond, and xc (known as σ2 =

5.76 A in the previous chapters) is the critical distance at which a bond is formed. For

notational simplicity, we order the index of configurations based on the number of bonds

starting with the configuration with no bonds, χ1(rN) =

∏nb

i=1 H(xi − xc), and ending

with the configuration with the maximum number of bonds, χns(rN) =

∏nb

i=1 H(xc−xi).

As was seen in Chapter 3, the configurational space can be unambiguously partitioned

into microstates whose equilibrium populations can be estimated. Therefore, one can

also estimate the cumulative distribution functions, probability densities, and potential

of mean force associated with the formation of a bond. For example, for the 25-bead

chain, the configuration c2 = BFBR can be formed from the configuration c1 = BF by

the formation of the BR bond, which occurs when the distance xBR = |rB − rR| is less

than the critical bond formation distance xc. One can define a probability density ρc1(x)


and the cumulative distribution Cc1(x) =∫ x

0dy ρc1(y) in terms of canonical ensemble

averages restricted over microstates as

ρc1(x) = 〈δ(x− xBR)〉c1 , (5.2)

where the notation 〈B(R)〉c1 denotes the normalized uniform average

〈B(R)〉c1 =

∫dR χc1(R) B(R)∫

dR χc1(R).

One simple way to estimate ρc1(x) is to construct histograms of the distance xBR from

Monte Carlo simulations. Since the probability density and the cumulative distribution

function are independent of temperature, the distance xBR from any instantaneous con-

figuration that satisfies the bonding criteria for configuration c1 can be used. A more

appealing way of constructing analytical fits to the densities and cumulative distribu-

tion functions is to use a procedure that constructs these quantities from sampled data

using statistical fitting criteria[70]. The temperature-dependent potential of mean force

φc1c2(x) connecting states c1 and c2 can be computed from the probability densities ρc1(x)

and ρc2(x) by first considering the cumulative distribution function connecting the two

states,

C1→2(x) =

e∆S∗eβε

1+e∆S∗eβε Cc2(x) x < xc

11+e∆S∗eβε Cc1(x) + e∆S∗eβε

1+e∆S∗eβε x ≥ xc

, (5.3)

where ∆Uc1c2 = −ε and ∆S∗ is the relative entropy difference of the configurations

divided by the Boltzmann constant, kb. Noting that ρc(x) = dCc(x)/dx, we find that

ρ1→2(x) =

e∆S∗eβε

1+e∆S∗eβε ρc2(x) x < xc

11+e∆S∗eβε ρc1(x) x ≥ xc

, (5.4)

which is discontinuous due to the nature of the potential at x = xc. The potential of mean

force, φ1→2(x) = − ln ρ1→2(x), describes the reversible work associated with pulling the

system from configuration c1 to c2 by reducing the distance x = xBR between monomers B


1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4NR bond distance

-2

-1

0

1

2

3

4

Pote

ntia

l Mea

n Fo

rce

(di

men

sion

less

)

β* = 1

β* = 2

β* = 3

Figure 5.1: The potential of mean force in dimensionless units as a function of the critical

bond distance, where each distance unit is equivalent of 3.2 A.

and R. Due to the simplicity of the interaction potential, the potential of mean force can

be computed at any inverse temperature β from the temperature independent densities

ρc1 and ρc2 . Note that the potential of mean force φ1→2(x) includes an effective volume

factor of the form −2/β ln x that arises from the conversion of Cartesian coordinates

into spherical polar coordinates. This volume factor leads to a temperature independent

entropic barrier for the formation of a bond. In Fig. 5.1, the potential of mean force is

plotted as a function of the NR distance for configurations c1 = BF BR BV FJ JN RV

and c2 = BF BR BV FJ JN NR RV for a range of different inverse temperatures. Note

the entropic barrier between the non-bonded state with x > xc = 1.8 (5.76A) and bonded

states with x < xc. We shall see in the next section that the densities ρi and cumulative

distributions Ci play an important role in determining the rate constants relating the

rate of population transfer between microstates.


5.2 Smoluchowski dynamics

We assume that the monomer-solvent interactions lead to a fast decay of correlations

of the monomer position and momentum, so that the dynamics of dynamical variables

B(rN , t) is governed by the Smoluchowski equation

B(rN , t) = L†B(rN , t), (5.5)

with formal solution B(rN , t) = exp{L†t}B(rN , 0), where the Smoluchowski operator is

given by

L† = D(∇rN − β∇rN U) · ∇rN , (5.6)

where U is the potential of mean force describing monomer-monomer interactions aver-

aged over the solvent bath. In Eq. 5.6, we have defined an inner product over monomer

vector positions as rN · rN =∑N

i=1 ri · ri, where ri is the position vector of monomer i.

The dynamics of the monomers can be justified from first principles by considering

the full dynamics of the system and applying projection operator methods[71]. To arrive

at Eq. 5.6, we have assumed that there is a clear separation of time scale between the time

scale for decay of correlations involving functions of the bath and correlation functions

of the monomer positions. We will also concern ourselves with long time dynamics in

the “diffusive” regime t À m/(βΓ), where Γ is the effective friction a monomer feels due

to the solvent bath. Implicitly, it has been assumed that the generalized friction coeffi-

cient matrix Γij(rN) describing the effective friction on monomer i due to hydrodynamic

interactions arising from monomer j is diagonal so that Γij(rN) ∼ δijI Γ(rN). This ap-

proximation amounts to assuming the hydrodynamic interactions are instantaneous[72],

and leads to a simple relation between the friction and the diffusion coefficient appearing

in Eq. 5.6

D(rN) =(kBT )2

Γ(rN). (5.7)


We furthermore simplify this analysis by assuming the diffusion coefficient for each

monomer is a constant independent of the configuration of the polymer. This assumption

is particularly drastic, since the interactions of the monomers with the solvent bath will

depend strongly on the configuration of the polymer in the vicinity of a given monomer

and thereby influence the friction on the monomer. It is expected that such effects are

minimized in an idealized solvent in which the bath particles interact with the monomers

on very short length scales so that local monomer shielding is negligible.

If the transitions between microstates are slow compared to the diffusive time scale

of the motion of the monomers, the dynamics of populations c(t) = {c1(t), . . . , cn(t)} of

the microstates is well-represented by a simple Markov model:

c(t) = K · c(t), (5.8)

where K is an ns × ns matrix of transition rates connecting the ns microstates. In the

next section, simple means of computing the rate constants composing the matrix K is

outlined.

5.2.1 First passage time approach to rate constants

The general problem of diffusive barrier crossing in asymmetric double well potentials

can be addressed by considering the barrier crossing as a two step reaction[73, 74]

A

k1

k−1

C

k2

k−2

B (5.9)

where state C is defined to be the region near x = xc. Writing first-order kinetic equations

for the two step reaction and assuming the steady state approximation dC/dt = 0 to


eliminate two of the rate constants, we find the effective rate equations

dA

dt= −kfA + krB (5.10)

dB

dt= −krB + kfA, (5.11)

where

k−1f = k−1

1 + k−1−2

ZA

ZB

k−1r = k−1

−2 + k−11

ZB

ZA

, (5.12)

where ZA and ZB are the equilibrium populations of A and B. To obtain Eq. 5.12, we have

used the detailed balance condition, kf/kr = ZB/ZA = (k1k2)/(k−1k−2). In equilibrium,

the relative populations of A and B are ZA/(ZA + ZB) and ZB/(ZA + ZB), respectively

and the relaxation for a system initially in state B obeys

NB(t) =ZB

ZA + ZB

+

(1− ZB

ZA + ZB

)e−(kf+kr)t, (5.13)

with characteristic relaxation time (kf + kr)−1.

The rate constants k1 and k−2 can be approximated by computing the first passage

time out of the stable wells to an absorbing state at x = xc. To compute k1, we assume

the particle starts at some position x = a in the A well and the probability P (x, t|a) of

finding the particle at position x at time t given that there is an absorbing trap at x = xc

is governed by

∂P (x, t|a)

∂t= D

∂

∂x

(∂

∂x+ βφ′(x)

)P (x, t|a). (5.14)

The absorbing boundary condition requires P (xc, t|a) = 0. From the definitions above,

the survival probability of the particle is given by Ps(t|a) = 1 − ∫ xc

0dxP (x, t|a), so the

absorption rate at xc is

f(t|a) =dPs(t|a)

dt= −

∫ xc

0

dx∂P (x, t|a)

∂t=

∫ xc

0

dx∂J(x, t|a)

∂x= J(xc, t|a), (5.15)


where J(x, t|a) is the flux J(x, t|a) = DdP (x, t|a)/dx + βDφ′(x)P (x, t|a) which satisfies

J(0, t|a) = 0 for a reflecting boundary at x = 0.

We assume that k−11 = 〈τfp(a)〉, where τfp(a) is the first passage time averaged over

the density f(t) given that the particle started at x = a, and 〈τfp(a)〉 is given by

〈τfp(a)〉 = Z−1A

∫ xc

0

dx e−βφ(x)τfp(x) = Z−1A

∫ ∞

0

dt

∫ xc

0

dx e−βφ(x)f(t|x)t. (5.16)

Integrating by parts and using the fact that tP (x, t|a) vanishes at t = 0 and t = ∞, we

find that

τfp(a) =

∫ xc

0

dx

∫ ∞

0

dt P (x, t|a) =

∫ xc

0

dx P (x|a), (5.17)

where P (x|a) is the time integral of P (x, t|a). Using Eq. 5.14, we find that P (x) obeys

the equation

∫ ∞

0

dt∂P (x, t|a)

∂t= −P (x, 0|a) = −δ(x− a) = D

∂

∂x

(∂

∂x+ βφ′(x)

)P (x|a). (5.18)

Integrating this equation from 0 to y yields

−H(y − a) = De−βφ(y) d

dy

(eβφ(y)P (y|a)

), (5.19)

where we have used the fact that the flux is zero at the origin due to the reflecting

boundary. Multiplying both sides of the equation above by exp{βφ(y)} and integrating

from x to xc yields the solution

P (x|a) = D−1e−βφ(x)

∫ xc

x

dy H(y − a)eβφ(y), (5.20)

where we have used the fact that P (xc|a) = 0. Inserting this expression into Eq. 5.17,

we find

τ(a) = D−1

∫ xc

0

dy

∫ y

0

dxH(y − a)eβφ(y)e−βφ(x)

= D−1

∫ xc

a

dy eβφ(y)

∫ y

0

dx e−βφ(x) = D−1

∫ xc

a

dycA(y)

ρA(y), (5.21)


so that

k−11 = 〈τ(a)〉 = D−1

∫ xc

0

dx ρA(x)

∫ xc

x

dycA(y)

ρA(y)

= D−1

∫ xc

0

dycA(y)

ρA(y)

∫ y

0

dx ρA(x) = D−1

∫ xc

0

dycA(y)2

ρA(y). (5.22)

Following a similar procedure for k−1−2, we obtain

k−1−2 = D−1

∫ ∞

xc

dy(1− cB(y))2

ρB(y), (5.23)

leading to the following expressions for the forward and reverse rate constants:

k−1f = D−1

∫ xc

0

dycA(y)2

ρA(y)+ D−1

∫ ∞

xc

dy(1− cB(y))2

ρB(y)

ZA

ZB

k−1r = D−1

∫ ∞

xc

dy(1− cB(y))2

ρB(y)+

ZB

ZA

D−1

∫ xc

0

dycA(y)2

ρA(y). (5.24)

It can be shown that these expressions coincide with the equations that can be found

through the spectral decomposition of a projected time evolution operator governing

the time dependence of a correlation function whose long-time limit yields expressions

for the rate constants. However, the above equations were obtained from much simpler

arguments.

5.2.2 Numerical test of microscopic rate expressions

A direct and simple way to verify that an adequate separation of time scale holds between

the time scale of relaxation within a well and the time scale of structural rearrangements

is to simulate the Smoluchowski dynamics of the populations under the appropriate

potential of mean force and verify that the population dynamics show exponential decay

with decay rate (kf + kr)−1 suggested in Eq. 5.13.

To this end, an initial non-equilbrium system in which all members of an ensemble

evolve from an initial state of conditional equilibrium in state B according to the effective

potential in Fig. 5.1 was simulated. The simulation was done using a Monte Carlo

procedure in which steps of magnitude ±∆x were attempted with equal probability and


accepted with probability min(1, ρ(xt)/ρ(x)), where x is the current state of the system

and xt = x±∆x is the trial configuration and ρ(x) is the probability density in Eq. 5.4.

The simulation was done under conditions with diffusion coefficient D = (∆x)2/∆t = 1,

so that the discrete system time evolved with ∆t = 1/∆x2. In the simulation results

shown in Fig. 5.2, ∆x = 0.01. As is clear from the results shown in Fig. 5.2, the decay of

the population of the unbound state is exponential and well described by the theoretical

predicted rate for the range of temperatures examined. The integrals in Eqs. (5.22)

and (5.23) for the intermediate rate constants k1 and k−2 were carried out numerically

using analytical fits to the cumulative distributions and densities, and it was found that

k−11 = 0.0287, and k−1

−2 = 1.532.

5.3 Markov model of configurational dynamics

Based on the considerations of the previous section, it is clear that under conditions of

reasonably low temperature β∗ ≥ 1, the dynamics of the fractional populations c(t) =

{c1(t), . . . , cn(t)} of the microstates is well-represented by a simple Markov model:

c(t) = K · c(t), (5.25)

with formal solution c(t) = eKtc(0), where K is an ns × ns matrix of transition rates

connecting the ns microstates. We recall that since the sum of the fractional populations

is one,∑

α cα(t) = 1, the diagonal elements of K are given by

Kαα = −∑

β 6=α

Kαβ, (5.26)

whereas the off-diagonal rates are

Kαβ =

kfαβ α > β

krαβ β > α,

(5.27)

where the forward and backward rates between states α and β are given by Eq. 5.24.


0 2 4 6 8 10 12 14Time

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Popu

latio

n U

nbou

nd S

tate

β* = 1

β* = 2

β* = 3

Population DynamicsSimulation versus Kramers Solution

Figure 5.2: The equilibration dynamics of the population of the unbound state based

on an ensemble size of 5000 random walkers. The solid lines correspond to simulation

results, and the dashed lines are analytical fits based on the Kramers first-passage time

solution for the potential of mean force in Fig. 5.1. The different curves correspond to

different effective temperatures β∗ = βε. Note that at the lowest effective temperature

β∗ = 3 (the bottom curve), the equilibrium bound population is significantly larger than

the unbound population.


Since it is assumed that there is a clear separation of time scale between reactive events

and the transient time scale of the evolution within a given configuration, the matrix

element Kαβ = 0 for any two states α and β which differ by more than a single bond.

This property means the ns × ns matrix is sparse, particularly for systems with a large

number of possible bonds. For larger systems, the equilibrium population of certain states

with small entropies compared to other configurations with the same number of bonds

is effectively zero over the entire temperature range, and such states are not considered.

From the form of the rate constants, it is expected that the rate of transitions from other

states to these rare states is relatively small and justifies their neglect.

A dynamical picture of the equilbration of an initial non-equilibrium ensemble of

states can be obtained by diagonalization of the matrix K to write

cα(t) = Qαβ e−λβt Q−1βγ cγ(0), (5.28)

where summation over repeated Greek indices is implied. In Eq. 5.28, the columns of

the transformation matrix Q contain the ns eigenvectors, which have eigenvalues {−λ}.Conservation of overall population guarantees that one of the eigenvalues λ0 = 0, with its

corresponding eigenvector ceq being the equilibrium populations. All other eigenvalues

are non-zero and negative (i.e. λn > 0 for n > 0). Thus, we can write

cα(t)− ceq =ns−1∑n=1

Qαn e−λnt Q−1nγ cγ(0), (5.29)

When the states are ordered so that α = 0 corresponds to the completely unbonded

state and α = 2nb − 1 = ns − 1 the state with the maximum number of bonds (hereafter

referred to as the “folded state”), we can monitor the equilibration of the population

of the folded state cf (t) = cns−1(t) starting from an initial population of completely

unbound state c = (1, 0, . . . , 0) by tracking

cf (t) = cf (t)− cf,eq =ns−1∑n=1

Qns−1,n e−λnt Q−1n0 , (5.30)

which has initial value cf (0) = −cf,eq. The relaxation profile cf (t) of a 15-monomer chain


0 5 10 15 20 25 30Time

-1

-0.8

-0.6

-0.4

-0.2

0

c f(t)

β* = 8

β* = 2

Folding Equilibration3 bond model

Figure 5.3: The equilibration dynamics of the folding state population cf (t) versus time

for a chain of 15 monomers for values of inverse effective temperature β∗ ranging from

β∗ = 2 (top curve, dotted lines) to β∗ = 8 (solid black line, bottom curve).

with a maximum of nb = 3 attractive bonds is shown in Fig. 5.3. For this system, there

are only ns = 8 possible microstates, and hence cf (t) may be written as a superposition

of 7 exponentials. At low temperatures (large β∗), the dynamics becomes essentially

independent of temperature as the forward rate constants kf become small and kr ≈ k−2

and independent of temperature (assuming a constant diffusion coefficient D). At low

temperatures, the first non-zero eigenvalues are nearly triply degenerate with a value of

λ1,2,3 ≈ 0.2, while the next set of three eigenvalues are roughly twice as large. Thus, the

dynamics is effectively single exponential, reflecting the equivalence of three relaxation

channels in which the fully folded state BF FJ JN can be reached from any of the three

precursor states with two bonds. At low temperatures, the dynamics roughly consists of

the system falling down a series of steps from the fully unbonded state to the fully folded

(bonded) state with little back-reaction (i.e. climbing back up the steps), and there are

multiple stairways leading to the folded state.


0 50 100 150 200 250 300Time

-1

-0.8

-0.6

-0.4

-0.2

0

c f(t)

Folding Dynamics8 bond model

Figure 5.4: The equilibration dynamics of the folding state population cf (t) versus time

for a chain of 25 monomers for values of inverse effective temperature β∗ ranging from

β∗ = 2 (top curve, dotted lines) to β∗ = 8 (solid black line, bottom curve).

The dynamics of longer chains is much more interesting due to the possibility of form-

ing longer bonds between distant residues that mimics the folding of secondary structural

elements into tertiary structures. In Fig. 5.4, the relaxation profile of a 25-monomer chain

is plotted versus time for a range of inverse temperatures. The dynamics of the larger

system, with 25 monomers, nb = 8 attractive bonds and ns = 256 microstates, is much

more complex than the smaller system since cf (t) is now a combination of 255 expo-

nentials and can acquire characteristics of a stretched exponential frequently observed in

glassy systems with frustration. Note that the folding time of the 25 monomer system is

typically longer than in the shorter chain, even at high temperatures. At high tempera-

tures, the smallest nonzero eigenvalue is doubly degenerate, with λ(25)1,2 ≈ 0.01, two orders

of magnitude smaller than the value of roughly 0.2 observed in the 15-monomer chain.

As the temperature decreases, the relaxation profile becomes more complex as more

eigenmodes contribute to the relaxation at longer time scales, leading to a stretched-


exponential appearance. The folding time clearly increases as the temperature is lowered

at intermediate values of the temperature 1 ≤ β∗ ≤ 4. However, below this temperature

regime the equilibration profile simplifies to a characterstic single exponential form with

a shorter overall folding time. Once again, at low effective temperatures β∗ ≥ 6, the

relaxation profile becomes independent of temperature and roughly single exponential.

The behavior of the profile cf (t) as the temperature is lowered can be understood in

terms of the number of relaxation modes or “folding pathways” that contribute to the

evolution of the microstate populations. At intermediate temperatures, many modes are

connected to one another since the forward rate constants kf describing the rate of escape

from a bonding well are large enough to allow rapid formation and loss of bonds as the

system equilibrates. However, as the temperature is lowered, the forward rate constants

become small and the relaxation proceeds as a sequence of steps of “falling” down the

steps in the free energy landscape. Once again, at low temperatures we find that kf ≈ 0

and kr ≈ k−2, leading to temperature independent dynamics.

In summary, we observe that even though the relaxation profile of longer chains in

model B is much more complicated than that of shorter chains (such as the 15 monomer

system), both folding profiles appear to be single exponential and independent of tem-

perature in the low temperature, high β regime, β∗ ≥ 6. It should be noted that the

overall shape of the free energy landscape of the model B protein-like chain system is

quite funnel-like especially for shorter chains. It is likely it is the smoothness of the funnel

that leads to relatively simple folding dynamics. It would be quite interesting to study

chains longer than 29 beads or to modify the model to examine the effect of long-lived

metastable misfolded states on the qualitative nature of the dynamics.

Chapter 6

Conclusions, Summary and Future

Work

Simple models of a protein-like chain were constructed to investigate the free energy

landscape of a system possessing features of bimolecular systems. Using a combination

of Parallel Tempering (PT) and Discontinuous Molecular Dynamics (DMD), the free

energy landscape of these models of protein-like chains with and without solvent were

investigated.

6.1 Free Energy Landscape in the Absence of a sol-

vent

Simple models of a protein-like chain were used not to capture very detailed behavior

of proteins, which is not possible because of the simplicity of the models, but rather to

capture the basic behavior of proteins and to observe different phases of proteins in a

relatively short computational time. Finding simple models that can be applied to study

the qualitative behavior of proteins without becoming computationally prohibitive is an

important step in understanding the dynamics of protein folding.

118

Chapter 6. Conclusions, Summary and Future Work 119

Two models were presented, called model A and model B. Model A and model B

differ primarily in the number of attractive interactions between beads (illustrated in

Fig. 3.2). Fewer bonding interactions are present in model B, leading to a system with

less frustration and a free energy landscape that possesses fewer local minima and a less

compact folded structure at low temperatures. The secondary structure of an alpha helix

can be observed clearly in model B. In model B, for chains longer than 17 beads, the

most common structure at low temperatures is a collapsed structure in which there are

bond(s) between the two ends of the chain as well as the bond(s) between different layers

of helix. For chains shorter than 18 beads the secondary structure of alpha helix can be

observed without any tertiary structure.

It was shown that for model B, the free energy landscape of the 25-bead chain has a

smooth funnel that has important effects on both the dynamics and the thermodynamics

of the system. In this model, the free energy landscape at low temperatures contains a

deep point with several minima around it located inside one basin. As the temperature

decreases, the deepest point of the funnel becomes deeper, while the minima around the

deepest point become shallower. This trend continues until a temperature is reached in

which all local minima in the free energy landscape have vanished and only a single global

minimum exists. In contrast to Model B, Model A not only takes more time to simulate,

but does not exhibit a preference for a specific native structure at low temperatures.

This may be attributed to several factors such as the lack of rigidity of the chain in this

model, several large entropic barriers, and the possibility of having many structures with

the same energy.

It should be mentioned that before settling on models A and B, more than ten similar

models were considered and their free energy landscapes were investigated. For instance,

in one of the studied models attractions were allowed between any bead with index i and

with index i + 4n, where n cannot be 2 or 3. This means that similar to model B, beads

that are separated along the chain by eight or twelve beads do not attract one another,


and similar to model A, all kinds of beads can be involved in an attractive bond. Since

the results for this model were similar to those for model A, this model was not presented

here.

In order to elucidate the free energy landscape, the PT method was applied and found

to be reasonably effective. However, the suitability of the PT algorithm in studying the

free energy landscape is questionable when studying the wide range of temperatures due

to slow convergence properties. Using too many replicas in the PT method usually causes

problems in getting an efficient PT dynamics. Therefore, studying the landscape at low

temperatures is challenging for some of the models that have very complex free energy

landscapes.

In combination with PT, DMD was used to sample the configurational space available

to the system. Using DMD increased the complexity of the algorithm since all events

must be processed sequentially and efficiently. However, the difficulty of implementing

the dynamics was well worth the effort, since by using DMD it was possible to run systems

with more than 200 replicas for nearly 8 microseconds, processing almost 1010 collision

events in less than 48 CPU hours.

In the absence of solvent, it was shown that the relative configurational entropy is

temperature independent. Hence, using the populations of the configurations at different

temperatures, the relative free energy and entropy of any pair of configurations can

be calculated. From the Helmholtz free energies of different structures at the studied

temperatures, the populations of all configurations at any temperature were predicted

and verified against simulation results. These results agree reasonably with the simulation

results, which shows one of the great advantages of using discontinuous potentials in

studying the free energy landscape.

For model B, short chains with 15 and 20 beads and longer chains with 29, 30 and 35

beads were investigated. For chains shorter than 30 beads, one finds the probability of the

most common structure at low temperatures approaches one similar to the 25-bead chain,


which suggests that the free energy landscape has a deep global minimum inside a funnel

at low temperatures. However, for chains longer than 29 beads the structure satisfying all

possible attractive bonds is geometrically impossible and the entropic barriers between

the configurations with different energies become larger. Hence, at low temperatures

the energy landscape for chains smaller than 30 beads consists mainly of a funnel in

which the lowest energy configuration corresponds to the deepest point, and there are

several local minima with direct access to the lowest energy structure located around the

deepest point. By decreasing the temperature these local minima become shallower and

consequently, the funnel becomes steeper. However, for chains longer than 29 beads the

landscape at low temperatures consists of a few funnels relatively close to each other,

which are shallower than the funnel observed for chains smaller than 30 beads.

The observed landscape can provide insight into the shape of the landscape of actual

proteins. While for small chains the native structure seems to be the lowest free energy

structure, the existence of several distinct funnels in the landscape of long chains suggests

the possibility that the native structure of real proteins is not necessarily the lowest free

energy structure but may correspond to a configurational basin that can be accessed

easily during the folding dynamics. Another factor that should be considered for long

proteins is the important effect of temperature on the morphology of the landscape. In

our study, the basin containing the global minimum becomes steeper as the temperature

decreases for short chains. However, for longer chains, the basin becomes steeper while the

deepest point of the landscape can shift from one configuration to another configuration

with slightly different bonds over the same temperature range. Thus, for some of the

long proteins, the structure may be more sensitive to temperature fluctuations and by

slightly changing the temperature the thermodynamically stable configuration can shift

to a configuration that differs substantially.


6.2 Free Energy Landscape for a Chain Solvated by

a Square-Well Fluid

The free energies of different configurations (i.e., the free energy landscape) of a protein-

like chain in a solvent at different temperatures were also investigated. Qualitatively,

the behavior of a protein-like chain inside a square-well solvent is similar to the behavior

in the absence of a solvent. For the 15-bead chain, the lowest free energy configuration

was found to be an alpha helix that becomes dominant at low temperatures as in the

absence of solvent. The free energy landscape of the 15-bead chain at low temperatures

consists of a funnel with a very deep global minimum and few local minima around it. By

lowering the temperature, the global minimum becomes deeper while the others become

shallower and consequently, the funnel becomes steeper.

For larger chain lengths, in particular for ` = 20 and ` = 25, a phase transition of the

square-well solvent effectively puts a lower bound on the temperature range accessible in

the simulations. The observed phase transition temperature coincides roughly with the

temperature at which previous studies observed a liquid-vapor coexistence line. Inves-

tigating the free energy landscape of a solvated system over a phase transition point of

the solvent can be very challenging using the PT method, especially for larger systems.

For 20-bead and 25-bead chains the effects of the phase transition become more apparent

because of the larger number of particles in comparison with the 15-bead chain. Conse-

quently, the temperature range studied here could not be extended below the (effective)

phase transition temperature for 20-bead and 25-bead chain systems. This difficulty is

not easy to overcome, since it is related to the efficiency of the PT algorithm itself near

the phase transition point. Substantial computational resources, over a million cpu hours,

were used to obtain the results presented here, which were mainly consumed to obtain

the best set of parameters for the PT runs. As a result of the considerable computational

demand of computing the free energy of the solvated system below the phase transition


point, a direct comparison with the results in the absence of a solvent could not be done

for the whole range of temperatures for 20-bead and 25-bead chain systems. However,

it is expected that for both 20-bead and 25-bead chain systems the configuration with

the lowest configurational energy becomes dominant at lower temperatures, since their

systems energy seem to be the lowest ones at very low temperatures, which is mainly

due to their lowest configurational energy and the hydrophobicity of 75 % of protein-like

chain beads (having only hard-core repulsive interactions with solvent particles).

The relative configurational entropies could not be calculated here due to temperature

dependent averages over solvent degrees of freedom, and a large number of sampled

configurations is needed to reduce the statistical error of computed values of the system

energy and entropy. For example, even to decrease the statistical error in the system

entropy of the 15-bead chain by a factor of 10, which may be not sufficient, 100 times

more samples are needed, which here means around half a million cpu hours. For larger

systems, the statistical errors are even larger as they scale with the size of the system

and the calculations become much more expensive.

While for the 15-bead chain the lowest energy configuration is an unfolded alpha-

helix without any specific tertiary structure, for longer chains because of the bonds

between different layers of the helix, such as the bond between two ends of the chain,

the lowest energy structure is a folded structure. Our study showed that for longer

chains the entropic barrier for making bonds between the two ends of the chain is larger

than the change in entropy associated with the bonds necessary for forming a helix

structure. As a consequence, the unfolded helix structure is dominant for a relatively

wide range of temperature until the low temperature regime where the folded helix is

favored. Therefore, similar to the absence of a solvent, the effect of temperature on the

morphology of the landscape is more apparent for the longer chains.

Similar to the absence of the solvent, it was confirmed that the model B has a proper

criteria for studying real proteins. The number of bonds is sufficient to generate common


secondary structure of alpha helix and tertiary structure of the folded structure. The

models that have more possible bonds, such as model A introduced in chapter three, are

much more expensive than model B while they are not necessarily better in representing

real proteins. The lack of attractive bonds between beads that are separated by eight or

twelve beads makes the chain more rigid, and this restriction was successful in reproducing

some of the effects of dihedral angles interactions and side chains in real proteins.

One of the major differences between a protein model in a solvent and without a

solvent is the effects of mainly repulsive interactions of the protein-like chain beads with

the solvent particles in the folding process. Only 25% of the beads can make attractive

bonds with the solvent, while the rest of the beads only have repulsive interactions with

the solvent, which makes most of the beads hydrophobic. Because of the restriction

effects of the repulsive interactions, the entropy range (system entropy difference of the

maximum and minimum number of bonds configurations) is smaller in comparison to the

absence of a solvent. Because of the smaller entropy range, the landscape shows funnel

behavior at higher temperatures in comparison to the absence of a solvent.

6.3 Simple Dynamics Using Smoluchowski Equation

In Chapter 5, it was shown that one of the possible avenues for future research is to inves-

tigate the dynamics of the folding transition, instead of only studying the free energies.

Using some earlier studies, where some simple connections between energy landscapes

and protein folding kinetics are provided [75, 76], we applied Smoluchowski and Kramer’s

equations to study the dynamics. By considering the distance between any two beads

that can make a bond as a reaction coordinate and by using the populations of the con-

figurations at each distance segment, the potential of mean force (Helmholtz free energy)

versus the reaction coordinate (distance of the two beads that can make a bond) were

calculated.


It was shown through simulation of a stochastic model of the evolution of the system

that the dynamics of transitions between microstates of the chain is well described by the

first-passage time solution of the Smoluchowski equation. Applying this equation for all

the possible bonds of the 15-bead and 25-bead chains, we calculated the relaxation matrix

K for both of the chains, where Kij is the rate constants for the transition from microstate

j to microstate i. We investigated the equilibration process from an ensemble of initially

extended configurations to mainly folded configurations at low effective temperatures. We

observed that while the relaxation profile of the 25-bead chain in model B is much more

complicated than that of the 15-bead chain, both folding profiles appear to be single

exponential and independent of temperature in the low temperature, high β regime,

β∗ ≥ 6. It should be noted that the overall shape of the free energy landscape of the

model B protein-like chain system is quite funnel-like especially for shorter chains. For

the chains shorter than 30 beads in model B, the funnels are smooth and regular and free

of “mis-folded states”, corresponding to alternate funnels well-removed from the funnel

leading to the native state. It is likely it is the smoothness of the funnel that leads to

relatively simple folding dynamics.

6.4 Future work

The main problem in studying the protein-like chain inside a solvent is the slow conver-

gence of estimates of the free energy of configurations using the PT method. For example,

the presence of phase transition in the square-well fluids leads to large statistical errors

in the PT method. To overcome the effects of the phase transition on sampling, the

PT method should be enhanced by incorporating other techniques, such as the umbrella

sampling[77]. Another solution for this problem is to use different parameters for the

square-well liquid such that the phase transition temperature lies outside the tempera-

ture range of interest. Since in this work, the phase transition was observed at the same


temperature at which previous studies predict the liquid-vapor coexistence line for the

density of ρ∗ = 0.5, another set of parameters could be used for which, according to the

previous studies, no phase transition occurs inside the studied range of temperatures.

According to Ref. [61], by increasing the ratio λ = σ′/σ, the liquid-vapor coexistence line

shifts to higher temperatures for the density of ρ∗ = 0.5. For example, for λ = 2.0 the

liquid-vapor coexistence line is crossed at a temperature around T ∗l = 2.4 for ρ∗ = 0.5

[61, 60, 78], which is very close to the highest studied temperature (T ∗l = 2.5).

While the models used here are too simple to represent specific protein, they still

can describe the general behavior of an alpha helix. It is possible to extend this project

by using more complex models that can be done in several different ways. One possible

extension is to define 20 different beads, representing 20 different amino acids, instead of

the current four kinds and try to define their interactions based on the characteristics of

real amino acids.

Another possible extension is to use a model in which the bonds are defined using

experimental results for the folded structure of a specific protein, where a possible bond

is defined only if there is a hydrogen bond between the two residues of a specific protein

in its natural folded state.

The model can also be extended by using some of the studies on applying network

motifs in understanding hydrogen bonds patterns in proteins[79, 80, 81]. Based on these

studies, there are some common hydrogen-bond patterns in proteins that can be presented

by graphs with relatively small number of nodes, which means that “surprisingly, very

few parameters are needed to define the hydrogen-bond motifs”[81]. It is possible to use

these patterns to define the attractive interactions between chain beads.

Another possible extension is to use several beads for modeling amino acids. This

means that each atom (or a group of atoms) in an amino acid can be represented by a

bead. However, because of the significant increase in the cost of simulation, applying

this extension may not be worthwhile.


It would be quite interesting to apply Smoluchowski equation to study the dynamics

of chains longer than 29 beads or to modify the model B to examine the effect of long-lived

metastable misfolded states on the qualitative nature of the dynamics.

Appendices

128

Appendix A

Heat Capacity and Compressibility

A.1 Heat Capacity

The heat capacity is related to the fluctuation of energy in the canonical ensemble ac-

cording to:

Cv =∂U

∂T= − 1

kb T 2

∂U

∂β= − 1

kb T 2

−∂ ∂ ln Z∂β

∂β=

1

kb T 2

∂2 ln Z

∂β2=

1

kb T 2〈(U − U)2〉, (A.1)

where Cv, U , T, kb, β and Z are the heat capacity at constant volume, internal en-

ergy, temperature, Boltzmann’s constant, 1/kbT, and the canonical partition function

respectively.

A.2 Compressibility

Since the number of particles is fixed in the both canonical and microcanonical ensembles,

the compressibility can be related to the number variation only in the grand canonical

ensemble. Number of particles can be presented as:

N = 〈N〉 =1

Zβ

∂Z

∂µ=

1

β

∂

∂µln Z, (A.2)

where N , µ and Z are the average number, chemical potential and the grand canonical

partition function. Then the fluctuation in the number of particles in volume V can be

129

Appendix A. Heat Capacity and Compressibility 130

written as:

〈N2〉 − 〈N〉2 =1

Zβ2

∂2Z

∂µ2−

[1

Zβ

∂Z

∂µ

]2

=1

β2

∂2lnZ

∂µ2=

1

β2

∂(β〈N〉)∂µ

= kBT

(∂N

∂µ

)

TV

.(A.3)

Using the Gibbs-Duhem equation, Ndµ = V dp− SdT , it can be shown that:

−N2

V

(∂µ

∂N

)

TV

= V

(∂p

∂V

)

TN

⇒(

∂N

∂µ

)

TV

=N2

VκT , (A.4)

where κT = − 1V

(∂V∂p

)TN

. Therefore,

〈N2〉 − 〈N〉2 =N2

VkBTκT . (A.5)

Appendix B

Temperature sets in PT

B.1 In the absence of solvent

For model A, ∆β∗ varies with β∗ from ∆β∗ = 1.5 for the highest temperatures, to

∆β∗ = 0.375 for the lowest temperatures. More specifically:

β∗i =

32(i + 1) if i ≤ 5

i + 4 if 5 ≤ i ≤ 20

34(i + 12) if 20 ≤ i ≤ 36

35(i + 24) if 36 ≤ i ≤ 51

12(i + 39) if 51 ≤ i ≤ 63

38(i + 73) if 63 ≤ i ≤ n

(B.1)

The inverse temperature sets were established by trial and error, and are not unique.

The most important property of this set is that subsequent temperature differences vary

smoothly with temperature. The larger ∆β∗ at high temperature allows one to reach

lower temperatures T ∗n without having to add too many replicas.

For model B, the temperatures set is simpler, since a wide range of temperatures is

not required to study the landscape. Since the used ∆β∗ is very small, ∆β∗ can be taken

131

Appendix B. Temperature sets in PT 132

to be uniform for this model, with

β∗i =3

2

(i

10+ 1

). (B.2)

Bibliography

[1] E. Shakhnovich. Protein folding thermodynamics and dynamics: where physics,

chemistry, and biology meet. Chem. Rev., 106(5):1559–1588, 2006.

[2] Editorial. So much more to know. Science, 309(5731):78–102, 2005.

[3] Leonor Cruzeiro-Hansson and Paulo A.S. Silva. Protein folding : thermodynamic

versus kinetic control. Journal of Biological Physics, 27:S6S8, 2001.

[4] Martin Karplus. The levinthal paradox: yesterday and today. Fold. Des., 2:69–75,

1997.

[5] Yaoqi Zhou and Martin Karplus. Folding thermodynamics of a model three-helix-

bundle protein. Proc. Natl. Acad. Sci. USA, 94(26):14429–14432, 1997.

[6] Oleg B. Ptitsyn. How the molten globule became. Trends in Biochem. Sci.,

20(9):376–379, 1995.

[7] Michel E. Goldberg. The second translation of the genetic message: protein folding

and assembly. Trends in Biochemical Sciences, 10(10):388–391, 1985.

[8] P. J. Thomas, B. H. Qu, and P. L . Pederson. Defective protein folding as a basis

of human disease. Trends Biochem Sci., 20(11):456–459, 1995.

[9] E. Haber and C. B. Anfinsen. Side-chain interactions governing the pairing of half-

cystine residues in ribonuclease. The Journal of Biological Chemistry, 237:1839–

1844, 1962.

133

Bibliography 134

[10] Alexander Schug, Thomas Herges, Abhinav Verma, and Wolfgang Wenzel. Inves-

tigation of the parallel tempering method for protein folding. J. Phys.: Condens.

Matter, 17:S1641–S1650, 2005.

[11] Christian Anfinsen. Principles that govern the folding of protein chains. Science,

181(4096):223–230, 1973.

[12] Ken A. Dill. Folding proteins: finding a needle in a haystack. Curr. Opinion Struct.

Biol., 3(1):99–103, 1993.

[13] Venkataramanan Soundararajan, Rahul Raman, S. Raguram, V. Sasisekharan, and

Ram Sasisekharan. Atomic interaction networks in the core of protein domains and

their native folds. PLoS ONE, 5(2):e9391, 2010.

[14] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes. Funnels, pathways,

and the energy landscape of protein folding: A synthesis. Proteins:Struct. Funct.

Genet., 21(3):167195, 1995.

[15] C. Levinthal. Are there pathways for protein folding. J. Chim. Phys., 65:4445, 1968.

[16] Sridhar Govindarajan and Richard A. Goldstein. On the thermodynamic hypothesis

of protein folding. Proc Natl Acad Sci U S A., 95(10):55455549, 1998.

[17] Ken A Dill, S Banu Ozkan, Thomas R Weikl, John D Chodera, and Vincent A Voelz.

The protein folding problem: when will it be solved? Current Opinion in Structural

Biology, 17(3):342–346, 2007.

[18] A. Sali, E. Shakhnovich, and M. Karplus. How does a protein fold? Nature,

369(6477):248–251, 1994.

[19] P. S. Kim and R. L. Baldwin. Intermediates in the folding reactions of small proteins.

Annual Review of Biochemistry, 59:631–660, 1990.

Bibliography 135

[20] H. Roder H and W. Coln. Kinetic role of early intermediates in protein folding.

Current Opinion in Structural Biology, 7(1):15–28, 1997.

[21] H. Frauenfelder, F. Parak, and R. D. Young. Conformational substates in proteins.

Annu. Rev. Biophys. Biophys. Chem., 17:451–479, 1988.

[22] T.C.B. McLeish. Protein folding in high-dimensional spaces: Hypergutters and the

role of nonnative interactions. Biophys. J., 88(1):172–183, 2005.

[23] P. E. Leopold, M. Montal, and J. N. Onuchic. Protein folding funnels: A ki-

netic approach to the sequence-structure relationship. Proc. Natl. Acad. Sci. USA,

89:87218725, 1992.

[24] P. G. Wolynes, J. N. Onuchi, and D. Thirumulai. Navigating the folding routes.

Science, 267:16191620, 1995.

[25] J. N. Onuchic, P. G. Wolynes, Z. Luthey-Schulten, and N. D. Socci. Toward an

outline of the topography of a realistic protein-folding funnel. Proc. Natl. Acad. Sci.

USA, 92(8):3626–3630, 1995.

[26] Jose Nelson Onuchic, Nicholas D. Socci, Zaida Luthey-Schulten, and Peter G

Wolynes. Protein folding funnels: the nature of the transition state ensemble. Fold

Des., 1(6):441–450, 1996.

[27] Ken A. Dill and Hue Sun Chan. From levinthal to pathways to funnels. Nature

Structural Biology, 4:10–19, 1997.

[28] Nicholas D. Socci, Jose Nelson Onuchic, and Peter G. Wolynes. Protein folding

mechanisms and the multidimensional folding funnel. Proteins, 32(2):136–158, 1998.

[29] Brian C. Gin, Juan P. Garrahan, and Phillip L. Geissler. The limited role of nonna-

tive contacts in the folding pathways of a lattice protein. J. Mol. Biol., 392(5):1303–

1314, 2009.

Bibliography 136

[30] Michael Springborg. Chemical Modelling. Royal Society of Chemistry, 2010.

[31] Themis Lazaridis and Martin Karplus. ”new view” of protein folding reconciled with

the old through multiple unfolding simulations. Science, 278:1928–1931, 1997.

[32] S. B. Prusiner. Novel proteinaceous infectious particles cause scrapie. Science,

216(4542):136–144, 1982.

[33] S. B. Prusiner. Molecular biology of prion diseases. Science, 252(5012):1515–1522,

1991.

[34] Reinat Nevo, Vlad Brumfeld, Ruti Kapon, Peter Hinterdorfer, and Ziv Reich. Direct

measurement of protein energy landscape roughness. EMBO Rep., 6(5):482–486,

2005.

[35] Nikolay V. Dokholyan, Sergey V. Buldyrev, H Eugene Stanley, and Eugene I.

Shakhnovich. Discrete molecular dynamics studies of the folding of a protein-like

model. Fold Des., 3(6):577–587, 1998.

[36] D. C. Rapaport. The art of molecular dynamics simulation. Cambridge University

Press, Cambridge, 2nd edn. edition, 2004.

[37] Lisandro Hernandez de la Pena, Ramses van Zon, Jeremy Schofield, and Sheldon B.

Opps. Discontinuous molecular dynamics for semi-flexible and rigid bodies. J. Chem.

Phys., 126(7):074105, 2007.

[38] Y. Zhou, M. Karplus, J.M. Wichert, and C.K. Hall. Equilibrium thermodynamics

of homopolymers and clusters: molecular dynamics and monte-carlo simulations of

system with square-well interactions. J. Chem. Phys., 107(24):10691–10708, 1997.

[39] Derek N. Woolfson, Alan Cooper, Margaret M. Harding, Dudley H. Williams, and

Philip A. Evans. Protein folding in the absence of the solvent ordering contribution

to the hydrophobic interaction. J. Mol. Biol., 229(2):502–511, 1993.

Bibliography 137

[40] Manoj V. Athawale, Gaurav Goel, Tuhin Ghosh, Thomas M. Truskett, and Shekhar

Garde. Effects of lengthscales and attractions on the collapse of hydrophobic poly-

mers in water. Proc. Natl. Acad. Sci. USA, 104(3):733–738, 2007.

[41] Sowmianarayanan Rajamani, Thomas M. Truskett, and Shekhar Garde. Hydropho-

bic hydration from small to large lengthscales: Understanding and manipulating the

crossover. Proc. Natl. Acad. Sci. USA, 102(27):9475–9480, 2005.

[42] Robert H. Swendsen and Jian-Sheng Wang. Replica monte carlo simulation of spin

glasses. Phys. Rev. Lett., 57(21):2607–2609, 1986.

[43] C. J. Geyer. Markov chain monte carlo maximum likelihood. In Proceedings of the

23rd Symposium on the Interface: Computing Science and Statistics, pages 156–163,

1991.

[44] M. C. Tesi, E. J. Janse van Rensburg, E. Orlandini, and S. G. Whittington. Monte

carlo study of the interacting self-avoiding walk model in three dimensions. J. Statist.

Phys., 82(1-2):155–181, 1996.

[45] David J. Earl and Michael W. Deem. Parallel tempering: Theory, applications, and

new perspectives. Phys. Chem. Chem. Phys.,, 7:3910–3916, 2005.

[46] Kurt Binder and Dieter W. Heermann. Monte Carlo Simulation in Statistical

Physics: An Introduction. Springer, 2010.

[47] Jeremy Schofield and Ramses van Zon. Class notes chm1464h: Foundations of mole-

cular simulation. http://www.chem.toronto.edu/staff/JMS/simulation/notes.html,

2008.

[48] Daan Frenkel and Berend Smit. Understanding Molecular Simulation, Second Edi-

tion: From Algorithms to Applications. Academic Press, 2002.

Bibliography 138

[49] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller.

Equations of state calculations by fast computing machines. Journal of Chemical

Physics, 21(6):10871092, 1953.

[50] S. Toxvaerd. Energy conservation in molecular dynamics. Journal of Computational

Physics, 52:214–216, 1983.

[51] Loup Verlet. Computer “experiments” on classical fluids. i. thermodynamical prop-

erties of lennard-jones molecules. Phys. Rev., 159(1):98103, 1967.

[52] Simon Duane, A. D. Kennedy, Brian J. Pendleton, and Duncan Roweth. Hybrid

monte carlo. Phys. Lett. B, 195(2):216–222, 1987.

[53] S. B. Opps and J. Schofield. Extended state-space monte carlo methods. Phys. Rev.

E, 63(5 Pt 2):056701, 2001.

[54] William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel

Programming with the Message-Passing Interface. MIT Press, Cambridge, MA,

1994.

[55] A. Bellemans, J. Orban, and D. van Belle. Molecular dynamics of rigid and non-rigid

necklaces of hard discs. Mol. Phys., 39(3):781–782, 1980.

[56] Michel Daune. Molecular biophysics : structures in motion. Oxford University Press,

Oxford ; New York, 1999.

[57] H. Taketomi, Y. Ueda, and N. Go. Studies on protein folding, unfolding and fluctua-

tions by computer simulation. i. the effect of specific amino acid sequence represented

by specific inter-unit interactions. Int. J. Pept. Protein Res., 7(6):445–459, 1975.

[58] N. Go and H. Abe. Noninteracting local-structure model of folding and unfolding

transition in globular proteins. i. formulation. Biopolymers, 20(5):991–1011, 1981.

Bibliography 139

[59] Da-Wei Li and Rafael Bruschweiler. In silico relationship between configurational

entropy and soft degrees of freedom in proteins and peptides. Phys. Rev. Lett.,

102(11):118108, 2009.

[60] Jayant K. Singh, David A. Kofke, and Jeffrey R. Errington. Surface tension and

vaporliquid phase coexistence of the square-well fluid. J. Chem. Phys., 119(6):3405–

3412, 2003.

[61] P. Orea, Y. Duda, V. C. Weiss, W. Schrer, and J. Alejandre. Liquidvapor interface

of square-well fluids of variable interaction range. J. Chem. Phys., 120(24):11754–

11764, 2004.

[62] E. Schll-Paschinger, A. L. Benavides, and R. Castaeda-Priego. Vapor-liquid equilib-

rium and critical behavior of the square-well fluid of variable range: A theoretical

study. J. Chem. Phys., 123(23):234513, 2005.

[63] I Guillen-Escamilla1, M Chavez-Paez1, and R Castaneda-Priego. Structure and

thermodynamics of discrete potential fluids in the ozhmsa formalism. J. Phys.:

Condens. Matter, 19(8):086224, 2007.

[64] G. Orkoulas and A. Z. Panagiotopoulos. Phase behavior of the restricted primitive

model and square-well fluids from monte carlo simulations in the grand canonical

ensemble. J. Chem. Phys., 110:1581–90, 19999.

[65] J. Richard Elliott and Liegi Hu. Vapor-liquid equilibria of square-well spheres. J.

Chem. Phys., 110(6):3043–3048, 1999.

[66] F. Del Rio, E. Avalos, R. Espindola, L. F. Rull, G. Jackson, and S. Lago. Vapourliq-

uid equilibrium of the square-well fluid of variable range via a hybrid simulation

approach. Mol. Phys., 100(15):2531–2546, 2002.

Bibliography 140

[67] L. Vega, E. de Miguel, L. F. Rull, G. Jackson, and I. A. McLure. Phase equilibria

and critical behavior of square-well fluids of variable width by gibbs ensemble monte

carlo simulation. J. Chem. Phys., 96:2296–2305, 1992.

[68] A. Lang, G. Kahl, C. N. Likos, H. Lowen, and M. Watzlawek. Structure and ther-

modynamics of square-well and square-shoulder fluids. J. Phys.: Condens. Matter,

11(50):1014310161, 1999.

[69] Sheldon B. Opps and Jeremy Schofield. Extended state-space monte carlo methods.

Physical Review E, 63(5):056701, 2001.

[70] R. van Zon and J. Schofield. Constructing smooth potentials of mean force, radial

distribution functions and probability densities from sampled data. J. Chem. Phys.,

132:154110, 2010.

[71] J. Schofield and I. Oppenheim. The hydrodynamics of inelastic granular systems.

Physica A, 196:209–240, 1993.

[72] J. Schofield, A. H. Marcus, and S. A. Rice. The dynamics of quasi two dimensional

colloidal suspensions. J. Phys. Chem., 100:18950–18961, 1996.

[73] A. Szabo, K. Schulten, and Z. Schulten. First passage time approach to diffusion

controlled reactions. J. Chem. Phys., 72:4350–4357, 1980.

[74] K. Schulten, Z. Schulten, and A. Szabo. Dynamics of reactions involving diffusive

barrier crossing. J. Chem. Phys., 74:4426, 1981.

[75] D. J. Bicout and A. Szabo. Entropic barriers, transition states, funnels, and expo-

nential protein folding kinetics: a simple model. Protein Sci., 9(3):452465, 2000.

[76] Peter Hamm, Jan Helbinga, and Jens Bredenbecka. Stretched versus compressed

exponential kinetics in a-helix folding. Nonequilibrium Dynamics in Biomolecules,

323(1):54–65, 2006.

Bibliography 141

[77] G. M. Torrie and J. P. Valleau. Nonphysical sampling distributions in monte carlo

free-energy estimation: Umbrella sampling. Journal of Computational Physics,

23(2):187–199, 1977.

[78] Enrique de Miguel. Critical behavior of the square-well fluid with λ=2: A finite-

size-scaling study. Phys. Rev. E, 55(2):13471354, 1997.

[79] Ofer Rahat, Uri Alon, Yaakov Levy, and Gideon Schreibe. Understanding

hydrogen-bond patterns in proteins using network motifs. Structural bioinformatics,

25(22):29212928, 2009.

[80] T. Prasad, T. Subramanian, S. Hariharaputran, H.S. Chaitra, and N. Chandra.

Extracting hydrogen-bond signature patterns from protein structure data. Appl

Bioinformatics, 3(2-3):125–35, 2004.

[81] M. C. Etter, J. C. MacDonald, and J. Bernstein. Graph-set analysis of hydrogen-

bond patterns in organic crystals. International Union of Crystallography, 46(2):256–

262, 1990.

by hanif bayat movahed a thesis submitted in conformity with the … · 2012. 11. 2. · hanif...

Documents