university of geneva department of theoretical physics the biological physics of protein folding:...

University of GenevaDepartment of Theoretical Physics

The The Biological Physics of Biological Physics of PProtein Folding: rotein Folding:

the Random Energy Model and Beyondthe Random Energy Model and Beyond

Péter HANTZ

Outline of the lecture:

1. Modeling disordered systems• Spin glasses, frustration, Random Energy Model

2. Proteins: Building Elements and Structure • Primary, Secondary and Tertiary Structure, Classification

3. The Problem of Protein Folding • Anfinsen Experiment, Levinthal Paradox

• A Microscopic model: RHP • Phenomenological models: Gō, REM • Sequence Design, Minimal Frustration • Kinetics: Funnel Hypothesis, Nucleation, Reaction Coordinate

What is a spin glass? • interacting system of spins • low-temperature: frozen in random orientations

What is necessary for this? • (at least partially) random interactions • competing interactions

Simple Model Hamiltonians:Sherrington-Kirkpatrick Model

P-spin model

Distribution of coupling constants:2

20...1

1

21

21

2

)/(

2...

...21

..1

...

)(

2)(

})({

})({

J

NJJ

ipi

ipii

ipii

iiIip

ji

jipairsall

ijSK

ipi

p

eJ

NJP

SSSJSH

SSJSH

Spin Glasses

Frustration…

• no configuration is uniqely favoured by all of the interactions

• “fully frustrated” systems: hypercube/hypertetrahedron where the Jij=±1, and

?

J12=1 J13=1

J23= -1

plaquettesall

ijJ 1

And its consequences…

• rugged energy landscape “barrier tree”

of a p-spin model, P=3, N=7 (Fontanari, 2001)

(F=E-TS >> calculating the entropy: restrict to valleys)

0lim N

SN


• high degree of ground-state degeneracy (Plischke, 1994)

three very different configurations have the same ground state energy

in several models:


• Great relevance of broken ergodicity (Palmer, 1983)

-pure systems: mean-field theory of ferromagnets

time average≠Gibbs average

<Si>t=±m <Si>G=0

-spin glasses: in the limit of large N, the state space becomes partitioned

into mutually inaccessible “valleys” (Fischer, 1993)

Averages in disordered systems

• quenched average (of the free energy) -“over the realizations of the disorder” -the randomness of a system, Jij, is fixed (time-scale problem)

Note: doing the average of the logarithm is difficult.

• annealed average (of the free energy) -both spins and the randomness Jij are thermodynamic variables

Essential Difference (case of a protein sequence):• q: SUM of the free energies of various sequences • ad: SUM of EXPs of sequences

))},({ln()]([...ln)(

)(

)(

TJZJPdJkTZkTTF

AJDPA

ijij

ijijqq

Jq

}){},({

2

}{ )(

)]([..ln)(ln)( SJijHij

S ijijadad eJPdJkTTZkTTF

N

Averages in disordered systems

• self-averaging quantities

-extensive quantities: macroscopic system and subsystems

• Z is not self-averaging

(eg. one sample with low free energy could dominate the sum)

qsysqsubsr

rsubssys AAJAJA

1

)()(

N

Ff

R

rr

R

r

NfR

r

NfR

rr ZeeZZ rr

1111

qN

J AA

The Random Energy Model (REM)

• the E total energy of a system = sum of independent contributions

• central limit theorem =>

A particular set {E({J})1, E({J})2, … E({J})Ω} represents the energy levels of one particular realization {J}, of the modeled system

• the E({J})i energies of different microstates of a realization are

statistically independent

• number of microstates (eg. in the case of N Ising spins)

N

EN

NE

eN

eN

EP 22

)(

2

2

2

2

2

1

2

1)(

N2

lll

l JJE 2,,})({ })({

Properties of the REM • average density of states (average over the realizations of the disorder)

spectra of two realizations (eg. {J}, polymer chains) (Pande et al., 1997) (1) (2)

• below an average threshold energy EC :

• since

the density n(E) is self-averaging only in the middle region of P(E).

2ln2

2)2ln(

2

12ln,1

2,1)(2

22

2ln2

NE

N

ENNe

N

eEP

TD

C

CN

EN

CN

C

)(2)()})({()()(

)})({()(

2

1})({

2

1})({

EPdEEPJEEEnEn

JEEEn

N

iiiiJ

iiJ

N

N

)(

1

)(

)(,)()(

EnEn

EnEnEn

Properties of the REM • entropy

The entropy cannot be negative. If E< EC, S(E)=0, the system is “frozen”.

• critical temperature

and

for the critical temperature (where S=0, but s=S/N not necessarily 0) we have

N

EkkNekES

EkES

N

EN

TD

2)2ln()2ln()(

)(ln)(2

2

2

E

N

kET

TE

ES 1)(

1)(

2)(

1

2

12ln)(

kTkNTS

2ln2

1,

)(

1

2

12ln

2 kT

kT CC

Properties of the REM • free energy

If T>TC,

However, if T<TC, S=0, and

• partition function

In case if n(E) is self-averaging, Z does not depend on the disorder, and

dEeE kT

E

i

kT

sE

J

i

neJZ )(})({

1

)(

})({

2ln2

)()()( NkTkT

NTTSTETF

2ln2)()( NTETF

)(

22)(2

2

)(22

2

1

TF

eTZ kT

N

NkT

E

N

E

N dEeN

Digression: Order parameters

• distinguishing between HT paramagnetic and LT frozen states (Edwards and Anderson, 1975)

• some other important quantities

Stat. mech. order parameter:

Degree of broken ergodicity:

• “similarity” between states (e.g. phases) of the system

+1: full

a

a

a

S

SH

aS

SH

a

avall

N

iG

aiN

aEAZ

Z

e

ePSPq N

u

u

u

2

}{

})({

)(

. 1

21 ,

qq

SN

q

EA

N

iGi

1

21

N

iG

BiG

AiAB SS

Nq

1

)()(1

Digression: Phase diagram of the SK model (T, J0, H)

• Replica Trick to perform the quenched averaging of F

By simplifying this expression, introducing as new variables qrs,

and performing a saddle-point analysis, we arrive:

Spin glass phase: q≠0, <Si>=0; (Binder, 1986)

(Sherrington and Kirckpatrick, 1978; H-T plane: Almeida and Thouless, 1978)

n

r

N N Nn

r ij

rj

riij SrJijH

S Sr Sn

SSJ

ijijij

qn

ij

ijq

n

n

xn

n

xn

n

qijq

eJDJPeJPdJTJZ

JAJPDJA

n

x

n

ee

dn

dx

TJZkTTF

11 )(

}){},({2

}1{

2

}{

2

}{)(

0

ln

0

ln

0

][....])([...)},({

})({][

1lim

0

1lim)(limln

)},({ln)(

STr

Protein Synthesis

Transcription: DNA A, G, C, T pre-mRNA splicing mRNA A, G, C, U

Translation: ribosomes, tRNA Genetic code (degenerated !) Initiation: usually Met (AUG) Stop: UAA, UAG, UGA

Folding: with or without chaperons Covalent modifications: disulfide bonds proteolytic modifications, glycozylation…

A chaperone

Protein Structure

Primary structure:the amino acid sequence

Ramachanrdan plot L

N C

Protein Structure

Secondary structure: common regular local structures

α-helix β-sheet

RH helixes are more common than LH

Protein Structure and Classification

Tertiary structure: overall three-dimensional structure of a protein molecule motifs=common “blocks”, domains=independently folding regions

Classification:Globular proteins Fibrous proteins

Lysozyme Heat Shock Protein Collagene

Natively Unfolded proteins -substantial regions of disordered structure -usually have a target ligand -disorder-order transition when binding

Protein Structure

Quaternary Structurearrangements of several polymer molecules in a structure

Protein Folding

Interactions stabilizing the proteins• hydrophobic effect -entropic origin• hydrogen bonds - polar molecules• van der Waals interactions - induced dipoles • Coulomb interactions• at some proteins, disulfide bonds

kT = 4 x10-21 J = 0.03 eV

Anfinsen’s experimentDenaturation - Ribonuclease enzymerestoring the original conditions – the enzyme STARTED TO WORK AGAIN

• gentle heating / chemical treatment (urea, mercapto-ethanol)denaturation

• restoring the original conditionsspontaneous refolding (time scale: seconds)

=> Building of the 3D structure is SPONTANEOUS (in many cases)

Levithal’s paradox

• Anfinsen: there is a native state (F=minimum) • small protein, N=100 amino acids• assume 3 rotamers/monomer

Total number of structures:

• one microstate visited in 10-13s

Time necessary for finding the native state:

Thermodynamic + Kinetic problemSolution: Biasing towards the native state is necessary

47100 1033...33

yearsss 27341347 10101010

Microscopic Models

A typically used Hamiltonian

aI monomer species 1...20 (I: index along the chain)

N number of monomers rI position of the monomer IΔ interaction range function

(usual lattice models: 1 for nn., 0 otherwise)

ε(aI, aJ) interactions between amino acids I and J (NxN)

εij amino acid interaction matrix (20x20)

Including hydrophobicity: -the 21th species is the water -in the “empty” sites

)(),( JI

N

JIJI rraaH

)()(),(20

JIji

ijJI ajaiaa

Digression: the Gō model

• assumption: we know the folded, native conformation

• this conformation is energetically very well optimized • energy: function of the native contacts

εIJ= -w if I and J are first neighbors in the native state

εIJ=0 otherwise

η: the number of native contacts

“uses the answer to answer the question” ?

This model does not help the structure prediction, but it is helpful if we are studying how the protein reaches its native state.

)(2

1w E

Energy spectrum of random heteropolymers

The energy spectrum (400 lowest states) looks alike REM (Sali et al., 1994)

Indeed,

O(N) ≈independent terms => Central Limit Theorem => Gaussian distribution

only some sequences would fold repeatedly to the same stateKEY: single low-lying ground energy

neighboursfirst

IJE

)(),( JI

N

JIJI rraaH

Essential: the ground state

Threshold energy of the REM:

Extreme value statistics: Gumbell distribution it can be shown:

width of the energy gap:

Problems with the REM (thermodynamics):

•no flexibility against changing conditions

•no mutation stability matrix elements changed with ±b, energy levels change with (not large enough ΔE for a unique native state /freezing, escape/)

•there must be some correlation between the energy levels…

2ln2NEC

2ln2NE qG

W(Eg)

Eg

Ec

)1(1, OE GG

Nb

A Way Out: Sequence Design

“Pulling down” the energy of a target conformationCanonical design

•Given a 3d conformation C*• Searching for the best sequence of amino acids that minimalizes E for the given C*

Algorythm: the sequence is annealed

Movement in the sequence space: Metropolis MC method

What about Tdes?too high: random walktoo low: can be useless

*)(),(*)( CJI

N

JIJIdes rraaCH

)'()(,

)'()(exp

)()'(,1

)'(SESEif

kT

SESESESEif

SSP

des

Phase Diagram of Designed Proteins

(Pande et al., 2000)“Folded globule”:

•proteins with a stable target conformation

•they are “minimally frustrated”

DigressionInterpretation of a Chaperone Function

avoiding aggregation

e.g. HPhobic-HPhobic residues

(Clark, 2004)

Prion Proteins

•diseases transmitted by proteins

• PrPSC can induce PrPC→PrPSC transition

• PrPC might be an “off-path”

KineticsThe Funnel Hypothesis

How do we solve Levinthal’s Paradox?

Significantly low-energy native state: partially native structures also will have lower energies than others

Bumps: due to competitive interactions

=>FUNNEL

KineticsFree Energy Barriers and Nucleation

Barriers of F : energetic and entropic

Nucleation: • liquid-gas transition: homogeneous shrinking: ΔE and ΔS disadvantages solution: states with non-uniform density • protein folding: folding ~ seems to be a first-order transition nucleus: small, native secondary structure e.g. α-helix subsequent structure formation is speeded up

00

STEF

Digression

Super-Arrhenius behaviour

Most probably energy in the REM:

Assumption: these probable conformations surrounded with ones.

transition-state theory:

the argument is quadratic rather than linear – “Ferry law”

=> roughness (σ) slows down folding

kTEee mp

E

kT

E

dE

d 22 0

2

2

0E

2

2

)(0

kTesc et

Reaction CoordinateSimple (bimolecular) chemical reactions

A+BC→AB+C

PES(rAB, rBC)

reaction coordinate: the minimum energy path via a saddle-point

Protein Folding: the choice is difficult, no general solution

• similarity to the native state, Q

• an alternative choice: Pfold, or “commitment” Pfold: the probability of folding before even touching an unfolded state

total

nativeQ

Digression: Alternative Reaction Coordinate“Development” on the graph• Lattice model

• {C} conformation space ↔ graph

• single “elementary step” difference ↔nodes C1 and C2 connected

• nC – occupation number (eg. # of independent simulations)• mC – degree of the node

• “Potential” on the graph nodes:

• “development”: MMC dynamics

• define:

=>

Ohm’s law!

kT

CE

Cem)(

Ic→c’=(nc/mc) min{1; (mc/mc’)eE(C)-E(C’)} Ic’→c=(nc’/mc’) min{1; (mc’/mc)eE(C’)-E (C)}

Rcc’=max{mCeU(C);mC’eU(C’)} I= Ic→c’- Ic’→c=[Φc- Φc’]/Rcc’

Digression: Alternative Reaction Coordinate“First return” (casino) problem

“particle” (money) at X0

I will end up with 0 money ↔ all the flux is going to 0electric circuit analogy

Pfold: probability to arrive to the folded state FOR THE FIRST TIME

(Grosberg, 2003)

pfold = RCU/(RCU+RCF)punfold = RCF/(RCU+RCF)

moneyx00

( ) ;U

U xCU

C

R e dx( )

( )

UU x

CU Cfold U

U xFU

F

e dxR

pR

e dx

Conclusion

• protein folding: self-assembly

• low-energy ground state

• biased walk – correlations, funnel hypothesis

• “nucleation”

• sequence design

Acknowledgements

I’m indepted to

Michel DROZ, Alexander GROSBERG, Géza GYÖRGYI,Gabriella NETTING,Zoltán RÁCZ, Zoltán SZABÓ,László SZILÁGYI

and many others…

university of geneva department of theoretical physics the biological physics of protein folding:...

Documents

rem free energy

e total energy

energy levels

low free energy

sumthe random energy

average threshold energy

annealed average

microscopic model