university of geneva department of theoretical physics the biological physics of protein folding:...
TRANSCRIPT
University of GenevaDepartment of Theoretical Physics
The The Biological Physics of Biological Physics of PProtein Folding: rotein Folding:
the Random Energy Model and Beyondthe Random Energy Model and Beyond
Péter HANTZ
Outline of the lecture:
1. Modeling disordered systems• Spin glasses, frustration, Random Energy Model
2. Proteins: Building Elements and Structure • Primary, Secondary and Tertiary Structure, Classification
3. The Problem of Protein Folding • Anfinsen Experiment, Levinthal Paradox
• A Microscopic model: RHP • Phenomenological models: Gō, REM • Sequence Design, Minimal Frustration • Kinetics: Funnel Hypothesis, Nucleation, Reaction Coordinate
What is a spin glass? • interacting system of spins • low-temperature: frozen in random orientations
What is necessary for this? • (at least partially) random interactions • competing interactions
Simple Model Hamiltonians:Sherrington-Kirkpatrick Model
P-spin model
Distribution of coupling constants:2
20...1
1
21
21
2
)/(
2...
...21
..1
...
)(
2)(
})({
})({
J
NJJ
ipi
ipii
ipii
iiIip
ji
jipairsall
ijSK
ipi
p
eJ
NJP
SSSJSH
SSJSH
Spin Glasses
Frustration…
• no configuration is uniqely favoured by all of the interactions
• “fully frustrated” systems: hypercube/hypertetrahedron where the Jij=±1, and
?
J12=1 J13=1
J23= -1
plaquettesall
ijJ 1
And its consequences…
• rugged energy landscape “barrier tree”
of a p-spin model, P=3, N=7 (Fontanari, 2001)
(F=E-TS >> calculating the entropy: restrict to valleys)
0lim N
SN
And its consequences…
• high degree of ground-state degeneracy (Plischke, 1994)
three very different configurations have the same ground state energy
in several models:
And its consequences…
• Great relevance of broken ergodicity (Palmer, 1983)
-pure systems: mean-field theory of ferromagnets
time average≠Gibbs average
<Si>t=±m <Si>G=0
-spin glasses: in the limit of large N, the state space becomes partitioned
into mutually inaccessible “valleys” (Fischer, 1993)
Averages in disordered systems
• quenched average (of the free energy) -“over the realizations of the disorder” -the randomness of a system, Jij, is fixed (time-scale problem)
Note: doing the average of the logarithm is difficult.
• annealed average (of the free energy) -both spins and the randomness Jij are thermodynamic variables
Essential Difference (case of a protein sequence):• q: SUM of the free energies of various sequences • ad: SUM of EXPs of sequences
))},({ln()]([...ln)(
)(
)(
TJZJPdJkTZkTTF
AJDPA
ijij
ijijqq
Jq
}){},({
2
}{ )(
)]([..ln)(ln)( SJijHij
S ijijadad eJPdJkTTZkTTF
N
Averages in disordered systems
• self-averaging quantities
-extensive quantities: macroscopic system and subsystems
• Z is not self-averaging
(eg. one sample with low free energy could dominate the sum)
qsysqsubsr
rsubssys AAJAJA
1
)()(
N
Ff
R
rr
R
r
NfR
r
NfR
rr ZeeZZ rr
1111
qN
J AA
The Random Energy Model (REM)
• the E total energy of a system = sum of independent contributions
• central limit theorem =>
A particular set {E({J})1, E({J})2, … E({J})Ω} represents the energy levels of one particular realization {J}, of the modeled system
• the E({J})i energies of different microstates of a realization are
statistically independent
• number of microstates (eg. in the case of N Ising spins)
N
EN
NE
eN
eN
EP 22
)(
2
2
2
2
2
1
2
1)(
N2
lll
l JJE 2,,})({ })({
Properties of the REM • average density of states (average over the realizations of the disorder)
spectra of two realizations (eg. {J}, polymer chains) (Pande et al., 1997) (1) (2)
• below an average threshold energy EC :
• since
the density n(E) is self-averaging only in the middle region of P(E).
2ln2
2)2ln(
2
12ln,1
2,1)(2
22
2ln2
NE
N
ENNe
N
eEP
TD
C
CN
EN
CN
C
)(2)()})({()()(
)})({()(
2
1})({
2
1})({
EPdEEPJEEEnEn
JEEEn
N
iiiiJ
iiJ
N
N
)(
1
)(
)(,)()(
EnEn
EnEnEn
Properties of the REM • entropy
The entropy cannot be negative. If E< EC, S(E)=0, the system is “frozen”.
• critical temperature
and
for the critical temperature (where S=0, but s=S/N not necessarily 0) we have
N
EkkNekES
EkES
N
EN
TD
2)2ln()2ln()(
)(ln)(2
2
2
E
N
kET
TE
ES 1)(
1)(
2)(
1
2
12ln)(
kTkNTS
2ln2
1,
)(
1
2
12ln
2 kT
kT CC
Properties of the REM • free energy
If T>TC,
However, if T<TC, S=0, and
• partition function
In case if n(E) is self-averaging, Z does not depend on the disorder, and
dEeE kT
E
i
kT
sE
J
i
neJZ )(})({
1
)(
})({
2ln2
)()()( NkTkT
NTTSTETF
2ln2)()( NTETF
)(
22)(2
2
)(22
2
1
TF
eTZ kT
N
NkT
E
N
E
N dEeN
Digression: Order parameters
• distinguishing between HT paramagnetic and LT frozen states (Edwards and Anderson, 1975)
• some other important quantities
Stat. mech. order parameter:
Degree of broken ergodicity:
• “similarity” between states (e.g. phases) of the system
+1: full
a
a
a
S
SH
aS
SH
a
avall
N
iG
aiN
aEAZ
Z
e
ePSPq N
u
u
u
2
}{
})({
)(
. 1
21 ,
SN
q
EA
N
iGi
1
21
N
iG
BiG
AiAB SS
Nq
1
)()(1
Digression: Phase diagram of the SK model (T, J0, H)
• Replica Trick to perform the quenched averaging of F
By simplifying this expression, introducing as new variables qrs,
and performing a saddle-point analysis, we arrive:
Spin glass phase: q≠0, <Si>=0; (Binder, 1986)
(Sherrington and Kirckpatrick, 1978; H-T plane: Almeida and Thouless, 1978)
n
r
N N Nn
r ij
rj
riij SrJijH
S Sr Sn
SSJ
ijijij
qn
ij
ijq
n
n
xn
n
xn
n
qijq
eJDJPeJPdJTJZ
JAJPDJA
n
x
n
ee
dn
dx
TJZkTTF
11 )(
}){},({2
}1{
2
}{
2
}{)(
0
ln
0
ln
0
][....])([...)},({
})({][
1lim
0
1lim)(limln
)},({ln)(
STr
Protein Synthesis
Transcription: DNA A, G, C, T pre-mRNA splicing mRNA A, G, C, U
Translation: ribosomes, tRNA Genetic code (degenerated !) Initiation: usually Met (AUG) Stop: UAA, UAG, UGA
Folding: with or without chaperons Covalent modifications: disulfide bonds proteolytic modifications, glycozylation…
A chaperone
Protein Structure
Secondary structure: common regular local structures
α-helix β-sheet
RH helixes are more common than LH
Protein Structure and Classification
Tertiary structure: overall three-dimensional structure of a protein molecule motifs=common “blocks”, domains=independently folding regions
Classification:Globular proteins Fibrous proteins
Lysozyme Heat Shock Protein Collagene
Natively Unfolded proteins -substantial regions of disordered structure -usually have a target ligand -disorder-order transition when binding
Protein Folding
Interactions stabilizing the proteins• hydrophobic effect -entropic origin• hydrogen bonds - polar molecules• van der Waals interactions - induced dipoles • Coulomb interactions• at some proteins, disulfide bonds
kT = 4 x10-21 J = 0.03 eV
Anfinsen’s experimentDenaturation - Ribonuclease enzymerestoring the original conditions – the enzyme STARTED TO WORK AGAIN
• gentle heating / chemical treatment (urea, mercapto-ethanol)denaturation
• restoring the original conditionsspontaneous refolding (time scale: seconds)
=> Building of the 3D structure is SPONTANEOUS (in many cases)
Levithal’s paradox
• Anfinsen: there is a native state (F=minimum) • small protein, N=100 amino acids• assume 3 rotamers/monomer
Total number of structures:
• one microstate visited in 10-13s
Time necessary for finding the native state:
Thermodynamic + Kinetic problemSolution: Biasing towards the native state is necessary
47100 1033...33
yearsss 27341347 10101010
Microscopic Models
A typically used Hamiltonian
aI monomer species 1...20 (I: index along the chain)
N number of monomers rI position of the monomer IΔ interaction range function
(usual lattice models: 1 for nn., 0 otherwise)
ε(aI, aJ) interactions between amino acids I and J (NxN)
εij amino acid interaction matrix (20x20)
Including hydrophobicity: -the 21th species is the water -in the “empty” sites
)(),( JI
N
JIJI rraaH
)()(),(20
JIji
ijJI ajaiaa
Digression: the Gō model
• assumption: we know the folded, native conformation
• this conformation is energetically very well optimized • energy: function of the native contacts
εIJ= -w if I and J are first neighbors in the native state
εIJ=0 otherwise
η: the number of native contacts
“uses the answer to answer the question” ?
This model does not help the structure prediction, but it is helpful if we are studying how the protein reaches its native state.
)(2
1w E
Energy spectrum of random heteropolymers
The energy spectrum (400 lowest states) looks alike REM (Sali et al., 1994)
Indeed,
O(N) ≈independent terms => Central Limit Theorem => Gaussian distribution
only some sequences would fold repeatedly to the same stateKEY: single low-lying ground energy
neighboursfirst
IJE
)(),( JI
N
JIJI rraaH
Essential: the ground state
Threshold energy of the REM:
Extreme value statistics: Gumbell distribution it can be shown:
width of the energy gap:
Problems with the REM (thermodynamics):
•no flexibility against changing conditions
•no mutation stability matrix elements changed with ±b, energy levels change with (not large enough ΔE for a unique native state /freezing, escape/)
•there must be some correlation between the energy levels…
2ln2NEC
2ln2NE qG
W(Eg)
Eg
Ec
)1(1, OE GG
Nb
A Way Out: Sequence Design
“Pulling down” the energy of a target conformationCanonical design
•Given a 3d conformation C*• Searching for the best sequence of amino acids that minimalizes E for the given C*
Algorythm: the sequence is annealed
Movement in the sequence space: Metropolis MC method
What about Tdes?too high: random walktoo low: can be useless
*)(),(*)( CJI
N
JIJIdes rraaCH
)'()(,
)'()(exp
)()'(,1
)'(SESEif
kT
SESESESEif
SSP
des
Phase Diagram of Designed Proteins
(Pande et al., 2000)“Folded globule”:
•proteins with a stable target conformation
•they are “minimally frustrated”
DigressionInterpretation of a Chaperone Function
avoiding aggregation
e.g. HPhobic-HPhobic residues
(Clark, 2004)
Prion Proteins
•diseases transmitted by proteins
• PrPSC can induce PrPC→PrPSC transition
• PrPC might be an “off-path”
KineticsThe Funnel Hypothesis
How do we solve Levinthal’s Paradox?
Significantly low-energy native state: partially native structures also will have lower energies than others
Bumps: due to competitive interactions
=>FUNNEL
KineticsFree Energy Barriers and Nucleation
Barriers of F : energetic and entropic
Nucleation: • liquid-gas transition: homogeneous shrinking: ΔE and ΔS disadvantages solution: states with non-uniform density • protein folding: folding ~ seems to be a first-order transition nucleus: small, native secondary structure e.g. α-helix subsequent structure formation is speeded up
00
STEF
Digression
Super-Arrhenius behaviour
Most probably energy in the REM:
Assumption: these probable conformations surrounded with ones.
transition-state theory:
the argument is quadratic rather than linear – “Ferry law”
=> roughness (σ) slows down folding
kTEee mp
E
kT
E
dE
d 22 0
2
2
0E
2
2
)(0
kTesc et
Reaction CoordinateSimple (bimolecular) chemical reactions
A+BC→AB+C
PES(rAB, rBC)
reaction coordinate: the minimum energy path via a saddle-point
Protein Folding: the choice is difficult, no general solution
• similarity to the native state, Q
• an alternative choice: Pfold, or “commitment” Pfold: the probability of folding before even touching an unfolded state
total
nativeQ
Digression: Alternative Reaction Coordinate“Development” on the graph• Lattice model
• {C} conformation space ↔ graph
• single “elementary step” difference ↔nodes C1 and C2 connected
• nC – occupation number (eg. # of independent simulations)• mC – degree of the node
• “Potential” on the graph nodes:
• “development”: MMC dynamics
• define:
=>
Ohm’s law!
kT
CE
Cem)(
Ic→c’=(nc/mc) min{1; (mc/mc’)eE(C)-E(C’)} Ic’→c=(nc’/mc’) min{1; (mc’/mc)eE(C’)-E (C)}
Rcc’=max{mCeU(C);mC’eU(C’)} I= Ic→c’- Ic’→c=[Φc- Φc’]/Rcc’
Digression: Alternative Reaction Coordinate“First return” (casino) problem
“particle” (money) at X0
I will end up with 0 money ↔ all the flux is going to 0electric circuit analogy
Pfold: probability to arrive to the folded state FOR THE FIRST TIME
(Grosberg, 2003)
pfold = RCU/(RCU+RCF)punfold = RCF/(RCU+RCF)
moneyx00
( ) ;U
U xCU
C
R e dx( )
( )
UU x
CU Cfold U
U xFU
F
e dxR
pR
e dx
Conclusion
• protein folding: self-assembly
• low-energy ground state
• biased walk – correlations, funnel hypothesis
• “nucleation”
• sequence design