1
CONFORMATIONAL OPTIMIZATION AND SAMPLINGALONG NATURAL COORDINATES
Peter MinaryComputational Structural Biology Group & Bio-X
CenterStanford UniversityStanford, CA 94305
2
TALK OUTLINE
– Obstacles for Deciphering the Central Dogma of MB
– Challenges for Optimization & Sampling Algorithms
– Natural Coordinates for Biological Macromolecules
– Chain Closure Algorithms, Obstacles & Solutions
– An Atomic Level Insight into the Central Dogma• Nucleosome Positioning/Large Scale Optimization• Structure Space of RNA Junctions and Fractals• Interpretation & Refinement of Experimental Data
CENTRAL DOGMA OF MOLECULAR BIOLOGY
3
F. H. Crick(1)
Tran
scrip
tiona
l
Regul
atio
n
PostTranscriptional
Regulation
Translation Folding
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).
FUNCTIONM
oti
on
“If you want to understand function, study structure.” F. H. C. Crick
CENTRAL DOGMA OF MOLECULAR BIOLOGY
4
F. H. Crick(1)
Tran
scrip
tiona
l
Regul
atio
n
PostTranscriptional
Regulation
Translation Folding
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION
Mo
tio
n
5
TRANSCRIPTIONAL REGULATION
TF
...GTCCAGTTACGAATTGCGCGC…DNA DNA
~
Nucleosome Structure Nucleosome Positioning
...GTCCAGTTACGAATTGCGCGC…
3D Structure
E(Xi)
…..GTGAATGCCCAG…..
Scan DNA
TF
DNA in Chromatin
– Grand Challenges for CSB• Structure Based Prediction of Nucleosome Positions• Structure Based Prediction of TransF Binding Sites
• Requires All Atom Representation & Rapid Optimization• Simultaneously Explore Sequence and Structure Space
• Need Conceptually Novel Optimization/Sampling Tools
CENTRAL DOGMA OF MOLECULAR BIOLOGY
6
F. H. Crick(1)
Tran
scrip
tiona
l
Regul
atio
n
PostTranscriptional
Regulation
Translation Folding
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION
Mo
tio
n
POST TRANSCRIPTIONAL REGULATION
– Grand Challenges for CSB• Prediction of RNA Tertiary Structure
EXAMPLE: mRNA TRANSPORT IN NEURONS
• Need a Novel O/S Approach
• & Transport Protein Binding Sites
CENTRAL DOGMA OF MOLECULAR BIOLOGY
8
F. H. Crick(1)
Tran
scrip
tiona
l
Regul
atio
n
PostTranscriptional
Regulation
Translation Folding
(1) F. H. Crick et al. Nature 227 561-563 (1970).FUNCTION
Mo
tio
n
EM images of Molecular Complex
PROTEIN MOTION
– In Current Trend: Experimentally Measured Structures Are Getting
• Larger in Size• Higher in Flexibility• Lower in Resolution
FAS
Fatty
Acid
Synthase
– In Current Refinement Methods Atomic Motions Are Modeled As
• Independent• Isotropic• Harmonic
– To Follow the Trend Atomic Motion in Refinement Methods Should Be
• Collective• Anisotropic• Anharmonic
9
– Demand for Novel Optimization Methods for Structure Refinement
10
CHALLENGES FOR OPTIMIZATION & SAMPLING ALGORITHMS
– Roughness of the object function, E(X)• Leads to rare events in Markov Chain MC(1)
• Solutions– Multiple Markov Chains in Temperature(2)/Energy Domain(3, 4)
– Transformation of Variables(5) and/or using Extra Dimensions(6)
– Large number of degrees of freedom, Nd
• Number of energy basins is non polynomial in Nd
• Solutions– Local or Global Torsional Degrees of Freedom(4,7)
– Arbitrary/Most Relevant/Natural Degrees of Freedom(9)
(1) Metropolis, et al. J. Chem. Phys. 21, 1087-1091 (1953).(2) Geyer, et al. Proceedings of the 23rd Symposium on the Interface, 156-163 (1991).(3) Kou, et al. Annals of Statistics 34 1581-1619 (2006).(4) Minary et al. Annals of Statistics 34 1638-1642 (2006).(5) Minary et al. SIAM Journal of Scientific Computing 30 2055-2083 (2008).(6) Minary et al. J. Chem. Phys. 118 2510-2525 (2003) (7) Minary et al. J. Mol. Biol. 25 920-933 (2008).(8) Dodd et al. Mol. Phys. 78 961-996 (1993).(9) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
11
NATURAL DEGREES of FREEDOM for
NUCLEIC ACIDS
Dx ShiftDy SlideDz Rise
τ Tiltρ Rollω Twist
Sx ShearSy StretchSz Stagger
κ Buckleπ Propellerσ Opening
xy
zSx
x
y
zSy
xy
zSz
z
xy
κ
y
zx
σ
y
xzπ
zx
y
Dx
zx
y
Dy
zx
y
Dz
x
y
zτ
x
y
zρ
z
xy
ω
dof: 10(4+12x½)
Sx
Sy
Sz
κπσ
Dx
Dy
Dzτρω
N
O3′O3′
RC
C5’
O5’ P
C4’
O1’
Movesbreak the
chain!
τ12
τ23
θ1
θ2
12
NATURAL DEGREES of FREEDOM for PROTEINS
β-SHEET & α-HELIX
Sx ShearSy StretchSz Stagger
κ Buckleπ Propellerσ Opening
x
y
z
Sx
Movesbreak the
chain!
13
CHAIN CLOSURE ALGORITHMS
– Analytical multi atom closure algorithms(1)
• Ncd non-linear equations and Ncd unknown, Ncd number of closure dof
• Ncd = 6 is the practical limit, given that the complexity is O(fNP(Ncd))
– Single atom Deterministic Full Closure (DFC)(2)
• Cost efficient• Two solutions or No solution
– Single atom Stochastic Partial Closure (SPC)(3) • Cost efficient• Solution always exist for• Any size of the chain break
(1) Dodd et al. Mol. Phys. 78 961-996 (1993).(2) Sklenar et al. J. Comp Chem. 27 309-315 (2005).(3) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
14
RECURSIVE STOCHASTIC CLOSURE
1 cycle of RSC = DFC[ SPC[ SPC[ SPC[…] ] ] ]
Molten zone
Molten zone
DFC
1st cycle
m cycles
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
• One SPC step
– Restores 4-5, breaks 3-4
• Multiple SPC steps– Propagates the chain brake
– Narrows closure gap
• AC = O(Ncd) << O(fNP(Ncd))– Ncd = 2 Nm + 5
15
MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-I
Molten zone (C4’….O3’)
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
16
MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-II
• Monte Carlo Minimization(1) (MCM) is Monte Carlo on
• In MCRSC(2) is Monte Carlo on
( ) min ( )X
E X E XE
E
( ) min ( ) d
id dXiX XE X E X
minimization invariant DOF X E evaluation
MCM
MCRSC
BFGS, CG none cart/tors ~10-1000
N cycle of RSC Xi arbitrary 1
(1) Wales, D. J., Scheraga, H. A. Science 285 1368-1372 (1999).(2) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
17
• RSC works with an order of magnitude larger move sizes than DFC• RSC is like a wire, you pull the system that deforms to follow the change
RECURSIVE STOCHASTIC vs DETERMINISTIC FULL CLOSUREin MONTE CARLO: a B-DNA
zx
y
Dx
zx
y
Dy
zx
y
Dz
xy
zSx
x
y
zSy
xy
zSz
dof: 6
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
E2 binding DNA: 5’-ACCGAATTCGGT-3’ Force Field: amber99-bs0
18
RECURSIVE STOCHASTIC CLOSURE vs LOOP TORSIONAL SAMPLING in MONTE CARLO: an α+β PROTEIN
SCOP id: d1div_2, 55 residue domain
(2) Minary & Levitt J. Mol. Biol. 25 920-933 (2008).(1) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
(1)
(2)
Ncd = 19
19
APPLICATIONS
20
THE METHOD: GENERAL PIPELINE IN SILICO NUCLEOSOME POSITIONING
21
APPLICATION TO CHROMOSOME 14
(1) Cherry, J. M. et al., Nucleic Acids Res. 26, 73-79 (1998).(2) Kaplan, N. et al., Nature 458, 362-366 (2006). (3) Davey, C. A. et al., J. Mol. Biol. 319 1097-1113 (2002).(4) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(5) Perez et al., Biophysics J. 92 3817-3827 (2007).(6) Minary (2010).
ab initio
P(i)
i i
P(i)
in vitro
• Yeast Chromosome 14– 187k-189k from SGD(1)
– Experimental Data(2)
• Nucleosome template– 1.9 Å resolution– pdb code (1kx3)(3)
• Slide nucleosome along DNA– Slide a 147 bp window– Design template
• Run MCRSC on all structures– Force field: AMBER99-bs0(5)
– Software: MOSAICS(6)
• Get probability profile– P(i) ~ exp(-β <E(i)>)
187k 189k 201k 203k 205k 207k
Minary & Levitt
IN SILICO NUCLEOSOME POSITIONING
NUCLEOSOME OCCUPANCY
Yeast Chromosome 14
i
Minary & Levitt
P(i)
in vivo
P(i)in vitro
ab initio P(i)
i 191000 193000 195000 197000 199000
P(i)
in vivo
P(i)in vitro
P(i)ab initio
22
187000 191000 195000 199000 203000 207000
IN SILICO NUCLEOSOME POSITIONING
HIERARCHICAL NATURAL DOFs/MOVES (HNM)
23
L2L1
L1
L3 L4
EXPLORING RNA STRUCTURE SPACE
RNA 4 WAY JUNCTION: SAMPLING METHODS
24
Move Set(1,2,3)
L1
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. To be submitted.(3) Minary, P., MOSAICS: http://csb.stanford.edu/minary/MOSAICS
EXPLORING RNA STRUCTURE SPACE
L1 NM-MC(1,3)
L1 – L2
Sampling Methods
L2
L3 L4
NM-MC(1,3)
MCRSC(1)
+ . . . =
L1 - L4
L1
HNM-MC(1,2,3)
.
.
L1 – L3 HNM-MC(1,2,3)
L1 – L4. .
MCRSC(1)
+User Defined
Move Sets(Medicine/Physics)(Chemistry/Biology)
RNA 4 WAY JUNCTION
25
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Parisien and Major, Nature, 452, 51 (2008).(3) R. Das, J. Karanicolas, and D. Baker, Nat. Methods 7 (4), 291 (2010). (4) Sim, A., Levitt, M., Minary, P. , To be submitted. (5) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
EXPLORING RNA STRUCTURE SPACE
NM-MC(1,5) FA-MC-Sym(2) FA-Rosetta(3) HNM-MC(1,4,5)
(a) (b) (c) (d)L1 L1-L4
• Necessary condition for unbiased sampling
– Symmetric RNA -> distributions coincide
• Easy to improve by field specific move set
– RNA : relative arrangement of stem loops
• Comparing to Fragment Assembly
– Biased and non continuous sampling
– Dependence on fragment libraries
HNM-MC(1,4,5)
L1 - L4
L2
L4
L1
L3
FRACTAL RNA: BEYOND CURRENT METHODS
26
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
EXPLORING RNA STRUCTURE SPACE
• Necessary condition for unbiased sampling
– Symmetric RNA -> armend distributions coincide
• Further improvement by L5, L6, L7
– No limitation on improvement
• Benchmark with different move sets
– Accuracy converges by L7(1,2,3)
HNM-MC(1,2,3)
εrro
r(i)
i x 104
L1 – L4 L1 – L7
FRACTAL RNA: WHY/HOW DOES IT WORK?
27(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
EXPLORING RNA STRUCTURE SPACE
• Use embedded subspaces
• In particular
– : 6 DOFs / main arms(2)
– : 6 DOFs / arms of arms(2)
– : 10 DOFs / nucleotides(1)
Ω3 ⊂Ω2 ⊂Ω1 ≡ Ω
Ω1
Ω3
Ω2
Ω1
• Low cost method to approximate
• Multi scale integration(3) along
–
– around all
– around all
Ω2
Ω3
α = dLL∈Ω∫ α (L) f (L)
α, f :Ω→ °
L3 ∈Ω3
L2 ∈Ω2
L1 ∈Ω1
L3
L2
Fatty Acid Synthase (FAS)
EM images of Molecular Complex
OBJECTIVE
Objective
initial model refined model EM image
CRYO-EM REFINEMENT
28
initial structure
target structure2 Å rmsd
refined structure
VALIDATION I
(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
optimization(1)-(3)
along natural dof
target projection18 Å rmsd
CRYO-EM REFINEMENT
29
Lysozyme
cc
Projection Angle
CRYO-EM REFINEMENTVALIDATION II: CROSS CORRELATION OF MAPS
Etotal= Weight*EEM+ Emolecule
THE PROTOCOL CRYO-EM REFINEMENT
31
Lysozyme
REFINEMENT CRYO-EM REFINEMENT
32
DOMAIN FLEXIBILITY CRYO-EM REFINEMENT
33
(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS(4) Courtesy of Steve Ludtke, Baylor College, Texas.
(1)-(3)
(4)
CONCLUSION
• CSB has Limited Impact due to Inefficient Conformational Sampling
• Novel Algorithms Supporting Natural DOF May Offer The Solution
• Our Novel Approach May Open New Avenues
– In The Refinement and Interpretation of Experimental Data
– In The Use of Structural Information in Molecular Biology
• Atomic Level Understanding of the CDMB may be a reality with NC
34
FUNCTION
“If the code does indeed have some logical foundation then it is legitimate to consider all the evidence, both good and bad, in any attempt to
deduce it.” F. C. H. Crick
CDMB
35
ACKNOWLEDGEMENTS
– Michael Levitt Computer Sci. & Structural Biology, Stanford, US
– Jernei Ule Molecular Biology/MRC, Cambridge, UK
– Peter Lukavszky Molecular Biology/MRC, Cambridge, UK
– Sebastian Doniach Physics, Stanford, US
– Zev Bryan Bioengineering, Stanford, US
– Wing H Wong Statistics, Stanford, US
– Wah Chiu Baylor College, Texas, US
– Adelene Sim Physics, Stanford, US (graduate student)
– Gaurav Chopra Mathematics, Stanford, US (graduate student)
– Junjie Zhang Baylor College and Stanford, US (postdoc)
– Anatole von Lilienfeld & and Workshop Organizing Committee