steps towards an ensemble-based force field fitting procedure… dragos horvath, benjamin parent,...
TRANSCRIPT
Steps towards an Ensemble-Based Force Field Fitting Procedure…
Dragos HorvathDragos Horvath, Benjamin Parent, Guy Lippens, Benjamin Parent, Guy Lippens
UMR 8525 CNRS, LilleUMR 8525 CNRS, Lille
Goal…
• To calibrate an empirical molecular force field for use in conformational sampling and docking:– Generally applicable to proteins, sugars, organic ligands
• Full atom simulations, no large protein folding
– Tailor-made for use with torsional degrees of freedom only!
• Continuum model for solvent effects!
– Consistent, in the sense that docking affinities & folding propensities should be directly linked to computed force field energies of sampled ensembles,
• no a posteriori rescoring of docking poses! Docking is just simultaneous conformational sampling of several molecules!
The Prerequisite: an Exhaustive Conformational Sampling Tool!
• Based on a Genetic Algorithm, coding conformers as "chromosomes" in which each locus stands for a torsional angle value.
n…
• The In Silico Darwinian Evolution, leading to fitter and fitter (lower energy) conformers, was enhanced by – hybridization with various optimization heuristics
– Fine-tuning of the parameters controlling the evolutionary strategy
Generation of new offspring :
Mutation :
… n…i+1iWild type : … ni+1’i…mutant :
Crossover :
… n…i+1i
… ’n…’i+1’i’’
parent1 :
parent2 :
… ’n’i+1i…
… ni+1’i…’’
child1 :
child2 :
energies
intermediate population
... n
... n
... n
... n
... n
... n
... n
... n
random
... n
... n
... n
... n
initial population
sorted
next generation
... n
... n
... n
... n
sorted
Hybrid Heuristics: (1)-Targeted torsion angle choice!
Knowledge-based bias: favoring locally stable torsions…
polycycle : torsion nr. 1
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0 100 200 300
angle
pro
bab
ilit
é
polycycle : torsion nr. 3
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0 100 200 300
angle
pro
bab
ilit
ies
biased torsion probabilities thanks to learning
biased torsion probabilities wrt local Hamiltonian
"Traditionalism": favoring torsion values seen in previously visited samples
• Biasing the probabilities to draw a given value for a given angle (according to a temperature parameter):
Hybrid Heuristics: (2)-Directed Mutants or Explorers
• Torsional angle driving
Evolution stuck in local minima,
no mutationwould help
Adding aconstraint term,
Gradient optimizationin this new landscape
Final relaxation towardslocal minimum
"Explorer" launched in parallel in ordernot to halt the evolution process
Other Hybrid Heuristics - Automated Fragment Presampling, Taboo Search
• Taboo Search & Intrapopulation diversity control:– Discarding chromosomes that are too similar to fitter
conformers or to previously visited geometries
• Fragmentation: Sampling of energetically permitted geometries of fragments in presence of a buffer zone– allows the automated definition of "rotamer libraries" out of
which to pick geometries during global sampling!
0.0000
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
0.3500
Enhancement to Find Native Fragment Geometry in Initial Population
% F
rag
me
nts
Search for Optimal Sampling Setups in the Strategy Parameter Space…
p1 p2 p3 p4 p5 p6 p14 p15
Population management
Population size
Number of parallel process
Migration rate between ‘islands’
Evolution management
Crossover rate
Mutation rate
One/two point crossover rate
Selection pressure
Dissimilarity limit
Maximal age
Convergence management
Apocalypse (population reset) frequency
Elitism
Global stop condition
CPUtimeTk
ETkFitness
b
ib .expln._
minimafound
3-fold repeat
Postprocessing…
Run 1
Run 2
Runn
…
Global Base of
Diverse Conformers
Base of diverse conformers[sampled at current setup]
µ-Fitness!!
Meta-algorithm defines parameter setupMeta-algorithm defines parameter setup
NewsNews????
« Taboos »« Tradition »
Meta-GA picksMeta-GA picksnext set of next set of
configurationsconfigurations
yes
GAME
OVER
no
Explorer
Sampling Engine Overview
Conformational sampling with an optimally tuned GA is (reproducibly !)
more efficient than a randomly parameterized simulation
linear peptide
190
195
200
205
210
215
220
225
Nr. of the parameters setting
Fre
e en
ergy
of t
he p
opul
atio
n
Optimizedparameters
Randomparameters
Impact of the hybrid heuristics on the sampling of cyclodextrine…
0
5
10
15
20
25
30
35
1 10 100
Deepest Energy well (kcal)
Nr.
of
dive
rse
conf
orm
ers
with
in +
20
kcal
.
from
be
st m
inim
um
Default No Exploring No Taboos Flat distribution Preference for locally stable torsions
Wanted: *structured* compounds with ~100 torsional degrees of freedom!
• Unfortunately, small molecules showing significant structuring in water (due to weak non-covalent interactions) are rare…– The "Trp cage" peptide 1L2Y (helix & turn, 20 AA)– The "Trp zipper" peptide 1LE1 (-sheet, 13 AA)– Designed minimalist -sheet peptide 1UAO (10 AA)– The WW domain of PIN 1 (34 AA, mostly -sheet)– Conformationally Restrained Helical peptide (CRH) with a
chemically engineered helix inducer group (21 AA)– Cyclodextrine (with "opened" rings!)– Protein-ligand complexes to be used as soon as the
docking module is developed !
Force Fields: What's wrong with existing ones?
• Heisenberg's Frustration Principle applies:– (FF Inaccuracy)X(Chance of "Missing Parameters Error") >> ħ
• Most were fitted with respect to few key points of the energy-geometry landscape, around which molecular dynamics simulations were supposed to gravitate…– … but sampling methods that facilitate barrier crossings may
discover deeper artefactual minima elsewhere!
• Ignoring valence angle flexibility requires some additional "fuzziness" of force field terms, to "accommodate" imprecise interatomic distances…
Considered Force Field terms
• Customized CVFF force field, employing:– a 10 Å cutoff (with a termination function)– a smoothing procedure of interatomic clash
contributions– a continuum solvent model
Effective interatomic distance d0ij
‘Sm
ooth
ing’
dis
tanc
e d i
j
2*4 ijdd
jicoulomb dE
jikd
VQVQkE hphob
h
ji
ijjisolvSolv ,4
,
22
The Force Field Fitting Procedure…
Install a NEW FF parameter configuration
For each training molecule
Locally explore neighborhood of experimental geometry
Run GA-drivenExhaustive
Sampler
Add all sampled conformers toData Base & calculate RMS
Deviation from "native" geometry
Recalculate energies of stored conformers according to current FF setupCalculate Folding G according to chosen RMS radius
All G <0?
Yes, for the first time!
OK!
Yes, reconfirmed!
NO!
Distance-dependent dielectric constant Weighing factor of the desolvation penalty Weighing factor of the hydrophobic contacts Weighing factor of repulsive van der Waals Attractive & repulsive van der Waals coefficients of the following type:
'co' (carbonyl C), 'o' (ether-type O), 'h' (aliphatic H), 'cp' (aromatic C), 'oc' (carbonyl O)
jstatesmisfoldedj
istatesfoldedwelli
E
E
G)exp(
)exp(
ln1
RMS deviationRMS deviationfrom nativefrom native
Status Quo – after eight iterations in force field parameter space…
• Compounds for which correctly folded conformers were sampled, but misfolded conformers of lower energy were also found!
• Molecules for which the correctly folded conformers were never sampled• the WW domain of PIN-1 (34 residues)the WW domain of PIN-1 (34 residues)• the 1LE1 ‘Tryptophane Zipper’ mini-protein (13 residues)the 1LE1 ‘Tryptophane Zipper’ mini-protein (13 residues)
• Compounds for which experimental confor-mations are being sampled and ranked among the energetically most stable:
Cyclodextrine (open rings)Conformationally restrained helical peptide
Tryptophane cage (1L2Y)
Conclusions…
• This is a coherent approach to simultaneously evolve a conformational sampling and docking engine, together with its underlying force field– Both the ability to find the minima and the quality of the energy
landscape are paramount in ensuring that the herein defined measures of free energy will be physically relevant…
– Will the resulting molecular force field be more "sampling-friendly" (with funnel-like landscapes?)
• At this point, it is unclear how quickly – if ever – it will converge, but it is well suited for GRID computations (deployment in progress).
• A genetic algorithm reproducibly finding a significant low-energy representative for each populated energy minimum cannot be envisaged without help from other minimum search heuristics… TTHANKSHTHANKSA THANKSNTHANKSKTHANKSS