docking: modeling of binding of macromolecules between themselves or with small-molecule ligands....
TRANSCRIPT
Docking: Modeling of binding of macromolecules between themselves or with small-molecule ligands.
Factors that determine specific binding:
Phenomenologically: -shape complementarity (lock and key).
Basis for early geometry-based docking algorithms
-property complementarity; hydrophobic atoms to hydrophobic atoms, hydrogen bond donor to h.b. acceptor, positively charged to negatively charged
Physically: -for a stable complex, bound
conformation is the global free energy minimum: maximizing favorable and minimizing unfavorable interactions
Docking
-CH2-CH2-
-CH2-CH2-
--- +
+
||O
H \
-H
Energy function
Fast approximation, but accurate enough - global minimum should correspond to the (near) native conformation.
Potentials derived from:- Molecular mechanics force-fields:
physical terms, parameters based on QM and/or experimental physical properties (ECEPP,MMFF etc)
- statistical/phenomenological1. ad-hoc 2. Mean-force: observed statistics of inter-atomic distances + Boltzmann
Docking as a global energy optimization problem
Search algorithm
Quickly locates the global minimum on a (typically) extremely rugged energy landscape
- geometry-based: rigid-body, possibly followed by local minimization
- incremental construction:split in rigid fragments, dock, rebuild from ‘anchors’
- genetic algorithm:‘chromosomes’ of variables, recombination/mutations, Darwinian evolution
- Monte-Carlo (+local minimization)
MCM global optimization procedure
Monte-Carlo minimization:1. Random step: perturb one of the
torsions or the position/orientation of the ligand
2. Local gradient minimization3. Compare the new energy to the
previous value, if improved, accept the new conformation, otherwise apply Metropolis criterion: accept/reject with the probability Exp(-∆E/kT)
4. Go back to step 1
Termination: Adaptive heuristics for optimal MC run length based on ligand size and flexibility.
Fast Grid Protein/Flexible Ligand Docking in ICM
Global energy optimization:-ligand position and internal torsions optimized by stochastic Monte-Carlo search in the framework of Internal Coordinates Mechanics (ICM)
- local gradient minimization after each random move
- ligand is continuously flexible
- receptor represented by pre-calculated grid potentials
- energy terms include ligand internal force-field energy and grid receptor interaction potentials
Grid potentials
Continuously differentiable grid potential using spline interpolation for efficient gradient minimization
Terms:• Van der Waals - steric repulsion and dispersion attraction• Electrostatics• Directional (anisotropic) hydrogen bonding• Hydrophobic interaction
Acceleration:~100 fold faster than explicit receptor.Implicit minor receptor flexibility:smoothing grid potentials, truncating VW repulsion to limit the adverse effect of minor steric clashes (‘soft’ docking). Soft potentials also make minimization more efficient
EmaxVW
E
d
• Internal coordinates• Large radius of convergence• Efficient global energyoptimization algorithm
Applications:Folding, protein modeling,Docking, Virtual Screening
ICM References:• Abagyan, Mazur (1989) • Abagyan et al. (1994) “ICM - a new method for protein modeling..”J. Comp. Chem. 15, 488-506• Abagyan, and Totrov, (1994).“Biased Probability Monte Carlo searches …”J. Mol. Biol. 235, 983-1002.
Method: Internal Coordinate Mechanics (ICM)
Global optimization procedure (2)
Tricks to improve search efficiencyConformational stack: low-energy conformations accumulated, trajectory monitored by comparison with previously found minima.
Multiple start: If the simulation is ‘trapped’ in the vicinity of a certain conformation, or if the energy remains higher than energy of a number of already found conformations, restart from another initial conformation.
‘Grid annealing’: first dock into smoothed grid potentials, than in ‘hard’ exact grids.
‘Reverse’ torsion steps, symmetry, Cartesian relaxation, etc..
Accuracy/Speed of Flexible Ligand Docking
From: Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59.
PDB RMSD
1a28 0.032
1bsx 0.38
1db1 0.74
1e3g 0.22
1e3k 0.28
1fby 0.37
1fcz 0.84
1fm6 1.39
1fm6 0.78
1fm9 1.72
1i37 0.21
1ilh 2.24
1l2i 0.29
1qkm 0.79
3erd 1.61
3ert 1.41
Large benchmarks 100-300 complexes:
• For a ‘clean’ benchmark (high resolution, good X-ray density for both ligand and binding site, no obvious crystallography errors) typically ~80% of ligand/receptor complexes are reproduced within 2Å RMSD
• For broader benchmarks ~70% within 2Å. >75% within 3Å, ~60% within 1.5Å RMSD, only ~45% within 1Å RMSD. For >85% of complexes, a pose within 2Å is found among top 10 solutions.
Quality of the X-ray structure:-low resolution or NMR-missing residues-missing side-chains atoms-high b-factors-clashes
Special features:-covalent binding-rare residue charge state - protonated Asp or Glu (HIV protease),
deprotonated Cys or Tyr.-coupled ligand/ion binding (kinases - ATP/Mg) -tightly bound water molecules (2-3 hbonds coordination)
‘Druggability’:•binding pocket identification: PocketFinder
Receptor analysis/issues affecting docking
•Conversion of pdb structure to ICM: - hydrogens and missing heavy atoms added- polar hydrogens optimized (possibly including water)- atom types and partial charges assigned
Specific cases may involve:
• regularization/refinement in cases of poor structure quality: idealized amino acid covalent geometry imposed, energy annealing, possibly in the presence of a ligand (site ‘molding’)
• sampling of alternative side-chain conformations, loops
• homology modelling
Receptor preparation
Difficult cases:
• Highly flexible ligands (more than 10-15 torsions)• Shallow pockets• Water-mediated binding• Covalent binding• Poor quality of the receptor structure (low resolution X-ray, NMR, homology models)• Receptor flexibility
Remedies:- longer simulations- include water- constrained docking - explicit receptor docking- multiple receptor structures
Potential pitfalls
Approaches:
• Explicit continuously flexible receptor:- in principle, more comprehensive- slow even for side-chains, very slow for backbone movements- prone to artefacts: dozens of new variables, new local minima- large backbone movements are still generallybeyond reach
• Ensemble of pre-defined receptor conformations - either from multiple experimental structures or from simulations such as side-chain or loop sampling, homology modelling:
- can be fast- any type of movement can be handled- success mostly defined by the quality of the ensemble
Receptor flexibility
‘SCARE’ (Bottegoni et al, JCAMD 2008 A new method for ligand docking to flexible receptors by dual alanine scanning and refinement.)
Observation: typically, steric clashes resolved by induced fit involve 1-2 sidechains
• Grid docking is performed to multiple versions of the binding site generated by systematic replacement of various pairs of sidechains by alanines
• Top-scoring ligand pose from each grid simulation is refined with explicit flexible receptor. Best scoring refined conformation is selected as final answerOn a benchmark of 30 cross-docking pairs, top-ranking near native solution was found in 80% of cases. Protocol takes ~2Hr CPU time
Hybrid grid/explicit protocol: SCARE
Direct incorporation of discrete receptor flexibility in grid-based simulation: • Receptor conformations provided by user in a conformational stack• Displaceable bound water molecules can be included• Potentials are pre-calculated for each receptor conformation• Stored as ‘4D’ grids - 4th dimension is the receptor conformation• During MC simulation, additional type of stochastic step is included: the grid 4D layer switch
Benchmarking recently published: Bottegoni et al J Med Chem 2009 Four-dimensional docking: a fast and accurate account of discrete receptor flexibility in ligand docking. 99 therapeutically relevant proteins and 300 diverse ligands. 77% complexes correctly reproduced. On average 4-fold faster than independent grid docking into all available receptor conformations.
Receptor ensemble docking in ICM: 4D grids
• Virtual ligand screening (VLS) algorithms allow to identify potential novel ligands from databases in silico.
Search large databases (100K-1M or more compounds) and select subsets enriched with hits using:
-2D pharmacophore- similarity measures (fingerprints, Tanimoto)-3D pharmacophore-receptor structure:
no prior ligand knowledge necessarysearch is not biased to known chemistry
Receptor structure based VLS:-dock each ligand in the DB to the receptor
structure-evaluate the quality of fit in the docked
structures to select potential binders.
Results of ICM docking of a virtual library of 200,000 compounds into an FGFR pocket
Virtual Ligand Screening
Measure of VLS efficiency: enrichment factors
Full screening collection Ntotal, containing Atotal active compoundsVLS is used to select Nsel, containing Asel actives
Typically Nsel << Ntotal , but also in real life Nsel<Asel (false positives) and Asel<Atotal (false negatives)
Enrichment factor: (Asel/Nsel)/(Atotal/Ntotal)
Choice of threshold cutoff:
Nsel
Asel
100%
Receptor structure based VLS: Docking+Scoring
Docking:
Find a putative docked conformation for each compound, native-like for the binding ligands.
* Efficient conformational search routine
* Docking potential:
- must be fast
- has to rank top the native-like conformation among many different docking conformations of the same ligand
Scoring:
* One or few conformations per compound are evaluated
* Potential (screening score)
- must rank the binding ligands above large number of chemically diverse inactive compounds
Binding energy?
Binding energy
Ligand/receptor binding energy in solution:
• Van der Waals - favorable, but partially compensated by solvent
• Electrostatics - mostly compensated by solvation, only becomes favorable for charged ligands (salt bridges).
• Hydrogen bonds - mostly compensated by solvation, but major determinant of specificity
• Hydrophobic - often provide most of the affinity
• Strain - unfavorable
• Entropy loss - unfavorable, limits affinity for highly flexible ligands.
Fucose binding protein/fucose (1abf) Antibody/progesterone (1dbb)
pKd=4uM, 8 HB, dASAhp=128A2 pKd=1nM, 2 HB, dASAhp=390A2
Binding energy calculations
Ligand/receptor binding energy in solution - highly complex concept:
Evan der Waals attraction+ Esteric repulsion+ Eel interact+ Eel desolvation + Eh-bond+ Eacc/donor desolvation + Ehydrophobic + Estrain + Eentropy loss
-Multiple opposing (favorable and unfavorable) contributions that largely compensate each other. Accumulation of errors: (-100±5) + (90±5) = -10±7.
-Solvent (water) effects: hydrophobicity, electrostatic desolvation, solute-solvent hydrogen bonds. Explicit water too computationally expensive and not always more accurate than implicit methods (continuum dielectric, surface tension).
-Some contributions extremely sensitive to the accuracy of geometry
Binding energy predictions remain, in general, only qualitatively accurate. Special model fitting for a specific receptor and/or chemotype of ligands is necessary to achieve quantitative agreement with experiment.
Practically, Edocking Escoring EbindingEnergy are different approximations
EbindingEnergy estimates binding energy typically slow, often doesn’t work unless tuned for a specific system
Edocking Escoring are not accurate estimates of binding energy, but:
Edocking discriminates correct bound pose very fast
Escoring discriminates active ligands reasonably fast
Traditional approach: fitting of experimental binding energy values,
no non-binders in the training set.
Goal of VLS : discrimination between binders and non-binders.
ICM Score: Optimize Escoring performance for active ligand discrimination
Energy function optimization
ICM Scoring function
Components are physical terms:
1. Internal force-field energy of the ligand
2. Conformational entropy loss of the ligand
3. Receptor-ligand hydrogen-bond interaction
4. Solvation electrostatic energy change
5. Hydrogen-bond donor/acceptor desolvation
6. Hydrophobic energy
Due to imperfect term evaluation (errors in geometry, charges etc.), to obtain best performance the components have to be adjusted/weighted.
•Evaluation of discrimination potential performance on a benchmark (‘score the score’):
• Five weighting
factors optimized
Training VLS scoring function
Benchmark set generation:
* Diverse set of ligands and receptors * Structures generated by the docking procedure (not X-ray)* Artificial non-binding complexes are included
* Structures of 25 receptors and 75 ligands extracted from high-resolution ( <2Å ) PDB structures of complexes.* 10000 random ligands from ACD added
* Exhaustive cross-docking: all ligands to all receptors.
The score components pre-calculated for all 25*10075= 251625 putative complexes
•Multiple runs of "Amoeba” simplex minimization to ensure convergence
Result: recognition significantly improved
Discrimination of active ligands
Virtual Database Screening Efficiency
Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J Med Chem. 2003 Jul 3;46(14):3045-59.
•19 structures for 10 nuclear hormone receptors
•One structure used (glucocorticoid receptor) was a homology model
•5000 random molecules from CDL screening collection
•A library of 78 known NR ligands, 3 to 8 per receptor
receptor Enrichment for top1%
AR[+](1E3G) 33 33
AR[+](1I37) 67 50
ERa[+](1L2I) 71 0
ERa[+](3ERD) 71 14
ERb[+](1QKM) 57 14
ERa[-](3ERT) 87 87
GR[+](model) 100 100
PXR[+](1ILH) 0 14
PR[+](1A28) 83 20
PR[+](1E3K) 50 60
PPARg[+](1FM6) 80 10
PPARg[+](1FM9) 40 30
PPARa[+](1K7L) 70 20
PPARd[+](1GWX) 60 20
RXRa[+](1FBY) 100 71
RXRa[+](1FM6) 100 29
RARg[+](1FCZ) 88 89
TRb[+](1BSX) 33 0
VDR[+](1DB1) 100 71
average 68 39
Virtual Database Screening Efficiency
Chen, H, Lyne, PD, Giordanetto, F, Lovell, T and Li, J; On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors. J Med Chem. J. Chem. Inf. Model. (2005)12 protein targets of therapeutic interest, 17 to 622 active ligands per target, 20000 random compounds
Enrichment factors at 1% of database subsetting
•2D to 3D conversion, type and charge assignment: - using MMFF force field
•Pre-selection of ligands in the database for drug-likeness according to Lipinsky-like criteria:
- size/weight- number of h-bond donors and acceptors- number of flexible torsions (Weber)
•Protonation/charge states for ligands: charge carboxyls, amino groups, possibly generate tautomers. Important, especially for correct scoring. New: pKa prediction will allow automatic protonation of non-trivial chargeable groups.
Ligand pre-processing
Top scoring ~1% - still 1000-5000 compounds. Further tightening of score cutoff typically does not improve hit ratesOptimal way to select 100-500 for experimental validation?
Rational for further filtering: -improve hit rate-diversify hits-ensure desired effect (e.g. inhibition)-achieve specificity with respect to homologous receptors
Consesus scoring: while lowering of primary score cutoff beyond 1% typically does not improve enrichment, unsing a secondary scoring function may further enhance enrichment. Selection according to additional criteria, such as:
-formation of specific h-bonds-contact with certain parts of receptor
Post-processing VLS hit lists
Chemical clustering for improved diversity and hit rates: -top scoring list often dominated by one/few compound families -to diversify final selection, cluster compounds by chemical similarity, select best scoring compounds from each clusterOnce activity is confirmed for a chemotype, other compounds from the same cluster can be tested.
Post-processing VLS hit lists: chem. clustering
Starting point - initial lead compoundIdentify scaffold and variable substituents (R1,R2 etc)Create substituent lists for each Ri positionAssemble a Markush virtual combinatorial library
VLS fully enumerated Markush combinatorial libraryAlternative - Two-step procedure:
1. VLS each Ri with constant small (H?) other positions, select best subset for each Ri.
2. Assemble Markush; VLS full enumeration of this smaller combinatorial library.
For large Ri lists (>~1000 compounds), dramatically larger virtual chemical space can be explored by the two-step procedure.
Lead optimization: screening focused combinatorial library
R1=H, CH3, Ph,…R2=H, CH3, Ph,…R3=H, CH3, Ph,…
Protein - Protein Docking
Protein-Protein Docking in ICM
First demonstrationGlobal Stochastic Free-energy optimization with pseudo-Brownian moves and Biased Probability Monte Carlo (JCC, 1994).
Explicit All Atom docking and flexible side-chain refinementLysozyme-Antibody (Nature SB, 1994)Beta-lactamase/inhibitor docking challenge (1995,96) Grid Docking and refinement
24 known protein-protein complexes (Protein Sci. 2002)
Global Grid Docking and refinement
CAPRI docking competition (on-going, 2003 Proteins)
A faster model: Atoms to Grids
Atoms-to-Grids docking
• One molecule is static (receptor) and is represented by grid potentials• Pros: energy calculation time does not depend on the receptor size; induced fit can be partially approximated by soft grids• Cons: non-symmetrical, some energy terms have to be adapted/simplified for grid representation.
Atoms-to-Atoms docking:•Pros: symmetrical, explicit flexibility can be introduced for both molecules•Cons: extremely time-consuming, scales poorly with the size
Multiple start MC global optimization
Stochastic search good for local sampling, but diffusion gets slow on larger scale (d~t)•Pre-generate starting points spread evenly around receptor and ligand (Fig a)•Match each starting point on
receptor (Nr) with each starting
point on ligand (Nl) (Fig b)
•Six rotations around the
match axis, for a total of 6 Nr Nl
starting configurations
Optimized scoring of docked solutions
•Scoring function including the grid terms and three ASA-based solvation components - polar, aliphatic and aromatic •Term contributions weighted:
E=Evw+Eel+Ehb+Epol+Ear+E
al
•For each of the 24 complexes in the benchmark, 6000-12000 docked conformations •Factors - optimized for best ranking of near-native solutions
ICM docking in CAPRI
Best result in the worldwide Critical Assessment of PRedicted Interactions (CAPRI).
Proteins July 2003
Top prediction used Molsoft’s ICM Protein-Protein Docking procedure
Best Results for 3 targets
A: Target 3, hemagglutinin / FabB: Target 6, -amylase / VHH C: Target 7, TCR-/ SpeA
Improvement of best rigid body docking solution for Target 6 (in gray) after refinement (in red)
X-ray structurePredicted ligand
CAPRI Round 2 and 3 results
• Good models for 8 out of 9 targets
• One failure: T9 large hinge-bending movements, Successfully used new scoring function for T14, T18 & T19• 64-71% of native contacts• 0.4-1A interface RMSD• For T14, Rmsd 0.6A, Rank 1 by energy• T19: antibody - prion. Used no CDR bias + NMR model for prion.