automatic compound design by matched molecular pairs
DESCRIPTION
Matched Molecular Pairs (MMPs) are pairs of molecules that differ by a single structural transformation, which can be due to a chemical reaction but more often involves swapping one chemical group for another in a way that is not feasible in a single synthetic step. It is implicitly understood that in MMPs the static (common) part of the pair is significantly larger than the variable parts. MMPs are popular among medicinal chemists because the concept is closely related to how chemists think about a series of molecules: typically a series is defined as a static core with variable substitutions that each contribute to the overall properties of the molecule like potency, solubility, selectivity, etc. MMPs have been used to mine large sets of biological screening results to answer questions like “how much potency is gained by added a chloro atom in the para postion”. This analysis can be done at multiple levels, for instance all occurrences of the transformation, occurrences against a particular target or occurrences against a target family. For each transformation the average change in potency is recorded which can be used to make quantitative predictions. Suppose a pair of molecules were only the potency of one is known, but the other molecule is related to the first by a single transformation. The predicted potency of the second molecule is the potency of the first plus the average potency change associated with the transformation. Herein lies the power of MMPs compared to classic QSAR regression methods: not only can the potency of novel molecules be predicted but the transformations can be applied as idea generator to come up with reasonable ideas of what the novel molecule(s) should be. In the presentation it is shown how the above can be done using the new MMP algorithm in Pipeline Pilot 8.5 using publicly available datasets from ChEMBL.(Accelrys European Science Symposium, Brussels, June 2012)TRANSCRIPT
Automatic Compound Design by Matched Molecular Pairs Willem van HoornSenior Solutions ConsultantProfessional Services
• Matched Molecular Pairs (MMPs)• Implementation in PP• Reaction Fingerprints• Using MMPs as automatic learning machine
Contents
Ceci n’est pas une MMP
Sildenafil Vardenafil
Similarity = 0.55 / 0.98 (ECFP_4 / MDL public keys)
MMP: - Single change- Typically: 1 or 2 bond cleavage; replace R-group or template
Recent AZ review
http://pubs.acs.org/doi/abs/10.1021/jm200452d
MMP as predictor of activity
Classic QSAR with full molecule descriptors QSAR using MMP
DpIC50(m-Br to m-Cl-p-F) = -0.19
Classic QSAR / regression• More generic, can predict >1 change• Interpretability varies
MMPs• Can only predict “one step away from known”• Very interpretable• Can answer “what to make next” challenge
What have the MMPs done for us?
“Learning Machine” using MMPs
Example of MMP learning machine
1 2 transformation applied to compound 3 should yield more attractive compound 4
4
MMP in Pipeline Pilot
Components
Protocols
PP 8.5 CU1
PP MMP algorithm based on GSK publication
Test set: EGFR from ChEMBL
Ed Griffen et alJ Med Chem. 2011, 54, 7739-50
- ChEMBL version 11
- 4609 IC50 values
- 3581 compounds
Generate MMPs and transformations
>90k MMPs in
<1 minute
Slow!
MMP output
MMP transformation
Full transformation
DpIC50 distribution of transformations
90,343 MMPs yield 180,684 transformations (AB / BA)
10fold 100fold 1000fold etc
bioisosters
activity cliffsactivity cliffs
MMP transformations vs. full reactions
Not specific enough, seen >>1 in data set but large stddev(DpIC50)
Too specific, seen once in dataset, DpIC50 statistics n=1
Would like to have something that describes “reaction centre + nearby environment”
Would like increase confidence by looking at similar MMP transformations (with similar DpIC50)
PP reaction fingerprints: RCFP
• RCFP are similar to ECFP, atoms described by: Charge Hybridization Whether the atom is Reactant or Product Whether or not the atom is in the “Reaction Site”
• Need mapped reactions
PP 8.5
Reaction mapping is necessary
Only features describing reaction site
Mapped
All features, no information whether atom is in product or reactant
Unmapped
Reaction direction matters
Reaction fingerprints are not identical A→ B ≠ B → A
MMP transformation as rules
“Rule” = MMP transformation Effect = DpIC50
Context of MMP
transformation
Tanimoto seach of MMP transformations
DpIC50 = 1.9
A single observation…
DpIC50 = 1.8
DpIC50 = 1.5
DpIC50 = 1.3
… becomes more believable when looking at similars
Express significance as Bayesian probability
Bayesian model “Good” molecules: DpIC50 ≥ 1
Rank test set by likelihood transformation will yield
≥10fold increase in potency
Bayes can predict MMP 10 fold increase
• RCFP_6 > RCFP_4
• RCFP_4 >> RCFP_2
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Random Model
Perfect Model
dActivity_class_increase_RCFP_2 Model
dActivity_class_increase_RCFP_4 Model
dActivity_class_increase_RCFP_6 Model
% of Samples
% A
ctive
s Ca
ptur
ed
Enrichment plots of test set
Confidence vs. DpIC50
Bayesian score = confidence
DpIC50
Semi-quantitative Bayesian predictions
• Multi-category Bayesian• Class = DpIC50 bin• RCFP_6
Compare:• Normalised Probability (default)• #Enrichment• #EstPGood• Prediction
#EstPGood score smallest prediction error
22.5%22.5%
30.0% 19%
MMP vs. Full molecule transformations
vs.
Modelling with mapped reactions works better (it should)
22.5% 30.0%
• 80% training set– Generate MMP transformations– Learn classic regression model (PLS)– Learn Bayesian model from reaction fingerprints
MMP Idea Generator: Training
• ~5.6 predictions per test set molecule• MMP pIC50 := mean (pIC50reactant + DpIC50transformation)
• RCFP pIC50 := mean (pIC50reactant + DpIC50predicted by Bayes)
MMP Idea Generator: Test
Runtime ~ 30 min
~34k transformations >6.5M design ideas
Test set
QSAR by MMP
QSAR by Bayes / RCFP_6
SAR by MMP vs. SAR by PLSECFP_6 / phys property descriptors
MMP PLS
• MMP predictions nearly as good as PLS predictions
• Not 100% like with like comparison: fewer predictions for MMP
Consensus MMP & PLS predictions
Consensus: 26 / 62
Found by PLS: 10 / 56
Found by MMP: 11 / 56
Red: top 5% by pIC50 (59)
Solid: top 10% (118) by MMP or PLS. Total = 174
12 / 1006
• For one dataset it has been shown that– MMP transformations can form basis of an
automatic “Learning Machine”– Can select “significant rules”– Consensus MMP/regresssion activity prediction
works better than individual predictions
Conclusions
Spares
MMP vs. Bayes/RCFP predictions