practical applications of matched molecular pairs at vernalis · •chembl •knime...
TRANSCRIPT
Practical Applications of Matched Molecular Pairs at Vernalis
Steve RoughleyRichard Sherhod
• What are they?• How do we find them?• How to deploy for users?
2 16 March 2017
About Vernalis
• Expertise• Fragments and structure‐based drug discovery(Protein Science, Structural Biology, Chemistry)
• Therapeutic areas• Oncology, CNS, infectious diseases
• Location• Based in Granta Park, outside Cambridge, UK
Trusted community contributor since 2013 (2 KNIME‐trained developers)
3 16 March 2017
Matched Molecular Pairs (MMPs)
“MMP can be defined as a pair of molecules that differ in only a minor single point change”
(Wikipedia)
•Multiple open‐source implementations in various forms•At least 2 in KNIME
• Vernalis• Erlwood
• Recently reviewed• Christian Tyrchan and Emma Evertsson, Comput. & Struct. Biotech. J., 2017, 15, 86‐90
Definition
CHEMBL2263252 CHEMBL60592
4 16 March 2017
Anatomy of a Matched Molecular Pair
• Hussain‐Rea Algorithm3
• Identify bonds that can be broken• Eg acyclic bonds
• Break molecule along each matching bond in turn• Match identical ‘Keys’
• ‘Values’ form pair transforms
*
*
**
identical keysValues form Pair Transform
cutMolecule A
Molecule B
3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348
5 16 March 2017
Anatomy of a Matched Molecular Pair
• Hussain‐Rea Algorithm3
• Identify bonds that can be broken• Eg acyclic bonds
• Break molecule along each matching bond in turn• Match identical ‘Keys’
• ‘Values’ form pair transforms
identical keysValues form Pair Transform
cut
3. Jameed Hussain, Ceara Rea, J. Chem. Inf. Model., 2010, 50, 339–348
CHEMBL2263252
CHEMBL60592
6 16 March 2017
Multi‐cut pairs
• Pairs also can be formed by cutting 2 or more bonds simultaneously• Allows scaffold replacement transforms• Need to track which breaking bond is which
Identical KeysValues form Pair Transform
Molecule A
Molecule A
2*1*
1* 2*
1* 2*2*
1*
cut
7 16 March 2017
Multi‐cut pairs
• Pairs also can be formed by cutting 2 or more bonds simultaneously• Allows scaffold replacement transforms• Need to track which breaking bond is which
Values form Pair Transform
cutCHEMBL373838
CHEMBL1368873
Identical Keys
8 16 March 2017
Transforms application
Original molecules interconvert when transform is applied
A>>B
Molecule A Molecule B
CHEMBL2263252 CHEMBL60592
A>>B
**‘A’ ‘B’
A>>B
CHEMBL1350874Not found in ChEMBL
Other molecules generate new ideas
Transform takes no account of relevance or context
9 16 March 2017
DESTRUCTION TESTING
Can we fragment all of ChEMBL?A ‘reasonable’ ‘representative’ test set
10 16 March 2017
Pre‐processing – “Speedy SMILES”
• Fast String‐based SMILES string manipulation•No chemical toolkit conversion
e.g. c1cc[nH]c1C(=O)OCCN(C)C
• Streamable• Example application – pre‐processing ChEMBL
• De‐salt• Remove large (HAC>40) or small (HAC<8) molecules• and those with a net charge or large number of charges
Vernalis Community Nodes
11 16 March 2017
SpeedySMILES pre‐processingChEMBL
1,581,653 molecules inProcessed 76 seconds1,486,077 molecules out
Failure Category Count
HAC < 8 or > 50 63,920
Non‐neutral 30,706
Total Charges > 4 949
Broken Bonds 1
12 16 March 2017
ChEMBL fragmentation
• 1,420,462molecules fragmented• 1‐10 cuts• Non‐functional group single bonds• Maximum 10,000 fragmentations / molecule
• 134,020,007 fragments• 139,679 failed rows:
• 10h30min (Intel® Core™ i7‐4770 @ 3.4GHz; W10)• 10 threads; 500 rows buffer• ‐Xmx16329m
The numbers…
Failure category Count
Complexity limit 139,550
Too few matching bonds 79
Valence error 25
Kekulisation error 25
This version will be released to the community ‘imminently’
13 16 March 2017
Example failure rows
CHEMBL2297882Molecule failed complexity limit15965 possible fragmentations
CHEMBL1698868No matching bounds found, or too few to cut
CHEMBL178180Error parsing … Explicit valence for atom # 8 Te, 4, is greater than permitted
CHEMBL3188982Error parsing… OC(=O)C(=O)Nc1cccc(c1)c2nnnn2 …Unkekulized atoms
CHEMBL2006679Error parsing … Unkekulized atoms 4
SP
HN NH
P
NHHN
HN P
N
PNN
N
NN
O
New Vernalis Matching Bonds Renderer Node
+H
14 16 March 2017
Matched Molecular Pairs (MMPs)
•MMP Concept has been extended to improve utility• Data analysis
• What effect does a transform have on data, e.g. Metabolism/Stability, hERG binding, target binding?
• Matched Molecular Series• When my series is seen in activity order, what other new members are commonly ‘better’?
• Fingerprint similarity• How closely related is the surrounding chemical matter to my input molecule?
• 3D Matched Pairs• Molecular shape/pharmacophore presentation
• In all cases, provides additional ‘context’ to the pairs
• Can be used for substituent analysis/replacement or scaffold replacement
MMAnalyser: Applied MMP/S Analysis
Richard [email protected]
16 16 March 2017
MMAnalyser
• KNIME Web Portal application• Composed of multiple interactive KNIME workflows
•Allows chemists to do matched‐molecular pair/series analysis
• Guides users through the analysis process
17 16 March 2017
MMAnalyser: Application
• Two interactive workflows for chemists• MMPair Analyser – matched‐molecular pair analysis• MMSeries Analyser – matched molecular series analysis
• Interactive admin workflows for database maintenance• Rebuilding MMPair and MMSeries databases from pre‐defined sources
Kenny, P.W. & Sadowski, J., 2005. Structure Modification in Chemical Databases. In Wiley‐VCH Verlag GmbH & Co. KGaA, pp. 271–285. Available at: http://doi.wiley.com/10.1002/3527603743.ch11 [Accessed March 6, 2017].
Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017].
18 16 March 2017
MMPair Analyser: Application
Pre‐generated transformations with observation data
Input structure
MMP Analysis• All transforms applied to the input structure
• Results filtered and sorted by:• Observation count• Enrichment of positive observations
19 16 March 2017
MMPair Analyser: Database
ChEMBL data• Molecule dictionary• Compound structures• Compound properties (QED)• Compound records• Activities• Assays• Documents
Observations
Filtered structures
Generate MMPs1. Fragment structures
• Fragments (values) and scaffolds (keys)2. Add hydrogens to fragments and scaffolds3. Generate transformations from fragments
• Record ID of left and right structures
Gather evidence1. Filter transformations by observation count2. Get observed changes in property for each
transformation3. Calculate enrichment of positive observations4. Perform one‐tailed binomial test
• Keep transforms with p >= 0.05
20 16 March 2017
MMPair Analyser: Demo
21 16 March 2017
MMSeries Analyser
• Extension of MMPs to three or more R‐groups (values)• Originally proposed by Waver & Bajorath (2011)• Several implementations, e.g. MATSY (O’Boyle et al. 2014)
• R‐groups ordered by the properties of their parent
Ms
MeOR
pIC50 R pIC50
7.00 H 8.30
7.68 F 8.00
8.51 Cl 7.77
8.77 Br 8.89Br > Cl > F > H Br > H > F > Cl
O’Boyle, N.M. et al., 2014. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. Journal of Medicinal Chemistry, 57(6), pp.2704–2713. Available at: http://pubs.acs.org/doi/abs/10.1021/jm500022q [Accessed March 3, 2017].
Wawer, M. & Bajorath, J., 2011. Local Structural Changes, Global Data Views: Graphical Substructure−Ac vity Rela onship Trailing. Journal of Medicinal Chemistry, 54(8), pp.2944–2951. Available at: http://pubs.acs.org/doi/abs/10.1021/jm200026b [Accessed March 6, 2017].
Matched‐molecular Series Analysis
22 16 March 2017
MMSeries Analyser: Application
Input structuresWith unique IDs and
numeric data
MMS Analysis• Query structures are fragmented into scaffolds and R‐groups• Sets of R‐groups, their scaffold and data are arranged into series
• Query series are compared to pre‐generated series• Common R‐groups are recorded
• Query and database series ordered by data• Spearman's rank correlation between matching series calculated• Matching series sorted by rank correlationPre‐generated sets of scaffolds
and R‐groups with data
23 16 March 2017
MMSeries Analyser: Application
Scaffold Rank Correlation
86 69 63 60 59 52 46 34
0.975 100 100 95 94 92
0.9 72 61 16 33 7
0.5 95.6 85.2 95.7 90 78.9
24 16 March 2017
MMSeries Analyser: Database
ChEMBL data• Molecule dictionary• Compound structures• Compound properties (MW)• Compound records• Activities• Assays• Documents
Observations
Filtered structures
Generate MMPs1. Fragment structures into R‐groups and scaffolds
• 4 methods including Hussein/Rea rules2. Group R‐groups into series by common parent
scaffolds3. Keep series of 3 or more R‐groups4. Record IDs of parent structures
Gather evidenceAssociate parent structures with observation data
25 16 March 2017
MMSeries Analyser: Demo
26 16 March 2017
MMAnalyser: Possible Improvements
• Present observation data to the user• Direct the user to relevant source material
•More datasets• Better (more robust) processing
• Incorporate transformation site similarity (MMPs)• Associate transformations with fingerprints from their parent scaffold(s)
• Incorporate scaffold similarity (MMSs)• Filter/order series by similarity to query scaffold
27 16 March 2017
Acknowledgments
•Vernalis colleagues
•Greg Landrum (RDKit)
• ChEMBL
• KNIME
Matched‐molecular series implementation:
Hunt, P. et al., 2017. Practical applications of matched series analysis: SAR transfer, binding mode suggestion and data point validation. Future Medicinal Chemistry, 9(2), pp.153–168. Available at: http://www.future‐science.com/doi/10.4155/fmc‐2016‐0203 [Accessed March 3, 2017].
Thank you!
29 16 March 2017
BACKUP SLIDES
31 16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries• Existing absolute/unknown must be preserved
Known Stereocentre?
Yes No
Unk
nown/Ra
cemic
Stereo
centre? Yes
No
32 16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries• Existing absolute/unknown must be preserved
Known Stereocentre?
Yes No
Unk
nown/Ra
cemic
Stereo
centre? Yes
No
“KNOWN KNOWNS”We know we know about stereochemistry
33 16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries• Existing absolute/unknown must be preserved
Known Stereocentre?
Yes No
Unk
nown/Ra
cemic
Stereo
centre? Yes
No
“KNOWN UNKNOWN”We know unknown or racemic
34 16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries• Existing absolute/unknown must be preserved
Known Stereocentre?
Yes No
Unk
nown/Ra
cemic
Stereo
centre? Yes
No ?“UNKNOWN UNKNOWN”We have no idea…
35 16 March 2017
Stereochemistry
• Fragmentation can create new chiral centres / double bond geometries• Existing absolute/unknown must be preserved
Known Stereocentre?
Yes No
Unk
nown/Ra
cemic
Stereo
centre? Yes
No ? or
36 16 March 2017
Sneak Preview – Upcoming revised release
•Memory leak fixed• Survives ‘destruction testing’•New fragmentation type added
• More flexibility for custom types•New parallelised pair generation nodes• Transform filtering options• Reference table version
• Only generates pairs between rows from the two tables
• Improved Rendering/Filtering nodes
37 16 March 2017
Attachment point fingerprints
• RDKit Morgan ECFP‐like fingerprint• Rooted at the attachment point atom for each ‘key’ component• Default radius 4, size 2048 bit• Calculated during fragmentation in Vernalis Nodes
1: 10000000000100000000010100000100
1A
O
1: 00001000000010000000100000000101
1: 000000000000100000000001000001002: 10000000000001000000001100000100
Example 32‐bit AP fingerprints