data drivenapproach to medicinalchemistry
TRANSCRIPT
Data-Driven Approaches to Medicinal Chemistry How Large-Scale Normalized Data Empowers Drug Discovery
RICT 2015
Drug Discovery and Selection
Barberan Olivier
Senior Product Manager Reaxys Medicinal Chemistry
July 1
The Lead optimization Chalenge :
Optimization of early subtances to potential drug
2
Potency &
Selectivity
DMPK Properties
Physical properties
Safety
pharmacology
Opportunity : Knowledge-driven drug design using structure activity
relationship knowledge base
3
• General descriptor-property
relationships
• Sub-structural alerts
• QSAR
• Matched Molecular Pair analyses
• Predictive pharmacology
Etc…
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Data Knownledge Predictions
• Data normalization
• Taxonomies
• Quality control
Etc…
“Those who cannot remember the past are condemned to repeat it.” George Santayana: Life of Reason, Reason in Common Sense, Scribner's, 1905, page 284
4
• Integration of high value Data Sources supporting Lead Finding and lead
Optimization
4
Elsevier Solution for Lead Optimization : Reaxys medicinal Chemistry
Transform
Load
Extract
• Substances : 1M
• Biological results : 3.5 M
• Substances : 3.5 M
• Biological results : 8 M
• Substances : 4.2 M
• Biological results : 22 M
• Substances : 6 M
• Biological results : 29 M
Data Normalization (parameters, Units etc…)
Structure Normalization
Taxonomies (Targets, Sepecies, Cell lines,
Tissues/organs, bioassays)
Reaxys Medicinal Chemistry Coverage
Substances
Chemical structure ,Name, code, synonym of compound, calculated physchem
properties (log P, HBA, HBD, PSA, RotB), Lipinsky rules
Druggable target
Explore Target affinity patterns of chemical compounds
In vitro and Cell Based assays
In vitro assays (binding, second messenger etc..) and Cell based assays for
example : Aggregation, Angiogenesis, Apoptosis, Cell differentiation, Cellular Cycle
Animal models disease
Zucker rats for obesity model, ovariectomized rat in osteoporosis, treatment of
glaucoma, Xenografted animals with tumors to test antineplastic drugs
Pharmacokinetic and ADME Properties
Metabolic stability, Intrinsic clearance, Half life of elimination, Bioavailability, In
vivo Clearance
Toxicity
Cytotoxicity, cardiotoxicity, chronic toxicity
Reaxys Medicinal Chemistry : Journals coverage
6
6
• 345 000 articles are included in Reaxys Medicinal Chemistry
• corresponding to >5000 Journals from 1980 to present.
• Some articles stored in Reaxys Medicinal Chemistry are older than 1980.
• Elsevier and others publishers are covered.
• Medicinal chemistry journals are the cornerstone of Reaxys Medicinal chemistry but not
only pharmacology, biology and Chemistry journals are also included.
8
Chemical diversity per target : JAK3 and NPY5
JAK3 Substances Diversity (B&M Scaffolds)
RMC (patent only) 43365 16045 RMC (Articles only) 3283 2199 RMC 45715 17828 chEMBL 2443 1490
NPY5 Substances Diversity (B&M Scaffolds)
RMC (patent only) 12698 5700 RMC (Articles only) 2537 1014 RMC 14544 5963 chEMBL 1483 652
95%
+914%
+1196%
90%
- Patents increase the chemical diversity by around 1000% versus articles only
- Patents represent around 90% of the overall chemical diversity
Putting Data to Work | 10
Ligand Based virtual Screening – Using Reaxys Medicinal CHemistry
Objective
• Describe an In Silico Screening approach using Reaxys Medicinal Chemistry
Case Study on T-Type calcium channels
Putting Data to Work | 11
Ligand-Based In Silico Screening
Simple Target name search returns all results
Filter on active compound pX>7
ANSWERS 130 compounds and 1200 experimental data
Putting Data to Work | 12
Ligand-Based In Silico Screening
130 Query structures
Flat file
Representation & Chemical Space Molecular descriptors & Fingerprints
Virtual Screening Pharmacophoric Similarity
N
O
N
NN
O
N
N
N
314 Hits
"Drug-like" Filtering
1. Molecular diversity and chemical originality 2. Compounds availability
39 compounds ordered for testing
Putting Data to Work |
FEATURES
The Reaxys Medicinal Chemistry Flatfile
• Substance information (~ 26 million substances)
• The substances are delivered as a series of SD files containing all structures from Reaxys in Molfile format together with their identification data and a list of available facts and reactions for each compound
• Unstructured substances are included as empty Molfiles
• Bioactivity data (> 29 million bioactivity data points)
• The bioactivity data are delivered as a series of linked data files in XML format, using the Resource Description Framework (RDF), compliant with the OpenPHACTS guidelines
• The XML files contain information on bioassays, citations, bioactivity data points, substance facts and bioactivity targets
• This includes pharmacokinetic and ADME property data, toxicity data
Substance information
Putting Data to Work | 14
Biological activity
Electrophysiology experiments: Screening @10 µM on Cav3.2 T-Type channels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 32 33 34 35 36 37 38 39 0
25
50
75
100
Pea
k cu
rren
t in
hib
itio
n (
%)
28
9 compounds with a % inhibition > 75% 15 compounds with a % inhibition >50%
Compound # (@ 10µM)
Prediction of ADMET properties
Influencing medicinal chemistry design
• logD7.4
• Protein Binding
• Solubility
• Metabolic
Stability
• hERG
• Etc…
Step1
•Rat PPB
• Hu heps
• CYP inhib
• Caco2
• NaV1.5
• Etc…
Step2
• logD7.4
• Solubility
•Protein Binding
• hERG
• Rat PPB
• Metabolic
Stability
• CYP inhib
• Caco2
• etc.
Step 0
AstraZeneca’s global HERG QSAR model70 has contributed to the
reduction in the synthesis of ‘red flag’ compounds (compounds that are
measured to have an HERG potency of <1μM)
from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Case Study: Solubility Modeling
18
Overview
• Reaxys has an impressive amount of data about compounds that was reported in literature
• Using the Reaxys API one can access the data and create predictive models
• This example uses aqueous solubility as reported in literature, this is not the intrinsic solubility for
neutral molecules, this is whatever the authors reported which is generally at neutral pH.
• Every value has a reference that you could read to verify where the value came from
Model Making Process
• Extracted 6893 reported aqueous solubilities in g/L reported in Reaxys
Converted reported values to molarity using molecular weight, computed logS
Averaged values when there were multiple reports for each compound
• Created a KNIME workflow to do the analysis
• Used CDK molecular descriptors and R to make simple solubility model
Simple multiple regression model “lm” in R
• Also created a model that reports the solubility of the most similar compound
This has been reported to be surprisingly effective!
This works well for Reaxys because of the large number of compounds with solubilities
Relevant data where and when they are needed
ELSS content integrated into the existing environment of tools and processes
19
Script, PipelinePilot
or KNIME node
Set of compound structures
List of target names
Patent numbers
Bioactivity values
Compound structures
Chemical properties
Input Output Search, retrieve & process element
– Visualisation,
Spotfire input
– Reporting,
dashboard
production
– Excel tables
– QSAR/QSPR
modeling
– Hit-to-lead
optimization
– Reaction
modeling
– Text mining
Further processing
Knime workflow for Solubility Modeling
20
KNIME Workflow
• Can create this in PipelinePilot
• Can auto-update with new data in Reaxys since it pulls directly from the server
Putting Data to Work |
Solubility modeling : Predicted-vs-Actual
• Predicted vs actual Log (S[M])
• Could filter for a “better” subset of compounds
• More scatter than recent work; in this framework can try various descriptors to find ones that work best
Residual standard error: 1.253 on 3437 degrees of freedom
Multiple R-squared: 0.5728, Adjusted R-squared: 0.5636
F-statistic: 62.29 on 74 and 3437 DF, p-value: < 2.2e-16
• Recent work of Yalkowsky, 1642 selected compounds.
• Used group-contribution methods
• Std error 0.8 log units
• Int J Pharm. 2008 Aug 6;360(1-2):122-47. doi: 10.1016/j.ijpharm.2008.04.028
Putting Data to Work |
Conclusions
• One can use the valuable properties reported in Reaxys for creating models
• Much more biological information available in Reaxys Medicinal Chemistry!
• All sources are referenced
• The API allows easy access to the data outside of the web user interface for models
• One can make several kinds of models and show all results, or make a consensus determination
Putting Data to Work |
Case Study : Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel? • 5-HT2A receptor antagonism in contributing to the therapeutic effect of several clinically
effective and potential atypical antipsychotics as well as several antidepressants.
• The ability of selective 5-HT2A receptor antagonists to interfere with the heightened state of dopamine activity without altering basal tone, suggests that these drugs possess antipsychotic activity and may provide the basis for new therapies for psychosis and drug dependence.
search for 5-HT2a antagonist
search for compounds tested on Herg
Putting Data to Work | 26
26
Click on Heatmap overlay to retrieve 5-HT2 antagonist tested on Herg
Combine Hitsets
Putting Data to Work | 27
Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel?
The following Heatmap displays 99 5-HT2a antagonist tested also on Herg Channel
Putting Data to Work | 28
Which are the antagonist of 5-HT2a antagonist with low affinity on herg Channel?
Most active antagonist on 5-HT2A (~10nM) with low affinity on Herg
How to avoid erg inhibition
Putting Data to Work |
Introduction
QT interval prolongation can lead to serious arrhythmias which can evolve to fatal issue.
A large number of non-cardiac drugs-induced QT prolongation has been reported and continues to increase with the withdrawn of some blockbusters medicines.
hERG seems to be the main target of this adverse side effect.
In silico models could be rapid and powerful tools to screen out potential hERG blockers as early as possible during the discovery process.
Putting Data to Work |
Extraction & Methodology
hERG data set 640 mol
Recursive Partitioning (RP) MOE QuaSAR Classify
Molecules tested on hERG (Kv11.1)
2D-Molecular descriptors sets Predictive models
Subsets of molecules According to biological detailed protocols
Representation of hERG ligands within chemical space
of NCI database according to the two first PCA axis
NCI database
hERG data set
Cross-validation
External validation
Putting Data to Work |
Model 3 HIGH WEAK
1 µM 10 µM 50 / 9 mol 50 / 46 mol
Training Test
Descriptors Relevant P_VSA Relevant P_VSA
High 48/50 47/50 45/46 42/46
Weak 49/50 49/50 8/9 9/9
All 97% 96% 96% 93%
Correct classifications determined by 5-fold cross-validation (Training) and
by external validation (Test) for each descriptor set.
SlogP_VSA7
SlogP_VSA2 SMR_VSA6
PEOE_VSA+1
+ +
+ +
+
-
- -
-
-
SMR_VSA5 SMR_VSA6
SMR_VSA5
SMR_VSA4
PEOE_VSA+0
+
-
SlogP
PEOE_VSA_FHYD
SMR_VSA1
SMR_VSA6
SMR_VSA1
SlogP vsa_pol
-
-
- - + +
Relevant P_VSA
Putting Data to Work |
Conclusion
Reaxys Medicinal Chemistry permitted to retrieve high quality dataset with chemical diversity and homogeneous biological activities.
Pertinent predictive models of hERG activity have been designed using recursive partitioning analysis with 2D-molecular descriptors.
From Reaxys Medicinal Chemistry, fast virtual screening approach could be used as early tool in the drug discovery process to avoid cardiotoxic side effect related to hERG blockade.
AstraZeneca’s global HERG QSAR model70 has contributed to the reduction in the synthesis of ‘red flag’ compounds (compounds that are measured to have an HERG potency of <1μM) from 25.8% of all compounds tested in 2003 to only 6% in 2010.
Cumming, J.G., Davis, A.M. et al. Nat. Rev. Drug Disc. (2013) 12, 948–962
Even if models are performing well designers need interpretable models
Putting Data to Work | 35
How to search Metabolism of a certain Phenotype
Bioassay Category Parameter
Broad search (All parameters)
Precise search (by parameters)
Pyrrolidine versus Azetidine metabolic stability
Putting Data to Work | 36
How to Access to Metabolism details
enzyme Tissue/ Organ
Cell Fraction
Enzyme Substrate
Putting Data to Work |
Show Case 1 : Pyrrolidine metabolic stability
Pyrrolidines are known to be metabolically unstable. Are there pyrrolidines out there with an intrinsic clearance in microsomes <20 µL/min/mg of protein? How do the complete structures of the compounds look like
The overall search on intrinsic clearance (ml/min/g or µL/min/mg of protein) of Pyrrolidines in Reaxys Medicinal Chemistry provides1031 Substances and 1777 clearance results. (see below) extracted from 138 citations.
Putting Data to Work |
More stable Pyrrolidine compouds
• The top 10 pyrrolidine compounds having the lowest intrinsic clearance are displayed
Putting Data to Work |
Show Case 2 : Azetidine Metabolic stability
• Different cyclic amines might be tolerated. What is known about the metabolic stability of azetidines? Are they more stable than pyrrolidines? What modifications of the azetidine are known?
•
Putting Data to Work |
Among the stable azetidines (clint <20 µl/min/mg of prot) the followings scaffolds were found.
What modifications of the azetidine are known ?
Putting Data to Work |
Are Azetidines more stable than pyrrolidines?
Based on the graph below displaying azetidines and Pyrrolidines Clearance results founds in RMC, it’s appear that in general Azetidines are more stable than Pyrrolidines. • 40% of the total results of Azetine clearances are below 20 µl/min/mg of prot but only 30% for Pyrrolidines.
• This is even obvious when looking into the 20 to 100 µl/min/mg of prot of Clearance range where 42% of
azetine results fall into this category but only 20% of pyrrolydines.
Putting Data to Work |
Lead optimization
exploration of structural features of a lead series of Compounds
Putting Data to Work | 44
Exploration of structural features of a lead series of Compounds
• Within NK3 the 3,4-dichlorophenyl group appears to be important as structural
feature
• Are there more target classes in which the diCl-Phe play an important role?
• Does the 3,4-di Cl Phe cause a certain activity profile?
• Are other 3,4-diX Phe structures known and what is their pharmacological
profile?
• Are there other di-substitution patterns with a strong pharmacological
response? (Other than 3,4 is meant here).
Putting Data to Work | 45
Does the 3,4-di Cl Phe cause a certain activity profile?
Substructure search for 3,4 Dichloro Phenyl Fragment
30% of the substances containing 3,4 DiCl Phenyl fragment have a bioactivity below 0,1µM
Putting Data to Work | 46
Are there more target classes in which the 3,4 diCl-Phe plays an important role
Select bioactivities below 0,1µM
• Target Profile of 3,4 DiCl Phenyl
• Target are ranked based on count of bioactivities. Yellow bars indicate the count of bioactivities below 0,1µM
Putting Data to Work | 47
Are there more target classes in which the 3,4 diCl-Phe plays an important role
Off Targets/CNS adverse Effect: Addiction/psychostimulant
Off Targets/CNS adverse Effect: Attention/perception
Off Targets/CNS adverse Effect: Learning/Memory
3,4 Dichloro Phenyl group are also involved in Off-Targets Mainly CNS related
Target Profile of 3,4 DiCl Phenyl
With an affinity below 0,1µM and having at least 100 bioactivities
Substances tested on NK3 are not tested ON other targets except one substances on Histamine 1
Putting Data to Work | 48
Are other 3,4-diX Phe structures known and what is their pharmacological profile?
3,4-DiFluoro Phenyl 3,4-Dibromo Phenyl 3,4-Dibromo Phenyl
Target Profile of 3,4 DiX Phenyl
With an affinity below 0,1µM
Putting Data to Work | 49
Are there other di-substitution patterns with a strong pharmacological response? (Other than 3,4 is meant here)
2,4-DiChloro Phenyl 2,5-DiChloro Phenyl 2,3-DiChloro Phenyl
• Canabinoid receptor (1 and 2)
Melanocortin 4
5-HT 2C Amyloid precursor protein (App)
Dopamine receptor (2 and 3) p38a
Cytochrome P450 3A4 potential Drug drug interactions
Target Profile of X,Y DiCl Phenyl.With an affinity below 0,1µM
Putting Data to Work |
Reaxys Medicinal chemistry accelerates Drug Discovery by Knowledge based Design
• Mine large datasets to find hits (virtual screening) • Mine large datasets to accelerate understanding & derive useful medicinal chemistry knowledge • Apply this knowledge to propose and evaluate new, better molecules to fulfil the multi-objective design needs of Lead Optimisation • Apply this to develop clinical candidates faster • Apply this knowlegdge base to repurpose drug.