combining cheminformatics, diverse databases and logic ... · pdf filewhere are the best...
TRANSCRIPT
Combining Cheminformatics, Diverse Databases and
Logic Based Pathway Analysis For TB
Sean Ekins
Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.
NIAID Systems Biology Approaches for Tuberculosis Workshop
Partial Map of North America , 1765, T.
Kitchin
The cellular overview diagram for M.
tuberculosis H37Rv, from the TBCyc
database
(http://tbcyc.tbdb.org/index.shtml)
Maps as a starting point for further exploration
Pathways inform little on chemistry
Where are the best targets for drugs?
How can we overcome resistance?
What can we add to improve maps?
When I think of systems biology it is an integrated map of our knowledge on a topic
Limited by what we know
~ 25 public datasets
for TB
Including GSK,
Novartis data on TB
hits
>300,000 cpds
Patents, Papers
Annotated by CDD
Open to browse http://www.collaborativedrug.com/register
Molecules with activity
against
Public molecule datasets tell you little about biology
CDD
Literature data on
molecules
and their targets
Similarity search with a
mimic enables target
fishing
SRI
Pathway data (targets)
Species differences in
pathways
Where to intervene
Combine the knowledge
Select new targets
Take mimic strategy
What if you combine pathways and molecules - can this help in drug discovery?
Developed API
Enables connection to
other tools e.g.
PipelinePilot, Knime
CDD no longer an island
TB molecules and target information database connects molecule, gene,
pathway and literature – then use to identify targets for mimic strategy
Take substrate or metabolite and generate 3D conformers and build a pharmacophore
Use the pharmacophore to search vendor libraries in 3D
Buy and test compounds
dethiobiotin
Two Proposed Mimics of D-fructose 1,6 bisphosphate in less than 6 months
DFP000133SC MIC 40μg/ml
DFP000134SC MIC 20μg/ml
Computationally searched >80,000 molecules – narrowed to 842 hits -tested
23 compounds in vitro (3 picked as inactives), lead to 2 proposed mimics
Sarker et al., Pharm Res, 29(8):2115-27, 2012
a.
b.
Continuation in phase II on a bigger scale, validation of hits ongoing…
Pathway analysis
Binding site similarity to
Mtb proteins
Bayesian Models - ligand
similarity
Docking
Predicting the target/s for small molecules
Is there a simpler way? What about molecule similarity?
iPhone
TB Mobile: A free app for iPhone and Android
Each molecule can be copied to the clipboard then opened with other
apps, exported via Twitter or email or shared via Dropbox
The 1st app for TB.
- combines chemistry and
bioinformatics – and can be used
to predict potential targets for
new compounds
Ekins et al., Cheminformatics 5:13, 2013
Blue = 745 molecules with
known targets in Mtb
Molecules active against Mtb are only sampling part of target space and
metabolome – based on simple properties
Yellow = 177 GSK in vitro actives Yellow = 1429 in vitro active
and non cytotoxic hits from
SRI
Yellow = 338 molecules tested in mouse
3 PCs explain 83% of variance 3 PCs, which explain 88 % of variance
3PCs explain 86% of variance
In vitro data is more
localized with partial
coverage of target space
Historic in vivo data in
mouse covers target space
Yellow Mtb metabolome from TBcyc
3PCs explain 89% of variance
Phenotypic screening HTS Hit rates
SRI papers
Become more stringent in what we call an ACTIVE for
models
IC90 < 10 mg/ml (CB2) or <10mM (MLSMR) and a selectivity
index (SI) greater than ten.
SI was calculated as SI = CC50/IC90 where CC50 is the
concentration that resulted in 50% inhibition of Vero cells
(CC50).
Literature usually < 1%
Top scoring molecules
assayed for
Mtb growth inhibition
Mtb screening
molecule
database/s
High-throughput
phenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning classification Mtb Model
Molecule Database
(e.g. GSK malaria
actives)
virtually scored
using Bayesian Models
New bioactivity data
may enhance models
Identify in vitro hits and test models
Increased hit/lead discovery efficiency
NH
S
N
Dual-Event Bayesian Models for whole cell Mtb activity
Could use
any machine
learning
methods
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011. Ekins et al.,Chem Biol 20, 370–378, 2013
5 active compounds vs Mtb in a few months
7 tested, 5 active (70% hit rate)
Ekins et al.,Chem Biol
20, 370–378, 2013
1. Virtually screen
13,533-member GSK
antimalarial hit library
2. Model = SRI
TAACF-CB dose
response +
cytotoxicity model
3. Top 46
commercially
available compounds
visually inspected
4. 7 compounds
chosen for Mtb testing
based on
- drug-likeness
- chemotype diversity
GSK # Bayesian
Score Chemical Structure
Mtb H37Rv MIC
(mg/mL)
GSK Reported
% Inhibition HepG2 @ 10 mM cmpd
TCMDC-123868 5.73 >32 40
TCMDC-125802 5.63 0.0625
5
TCMDC-124192 5.27 2.0 4
TCMDC-124334 5.20 2.0 4
TCMDC-123856 5.09 1.0 83
TCMDC-123640 4.66 >32 10
TCMDC-124922 4.55 1.0 9
Bayesian Model Follow-up: do we have a lead?
• BAS00521003/ TCMDC-125802 reported to be a P.
falciparum lactate dehydrogenase inhibitor
• Only one report (that we were unaware of when picking the
compound) of antitubercular activity from 1969
- solid agar MIC = 1 mg/mL (“wild strain”)
- “no activity” in mouse model up to 400 mg/kg
- however, activity was solely judged by
extension of survival!
Bruhin, H. et al., J. Pharm. Pharmac. 1969, 21, 423-433.
SRI MLSMR 220K library contains:
107 hits with this substructure
- 3 nitrofuryl hydrazones
- 10 furyl hydrazones
- 19 nitrophenyl hydrazones
32 inactives with this substructure
Maddry et al., Tuberculosis 2009, 89, 354.
MIC of 0.0625 ug/mL
• 64X MIC affords 6 logs of
kill
• Resistance and/or drug
instability beyond 14 d
Vero cells : CC50 = 4.0
mg/mL
Selectivity Index SI =
CC50/MICMtb = 16 – 64
In mouse no toxicity but also
no efficacy in GKO model –
probably metabolized.
To be continued….
Ekins et al.,Chem Biol 20, 370–378, 2013
Filling out the triazine matrix using SARtable: A new kind of map
Green = good activity, Red = bad; colored dots are predictions
A summary of some of the numbers involved – filtering for hits.
>100,000 molecules screened through Bayesian models
~700 molecules were tested in vitro
150 actives were identified
>20 % hit rate Identified several novel potent hit series with good cytotoxicity & selectivity
Identified known human kinase inhibitors and FDA approved drugs as hits
Also taken this approach with another institute’s data and had 22.9% hit rate
Ekins et al., PLOSONE 2013 May 7;8(5):e63240; Ekins et al.,Chem Biol 20, 370–378, 2013
1924 new compounds used as external test set for all 3 dual-event models
Mtb Models (training set N) ROC ARRA dose response and
cytotoxicity (1924)
MLSMR dose response and
cytotoxicity (2273) 0.82
TAACF-CB2 dose response and
cytotoxicity (1783) 0.54
TAACF Kinase dose response and
cytotoxicity (1248) 0.74
Ideal ROC = 1
suggests value of using multiple models, multiple algorithms and validation
Continuous testing of in vitro models, fusing datasets and using
different machine learning methods
Ekins et al., Pharm Res, In press 2013; Ekins et al., Submitted 2013
Mtb Models (training set N) ROC ARRA dose response and cytotoxicity (1924)
SVM - Combined 0.72 (Binary data)
Random Forest - Combined 0.83 (probability data)
Random Forest - Combined 0.75 (Binary data)
Bayesian - Combined 0.83 (Bayesian score)
Bayesian - Combined 0.69 (Binary data)
Summary & Questions
Why can’t we predict compounds active in vivo?
Can we go straight to the mouse based on models, what is stopping us?
Can combining all the in vitro, in vivo, pathway structural data etc help us find better
compounds, faster ?
To date we have only sampled a part of the known target space, with in vitro data (?)
We have only sampled a fraction of the metabolome space (?)
We have only sampled a miniscule fraction of the chemistry space (?)
“We shall not cease from exploration and the end of all our exploring
will be to arrive where we started... and know the place for the first
time.”
T.S. Elliot
Acknowledgments
STTR: Malabika Sarker, Carolyn Talcott, Peter Madrid, Sidharth Chopra,
Barry A. Bunin, Gyanu Lamichhane, Joel Freundlich, Alex Clark
SBIR: Joel Freundlich, Robert C. Reynolds, Hiyun Kim, Mi-Sun Koo,
Marilyn Ekonomidis, Meliza Talaue, Steve D. Paget, Lisa K. Woolhiser,
Anne J. Lenaerts, Barry A. Bunin, Nancy Connell, Baojie Wan, Scott G.
Franzblau, Allan Casey (IDRI)
Accelrys
Funding
2R42AI088893-02 “Identification of novel therapeutics for tuberculosis combining
cheminformatics, diverse databases and logic based pathway analysis” from the National
Institute of Allergy And Infectious Diseases. (PI: S. Ekins)
R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug
discovery” from the National Library of Medicine (PI: S. Ekins)
The CDD TB has been developed thanks to funding from the Bill and Melinda Gates
Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of
SAR data optimized to promote data archiving and sharing”).
http://goo.gl/vPOKS http://goo.gl/iDJFR