bigger data to increase drug discovery

46
Bigger Data to Increase Drug Discovery Bigger Data to Increase Drug Discovery Sean Ekins Sean Ekins Phoenix Nest, Inc., Brooklyn, NY. Collaborations in Chemistry, Inc., Fuquay Varina, NC. Collaborative Drug Discovery, Inc., Burlingame, CA. Collaborations Pharmaceuticals, Inc., Fuquay Varina, NC.

Upload: sean-ekins

Post on 20-Jul-2015

203 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Bigger Data to Increase Drug Discovery

Bigger Data to Increase Drug DiscoveryBigger Data to Increase Drug Discovery

Sean EkinsSean Ekins

Phoenix Nest, Inc., Brooklyn, NY.Collaborations in Chemistry, Inc., Fuquay Varina, NC.Collaborative Drug Discovery, Inc., Burlingame, CA.

Collaborations Pharmaceuticals, Inc., Fuquay Varina, NC.

Page 2: Bigger Data to Increase Drug Discovery

In a Perfect World…

• All major diseases cured• All > 7000 rare diseases have treatments available• Neglected diseases are eradicated• Antibiotics, antivirals, vaccines developed to anticipate all

future mutations• Drug resistance eradicated• All research coordinated globally• Government/individuals collaboration- discovers / fund all

research• Billions of molecules will be available with data for different

targets• All decisions will involve machine learning• Life expectancy is infinite

Page 3: Bigger Data to Increase Drug Discovery

Big DATA

Page 4: Bigger Data to Increase Drug Discovery
Page 5: Bigger Data to Increase Drug Discovery
Page 6: Bigger Data to Increase Drug Discovery

Ebola- related tweets in a 6 week period 2014

Robert Moore

Page 7: Bigger Data to Increase Drug Discovery

Why ‘Bigger’ and not ‘Big’

Page 8: Bigger Data to Increase Drug Discovery

Just a matter of scale?

Drug Discovery’s definition of Big data

Everyone else’s definition of Big data

Page 9: Bigger Data to Increase Drug Discovery

What about Chemistry and Biology - Pharmacology X.0

• Data Sources• PubChem

• ChEMBL

• ToxCast over 1800 molecules tested against over 800 endpoints

Page 10: Bigger Data to Increase Drug Discovery

BUTBUT

WHERE

ARE

THE

Page 11: Bigger Data to Increase Drug Discovery

‘Big’ Chemistry DBs

Page 12: Bigger Data to Increase Drug Discovery

But what about small data?

• In some cases its all we have• In vivo data is not high throughput

• Small data builds networks DATA

V

http://smalldatagroup.com/

Page 13: Bigger Data to Increase Drug Discovery

The past

• 1996• Data from low throughput

Drug-drug interaction studies

• E.g. Ki values with CYP 3A4

• A drug company might have 10s of values

• This data was used to build 3D QSAR, pharmacophores

JPET, 290: 429-438, 1999

Page 14: Bigger Data to Increase Drug Discovery

  Hydrophobi

c features 

(HPF)

Hydrogen 

bond 

acceptor 

(HBA)

Hydrogen 

bond 

donor 

(HBD)

Observed 

vs. 

predicted 

IC50 r

Acoustic mediated process2 1 1 0.92

Tip-based process0 2 1 0.80

Acoustic Tip based

Generated with Discovery Studio Generated with Discovery Studio (Accelrys)(Accelrys)

Cyan = hydrophobicCyan = hydrophobic

Green = hydrogen bond acceptorGreen = hydrogen bond acceptor

Purple = hydrogen bond donorPurple = hydrogen bond donor

Each model shows most potent Each model shows most potent molecule mappingmolecule mapping

How you dispense liquids may be important: insights from small dataHow you dispense liquids may be important: insights from small data

PLoS ONE 8(5): e62325 (2013)

Page 15: Bigger Data to Increase Drug Discovery

Ebola inhibitor Pharmacophore

Ekins S, Freundlich JS and Coffee M F1000Research 2014, 3:277

Docking FDA approved compounds in VP35 protein showing overlap with ligand (yellow)

Proposed amodiaquine, chloroquine, clomiphene toremifeneWhich all are active in vitro may have common features and bind common site / target

A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus

Page 16: Bigger Data to Increase Drug Discovery

The last 5 years -Present• 2010• Data from high

throughput screens at Pfizer

• E.g. metabolic stability data ~200K compounds

• This data was used to build machine learning models

• 2015• Could easily be

double this amountDrug Metab Dispos, 38: 2083-2090, 2010

Page 17: Bigger Data to Increase Drug Discovery

Ebola Machine Learning ModelsModels 

(training set 

868 

compounds)

RP Forest 

(Out of 

bag ROC)

RP Single 

Tree (With 5 

fold cross 

validation 

ROC)

SVM

(with 5 fold 

cross 

validation 

ROC) 

Bayesian 

(with 5 fold 

cross 

validation 

ROC)

Bayesian 

(leave out 

50% x 100 

ROC) 

Open Bayesian

(with 5 fold 

cross 

validation 

ROC)

Ebola 

replication 

(actives = 20)

0.70 0.78 0.73 0.86 0.86 0.82

Ebola 

Pseudotype 

(actives = 41)

0.85 0.81 0.76 0.85 0.82 0.82

Ekins, Freundlich, Madrid and Clark

Page 18: Bigger Data to Increase Drug Discovery

https://goo.gl/uG8K3P

Page 19: Bigger Data to Increase Drug Discovery

Tuberculosis still kills 1.6-1.7m/yr (~1 every 8 seconds)

1/3rd of worlds population infected!!!!

streptomycin (1943)streptomycin (1943)para-para-aminosalicyclic acid (1949)aminosalicyclic acid (1949)isoniazid (1952) isoniazid (1952) pyrazinamide (1954)pyrazinamide (1954)cycloserine (1955)cycloserine (1955)ethambutol (1962)ethambutol (1962)rifampicin (1967)rifampicin (1967)

Multi drug resistance in 4.3% of cases Multi drug resistance in 4.3% of cases

Extensively drug resistant increasing Extensively drug resistant increasing incidenceincidence

2 new drugs (bedaquiline, delamanid) 2 new drugs (bedaquiline, delamanid) in 40 yrs in 40 yrs

Tuberculosis – a big diseaseTuberculosis – a big disease

Page 20: Bigger Data to Increase Drug Discovery

Tested >350,000 moleculesTested >350,000 molecules      Tested ~2M            2M     Tested ~2M            2M     >300,000    >300,000

>1500 active and non toxic>1500 active and non toxic     Published 177        100s    Published 177        100s         800         800 

Big Data: Screening for New Tuberculosis Treatments Big Data: Screening for New Tuberculosis Treatments 

How many will become a new drug?How do we learn from this big data?

TBDA screened over 1 million, 1 million more to go

TB Alliance + Japanese pharma screens

Page 21: Bigger Data to Increase Drug Discovery

Over 8000 molecules with dose response data for Mtb in CDD Public

from NIAID/SRI

https://app.collaborativedrug.com/register

Page 22: Bigger Data to Increase Drug Discovery

Over 6 years analyzed in vitro data and built models

Top scoring molecules assayed for

Mtb growth inhibition

Mtb screening molecule

database/s

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning classification Mtb Model

Molecule Database (e.g. GSK malaria

actives)virtually scored

using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits and test models3 x published prospective tests ~750 ~750 molecules were tested molecules were tested in vitroin vitro 198 actives were identified198 actives were identified>20 % hit rate>20 % hit rateMultiple retrospective tests 3-10 fold enrichment

NH

S

N

Ekins et al., Pharm Res 31: 414-435, 2014Ekins, et al., Tuberculosis 94; 162-169, 2014Ekins, et al., PLOSONE 8; e63240, 2013Ekins, et al., Chem Biol 20: 370-378, 2013Ekins, et al., JCIM, 53: 3054−3063, 2013Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011Ekins et al., Mol BioSyst, 6: 840-851, 2010 Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,

Page 23: Bigger Data to Increase Drug Discovery

5 active compounds vs Mtb in a few months

7 tested, 5 active (70% hit rate)

Ekins et al.,Chem Biol 20, 370–378, 2013

1. Virtually screen 13,533-member GSK antimalarial hit library

2. Bayesian Model = SRI TAACF-CB2 dose response + cytotoxicity model

3. Top 46 commercially available compounds visually inspected

4. 7 compounds chosen for Mtb testing based on

- drug-likeness- chemotype diversity

GSK #Bayesian

Score Chemical Structure

Mtb H37Rv MIC

(µg/mL)

GSK Reported

% Inhibition HepG2 @ 10

µM cmpd

TCMDC-123868 5.73 >32 40

TCMDC-125802 5.63 0.0625 5

TCMDC-124192 5.27 2.0 4

TCMDC-124334 5.20 2.0 4

TCMDC-123856 5.09 1.0 83

TCMDC-123640 4.66 >32 10

TCMDC-124922 4.55 1.0 9

Page 24: Bigger Data to Increase Drug Discovery

Filling out the triazine matrix using SARtable:A new kind of map

Green = good activity, Red = bad; colored dots are predictions

Page 25: Bigger Data to Increase Drug Discovery

No relationship between internal or external ROC and the number of molecules in the training set?

PCA of combined data and ARRA(red)

Ekins et al., J Chem Inf Model 54: 2157-2165 (2014)

Internal and leave out 50%x100 ROC track each otherExternal ROC less correlationSmaller models do just as well with external testing

~350,000

Page 26: Bigger Data to Increase Drug Discovery

What matters most >70 years of TB mouse in vivo data – Mind the gap - 770 molecules

MIND THE TB GAP

Ekins et al., J Chem Inf Model 54: 1070-82, 2014

Ekins, Nuermberger & Freundlich DDT 19: 1279-1282, 2014

Page 27: Bigger Data to Increase Drug Discovery

In vivo Machine Learning Models

ROC 5 fold cross validation

RP Forest RP Single

Tree

SVM

Bayesian

3 /11

(27.2%)

4/11

(36.4%)

7/11

(63.6%)

8/11

(72.7%)

External test set

Ekins et al., J Chem Inf Model 54: 1070-82, 2014

RP Forest RP Single

Tree

SVM

Bayesian

0.75 0.71 0.77 0.73

Page 28: Bigger Data to Increase Drug Discovery

How can we find the in vivo active compounds?We need a map..

Page 29: Bigger Data to Increase Drug Discovery

>70 years of TB in vivo dataGreen = in vivo mouse activeEmpty = in vivo inactiveYellow = 2013-2015 data

Uses Bayesian fingerprintsand clustering by similarity

Clark and Ekins - unpublished

Clustering in vivo mouse TB dataHex

plot

Page 30: Bigger Data to Increase Drug Discovery

>70 years of TB in vivo dataGreen = in vivo mouse activeEmpty = in vivo inactiveYellow = 2013-2015

Clark and Ekins - unpublished

Clustering in vivo mouse TB data

Triazine surrounded by inactives

IssuesHigh Log P, poor solubility

Page 31: Bigger Data to Increase Drug Discovery

How do we ‘increase drug discovery’?

• Make data and models more accessible• Collaborate• Share

– Create mobile apps

• Encourage engagement from non scientists

Page 32: Bigger Data to Increase Drug Discovery

MoDELS RESIDE IN PAPERSNOT ACCESSIBLE…THIS IS UNDESIRABLE

How do we share them?How do we use Them?

Page 33: Bigger Data to Increase Drug Discovery

• CDD VisionUses Bayesian algorithm and FCFP_6 fingerprints

Bayesian models

Clark et al., J Cheminform 6:38 2014

Page 34: Bigger Data to Increase Drug Discovery

Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6 Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6 fingerprints; (b) modified Bayesian estimators for active and inactive compounds; fingerprints; (b) modified Bayesian estimators for active and inactive compounds; (c) structures of selected binders.(c) structures of selected binders.

For each listed target with at least two binders, it is first assumed that all of the For each listed target with at least two binders, it is first assumed that all of the molecules in the collection that do not indicate this as one of their targets are molecules in the collection that do not indicate this as one of their targets are inactive. inactive.

In the app we used ECFP_6 fingerprints In the app we used ECFP_6 fingerprints

Building Bayesian models for each target in TB MobileBuilding Bayesian models for each target in TB Mobile

Clark et al., J Cheminform 6:38 2014

Page 35: Bigger Data to Increase Drug Discovery

TB Mobile Vers.2TB Mobile Vers.2

Ekins et al., J Cheminform 5:13, 2013Clark et al., J Cheminform 6:38 2014

Predict targetsCluster molecules

http://goo.gl/vPOKS

http://goo.gl/iDJFR

Page 36: Bigger Data to Increase Drug Discovery

Predictions for 2013-2015 in vivo molecules

Page 37: Bigger Data to Increase Drug Discovery

Bayesian models added to mobile apps: MMDS

Page 38: Bigger Data to Increase Drug Discovery

Bayesian models added to mobile apps: Approved drugs

Page 39: Bigger Data to Increase Drug Discovery

Human Microsomal Intrinsic clearance

Human protein binding Solubility pH 7.4

AZ dataset models >1000 molecules

Page 40: Bigger Data to Increase Drug Discovery

Models from ChEMBL data

http://molsync.com/bayesian2

Page 41: Bigger Data to Increase Drug Discovery

What do 2000 ChEMBL models look like

Folding bit size

AverageROC

http://molsync.com/bayesian2

Page 42: Bigger Data to Increase Drug Discovery

Bigger datasets and model collections

• Profiling “big datasets” is going to be the norm.• A recent study mined PubChem datasets for

compounds that have rat in vivo acute toxicity data

• This could be used in other big data initiatives like ToxCast (> 1000 compounds x 800 assays) and Tox21 etc.

• Kinase screening data (1000s mols x 100s assays)

• GPCR datasets etc (1000s mols x 100s assays)

Zhang J, Hsieh JH, Zhu H (2014) Profiling Animal Toxicants by Automatically Mining Public Bioassay Data: A Big Data Approach for Computational Toxicology. PLoS ONE 9(6): e99863. doi:10.1371/journal.pone.0099863http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0099863

Page 43: Bigger Data to Increase Drug Discovery

• Data is at your fingertips instantly• labs add data to a massive corpus

of knowledge • Instantly available to all• Algorithms for mining, prediction• Millions of models accessible • Making decisions on experiments

needed and running them• Data visualization, exploration is

real-time, updated• Data follows you

Sean Ekins, a computational drug discovery consultant at Collaborations in Chemistry in North Carolina, is much more skeptical. He notes pharma companies have found hundreds of antimalaria compounds more potent than TNP-470 and says that he is not convinced Eve can do QSAR. He wants to see Eve go head-to-head with a real computational chemist. “Eve should go back to the Garden of Eden and leave drug discovery to scientists who know what they are doing,” Ekins says.

How close are we?

Page 44: Bigger Data to Increase Drug Discovery

• Computers and models do not replace scientists• A tool to help us sift through ideas quickly• Many examples have lead to leads• Bigger data not needed for good models• More data becoming public• Can model ADME, bioactivity and more• Collaboration and software is important

• Mobile apps have useful cheminformatics features - aid anyone to do drug discovery

• Models are compact < 1MB and portable• The age of model sharing is here

Conclusions

Page 45: Bigger Data to Increase Drug Discovery

Wanted

• “Bigger” small molecule screening datasets• Preferably > 500,000 – 1,000,000 molecules with data• To test how machine learning Algorithms Scale

• Contact [email protected]

Page 46: Bigger Data to Increase Drug Discovery

Nadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and many Nadia Litterman, Krishna Dole and all at CDD, Megan Coffee, SRI, MM4TB and many others …Funding: others …Funding: Bill and Melinda Gates Foundation (Grant#49852) Bill and Melinda Gates Foundation (Grant#49852) 1R41AI088893-01, 1R41AI088893-01,

2R42AI088893-02, R43 LM011152-01, 2R42AI088893-02, R43 LM011152-01, 9R44TR000942-02, 1R41AI108003-01, 1U19AI109713-01, MM4TB, Software: Biovia MM4TB, Software: Biovia

Freundlich Lab