digging out structures for repurposing: non-competitive intelligence

45
[1] Digging out Structures for Repurposing: Non-competitive Intelligence PubChem Seminar April 2013 Christopher Southan, TW2Informatics, Göteborg, Sweden

Upload: chris-southan

Post on 11-Jun-2015

408 views

Category:

Technology


0 download

DESCRIPTION

Prepared as visitor seminar for PubChem, April 2013

TRANSCRIPT

Page 1: Digging out Structures for Repurposing: Non-competitive Intelligence

[1]

Digging out Structures for Repurposing: Non-competitive Intelligence

PubChem Seminar April 2013

Christopher Southan, TW2Informatics, Göteborg, Sweden

Page 2: Digging out Structures for Repurposing: Non-competitive Intelligence

[2]

Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710Skype: cdsouthanEmail: [email protected]: http://twitter.com/#!/cdsouthanBlog: http://cdsouthan.blogspot.com/LinkedIN: http://www.linkedin.com/in/cdsouthan Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publicationsPresentations: http://www.slideshare.net/cdsouthan

Page 3: Digging out Structures for Repurposing: Non-competitive Intelligence

[3]

Outline

• Trawling for repurposing-relevant data• Code names statistics and name > structure triage• The NCATS/MRC challenge• Story of JNJ-39393406• Scaling-up Code name hunting and x-mapping • Code name in clinical trials, MeSH, PubChem• Story of PF-04457845• Trials, MeSH and PubChem code name intersects• Conclusions

Page 4: Digging out Structures for Repurposing: Non-competitive Intelligence

[4]

Intelligence: trawling compound information

Competitive

• Directed towards commercially positioning and/or repurposing own portfolio

• Major big pharma activity• Mixed commercial/public sources• Internal specialists• Typically a closed activity (i.e. little

open “best practice”)• Typically therapeutic area aligned

Non-competitive

• Directed towards repositioning any compound

• Collaborative approaches to IP holders (but new IP possible)

• Can utilise public resources alone• Different domain expert entry

points• Predominantly an open activity

(e.g. OSDD)• Can be hypothesis-neutral

Page 5: Digging out Structures for Repurposing: Non-competitive Intelligence

[5]

Structures: connecting to repurposing-relevant data

• Code names and synonyms• Resolving these to structures• Database entries• BioAssay results• Target/pathway links• In vitro & in vivo research papers• Clinical trial results and papers• Patents for analogues and SAR• Comparative in vivo data• Mendelian and GWAS disease links• Expression data for cpds• In silico modeling (including rare or NTDs)• Vendor similarity matches

Page 6: Digging out Structures for Repurposing: Non-competitive Intelligence

[6]

Code names: 2-15 year information hole

Pharmaprojects 2009-10 figures

Page 7: Digging out Structures for Repurposing: Non-competitive Intelligence

[7]

Drugs,code names, INN/USANs and structures: few congruent hard numbers

• Pharmaprojects (2013) drug profiles ~ 50,000 • Thomson Reuters Cortelis (2012) drug monographs = 41,889• Pharmaprojects (via ProQuest, 2012) records ~ 35,000• Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901 • Pharmaprojects (2003 structures) = 14,000 • ChEMBL USANs (2013) = 10,568• PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890• Pharmaprojects (2010 in development, no structure count) = 9,737 • GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864• Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828

Page 8: Digging out Structures for Repurposing: Non-competitive Intelligence

[8]

Code names: major repurposing potential – but..

• ~ 95% of the 30K are/will become “parked” or “abandoned” • Can be repurposed in silico at least• Obvious hierarchy : leads> development > clinical trials > INN > approved

• Problems– New code names < 50% - 70% blinded (i.e. no structures)– Some older code names never un-blinded– Code naming practices independent and completely ad hoc– Publications, conference reports, clinical trials entries, press releases

and portfolio listings linked to “blinded” code names (no structures) – Even for public declarations (e.g. papers) data linked into “the system”

(e.g. synonym mapping) is patchy– Code originators do not provenance public database entries – Data supporting non-progression decisions rarely disclosed– http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes

Page 9: Digging out Structures for Repurposing: Non-competitive Intelligence

[9]

Code name-to-structure mapping triage

Dig out the code names

PubChem Substance

PubChem Compound

PubMed/MeSH

Google Scholar

Google Images

Google open (filtered)

Name/image > struc

• chemicalize.org, OPSIN, Chemical Identifier Resolver, sketchers, OSRA

• Cross-checks: – SMILES/SDF/InChI strings

PubChem and ChemSpider– InChIKey in Google – SureChemOpen patent search– Clinicaltrials.gov– Synonym trawling

Page 10: Digging out Structures for Repurposing: Non-competitive Intelligence

[10]

The NCATS/MRC industry sponsored repurposing exercise: the joy of code lists

Page 11: Digging out Structures for Repurposing: Non-competitive Intelligence

[11]

NCATS/MRC repurposing candidates

http://cdsouthan.blogspot.se/2012/09/mrc-22-vs-ncats-58-repurposing-lists.html

Page 12: Digging out Structures for Repurposing: Non-competitive Intelligence

[12]

NCATS/MRC: summary statistics

• 70 code names – no structures• 18 INNs & 4 codes-only in PubChem• 24 strucs “dug out” but PubChem-ve• 24 codes remain blinded

PMID 23159359

Page 13: Digging out Structures for Repurposing: Non-competitive Intelligence

[13]

Sleuthing down a JNJ-39393406 structure: from darkness to twilight

Page 14: Digging out Structures for Repurposing: Non-competitive Intelligence

[14]

JNJ-39393406:NCATS documentation PubChem -ve

Page 15: Digging out Structures for Repurposing: Non-competitive Intelligence

[15]

JNJ-39393406: ClinicalTrials.gov

Page 16: Digging out Structures for Repurposing: Non-competitive Intelligence

[16]

JNJ-39393406 in PubMed

Page 17: Digging out Structures for Repurposing: Non-competitive Intelligence

[17]

JNJ-39393406: open Google

Page 18: Digging out Structures for Repurposing: Non-competitive Intelligence

[18]

JNJ-39393406: Google Scholar (was) structure -ve

Page 19: Digging out Structures for Repurposing: Non-competitive Intelligence

[19]

JNJ-39393406 in Google images: finally a mapping

But where did these two vendors get their mapping from ?

Page 20: Digging out Structures for Repurposing: Non-competitive Intelligence

[20]

(Probable) JNJ-39393406 in PubChem: CID 1675566 patent-only sources and near-neighbours

Page 21: Digging out Structures for Repurposing: Non-competitive Intelligence

[21]

(Probable) JNJ-39393406: SureChemOpen patent match

with corroborative data

PubChem SID 152835708

Cf NCATS data

Page 22: Digging out Structures for Repurposing: Non-competitive Intelligence

[22]

More JNJ-39393406 mystery:InChIKey in Google > ChemSpider > 3rd vendor

Page 23: Digging out Structures for Repurposing: Non-competitive Intelligence

[23]

Not all JNJ-s are blinded: JNJ-40418677IUPAC in abstract but code still PubChem –ve

IUPAC name converted at chemicalize.org for PubChem mapping

Page 24: Digging out Structures for Repurposing: Non-competitive Intelligence

[24]

Scaling-up code name retrieval: wild card searches

Page 25: Digging out Structures for Repurposing: Non-competitive Intelligence

[25]

Phases & codes in Clinicaltrials.gov: thin on results

• Interventional studies = 115356 , 7895 with results (7%)

• Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477

• Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004• Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122

(12%)

• Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 1640

• Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 185 (11%)

Page 26: Digging out Structures for Repurposing: Non-competitive Intelligence

[26]

altrials.net: public pressure > more results > more repurposing opportunities

http://www.youtube.com/watch?v=lQ6YTU5kGXw&feature=youtu.be&t=28m39s

Page 27: Digging out Structures for Repurposing: Non-competitive Intelligence

[27]

Stemming code names in MeSh

Page 28: Digging out Structures for Repurposing: Non-competitive Intelligence

[28]

Code names in PubChem Compound (CIDs)

CID:SID ratio 275:1039

Page 29: Digging out Structures for Repurposing: Non-competitive Intelligence

[29]

Codes in PubChem: selected matches

Page 30: Digging out Structures for Repurposing: Non-competitive Intelligence

[30]

“GSK-” in ChEMBL : 61

Page 31: Digging out Structures for Repurposing: Non-competitive Intelligence

[31]

Tracking PF-04457845 through the system

Page 32: Digging out Structures for Repurposing: Non-competitive Intelligence

[32]

PubMed intersects: finding PF-04457845

Page 33: Digging out Structures for Repurposing: Non-competitive Intelligence

[33]

PF-04457845: PubMed

Page 34: Digging out Structures for Repurposing: Non-competitive Intelligence

[34]

PF-04457845: Clinicaltrials.org

Page 35: Digging out Structures for Repurposing: Non-competitive Intelligence

[35]

PF-04457845: PubChem CID

24771824

Substance (SID) capture of activity, vendor and patent

sources

Page 36: Digging out Structures for Repurposing: Non-competitive Intelligence

[36]

Wikipedia: links to other development compounds

But who put them in ?

Page 37: Digging out Structures for Repurposing: Non-competitive Intelligence

[37]

PF-04457845: (almost) a total system success

• Declared efficacy failure > possible repurposing candidate • Selection of analogues and a probe [18F]PF-9811 (CID 70679467)• The “system” did well because of good publishing practice (e.g. full text)• Code, structure, target, papers, trials and patents all connected• 5mg for $275

But-• Serendipitous finding (no “efficacy failure” or “study stopped” tags)• Lack of clinicaltrials.org <> PubMed• BindingDB using deprecated ChEBI ID• PMID:21505060 not yet in ChEMBL• No direct target or patent nos. in CID record because no DrugBank,

SCRIPDB or IBM capture• [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books

Page 38: Digging out Structures for Repurposing: Non-competitive Intelligence

[38]

Looking at code name intersects in different parts of the system

Page 39: Digging out Structures for Repurposing: Non-competitive Intelligence

[39]

Clinicaltrials.org JNJ* Word cloud

JNJ-28431754 = Canagliflozin = CID 24812758

Page 40: Digging out Structures for Repurposing: Non-competitive Intelligence

[40]

Company Pipelines: GSK codes for 2012

Page 41: Digging out Structures for Repurposing: Non-competitive Intelligence

[41]

GSK codes: PubChem vs. 2012 Pipeline

Page 42: Digging out Structures for Repurposing: Non-competitive Intelligence

[42]

Clinical Trials, PubChem, MeSH: GSK

Page 43: Digging out Structures for Repurposing: Non-competitive Intelligence

[43]

Clinical Trials, PubChem, MeSH: JNJ

Page 44: Digging out Structures for Repurposing: Non-competitive Intelligence

[44]

Clinical, PubChem, MeSH, & 2012 Pipeline:GSK

Page 45: Digging out Structures for Repurposing: Non-competitive Intelligence

[45]

Conclusions

• Stalled development candidates, designated by company codes, constitute a large potential repurposing information estate

• Historical in vitro , pharmacological & clinical data linked to ~ 30K codes • But only 40-50% have structures assignable from open sources• An even smaller proportion have code names in PubChem• Public name>struc>data capture is ad hoc and needs improving• Repurposing-relevant relationships are not easy to dig out• Some “non competitive intelligence” approaches are shown here• The big push for transparency and open access should improve

disclosure, data capture, linkage and repurposing opportunities

Happy hunting !

TED Talk: Francis Collins: We need better drugs -- nowhttp://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html