digging out structures for repurposing: non-competitive intelligence
DESCRIPTION
Prepared as visitor seminar for PubChem, April 2013TRANSCRIPT
[1]
Digging out Structures for Repurposing: Non-competitive Intelligence
PubChem Seminar April 2013
Christopher Southan, TW2Informatics, Göteborg, Sweden
[2]
Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710Skype: cdsouthanEmail: [email protected]: http://twitter.com/#!/cdsouthanBlog: http://cdsouthan.blogspot.com/LinkedIN: http://www.linkedin.com/in/cdsouthan Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publicationsPresentations: http://www.slideshare.net/cdsouthan
[3]
Outline
• Trawling for repurposing-relevant data• Code names statistics and name > structure triage• The NCATS/MRC challenge• Story of JNJ-39393406• Scaling-up Code name hunting and x-mapping • Code name in clinical trials, MeSH, PubChem• Story of PF-04457845• Trials, MeSH and PubChem code name intersects• Conclusions
[4]
Intelligence: trawling compound information
Competitive
• Directed towards commercially positioning and/or repurposing own portfolio
• Major big pharma activity• Mixed commercial/public sources• Internal specialists• Typically a closed activity (i.e. little
open “best practice”)• Typically therapeutic area aligned
Non-competitive
• Directed towards repositioning any compound
• Collaborative approaches to IP holders (but new IP possible)
• Can utilise public resources alone• Different domain expert entry
points• Predominantly an open activity
(e.g. OSDD)• Can be hypothesis-neutral
[5]
Structures: connecting to repurposing-relevant data
• Code names and synonyms• Resolving these to structures• Database entries• BioAssay results• Target/pathway links• In vitro & in vivo research papers• Clinical trial results and papers• Patents for analogues and SAR• Comparative in vivo data• Mendelian and GWAS disease links• Expression data for cpds• In silico modeling (including rare or NTDs)• Vendor similarity matches
[6]
Code names: 2-15 year information hole
Pharmaprojects 2009-10 figures
[7]
Drugs,code names, INN/USANs and structures: few congruent hard numbers
• Pharmaprojects (2013) drug profiles ~ 50,000 • Thomson Reuters Cortelis (2012) drug monographs = 41,889• Pharmaprojects (via ProQuest, 2012) records ~ 35,000• Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901 • Pharmaprojects (2003 structures) = 14,000 • ChEMBL USANs (2013) = 10,568• PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890• Pharmaprojects (2010 in development, no structure count) = 9,737 • GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864• Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828
[8]
Code names: major repurposing potential – but..
• ~ 95% of the 30K are/will become “parked” or “abandoned” • Can be repurposed in silico at least• Obvious hierarchy : leads> development > clinical trials > INN > approved
• Problems– New code names < 50% - 70% blinded (i.e. no structures)– Some older code names never un-blinded– Code naming practices independent and completely ad hoc– Publications, conference reports, clinical trials entries, press releases
and portfolio listings linked to “blinded” code names (no structures) – Even for public declarations (e.g. papers) data linked into “the system”
(e.g. synonym mapping) is patchy– Code originators do not provenance public database entries – Data supporting non-progression decisions rarely disclosed– http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes
[9]
Code name-to-structure mapping triage
Dig out the code names
PubChem Substance
PubChem Compound
PubMed/MeSH
Google Scholar
Google Images
Google open (filtered)
Name/image > struc
• chemicalize.org, OPSIN, Chemical Identifier Resolver, sketchers, OSRA
• Cross-checks: – SMILES/SDF/InChI strings
PubChem and ChemSpider– InChIKey in Google – SureChemOpen patent search– Clinicaltrials.gov– Synonym trawling
[10]
The NCATS/MRC industry sponsored repurposing exercise: the joy of code lists
[11]
NCATS/MRC repurposing candidates
http://cdsouthan.blogspot.se/2012/09/mrc-22-vs-ncats-58-repurposing-lists.html
[12]
NCATS/MRC: summary statistics
• 70 code names – no structures• 18 INNs & 4 codes-only in PubChem• 24 strucs “dug out” but PubChem-ve• 24 codes remain blinded
PMID 23159359
[13]
Sleuthing down a JNJ-39393406 structure: from darkness to twilight
[14]
JNJ-39393406:NCATS documentation PubChem -ve
[15]
JNJ-39393406: ClinicalTrials.gov
[16]
JNJ-39393406 in PubMed
[17]
JNJ-39393406: open Google
[18]
JNJ-39393406: Google Scholar (was) structure -ve
[19]
JNJ-39393406 in Google images: finally a mapping
But where did these two vendors get their mapping from ?
[20]
(Probable) JNJ-39393406 in PubChem: CID 1675566 patent-only sources and near-neighbours
[21]
(Probable) JNJ-39393406: SureChemOpen patent match
with corroborative data
PubChem SID 152835708
Cf NCATS data
[22]
More JNJ-39393406 mystery:InChIKey in Google > ChemSpider > 3rd vendor
[23]
Not all JNJ-s are blinded: JNJ-40418677IUPAC in abstract but code still PubChem –ve
IUPAC name converted at chemicalize.org for PubChem mapping
[24]
Scaling-up code name retrieval: wild card searches
[25]
Phases & codes in Clinicaltrials.gov: thin on results
• Interventional studies = 115356 , 7895 with results (7%)
• Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477
• Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004• Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122
(12%)
• Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 1640
• Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 185 (11%)
[26]
altrials.net: public pressure > more results > more repurposing opportunities
http://www.youtube.com/watch?v=lQ6YTU5kGXw&feature=youtu.be&t=28m39s
[27]
Stemming code names in MeSh
[28]
Code names in PubChem Compound (CIDs)
CID:SID ratio 275:1039
[29]
Codes in PubChem: selected matches
[30]
“GSK-” in ChEMBL : 61
[31]
Tracking PF-04457845 through the system
[32]
PubMed intersects: finding PF-04457845
[33]
PF-04457845: PubMed
[34]
PF-04457845: Clinicaltrials.org
[35]
PF-04457845: PubChem CID
24771824
Substance (SID) capture of activity, vendor and patent
sources
[36]
Wikipedia: links to other development compounds
But who put them in ?
[37]
PF-04457845: (almost) a total system success
• Declared efficacy failure > possible repurposing candidate • Selection of analogues and a probe [18F]PF-9811 (CID 70679467)• The “system” did well because of good publishing practice (e.g. full text)• Code, structure, target, papers, trials and patents all connected• 5mg for $275
But-• Serendipitous finding (no “efficacy failure” or “study stopped” tags)• Lack of clinicaltrials.org <> PubMed• BindingDB using deprecated ChEBI ID• PMID:21505060 not yet in ChEMBL• No direct target or patent nos. in CID record because no DrugBank,
SCRIPDB or IBM capture• [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books
[38]
Looking at code name intersects in different parts of the system
[39]
Clinicaltrials.org JNJ* Word cloud
JNJ-28431754 = Canagliflozin = CID 24812758
[40]
Company Pipelines: GSK codes for 2012
[41]
GSK codes: PubChem vs. 2012 Pipeline
[42]
Clinical Trials, PubChem, MeSH: GSK
[43]
Clinical Trials, PubChem, MeSH: JNJ
[44]
Clinical, PubChem, MeSH, & 2012 Pipeline:GSK
[45]
Conclusions
• Stalled development candidates, designated by company codes, constitute a large potential repurposing information estate
• Historical in vitro , pharmacological & clinical data linked to ~ 30K codes • But only 40-50% have structures assignable from open sources• An even smaller proportion have code names in PubChem• Public name>struc>data capture is ad hoc and needs improving• Repurposing-relevant relationships are not easy to dig out• Some “non competitive intelligence” approaches are shown here• The big push for transparency and open access should improve
disclosure, data capture, linkage and repurposing opportunities
Happy hunting !
TED Talk: Francis Collins: We need better drugs -- nowhttp://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html