correct drug structures for pharmacology
TRANSCRIPT
1
How can pharmacologists know which drug structures are correct?
Christopher Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies
IUPHAR/BPS Guide to Pharmacology (GtoPdb) University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK.
Presentation for BPS | Pharmacology 2016, LondonScheduled for Wed, Dec14, 2:15 PM
http://www.slideshare.net/cdsouthan/correct-drug-structures-for-pharmacology
2
Abstract (will not be shown, should be online at BPS)
Introduction: Human medicines represent the crown jewels of pharmacology. Paradoxically however; there is neither any “Gold Standard” set of approved chemical structures, nor agreement on totals. A 2009 comparison of three sets of approved drugs recorded only 807 exact structures-in-common from the expected ~1200 [1]. The IUPHAR/BPS Guide to Pharmacology (GtoPdb) team have grappled with this discordance issue for curating approved drugs and all ~ 6000 small-molecule ligands we deposit into PubChem [2]. Users have the same challenge of deciding correct structures when procuring compounds for experiments or navigating links between journals and databases. This work examines the problems and partial solutions.Methods: We used PubChem to explore relationships for selected drugs already curated into GtoPdb. Tools included the “same connectivity” operator that records distinct compound record (CID) representations of the same carbon backbone. We divided structural multiplexing causes between stereo differences, mixtures and isotopic derivatives. We then performed Venn-type comparisons between DrugBank, ChEMBL, and the Therapeutic Target Database. Additional metrics were generated to dissect contributing factors to discordance between these three and other sources.Results: Atorvastatin has 51 different single representations in PubChem and 248 mixtures with paclitaxel (taxol) having 142 and 330, respectively. Comparing three manually curated drug sets mentioned above inside PubChem showed the consensus was only 25% of the sum. Results comparing other drug sources also showed discordance. Causes for CID multiplexing discordance will be presented. Using PubChem tools we assessed a curation strategy of selecting CIDs with structures supported by the majority of submitting sources. While not infallible, comparison with INN documentation indicated its effectiveness. We will also show how tagging our own approved drug records facilitates easy retrieval of just these entries from PubChem but that vendor drug names sometimes mapped to different structures.Conclusion: As PubChem pushes towards 100 million, we have examined problems of choosing correct structures of pharmacologically active compounds. The constitutive challenges of chemical representation and high levels of discordances we recorded indicate that definitive drug lists (even our own) will remain elusive until pharmaceutical companies submit their own records directly to open databases. In the meantime, we have optimised our GtoPdb curation for the submission of our own 1088 approved CID entries as both a partial solution and trusted reference set for the pharmacology community.References: [1] Southan et al. (2009) J Cheminform. 1:1-10. [2] Southan et al. (2016). Nucl. Acids Res. 44 (Database Issue): D1054-68.
3
Outline
• Introduction to GtoPdb• Context of the study• Database chemistry and approved drug counts• Intersecting curated drugs in PubChem• Fuzzy drug structure relationships• GtoPdb approved drugs• GtoPdb structures in PubChem• Conclusions• References
4
Introduction to IUPHAR/BPS Guide to Pharmacology (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British Pharmacological Society
• Formerly know as IUPHAR-DB for receptors and channels since 2009• Since 2012 funded by Wellcome Trust to cover all targets in the
human genome• Curated molecular mechanism of action (mmoa) as quantitative
activity mapping to primary targets, including IUPHAR nomenclature• 1429 human proteins, 14701 interactions, 8674 ligands• Described in four Nucleic Acids Research Annual Database issues,
PMIDs 26464438 (2016), 24234439 (2014), 23087376 (2013) and 21087994 (2011)
• Distilled into bi-annual British Journal of Pharmacology “Concise Guide to PHARMACOLOGY” as a nine-paper series
• Presents users with the best compounds for pharmacology research in silico, in vitro, in cellulo, in vivo, or in clinico
http://www.guidetopharmacology.org/
5
Context of presentation
• In the last few years the GtoPdb team has been finding structure space around lead compounds, probes and drugs increasingly “fuzzy”
• Curatorial choices are consequently becoming more difficult• We needed a molecular perspective on the causes of this “fuzz”• We have increased our exploration of PubChem chemical
structural neighbourhoods to gain this perspective• This presentation distils key points
6
There’s a lot of chemistry out there
Source Count
UniChem EBI 138 million
CAS/SciFinder 124 million
PubChem 93 million
PubChem (vendors) 64 million
ChemSpider 58 million
SureChEMBL (patents) 17 million
ChEMBL 1.6 million
PubChemBioAssay (active) 1.0 million
MeSH Pharmacological action 14,879
PubChem INN or USAN 10,858
Preclinical 6,861*
Phase I 1,856*
Phase II 2261*
Phase III 954*
Guide to PHARMACOLOGY 6,565
November 2016 counts
* The Citeline© 2015 drug counts include average of ~25% biologicals
7
Approved drug structure counts: take your pick
Source Year Total Reference NotesGVKBIO Drug Database 2013 4750Slideshare Global approvedNCATS Pharmaceutical Collection 2011 2356PMID 21525397 FDA, from global 3936Therapeutic Target Database 2015 2071PMID 26578601 Small-molecule FDADrugCentral 2016 2021PMID 27789690 FDA, from 4456 APIsDrugBank 5.0 2016 2004PMID 24203711 App. small-molecule, from 2225ChEMBL 22 2016 1855PMID 24214965 SMILES from 2260 Phase 4Drug3D db 2015 1790PMID 22539672 Small-molecule FDACfam Chemical Families db 2015 1691PMID 25414339 ApprovedMap of molecular drug targets 2016 1578PMID 27910877 FDA approvedFDA approved NME overview 2013 1543PMID 24680947 Small-molecule FDA, no strucs.Network analysis of FDA drugs 2007 1471PMID 17516560 26th Orange Book, no strucs.SWEETLEAD db 2013 1427PMID 24223973 FDA, from global 2836FDA recommended dose db 2004 1309PMID 15546675 Small-molecule FDAGuide to PHARMACOLOGY 2016.4 2016 1291PMID 26464438 Approved, selective curation
8
Discordance of curated drug sets within PubChem
http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021
• Good news: 1361 structures with at least 3-way agreement • Bad news: no“Gold Standard” set (but the 459 4-way would do)• Details below
NPC = National Centre for Advancing Translational Sciences (NCATS) Pharmaceutical Collection
9
Exploring “fuzz” via PubChem:Which of 51 atorvastatins is correct?
• Powerful structural relationship navigation
• Needs cheminformatics expertise
10
Which of 145 taxols is correct?
145 distinct structures in PubChem 12 have BioAssay results34 have vendors
11
GtoPdb approved drug curation
• Our approach is stringent and parsimonious (i.e. not a pharmacopeia)
• Usually select the best-supported PubChem CID • We “fuzz” check for chirality, strip salts and cross-check INN
PDFs• Focus on human diseases • No inorganics (except Li), nutraceuticals or metabolites• Mainly FDA and EMA• Withdrawn or discontinued are flagged• Cross-pointers to approved salt forms, active metabolites, drug
> prodrug• Every entry has curator’s note• Grateful for feedback and corrections
12
GtoPdb drugs
• The PubChem query (approved[comment] AND "IUPHAR/BPS Guide to PHARMACOLOGY"[SourceName]) retrieves just our 1291 substances (SIDs)
• These convert to 1174 distinct compound entries (CIDs)• 96% vendor matches in PubChem• The 117 SID difference is mainly antibodies
Approved set now a clean PubChem select
13
GtoPdb curated small-molecules: overlaps in PubChem
14
Conclusions
• Chemistry database coverage and annotation depth has expanded• But so has the “fuzz”• Ligand choices for pharmacology experiments can be challenging• Controlling these factors is crucial for experimental reproducibility• GtoPdb is a good “first-stop-shop” choice• “Gold Standard” is illusory but we do our best to select the correct
structures• Feedback welcome on coverage gaps or structural equivocality• We can assist with complex choices• Explore PubChem as “second-stop-shop” • Get acquainted with medicinal chemists and/or cheminformaticians
15
Thank you; questions welcome
Find out more at the BPS stand
PMID: 26464438, PMCID: PMC4702778