deuterated drugs in pubchem
DESCRIPTION
Presentation at BioIT World, Hannover, October 2010TRANSCRIPT
[1]
The Unforseen Consequences of Opportunistic Deuterated Drug Claims,
Patent Extraction Feeds and the PubChem Chemistry Rules
When data integration gets fuzzy
Chris Southan
ChrisDS Consulting, Göteborg, Sweden
BioIT World, Hannover, Oct 2010
[2]
Background The scale of chemistry-biology-bioinformatics connectivity has made PubChem a de facto global data integration hub.
However the fidelity of this is compromised by factors including :
Inherent complexities of chemical structure representation
Chemistry rules that, while rigourous, tend to split CIDs
Submitter primacy and a low proliferation bar
Vendor dilution of bioanotated with no-data compounds
Increasingly complex BioAssay relationships
Patent-extraction feeds from commercial sources
[3]
Discordance of Drug Collections
[4]
Will the real Rosuvastatin stand up ?
[5]
But 15 CIDs ?
[6]
“Heavy” Rosuvastatin (+28) CID 25241235
[7]
Patent Filings on Deuterated Drugs
“Protia's patents appear to be a blunderbuss approach, with a mass of US filings 237 published to date but only 11 PCT applications. None of these provide exemplification or any biological description. Concert has 39 PCTs and 26 US applications published, Auspex 57 PCTS and 39 US applications”
(Comment from “In the Pipeline” blog, June 2009)
[8]
A Pipeline With Unintended Consequences
[9]
Some of them Might Just Work ?
But we don’t know which one of the 25 !
[10]
Picking Off the Best-sellers
Connectivity Deuteros
1) 38 32
2) 4 88 0 85
3) 15 12
7) 7 4
8) 8 7
9) 11 9
10) 68 66
13) 29 26
14 ) 51 48
15) 16 11
[11]
Sorting
[12]
Extent of the Problem
[13]
Does this Matter ? In the grand scheme of things maybe not - but ........
Not just the deuteros but other patent-extracted compounds , are causing CID ”multiplexing ” in PubChem (and ChemSpider)
The Pharma ”Crown-jewels” of marketed drugs are badly hit
Less experienced PubChem users could be confused
Some types of search results get messed up
They ”gum up” company internal integrated systems
Do we want prophetic or virtual structures in PubChem ?
[14]
Solutions ? On an individual basis you can sort drug CIDs by Mr as described
However, no filters can cleanly discriminate between data-supported or prophetic chemistry published in patents (i.e. the authentic wheat from the spurious chaff)
Patent-only flags on CIDs would be useful but as a filter they would remove many useful non-deuterateds
Filtering all Isotopically labeled compounds removes valuable experimental tools that are “in pots”
A (tiny?) fraction of the deuterated drugs may get data links
Drug crowdsourcing is already happening (e.g. Wikipedia) but we need the set to be flagged in PubChem
Some kind of “canonical clustering” in PubChem could help
[15]
Questions ?