deuterated drugs in pubchem

[1]

The Unforseen Consequences of Opportunistic Deuterated Drug Claims,

Patent Extraction Feeds and the PubChem Chemistry Rules

When data integration gets fuzzy

Chris Southan

ChrisDS Consulting, Göteborg, Sweden

BioIT World, Hannover, Oct 2010

[2]

Background The scale of chemistry-biology-bioinformatics connectivity has made PubChem a de facto global data integration hub.

However the fidelity of this is compromised by factors including :

Inherent complexities of chemical structure representation

Chemistry rules that, while rigourous, tend to split CIDs

Submitter primacy and a low proliferation bar

Vendor dilution of bioanotated with no-data compounds

Increasingly complex BioAssay relationships

Patent-extraction feeds from commercial sources

[3]

Discordance of Drug Collections

[4]

Will the real Rosuvastatin stand up ?

[5]

But 15 CIDs ?

[6]

“Heavy” Rosuvastatin (+28) CID 25241235

[7]

Patent Filings on Deuterated Drugs

“Protia's patents appear to be a blunderbuss approach, with a mass of US filings 237 published to date but only 11 PCT applications. None of these provide exemplification or any biological description. Concert has 39 PCTs and 26 US applications published, Auspex 57 PCTS and 39 US applications”

(Comment from “In the Pipeline” blog, June 2009)

[8]

A Pipeline With Unintended Consequences

[9]

Some of them Might Just Work ?

But we don’t know which one of the 25 !

[10]

Picking Off the Best-sellers

Connectivity Deuteros

1) 38 32

2) 4 88 0 85

3) 15 12

7) 7 4

8) 8 7

9) 11 9

10) 68 66

13) 29 26

14 ) 51 48

15) 16 11

[11]

Sorting

[12]

Extent of the Problem

[13]

Does this Matter ? In the grand scheme of things maybe not - but ........

Not just the deuteros but other patent-extracted compounds , are causing CID ”multiplexing ” in PubChem (and ChemSpider)

The Pharma ”Crown-jewels” of marketed drugs are badly hit

Less experienced PubChem users could be confused

Some types of search results get messed up

They ”gum up” company internal integrated systems

Do we want prophetic or virtual structures in PubChem ?

[14]

Solutions ? On an individual basis you can sort drug CIDs by Mr as described

However, no filters can cleanly discriminate between data-supported or prophetic chemistry published in patents (i.e. the authentic wheat from the spurious chaff)

Patent-only flags on CIDs would be useful but as a filter they would remove many useful non-deuterateds

Filtering all Isotopically labeled compounds removes valuable experimental tools that are “in pots”

A (tiny?) fraction of the deuterated drugs may get data links

Drug crowdsourcing is already happening (e.g. Wikipedia) but we need the set to be flagged in PubChem

Some kind of “canonical clustering” in PubChem could help

[15]

Questions ?

deuterated drugs in pubchem

Health & Medicine

patent filings

pubchem chemistry rules

data compounds

patent extraction feedsand

prophetic chemistry

scale of chemistry

us applications comment

virtual structuresin