how can the international chemical identifier (inchi) be extended to non …

37
How can the International Chemical Identifier (InChI) be extended to non-trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012

Upload: valery-tkachenko

Post on 10-May-2015

792 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: How can the international chemical identifier (InChI) be extended to non …

How can the International Chemical Identifier (InChI) be extended to non-

trivial chemicals? of the pillars of a

V. Tkachenko, A.J. Williams,Y. Borodina, F. Switzer, T. Peryea, L. Callahan

ACS Philly August 2012

Page 2: How can the international chemical identifier (InChI) be extended to non …

What is InChI

Page 3: How can the international chemical identifier (InChI) be extended to non …

InChI Examples

CH3CH2OHethanol

InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3

L-ascorbic acidInChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1

Page 4: How can the international chemical identifier (InChI) be extended to non …

InChI Structure

Page 5: How can the international chemical identifier (InChI) be extended to non …

InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the

SHA-256 algorithm)

Designed to allow for easy web searches of chemical compounds

InChIKeys consist of

14 characters resulting from a hash of the connectivity information of the InChI

followed by 9 characters resulting from a hash of the remaining layers of the InChI

followed by a single character indication the version of InChI used

followed by single checksum character

InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-

11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1

BQJCRHHNABKAKU-KBQPJGBKSA-N

Unlike InChI, InChIKey CT only by lookup

Page 6: How can the international chemical identifier (InChI) be extended to non …

Proliferation of InChI

Page 7: How can the international chemical identifier (InChI) be extended to non …

Search by InChI

Page 8: How can the international chemical identifier (InChI) be extended to non …

ChemSpider Google Searchhttp://www.chemspider.com/google/

Page 9: How can the international chemical identifier (InChI) be extended to non …

What’s the catch?

InChI has limitations InChI is ideal for

Simple Static Well-defined graphs

Real chemical substances can only be approximated by such graphs

Page 10: How can the international chemical identifier (InChI) be extended to non …

Limitations Non-trivial stereo (e.g. axial, planar)

Non-trivial tautomers (e.g. ring-chain)

Mixtures – full stereo is rarely known

Polymers

Markush structures

Organometalics

Inorganics

Materials

Reactions

Etc

Page 11: How can the international chemical identifier (InChI) be extended to non …

Chemical data complexity

Page 12: How can the international chemical identifier (InChI) be extended to non …

Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now

working on expanding InChI to new areas of chemical representation:

Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.

Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.

Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.

But what do we do NOW???

Page 13: How can the international chemical identifier (InChI) be extended to non …

Deposition Process

Non-redundant

data

Data

Va

lidat

ion

Sta

nd

ardi

zatio

n

Filt

erin

g

De

dup

lica

tion

Co

mp

one

ntiz

atio

n

Ma

pp

ing

Page 14: How can the international chemical identifier (InChI) be extended to non …

ChemSpider Data Model

Page 15: How can the international chemical identifier (InChI) be extended to non …

Organometallics

Page 16: How can the international chemical identifier (InChI) be extended to non …

Mixtures or unknown stereo

Page 17: How can the international chemical identifier (InChI) be extended to non …

Accelrys Enhanced Stereo

Page 18: How can the international chemical identifier (InChI) be extended to non …

MOL V3000

Page 19: How can the international chemical identifier (InChI) be extended to non …

Enhanced stereo and InChI…

Unfortunately not supported Is it important? Now real-world examples…

Page 20: How can the international chemical identifier (InChI) be extended to non …

FDA Substance Registration System

Page 21: How can the international chemical identifier (InChI) be extended to non …

Stoichiometric and non-stoichiometric mixtures

Moiety 1:

Moiety 2:

C H 3

NH

O HCH 3

O H

O

C H 3

NH

O HCH 3

O H

O

AND Enantiomer

C H 3

NH

O HCH 3

O H

O

Substance:

Page 22: How can the international chemical identifier (InChI) be extended to non …

Moiety 1:

Moiety 2:

NH 2

O –

O

O

Na +

NH 2

O –

O

ONa +

NH 2

O –

O

ONa +

&1

&2

Mixed

NH 2

O –

O

O

Na +

Moiety 3:

Moiety 4:NH 2

O –

O

ONa +

Substance:

Page 23: How can the international chemical identifier (InChI) be extended to non …

O H

O H

O H

O H

O H

O

O –OH Ca 2+

2

UNDEFINEDO H 2

O H

O H

O H

O H

O H

O

O –OH Ca 2+

2

Substance: Moiety 1:

OH 2Moiety 2:

(undefined)

Page 24: How can the international chemical identifier (InChI) be extended to non …

A

BO 2–Fe 2+

O 2–Fe 3+

2 3

Substance:

2 3

Fe 3+O 2–

O 2–Fe 2+

Moiety 1:

Moiety 2:

(A)

(B)

Page 25: How can the international chemical identifier (InChI) be extended to non …

O HOH

OH O H

O

OH

O HOH

OH O H

O

OH

O

O H

O H

O H

H

O H

OH

O H

OH

O H

O H

OH

O

O H

OH

O H

O H

OH

O

D-glucose

Page 26: How can the international chemical identifier (InChI) be extended to non …

SRS standardization approach

Substance description Standardization module Moieties generator Normalization InChI[Key] generator

Hash function f(InChIKeys, moieties)

Unique ID Standard description

Page 27: How can the international chemical identifier (InChI) be extended to non …

SRS TBD

Markush

Polymers

Proteins

Inorganics

Materials

Page 28: How can the international chemical identifier (InChI) be extended to non …

OpenPHACTS

Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project

To reduce the barriers to drug discovery in industry, academia and for small businesses

To build an open platform, integrating chemistry and biology data from public domain resources

Semantic web platform

Open Standards, Open Data and Open Source

Page 29: How can the international chemical identifier (InChI) be extended to non …
Page 30: How can the international chemical identifier (InChI) be extended to non …
Page 31: How can the international chemical identifier (InChI) be extended to non …

OpenPHACTS specifics

Active/inactive ingredient

Parent/child

Sample/substance

Misreferences (!!!)

Page 32: How can the international chemical identifier (InChI) be extended to non …

ChemSpider Reactions

Page 33: How can the international chemical identifier (InChI) be extended to non …
Page 34: How can the international chemical identifier (InChI) be extended to non …

ChemSpider Reaction Challenges

Deduplication

Identification

Deposition

Page 35: How can the international chemical identifier (InChI) be extended to non …

Conclusions

InChI is The Identifier

InChI has its limitations

InChI is work in progress

InChI deficiencies can be hot-fixed

Page 36: How can the international chemical identifier (InChI) be extended to non …

Acknowledgements

RSC Cheminformatics group

FDA SRS group

OpenPHACTS consortium

Software: InChI, GGA Software

Page 37: How can the international chemical identifier (InChI) be extended to non …

Thank you

Email: [email protected] Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16