the chemical most common denominator: use of chemical ...bulletin.acscinf.org/pdfs/247nmacs69.pdf•...

27
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014 1 / 27 The chemical most common denominator: Use of chemical structures for semantic enrichment and interlinking of scientific information V. Eigner-Pitto, J. Eiblmaier, H. Kraut, L. Isenko, H. Saller, P. Loew InfoChem GmbH, Landsberger Strasse 408, Munich, 81241, Germany

Upload: others

Post on 26-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

1 / 27

The chemical most common denominator:

Use of chemical structures for semantic enrichment

and interlinking of scientific information

V. Eigner-Pitto, J. Eiblmaier, H. Kraut, L. Isenko, H. Saller, P. Loew

InfoChem GmbH, Landsberger Strasse 408, Munich, 81241, Germany

Page 2: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

2 / 27

Outline

• Introduction

o Role of structure searching

o Where do I perform structure searches?

o Cost implications

• Setting the scene: chemical structures as common denominator?

o Publishers efforts

Creation of chemical content

Semantic enrichment of journal articles

• Case Studies:

o Wiley The Smart Article

o Springer Chemistry Data Warehouse

http://www.bubblews.com/news/2372700-tips-to-be-a-professional-content-writer

Page 3: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

3 / 27

Why Structure Searching?

• CICAG (RSC) Survey by Neil Stutchbury, May 20, 2009

Chemical Information Mining: Possibilities and Pitfalls

(http://www.rsc.org/images/ChemInfoMining_tcm18-153536.pdf)

65 responses from Pharma, Academia, Vendors, and Publishers

“Search documents by chemical structure or substructure”?

Page 4: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

4 / 27

Diazepam OR Valium OR Ansiolisina OR Diazemuls OR Relanium OR Stesolid OR

Apaurin OR Faustan OR Seduxen OR Sibazon OR Methyldiazepinone OR Calmocitene

OR Neurolytril OR Bialzepam OR Ceregulart OR Condition OR Diazetard OR Liberetas

OR Relaminal OR Serenamin OR Tranquirit OR Ansiolin OR Apozepam OR Atensine

OR Bensedin OR Calmpose OR Diacepan OR Diazepan OR Dipezona OR Domalium

OR Kiatrium OR Paranten OR Quetinil OR Quiatril OR Quievita OR Renborin OR

Ruhsitus OR Seduksen OR Serenack OR Serenzin OR Stesolin OR Tensopam OR

Horizon OR Lembrol OR Morosan OR Saromet OR Sedipam OR Setonil Anxionil OR

Benzopin OR Calmaven OR Chuansuan OR Desconet OR Desloneg OR Diaceplex OR

Diazepin OR Gewacalm OR Jinpanfan OR Mentalium OR Metamidol OR Nixtensyn OR

Novodipam OR Pacitran OR Paralium OR Prozepam OR Psychopax OR Radizepam OR

Simasedan OR Trankinon OR Trazepam OR Valaxona OR Valiquid OR Valuzepam OR

Vanconin OR Antenex OR Arzepam OR Betapam OR Diapine OR Diaquel OR 7-Chloro-

1,3-dihydro-1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR NCGC00178168-01 OR

WLN: T67 GNV JN IHJ CG G1 KR OR 2H-1,4-Benzodiazepin-2-one, 7-chloro-1,3-

dihydro-1-methyl-5-phenyl- OR CPD000058398 OR SAM001246536 OR

SMR000058398 OR 439-14-5 OR 7-Chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-

2(1H)-one OR 7-Chloro-1-methyl-2-oxo-5-phenyl-3H-1,4-benzodiazepine OR 7-Chloro-

1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR C06948 OR D00293 OR 5-24-04-

00300 OR D003975 OR A3662/0155188 OR I06-0194 OR 1-Methyl-5-phenyl-7-chloro-

1,3-dihydro-2H-1,4-benzodiazepin-2-one OR 7-Chloro-1-methyl-5-3H-1,4-

benzodiazepin-2(1H)-one OR 7-chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-2-one

OR DZP OR Dap OR Pax OR 11100-37-1 OR 53320-84-6 OR

InChI=1/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-

11/h2-9H,10H2,1H

... (343 Synonyms!)

„Full Text Searching is Sufficient!“

WLN

SMILES

SMARTS

ROSDAL

Connection Table

Molfile

SDfile

CML

InChI

InChI Key

http://us.cdn4.123rf.com/168nwm/baz777/baz7771101/baz777110100

058/8576422-cartoon-scienziato-isolato-su-sfondo-bianco.jpg

Page 5: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

5 / 27

Where am I Able to Perform Structure Searches?

Page 6: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

6 / 27

Manuscript submission

Publishing

Cost Implications

Manual Indexing

Database production

http://premium.wpmudev.org/blog/tutorial-

how-to-add-authors-images-to-your-

wordpress-blog/

Page 7: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

7 / 27

Publishers Efforts

• Production of chemical content:

o chemical named entity recognition

o automatic CDX work-up

• Semantic enrichment of journal articles

o RSC: RSC Semantic Publishing (Project Prospect)

o NPG: sematically enriched PDF

o Elsevier: Article of the future

o Wiley: The Smart Article

• Structure search on web-pages (partly)

• Production of chemical content:

o chemical named entity recognition

o automatic CDX work-up

• Semantic enrichment of journal articles

o RSC: RSC Semantic Publishing (Project Prospect)

o NPG: sematically enriched PDF

o Elsevier: Article of the future

o Wiley: The Smart Article

• Structure search on web-pages (partly)

Page 8: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

8 / 27

Manual Indexing

Publishing

Production of Chemical Content

Page 9: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

9 / 27

Publishers Efforts

• Production of chemical content:

o chemical named entity recognition

o automatic CDX work-up

• Semantic enrichment of journal articles

o RSC: RSC Semantic Publishing (Project Prospect)

o NPG: sematically enriched PDF

o Elsevier: Article of the future

o Wiley: The Smart Article

• Structure search on web-pages (partly) http://manuelo-pro.deviantart.com/art/Disclaimer-281316501

• Production of chemical content:

o chemical named entity recognition

o automatic CDX work-up

• Semantic enrichment of journal articles

o RSC: RSC Semantic Publishing (Project Prospect)

o NPG: sematically enriched PDF

o Elsevier: Article of the future

o Wiley: The Smart Article

• Structure search on web-pages (partly)

Page 10: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

10 / 27

• Pioneer work: Project Prospect (2007)

• Online since 2011

• Extraction of chemical names from over

30,000 journal articles

• Integration of compounds into ChemSpider

• Approach integrated within routine

publication processes

• Features:

o Highlighting of:

Compounds

Chemical terms

Biomedical terms

o Link to compounds in ChemSpider

o Structure search only in ChemSpider

RSC: Semantic Publishing

Page 11: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

11 / 27

• XMP-embedded PDFs available online since 2008

• Entity specific annotation service:

o SureChem for chemical compounds

o LuXiD for genes/proteins

o …

• Mix between automated services and editorial QA

• Features:

o Figures and compound browser

o Links to:

Web of Science

PubMed

CAS Reference Linking

o No structure search

Nature Publishing Group: Semantically Enriched PDF

Page 12: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

12 / 27

• Launched 2012

• Guiding principals:

o readability

o discoverability

o extensibility

• Supplementary content, features and external

databases info presented in right sidebar

• Features:

o 3-pane presentation layout:

navigation bar

main content area

right sidebar

o Links to:

NCBI

Reaxys

… (depending on subject)

o No structure search

Elsevier: Article of the Future

Page 13: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

13 / 27

Wiley: The Smart Article

• Launched in 2012

• Goal: providing quick information on chemical compounds

featured in an article, chemical terms in the text, and other

key parts of the chemistry within the article

• Live for following journals and major reference works:

o Chemistry: An Asian Journal

o Chirality

o Applied Organometallic Chemistry

o Journal of Physical Organic Chemistry

o Journal of Heterocyclic Chemistry

o eEros

o Organic Synthesis

o Organic Reactions

• Features:

o Compound browser

o Chemistry term highlighter

o Compound index

o Enhanced abstract page

o Compound record

o Chemistry structure search

Page 14: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

14 / 27

Structure as Common Denominator: 2 Use Cases

Data Warehouse Concept

Page 15: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

15 / 27

The Challenge*

*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin

Page 16: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

16 / 27

Text Annotation and Scheme Enumeration:

Chemistry Enrichment Workflow*

*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin

Page 17: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

17 / 27

Examples

Page 18: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

18 / 27

Data Warehouse Concept

The Challenge

• Interlink different data repositories via chemical structure

• Create one search interface

• Data aggregation / results consolidation

Page 19: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

19 / 27

Data: Selected SpringerLink Subject Collections (1846 – 2011):

• Biomedical and Life Sciences

• Chemistry and Material Science

• Earth and Environmental Science

• Engineering

• Medicine

• Physics and Astronomy

Page 20: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

20 / 27

Structures

Structures (annotated)

Structures

Reactions

Database

Structures

--------

Full-text

Structures (annotated)

Full-text

Structures (annotated)

Full-text

Structures (annotated)

Full-text

Structures (annotated)

Structures

Reactions

Structures

Reactions

Concept: Springer Chemistry Data Warehouse

Springer Chemistry

Data Warehouse

Page 21: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

21 / 27

Document

display

The Demonstrator

Springer Structure

Data Warehouse

Structure search

Display

servers

Client

computers

Contains:

• master index of all structures

• basic molecule attributes

• links to the source page/document

Internet/Intranet

HTTP(S)

Page 22: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

22 / 27

Example: Entry Point Document

Page 23: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

23 / 27

Example: Entry Point Structure Search

Page 24: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

24 / 27

Example: Entry Point Substructure Search

Page 25: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

25 / 27

Summary

• Importance of structure searching and cost implications

• Publisher efforts

o Automatic generation of chemical content

o Semantic enrichment

• Case studies where structure is common denominator

o Generation of chemical content for Wiley

o Springer study: “Data Warehouse”

Proof of concept

Demonstrator

http://writing.phillipmartin.info/la_summary.htm

Page 26: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

26 / 27

Conclusions

• Starting middle 2000 chemical structure gains significance by publishers

• Publishers recognize importance of structure searching

• Chemical content is generated to a greater extent with automatic processes

The chemical structure is an

extremely efficient entity to be

used for effective retrieval as well

as linking of different sources

http://www.nedarc.org/emsDataSystems/lessonslearned.html

Page 27: The chemical most common denominator: Use of chemical ...bulletin.acscinf.org/PDFs/247nmACS69.pdf• Goal: providing quick information on chemical compounds featured in an article,

InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014

27 / 27

Acknowledgments

• Reinhard Neudert (Wiley)

• Wendy Warr

• InfoChem Team

o Josef Eiblmaier

http://www.wien2k.at/pictures/pa2005/pa/Thank%20you%20for%20your%20attention%2001.html

http://www.allenschool.edu/blog-online/questions-medical-billing-job-offered/2681/