chemical entity extraction using the chemicalize.org-technology josef scheiber novartis pharma ag...
TRANSCRIPT
![Page 1: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/1.jpg)
Chemical Entity extraction using the chemicalize.org-technologyJosef Scheiber
Novartis Pharma AG – NITAS/TMS
![Page 2: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/2.jpg)
Where the story of this project started ...
DreirosenbrückeNovartis Campus
A day in October 2008Some time around 7:45 in the morning ...
![Page 3: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/3.jpg)
Vision for textminingIntegration chemical, biological knowledge
![Page 4: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/4.jpg)
Mining for Chemical Knowledge - Rationale
- Make text corpora searchable for chemistry
- Generate chemistry databases for use in research based on Scientific Papers or Patents
- Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications
- Patent analyis for MedChem projects
Connection table
![Page 5: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/5.jpg)
Mining for chemical Knowledge - Rationale
Information on compounds targeting GPCRs
2005: >14.000 publications
1992: 256 articles & 34
patents
1988: 9 journal articles
HELPInformation explosion
Source: Banville, Debra L. “Mining chemical structural information from the drug literature.” Drug Discovery Today, Number 1/2 Jan. 2006, p.35-42
![Page 6: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/6.jpg)
Example:Project Prospect – Royal Society of Chemistry
Enhancing Journal Articles with Chemical Features
This helps you identifying other articles talking about the same molecule
![Page 7: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/7.jpg)
Mining for Chemical Knowledge – Focus for today
- Make text corpora searchable for chemistry
- Generate chemistry databases for use in research based on Scientific Papers or Patents
- Link Chemical Information with further annotation in an automated way for e.g. Chemogenomics applications
- Patent analyis for MedChem projects
Connection table
![Page 8: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/8.jpg)
A use case for successful patent mining(molecules you sometimes find in your inbox ;-) )
Vardenafil (2003, Bayer) –
€ 1.24 billion (USD 1.6 billion)
Sildenafil (1998, Pfizer) –
€ 11.7 billion (USD 15.1 billion)
Slide inspired by an example from Steve Boyer/IBM; Sales data from Prous Integrity datase
![Page 9: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/9.jpg)
Conventional Database Building
![Page 10: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/10.jpg)
Facts – current standard
... (ACS) owes most of its wealth to its two 'information services' divisions — the publications arm and the Chemical Abstracts Service (CAS), a rich database of chemical information and literature. Together, in 2004, these divisions made about $340 million — 82% of the society's revenue — and accounted for $300 million (74%) of its expenditure. Over the past five years, the society has seen its revenue and expenditure grow steadily ...
Source: ACS homepage
![Page 11: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/11.jpg)
Facts
Established applicationStraighforward useDe-facto Gold standardUnique data source
Very costlyNo structure export for reasonable priceVery limited in large-scale follow-up analysisMost recent patents not available
![Page 12: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/12.jpg)
Not data (search), but integration, analysis and insight, leading to
decisions and discovery
![Page 13: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/13.jpg)
Now – What would be the perfect solution?
All patent offices require to provide all claimed structures as machine-readable version available for one-click-download
![Page 14: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/14.jpg)
Text extraction
Definition: Extract all molecules that are mentioned in a patent text of interest, convert them to structures and make them available in
machine-readable format
![Page 15: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/15.jpg)
Mining for Chemical KnowledgeTechnologies from providers
Text entity recognition Image recognition
(a) Extractors (IUPAC names)- TEMIS Chemical Entity Relationships Skill Cartridge- Accelrys Pipeline Pilot extractor (Notiora)- Fraunhofer (ProMiner Chemistry)- Chemaxon (chemicalize.org)- Oscar (Corbett, Murray-Rust et al.)- SureChem- IBM ChemFrag Annotator
(b) Converter (Names connection table)- CambridgeSoft name=struct- Openeye Lexichem- Chemaxon
- OSRA (NIH)
- Clide Pro (Keymodule Ltd.)
- Fraunhofer chemoCR
- ChemReader
![Page 16: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/16.jpg)
The objective
To provide a tool that provides sophisticated text analysis methods for NIBR scientists and
thereby leverages the methods of TMS
![Page 17: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/17.jpg)
Mining for Chemical Knowledge – Novartis Tools – the chemicalize-technology is working under the hood!
Clipboard Analysis
Patent text
Identified structures
View structure onMouseOver
Export to other
applications
![Page 18: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/18.jpg)
Mining for Knowledge – Novartis ToolsInput example: J Med Chem Paper
![Page 19: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/19.jpg)
Mining for Chemical Knowledge – Use Case
Medicinal Chemist wants to synthesize competitor compound as tool compound for own project
Identification of core scaffold Analysis of
substitution patterns
This enables the identification of compounds most representative for a competitor patent
![Page 20: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/20.jpg)
Example – A text-based patent
Automated Text
extraction452
compounds
Reference636 compounds
71%
A patent example
![Page 21: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/21.jpg)
Example – An image-base patent
Text extraction not suitable for this case, it does find only a meager 40 molecules, 1129 in reference – Why?
An entirely image-based patent example
![Page 22: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/22.jpg)
Language issues – e.g. Japanese patents
![Page 23: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/23.jpg)
Encountered problems
OCR (Optical Character Recognition)!!
USPTO and WIPO are now available full text in most cases
Typos!
Name2Struct problems (less an issue here)
![Page 24: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/24.jpg)
IBM initiative Patent Mining / ChemVerse database (Steve Boyer)
The objective is to automatically extract all molecules from all patents available and make them searchable in a database
They leverage cloud computing and have access to all full-text patents
This is going absolutely the right direction
They annotate the molecules with information from freely available databases
![Page 25: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/25.jpg)
Future ideas: Patent Analysis
Markush translation, Image+Target
Ranking capabilities of outcome for User
„blurred“ dicos for translating stuff like aryl, cycloalkyl etc.
Select annotate as entity on the fly error-correction
Result goes in a database Crowdsourcing efforts to improve and store results
Suggest functionality
![Page 26: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/26.jpg)
To enable true Patinformatics analyses ...
Definition by Tony Trippe:
![Page 27: Chemical Entity extraction using the chemicalize.org-technology Josef Scheiber Novartis Pharma AG – NITAS/TMS](https://reader035.vdocuments.site/reader035/viewer/2022062404/5519ee8b5503464c588b45da/html5/thumbnails/27.jpg)
Acknowledgements
Alex Fromm Katia Vella Olivier Kreim
Therese Vachon Daniel Cronenberger Pierre Parisot Martin Romacker Nicolas Grandjean
NITAS/TMS Clayton Springer Naeem Yusuff Bharat Lagu
And many other people in different divisions of NIBR for their support