ii-sdv 2016 michael iarrobino - improving text mining results with access to full-text scientific...

14
Improving Text Mining Results with Access to Full-Text Scientific Articles Mike Iarrobino Product Manager, CCC

Upload: dr-haxel-congress-and-event-management-gmbh

Post on 17-Feb-2017

609 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Improving Text Mining Results withAccess to Full-Text Scientific Articles

Mike IarrobinoProduct Manager, CCC

Page 2: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Introduction

Mike IarrobinoProduct ManagerRightFind™ XML for MiningCopyright Clearance Center

Page 3: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Making Copyright Work – CCC and RightsDirect

Rightsholders Content Users

• Licensing Solutions

• Rights Management

• Content Delivery

• Copyright Education950+ million rights from:

• Publishers

• Authors

• Agents

• Creators

• 35,000 companies

• Workers worldwide

• 1,200 colleges and universities

• Publishers and Authors

Page 4: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

CCC and Text Mining

Rightsholders Content Users

Servicing many text mining license and content requests

Managing text mining feeds

Negotiating text mining rights with

multiple publishers

Page 5: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

“Text mining” is the process of deriving high-quality information from text materials using software.

Page 6: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Text Mining Non-Patent Literature

• Mining limited to abstracts

• High cost to obtain formatted full-text content and permission from multiple publishers

• Multiple formats

• Researchers can’t mine content to which they are not subscribed

Page 7: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

What is the Benefit of Full Text?

Volume Timeliness Quality

Catherine Blake. “Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles.” Journal of Biomedical Informatics Volume 43, Issue 2, April 2010, Pages 173–189

Elsevier (2015) Harnessing the Power of Content -Extracting value from scientific literature: the power of mining full-text articles for pathway analysis. Available at www.elsevier.com/__data/assets/pdf_file/0016/83005/R_D-Solutions_Harnessing-Power-of-Content_DIGITAL.pdf

Elsevier (2015) Harnessing the Power of Content -Extracting value from scientific literature: the power of mining full-text articles for pathway analysis. Available at www.elsevier.com/__data/assets/pdf_file/0016/83005/R_D-Solutions_Harnessing-Power-of-Content_DIGITAL.pdf

Enrique Bernal-Delgado and Elliot S Fisher. “Abstracts in high profile journals often fail to report harm.” BMC Medical Research Methodology (2008); 8:14

Page 8: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Volume and Recall

December 20158

(Abstract: "tau hyperphosphorylation" AND Abstract: kinase OR (GSK3β OR (CDK5 OR (MAPK1 OR (MARK1 OR (MARK2 OR (MARK3 OR MARK4))))))) AND (Abstract: alzheimer OR alzheimer's)

content:"tau hyperphosphorylation kinase"~25 OR "tau hyperphosphorylation GSK3β "~25 OR "tau hyperphosphorylation CDK5"~25 OR "tau hyperphosphorylation MAPK1"~25 OR "tau hyperphosphorylation MARK1"~25 OR "tau hyperphosphorylation MARK2"~25 OR "tau hyperphosphorylation MARK3"~25 OR "tau hyperphosphorylation MARK4"~25

Page 9: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Volume and Recall - Results

December 20159

0

100

200

300

400

500

600

700

800

BTK Tauhyperphosphorylation

Nu

mb

er A

rtic

les

Abstract

Full text

Page 10: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Text Mining Today – Example Workflow

December 201510

SearchGet

permissionDownload

PDFsConvert PDFs

Import into text mining software

SearchGet

permissionDownload

PDFsConvert PDFs

Import into text mining software

• Perform search• Obtain permission from publishers to mine full text for commercial use

• Requires automated tool or custom software to download in bulk

• Requires text mining permission from multiple publishers

• Requires content storage and feed management

• PDF is converted to a “blob of text”

• No tags

• Loss of metadata

• Low fidelity of content

• References induce noise

• Requires structuring text into XML

• Article text does not have “fields”

• Combining content from multiple sources takes time to normalize the metadata

SearchGet

permissionDownload

PDFsConvert

PDFs

Import into text mining

software

TEXT MINING TOOLS

Run queries

View results

MANUAL WORKTypically takes 4-8 weeks

Page 11: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

CCC’s RightFind™ XML for Mining Service

Build a corpus of full-text articles in XML format for mining

Text Mining SoftwareCCC’s Text Mining Service

Page 12: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

XML for Mining

• Rapid inventory growth

• MEDLINE abstract corpus

• Purchase not subscribed articles with cost optimization process

• MeSH article tagging and flat synonym list

Page 13: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Market Observations and Future Vision

ACCESS

AUTOMATION

Page 14: II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to Full-Text Scientific Articles

Thank you!Mike IarrobinoProduct Manager, [email protected]