text mining: the next data frontier. beyond open access

27
Presentation’s Subtitle #openminted_eu beyond Open Access Text Mining: the next data frontier Natalia Manola Athena Research & Innovation Centre OpenCon Satellite Berlin, 25 Nov 2016

Upload: openmintedeu

Post on 13-Apr-2017

144 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Text Mining: the next data frontier. Beyond Open Access

Presentation’s Subtitle

#openminted_eu

beyond Open Access

Text Mining: the next data frontier

Natalia ManolaAthena Research & Innovation Centre

OpenCon Satellite Berlin, 25 Nov 2016

Page 2: Text Mining: the next data frontier. Beyond Open Access

A few sobering facts on content production

OpenCon Satellite Berlin, 25 Nov 2016

● 1,8 billion websites & 3,46 billion internet users, on 25 September 2016.

● 24 million wireless sensors and actuators worldwide (553% up, between 2011and 2016)

● 16 zettabytes of useful data (16 Trillion GB) by 2020

● YouTube claims to upload 24 hours of video every minute, making the site ahugely significant data aggregator.

● Every second, on average, around 6,000 tweets are tweeted on Twitter, whichcorresponds to over 350,000 tweets sent per minute, >500 million tweets perday and around 200 billion tweets per year.

● 74,200,000 pages existed on Facebook, with 7 million apps and websitesintegrated with Facebook on 30/5/2016

2

Page 3: Text Mining: the next data frontier. Beyond Open Access

… And some facts on scientific literature

OpenCon Satellite Berlin, 25 Nov 2016

The global research community generates ~2.5 million new scholarly articles per year (English only)

The STM report (2015)

… some 90% of papers … are never cited (82% in the humanities)… of those articles that are cited, only 20 percent have actually been read… 50% of papers are never read by anyone other than their authors, referees and journal editors

Lokman I. Meho, The rise and rise of citation analysis, 2007

… one paper published every 12seconds… 70,000 papers published on a single protein, the tumor suppressor p53

Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014

3

Page 4: Text Mining: the next data frontier. Beyond Open Access

How can we make sense of this data?

OpenCon Satellite Berlin, 25 Nov 2016

4

Page 5: Text Mining: the next data frontier. Beyond Open Access

Emerging solutions

Machine readingprocess textual sources, organise and classify in various dimensions, extract main (indexical) information items,

… and “understanding” identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data

… and predictingenable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict

OpenCon Satellite Berlin, 25 Nov 2016

5

Page 6: Text Mining: the next data frontier. Beyond Open Access

However, …Multitude of solutions catering for different

Text Types NewswireScientific LiteratureTweets/blogsPatentsClinical/medical recordsTextbooks, monographsOnline forums….

LanguagesEnglish French GermanSpanishPortugueseItalianPolish….

TasksTranslationInformation ExtractionSemantic SearchQuestion AnsweringSentiment AnalysisSummarizationKnowledge Discovery….

DomainsFinance/BusinessHealthBiologySocial SciencesHumanities….

Creating a fragmented landscape

OpenCon Satellite Berlin, 25 Nov 2016

6

Page 7: Text Mining: the next data frontier. Beyond Open Access

A glimpse on the TDM landscape

OpenCon Satellite Berlin, 25 Nov 2016

7

Resource: FutureTDM project (www.fututetdm.eu)

Page 8: Text Mining: the next data frontier. Beyond Open Access

What can we do?

8

Page 9: Text Mining: the next data frontier. Beyond Open Access

1. Share content• Document literature content• Share in a meaningful way: what does Open Access really mean?

IPR and licensing• Study IPR restrictions for reuse of sources as well as possible exceptions• Promote clarity and standardisation of legal rights and obligations

Challenges• Rights statement vs. Open licenses (for repositories)• No access to full text. We live in a metadata world• No standard protocols, formats and APIs for access and retrieval• No capacity to handle extra traffic

OpenCon Satellite Berlin, 25 Nov 2016

9

Page 10: Text Mining: the next data frontier. Beyond Open Access

Proposed solution : Make TDM enabled hubs

OpenCon Satellite Berlin, 25 Nov 2016

10

Literature Repositories

OA Journals

Data Repositories

Aggregators

ArchivesMetadata

Full textData

OpenAIRE

CORE

PMC Europe

Guidelines APIs

TDM

Research networks

WIkiPedia/Media/Research

Page 11: Text Mining: the next data frontier. Beyond Open Access

2. Share TDM Services• Document language processing/text mining services and workflows in a

meaningful way for domain discipline researchers• Document language/knowledge resources, data categories taxonomies,

provenance information

Interoperable services• Common way of presenting annotated results• Combine services into workflows• Combine content and language resources with services and workflows• Combine automatic and manual/crowdsourcing annotation services

IPR and licensing• Translate the legal & policy aspects into specifications for lawful user-to-

service and service-to-service interactions

Challenges• Bring text miners close to the researcher problems and needs• Semantic interoperability (not just technical)

OpenCon Satellite Berlin, 25 Nov 2016

11

Page 12: Text Mining: the next data frontier. Beyond Open Access

OpenMintedEstablish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where

researchers can discover, collaboratively create, share and re-use knowledge from a wide range of text based

scientific and scholarly related sources.

OpenCon Satellite Berlin, 25 Nov 2016

12

A step from Open Access to Open Science

Page 13: Text Mining: the next data frontier. Beyond Open Access

HIGH LEVEL ARCHITECTURE

OpenCon Satellite Berlin, 25 Nov 2016

13

Policies & guidelines

Page 14: Text Mining: the next data frontier. Beyond Open Access

Register and Discover TDM Services and tools

Link to Content hubs

Run a TDM job and share results

Get people’s knowledge - Crowdsourced Annotation

Our Services

14

OpenCon Satellite Berlin, 25 Nov 2016

Build your own service – Combine components into a Workflow and SHARE

Page 15: Text Mining: the next data frontier. Beyond Open Access

Our UsersEnd users• Researchers, data base curators, Research Infrastructure

operators• Novice: use services to advance their science• Advanced: use TDM components into complex workflows

OpenCon Satellite Berlin, 25 Nov 2016

15

Content and service providers- Publishers, libraries, scientific data base centres, …- TDM researchers- SMEs

Page 16: Text Mining: the next data frontier. Beyond Open Access

OpenCon Satellite Berlin, 25 Nov 2016

Scholarly Comm.Feature extractionData citationResearch analytics

Life Sciences

Curation of databases and lexica in Chembolomics &neuroinformatics

Agriculture

Extracting information from tables for food safety alerts

Social Sciences

Data citation

Community Driven

16

From the very beginning…Requirements, content, barriers, expected outcomes.

… to the very end Create applications, validate and evaluate the results.

Page 17: Text Mining: the next data frontier. Beyond Open Access

Examples of OpenAIRE TDM services we want to share

17

@openaire_eu

Page 18: Text Mining: the next data frontier. Beyond Open Access

18

Discover research in context

OpenCon Satellite Berlin, 25 Nov 2016

Page 19: Text Mining: the next data frontier. Beyond Open Access

19

Research Trends and correlations

Text and data mining with domain specific knowledge

Interactive visualization for drill-down information

Trends in science

Correlations of funding programmes

Within a funder, oracross countries

OpenCon Satellite Berlin, 25 Nov 2016

Page 20: Text Mining: the next data frontier. Beyond Open Access

What will it look like?

20

Page 21: Text Mining: the next data frontier. Beyond Open Access

the openminted registry

OpenCon Satellite Berlin, 25 Nov 2016

21

Page 22: Text Mining: the next data frontier. Beyond Open Access

Browse tdm resources & tools/services

OpenCon Satellite Berlin, 25 Nov 2016

22

Page 23: Text Mining: the next data frontier. Beyond Open Access

Register, document, share tools

OpenCon Satellite Berlin, 25 Nov 2016

23

Page 24: Text Mining: the next data frontier. Beyond Open Access

Create your corpus, annotate, share

OpenCon Satellite Berlin, 25 Nov 2016

24

Page 25: Text Mining: the next data frontier. Beyond Open Access

How does this all bind together?

OpenCon Satellite Berlin, 25 Nov 2016

25

OpenAIRE

CORE

CrossRef

… OpenMinted REGISTRY

CLARIN

META-SHARE

OpenMinted WORKFLOWS

TDM TOOLSRepositories

(OA) Journals

Other textual resources e.g. medical records, PSI

How DOES open Science help?

Language resources

Page 26: Text Mining: the next data frontier. Beyond Open Access

What’s next

Participate with your ideas• Give us your feedback on our pending guidelines and APIs• Provide us with your TDM requirements – we have the

experts to consult you• Register your TDM services• Test out the system when it comes live (spring)

Watch out for• OpenAIRE’s datathons, tenders and challenges (60K in total)• OpenMinTeD’s tenders and challenges (240K in total)

OpenCon Satellite Berlin, 25 Nov 2016

26

Page 27: Text Mining: the next data frontier. Beyond Open Access

twitter.com/openminted_eu

facebook.com/openminted

bit.do/openmintedlinkedin

vimeo.com/openminted

bit.do/openmintedplus

THANK YOU!

Natalia [email protected]

twitter.com/openminted_eu

facebook.com/openminted

bit.do/openmintedlinkedin

vimeo.com/openminted

bit.do/openmintedplus27