text and data mining in public...

49
TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche – University of Lille, ADBU LIBER 2017 – Patras, Greece 1

Upload: vuongnguyet

Post on 09-Jan-2019

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

TEXT AND DATA MINING IN PUBLIC RESEARCH

Julien Roche – University of Lille, ADBU

LIBER 2017 – Patras, Greece

1

Page 2: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

2

1. Why does TDM matter?

2. Why isn’t it used more widely in public research?

3. How do we change this?

Page 3: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Study aims

Assess economic impact of TDM on public research in France via:

• Case studies (France, UK, Europe)

• Analysis of the relevance of a copyright exception for TDM –French context

• A final report for decision makers and lobbying

3

http://adbu.fr/etude-tdm/

Page 4: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

6-fold return€6 contribution to EU economy for each €1 directly generated by research universities (source: Biggar Economics)

20% per annumEstimated rate of return to public investment in science and innovation (source: Frontier Economics)

€16 billion Value of R&D performed within French universities and

public research bodies (source: Eurostat)

4

Page 5: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

2.4 millionScientific articles per annum

ZeroNumber of researchers who can keep up

2.5 quintillion bytesData produced each day

5

Page 6: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Any automated analytical technique aiming to analyse text and data in digital form in order to generate

information such as patterns, trends and correlations.

European Commission. Proposal for a Directive of the European Parliament and of the Council on copyright in the Digital Single Market

=> TDM is machine reading, i.e. just another way to read

6

What is TDM?

Page 7: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

BASE CAMP

Where are we now, and how did we get here?

7

Page 8: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

…countries, in which academic researchers must acquire the express consent of rights holders to

conduct lawful datamining, exhibit a significantly lower share of data mining

research output relative to total research output

Handke, Guilbault and Vallbe IS EUROPE FALLING BEHIND IN DATA MINING? (2015)

8

What is the problem?

Page 9: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

The European ecosystem for engaging in text and data mining remains

highly problematic… The end result: Europe is being leapfrogged by rising

interest in other regions, notably Asia.

Filippov, S. & Hofheinz, P. Text and Data Mining for Research and Innovation(2016)

9

What is the result?

Page 10: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Legislative options

10

2014 2017-2018

Industry self-regulation

Mandatory exceptions to copyright

Non-commercial research only

Commercial research, beneficiaries restricted

1 2 3 4

Commercial research purpose, beneficiaries unrestricted

Loi pour une République Numérique (Loi LEMAIRE)N° 2016-1321 - 7 October 2016

1.5?

Page 11: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Restriction France

No lawful access

Not scientific literature

-

Not public research

Commercial purpose

Conservation not bydesignated body

Using a TDM exception

11

Page 12: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• The copyright directive (20011/29/EC) < revision in preparation -> end

of 2017 / beginning of 2018 ;

• Bringing French and European frameworks into line ;

• A MOU – best practices, waiting for the new EC copyright directive ;

• A French decree, when the European framework is set.

12

The French context – Update, July 2017

Page 13: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• In December 2016, the FutureTDM project released a policy

framework document outlining the needs for a successful

implementation of TDM.

• Their work identifies a series of barriers that need to be

overcome opposed to high-level principles that should be

followed to address them.

13

Barriers to TDM < FutureTDM

Source: http://openminted.eu/future-tdms-policy-recommendations/

Page 14: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• The barriers that need to be overcome are as follows:

• Uncertainty: this category includes uncertainties as to how, why and if TDM can be

carried out, as well as the lack of awareness of different aspects of TDM.

• Fragmentation: this refers to the fragmentation in the TDM landscape, which prevents

TDM from being carried out across e.g. national borders, scientific domains, companies

or fields of expertise.

• Restrictiveness: the last category refers to direct limitations to the ability to carry out

TDM, in the form of restrictive laws, lack of expertise, limited (financial) resources, etc.

14

Barriers to TDM < FutureTDM

Source: http://www.futuretdm.eu/wp-content/uploads/D5.1-Policy-Framework.pdf

Page 15: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• The high-level principles identified to overcome the barriers are:

15

The future of TDM < FutureTDM

• Awareness and Clarity: Information and clear

actions are crucial for a flourishing TDM

environment in Europe.

• TDM without Boundaries: boundaries should be

broken down to reduce fragmentation in the TDM

landscape.

• Equitable Access: access to TDM tools,

technologies and sources should take into

account the need of both users and providers.

Source: http://openminted.eu/future-tdms-policy-recommendations/

Page 16: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Copyright exception(Base Camp)

Camp 1: Legal clarity

EU Directive

Camp 2: Access to content

Camp 3: Technical infrastructure

Camp 4: Skills and support

Summit: Researchers embrace TDM

The long road to TDM

Page 17: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

1.ACHIEVING LEGAL CLARITY

17

Page 18: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

The exception has made a massive difference...

Petr Knoth, Open University, UK

18

Page 19: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

…the definition of commercial and non-commercial research

is creating uncertaintyPetr Knoth, Open University, UK

19

Page 20: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

What needs to happen?• Communicate legal provisions for TDM with

certainty and clarity • Clarify the exception’s scope where public

researchers collaborate with commercial partners• Monitor the interaction of the copyright

exception with digital rights management (DRM), licensing and other relevant legal regimes

20

Page 21: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

2.SECURING ACCESS

21

Page 22: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

I scaled down my TDM research, and had to exclude two publishers… I couldn’t do

what I set out to do

Chris Hartgerink, Tilburg University, Netherlands

22

Page 23: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

I had to ask too many publishers for the right to

download … it takes a lot of time and … the publishers’ servers frequently block us.

Mathieu Andro, INRA, France

23

Page 24: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

What is the problem with access?

• Technical protection measures (TPMs)• Crawler traps• Restricted access to application

programming interfaces (APIs)

24

Page 25: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• Incorporate TDM clauses into model licence agreements

• Educate researchers on their rights• Maintain dialogue with publishers• Improve access through better

infrastructure…

25

What needs to happen?

Page 26: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

3. INFRASTRUCTURE & TOOLS

26

Image: National Geographic

Page 27: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

…Every time you have a new project or data source… you hit issues about

how the documents are structured, oddities of formatting, and so on.

Mark Greenwood, GATE, UK

27

Page 28: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

The TDM Landscape

28

Source: OpenMinTED

Page 29: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• Invest in TDM infrastructure

• Make TDM accessible to non-specialists

• Streamline access

• Open standards and harmonised data

formats

29

What needs to happen?

Page 30: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

4.SKILLS & SUPPORT

30

Page 31: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

…We have algorithms to answer questions, but we do not have algorithms to ask

questions

François Rioult, GREYC Laboratory, Université de Caen, France

• François Rioult

31

Page 32: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

32

What is the role of the librarian?

Photo: REUTERS

Page 33: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

The library needs to be able to say: ‘If you’ve got a question

about TDM, come to us’

Danny Kingsley, Head of Scholarly Communications, University of

Cambridge, UK

33

Page 34: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Library support for TDM

• Advocacy

• Copyright advice

• Access to legal expertise

• Skills development and training

• Advice on data sources and tools

34

Page 35: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Small steps, big progress

Ex. “WillO” tool of the university of Lille https://decadoc.typeform.com/to/W2ZZMV

=> a tool for French researchers to know their rights andtheir obligations when they want to put their publications in OpenAccess, in the legal framework of the French law

35

Page 36: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

5.EMBRACING TDM

36

Page 37: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

37

Page 38: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

"Because it's there“

Edmund Percival Hillary

38

Why?

Page 39: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

There are so many obstructions in the way of doing this research, and doing it well. It is just too hard and so people do other

things

Ross Mounce, University of Cambridge, UK

39

Page 40: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

• Endorsement by senior research leaders

• Funding and incentives linked to TDM

• Alignment with moves to open science

40

What needs to happen?

Page 41: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

41

1. Why does TDM matter?

2. Why isn’t it used more widely in public research?

3. How do we change this?

Page 42: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Why does TDM matter?Public research is valuable

42

TDM makes research more efficient

TDM is worth investing in

Page 43: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

43

1. Why does TDM matter?

2. Why isn’t it used more widely in public research?

3. How do we change this?

Page 44: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Copyright exception(Base Camp)

Camp 1: Legal clarity

EU Directive

Camp 2: Access to content

Camp 3: Technical infrastructure

Camp 4: Skills and support

Summit: Researchers embrace TDM

Page 45: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

45

1. Why does TDM matter?

2. Why isn’t it used more widely in public research?

3. How do we change this?

Page 46: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

46

Libraries

• Monitor researchers’ experience

• Develop case studies and guidance

• Involve the national libraries

• Invest in TDM support

• Incorporate TDM clauses into licence agreements researchers’ experiences

Making TDM a reality

Page 47: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

Full case studies from the TDM report

http://adbu.fr/competplug/uploads/2016/12/Annex-1-Full-case-studies-Final-11-Dec-16.pdf

47

Page 48: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

48

Legislators

• Provide certainty

• Enable public/private partnerships

• Monitor interaction with other legislation (e.g. DRM)

Institutions/research leaders

• Endorse TDM

• Invest in library services

• Explore knowledge exchange opportunities

Research funders

• Invest in infrastructure

• Forum to improve access

• Link TDM to Open Science

Publishers & providers

• Cloud services for TDM

• Streamline access

• Open, harmonised standards

Making TDM a reality

Page 49: TEXT AND DATA MINING IN PUBLIC RESEARCHliber2017.lis.upatras.gr/wp-content/uploads/sites/6/2017/04/7.1.pdf · TEXT AND DATA MINING IN PUBLIC RESEARCH Julien Roche –University of

J. Roche – R. Johnson

Thank you

[email protected]

[email protected]

https://adbu.fr

www.research-consulting.com

49

http://adbu.fr/etude-tdm/Full report available at::