research data management at imperial college london

35
Library Services Research Data Management at Imperial College London 17 th May 2017 University of Copenhagen Library Sarah Stewart - Research Data Support Assistant, Scholarly Communications Team / @Biostew ORCID: http://orcid.org/0000-0002-9465-4042

Upload: sarah-anna-stewart

Post on 28-Jan-2018

166 views

Category:

Education


1 download

TRANSCRIPT

LibraryServices

Research Data

Management at

Imperial College

London

17th May 2017 – University of Copenhagen Library

Sarah Stewart - Research Data Support Assistant,

Scholarly Communications Team / @Biostew

ORCID: http://orcid.org/0000-0002-9465-4042

Imperial College London (some context)• ~15,000 students and

~8000 staff, including

~3000 researchers

• International community,

with students from 125

countries

• Focus on four main

disciplines: Sciences,

engineering, medicine and

business

• Times Higher Education

World University Rankings

2016-2017: 3rd in Europe

and 8th in the World.

• Greatest concentration of

high-impact research of

any major UK University.

The Strong Case for RDM

• Intensive Data-Generating Research Hubs = ‘Big Data’

• UK Med Bio - Bioinformatics Data Science Group – research into causes

and progression of human diseases.

• NHS Trust Research Data (Medicine)

• Research Computing Group and Research Software Engineering

Community

• But also many important ‘small data’ projects across College.

Funder requirements…

“Publicly funded research data are a public good,

produced in the public interest, which should be made

openly available with as few restrictions as

possible…”

RCUK Common Principles on Data Policy

Funder requirements…

Data Science hub and KPMG Data

Observatory launch (Nov 2015)

"At a research intensive

university like Imperial it is

hard to do anything that

doesn't involve data.“

James Stirling, Provost

"Data is at the heart of the

human condition."

Joanna Shields, UK Minister

for Internet Safety and

Security

The importance of RDM…

“In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.”http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416

Data Loss…

Process of policy development

•2014: Draft policy: “Statement of Strategic Aims”

•Lack of reliable data (on data storage needs (scale) in particular)

•Concerns about cost of maintaining infrastructure

•Concerns about uncertainties and changing market / policy landscape

•Decision: re-think approach – more cost-effective, based on better data

•Approach: RDM Green Shoots and RDM Investigation

•Funded by Vice-Provost (Research)

•Green Shoots: 6 bottom-up, academic projects (2nd half of 2014)

•RDM investigation (Oct 2014-Jan 2015)

•Online survey (academics; 390 responses)

•~40 interviews (academics)

•Workshops (academics & data managers)

Online survey – where does active data

live?

0 10 20 30 40 50 60 70 80

College computer

External/portable storage

Cloud storage

Personal computer

Departmental/group storage

College H drive

ICT central storage

Use of different types of storage in %

Online survey – growth of data volume

0 5 10 15 20 25 30

> 1 PB

100 TB – 1 PB

10 TB – 100 TB

1 TB – 10 TB

100 GB – 1 TB

10 GB – 100 GB

< 10 GB

Research group data storage needs in %

Now

In 2 years

Findings (best practice)

•RDM principles are considered to be sound but not fully practised

•Sharing publicly-funded data accepted in principle but some question value and cost

•Concerns about (metadata) effort to make shared data discoverable

•Metadata schemas are not yet widely available across disciplines

•Auto-generate metadata where possible

•Consensus that RDM training for PhDs is vital(also to ensure data loss when they leave)

Findings (data)

•60-100% of grant required to re-generate data used in publications

•% of data that needs retaining to support publications: ~60%

•Data storage capacity will have to grow significantly

•Concerns around back-up and archiving, esp. considering data volume

•Popularity of cloud services (as opposed to College storage)

Researchers want self-administered, secure, responsive solutionfor data sharing, storing and archiving; open APIs preferred

(“Yes [storage] is really important. Basically, whenever we have been out to talk to researchers, that's the thing they have latched on to and want to talk about the most.” 10.1371/journal.pone.0114734)

Conclusions / policy implementation

principles

•Provide platform-independent, flexible data storage

•Embed RDM training into PhD progression

•Where available, uses existing workflows:

Symplectic Elements: metadata management

Spiral (DSpace): public (metadata) catalogue

•Additional infrastructure:

•use external resources

•no long-term commitment

•as flexible as possible

•cost-effective

Infrastructure summary•Flexible, can react to market / policy changes

•Components can be exchanged, no additional

in-house infrastructure

•Make a start, collect data, learn – change as required

•Preservation infrastructure needs further work

(discussions with Arkivum about ‘framework’ for costing

into grants) – how much do we need

to retain beyond published data?

•It isn’t perfect, but we can make a start

Result: Imperial College RDM Policy

“Imperial College London is committed to

promoting the highest standards of academic

research, including excellence in research data

management. This includes a robust digital

curation infrastructure that supports open data

access and protects confidential data. The

College acknowledges legal, ethical and

commercial constraints on data sharing and the

need to preserve the academic entitlement to

publication.”

“Principal Investigators have overall

responsibility for the effective management of

research data generated within or obtained for

their research, including by their research groups.

The Library and ICT will provide training,

guidance and services to support PIs.”

http://imperial.ac.uk/research-data-management

Who are we?

Helping the Imperial community to communicate and disseminate their research

and academic work.

The RDM Workflow at Imperial

RDM Infrastructure

Data

Access

Statement

Data Management Plans: DMP Online

Live Data Storage: Box (and Others)

• Box for live data storage (non-sensitive) and data

sharing

• Sensitive data storage via ICT secure storage and

encryption

• Specialist data storage, eg. Omero in Bioinformatics

Data Science Group for light microscopy images

• Research Computing Repository

• Imperial GitHub for Software and code

Treat software as valuable research output

PyRDM Green Shoots projectZenodo integrates with GitHub

College survey on distributed version controlSoftware Sustainability Institute – I a fellow

Archiving Data ‘without a Repository?’

• Data is archived in Zenodo

or in UK Data Service

(sensitive data) post-

project

• Software and code

archived in Zenodo via

GitHub

• Metadata from Data and

Software are deposited

into Spiral via Symplectic

• Indexed by DataCite and

CrossRef

ORCID – Open Researcher and Contributor ID

•Emerging global standard for identifying authors of academic outputs

•The College created ORCID iDs for academics staff in late 2014

(now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements)

•Imperial hosted launch of Jisc ORCID consortium with

50 UK universities in September 2015

http://www.imperial.ac.uk/orcid

Case for a national infrastructure?

Currently, ~100 UK institutions spend effort to define and implement

an RDM infrastructure (storage, workflows, interfaces, metadata, compliance, monitoring,

business model etc.). Some aspects

have to be local, but…

…imagine a national research data infrastructure (say for data

publishing and preservation), run by RCUK:

•Economies of scale

•No issues with funding

•Just one system to interface with

•Increased visibility/discoverability

•Solution would by default be compliant

•No commercial “ownership” of public data

Outreach – Love Your Data!

• PhD Training on RDM Basics and DMPOnline (including

PhD-specific DMPOnline template)

• RDM ‘Drop-in Clinics’

• RDM ‘Byte-Size’ sessions – informal sessions on various

topics

• Imperial Data Circus

• Open Access Road Show

Liaisons

BDAU

FoE

FoLS

Business

School

DoM

DoSC

Ped

Materi

als

Bioinf

Grad

School

ESA

RDM

Clinics

RDM Talks

1:1s

RMs

Chem

Aero

HPC

RSE

CDT

CM

Hub

FoM

PhD

Webi

DMPDOIs

DoM

Event

New

Starter

email

Research

Doughnut

RDM Outreach

OA Team

OA/RDMOA Week

Data

Circus

Civil

Bio

engcomp

Imperial Data Circus

• Originally for Open

Access Week 2016

• Informal showcase

for research

conducted at

Imperial with Open

Data and Open

Software

• Provides a forum to

discuss open

research across

disciplines

Engaging Directly with Researchers

• Embedded approach – meet with researchers in situ – in

their labs and offices

• One-on-one or group meetings

• Departmental meetings to inform on policy changes and

updates and provide insight into best practice.

Stats are exciting!

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81

0

20

40

60

80

100

120

140

Time Series of RDM Enquiries (Totals)

Number of enquiries

0

20

40

60

80

100

120

Dataset Deposits in Spiral, 2015-2016

Number of Deposits Cumulative Deposits

0

2

4

6

8

10

12

14

16

18

Software Deposits in Spiral - 2015-2016

Series1

Series2

Data Catalogue

RDM – How are they asking?

How RDM Enquiries are Resolved

ASK rdm-enquiries one-to-one

Nature of RDM Enquiries

Box Data Access Statement Data Management Plan Data Sharing/Publication

Data Archiving DOIs/Metadata Data Policy Software

Zenodo Outreach Data Licenses

rdm-enquiries: what are they asking?

On the Horizon…

• On-line MOOC, pre-recorded webinar and video

presentations for researchers and students

• Medicine-specific DMPOnline template

• Jisc Shared Services Pilot – UK-wide network of data

management services (in planning)

Questions?

For more information:

www.imperial.ac.uk/rdm

[email protected]

Sarah Stewart – [email protected]

@Biostew

Ash Barnes – [email protected]

@ashbarnes71