paving the way to open and interoperable research data service workflows

57
Paving the way to open and interoperable research data service workflows Progress from 3 perspectives Angus Whyte, Digital Curation Centre Rory Macneil, Research Space Stuart Lewis, University of Edinburgh Repository Fringe, Edinburgh, 2 nd August 2016

Upload: the-university-of-edinburgh

Post on 24-Jan-2018

77 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Paving the way to open and interoperable research data service workflows

Paving the way to open and interoperable research data service workflows Progress from 3 perspectives

Angus Whyte, Digital Curation Centre Rory Macneil, Research SpaceStuart Lewis, University of Edinburgh

Repository Fringe, Edinburgh, 2nd August 2016

Page 2: Paving the way to open and interoperable research data service workflows

Paving the way to open and interoperable research data service workflows Progress from 3 perspectives

Angus Whyte, Digital Curation CentreResearch data service models, new DCC guidance and some draft ‘design principles’ for institutions integrating research data service workflows

Rory Macneil, Research SpaceIntegrating the RSpace ELN with University of Edinburgh’s DataShare and Harvard’s Dataverse repositories

Stuart Lewis, University of EdinburghDataVault –Jisc Research Data Spring prototype for packaging data to be archived

Page 3: Paving the way to open and interoperable research data service workflows

Guidance from Digital Curation Centre

Aims to effectively support organisations with-

• Providing effective research data services

• Promoting reusability of their research data

• and reproducibility of their research

Page 4: Paving the way to open and interoperable research data service workflows

Models that reflect and help shape reality

Higgins, S. (2008) The DCC Curation Lifecycle Model. Int. Journal Digital Curation 2008, Vol. 3, No. 1, pp. 134-140 doi:10.2218/ijdc.v3i1.48

Key element of DCC guidancee.g. Curation Lifecycle Model (2008)

Page 5: Paving the way to open and interoperable research data service workflows

DCC research data service model (2016)

RDM policy & strategyBusiness plans &

sustainability

Training

Data Management Planning

Active data management

Appraisal & risk assessment

Preservation

Access & publishing

Discovery

Ref. DCC How-to Guide on Evaluating Data Repository and Catalogue Platforms (forthcoming, 2016)

Advisory services

Page 6: Paving the way to open and interoperable research data service workflows

Some working definitions

Research data service – “a means of delivering value to the producers and users of digital objects by facilitating outcomes they want to achieve without the ownership of specific costs or risks” (derived from ITIL definition of a service)

Sub-types -

Active data management “services used to create or transform digital objects for the purposes of research”

Preservation: “services offering to ensure digital objects meet a defined level of FAIRness - findability, accessibility, interoperability, and reusability -for a designated community and period of time”

Publication: “services offering to enhance digital objects FAIRness by reviewing their quality on specified criteria, or connecting them to additional metadata”

Guidance: “services offering practical guidance on choosing or using the above services”

Page 7: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

1. Active data management services should use openstandards to express and expose the objects and metadata they offer to downstream services, including their access and reuse terms

Page 8: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

2. Preservation and publication services should publishpolicies stating what digital object types they accept, for what communities, and on what terms and conditions

Page 9: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

3. Preservation and publication services should makeopenly available sufficient metadata to enable reuseof their outputs, including all terms and conditionsfor third-party access and reuse

Page 10: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

4. Active data management, preservation and publication services should make sufficient detail of their workflows available to support thereproducibility of research that produced the digitalobjects they act upon

Page 11: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

5. Guidance services should support users of otherservices to make an informed choice of downstreamservice capabilities, informed by best practices forreuse and reproducibility

Page 12: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

6. Guidelines 1-5 should be implemented usingmachine-actionable content

Page 13: Paving the way to open and interoperable research data service workflows

DCC research data service model (2016)

RDM policy & strategyBusiness plans &

sustainability

Training Advisory services

Data Management Planning

Active data management

Appraisal & risk assessment

Preservation

Access & publishing

Discovery

Ref. DCC How-to Guide on Evaluating Data Repository and Catalogue Platforms (forthcoming, 2016)

Page 14: Paving the way to open and interoperable research data service workflows

Data Management Planning

Active data management

Appraisal & risk assessment

Preservation

Access & publishing

Discovery

Real services are made up of many more partstools and services at researchers disposal are increasing

Page 15: Paving the way to open and interoperable research data service workflows

‘lifecycles’ are often non-linear

Integration use cases do not follow neat sequences

One size does not fit all

Page 16: Paving the way to open and interoperable research data service workflows

What about Functional Models? OAIS foundation for

Trustworthy Digital Repository standards

IngestData

management

AIP AIP

Preservation PlanningContext Context

Description Description

RDM

SIP DIP

UsersProducer

Archival storage

Access

Administration

Page 17: Paving the way to open and interoperable research data service workflows

RDM platforms can be grafted on, to relate OAIS functions to context infosources

IngestData

management

AIP AIP

CRIS

Open access Repository

Preservation PlanningContext Context

Description Description

RDM

SIP DIP

UsersProducer

Data Catalogue

Archival storage

Collection

Appraisal

Access

Data Preservation

Active Data Management

Data MgmtPlanning

Data Publication

Administration

Page 18: Paving the way to open and interoperable research data service workflows

What best practice models apply ‘upstream’?

Q. How can institutions ensure researchers have informed choice of trustworthy services before they need a repository

• “Whole lifecycle” service models are one solution

• But how should they be governed?

– Commercial services?

– Commons?

– Hybrid?

Page 19: Paving the way to open and interoperable research data service workflows

Commercial services modele.g. Elsevier

“On Wednesday 1 June, Elsevier acquired Hivebench to help further streamline the workflow of researchers – putting research data management at their fingertips. The added value of the integration lies in linking Hivebench with Elsevier’s existing Research Data Management portfolio for products and services.

The research data that researchers have stored in the Hivebench notebook are linked to the Mendeley Data repository, which will be linked to Pure. This ….adds instant value to the datasets because they become far more suitable for reuse.”

Page 20: Paving the way to open and interoperable research data service workflows

Commons approach e.g. Principles for Open Scholarly Infrastructures

“…What should a shared infrastructure look like? Infrastructure at its best is invisible. We tend to only notice it when it fails. If successful, it is stable and sustainable. Above all, it is trusted and relied on by the broad community it serves. Trust must run strongly across each of the following areas: running the infrastructure (governance), funding it (sustainability), and preserving community ownership of it (insurance).” (emphasis added)

Cameron Neylon, Science in the Open, 25 Feb 2015

Page 21: Paving the way to open and interoperable research data service workflows

Commons approach e.g. Open Science Framework

Workflows in effect governed by a ‘mixed economy’of platforms

So what principles should apply?

Mathew Spitzer An Open Science Framework for Solving Institutional Challenges: Supporting the Institutional Research Method Research Data Access and Preservation Summit, 2016 Atlanta, GA May 4-7, 2016

Page 22: Paving the way to open and interoperable research data service workflows

Background - RDA Working Group on Data Publishing Workflows

Reviewed 25 examples of repository data publishing workflows, including some integrating with ‘downstream’ services e.g. data journals for peer review

Austin, Claire C et al.. (2015). Key components of data publishing: Using current best practices to develop a reference model for data publishing. Zenodo. http://dx.doi.org/10.5281/zenodo.34542

Drafted a reference model and made best practice recommendations-1. Start small, building modular, open source and shareable

components2. Follow standards that facilitate interoperability and permit

extensions 3. Facilitate data citation, e.g. through use of digital object PIDs,

data/article linkages, researcher PIDs 4. Document roles, workflows and services

Page 23: Paving the way to open and interoperable research data service workflows

Background - RDA Working Group on Data Publishing Workflows

• Follow up call (Dec 15) for examples of repositories connecting with upstream research workflows e.g. to gather metadata earlier

• Aiming to identify whether recommendations apply, and how the intention to publish data is changing research practices.

• Collected 12 cases - mix of concrete examples, prototypes (e.g. Dendro), conceptual models (e.g. Science 2.0 repositories)

• Report in preparation

Page 24: Paving the way to open and interoperable research data service workflows

Review of upstream workflow examples

Are the services components underpinning research workflows loosely coupled?

– Modular design of workflow components? yes, plenty evidence

– Standard vocabularies and protocols to describe components some evidence

– Significant investment in building trust-based relationships among participants some evidence

– Standardized ways of specifying capabilities and performance requirements limited (e.g. WDS-DSA Catalogue of Requirements)

* ‘The Joy of Flex’ Hagel and Seely-Brown, 2005 see e.g. http://www.cio.com.au/article/29148/joy_flex

Page 25: Paving the way to open and interoperable research data service workflows

So do we need more …

E.g. DCC How-to describe research data service workflows (forthcoming)

• Definitions – to describe services ?

• Design principles – to articulate best practice?

• Capability models - to articulate mutual expectations of service owners?

• Case studies of repository workflow integration?

• Understanding of how tools are actually being integrated, and effect on data publication practices

Page 26: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

1. Active data management services should use open standards to express and expose the objects and metadata they offer to downstream services, includingtheir access and reuse terms

2. Preservation and publication services should publish policies stating what digitalobject types they accept, for what communities, and on what terms and conditions

3. Preservation and publication services should make openly available sufficientmetadata to enable reuse of their outputs, including all terms and conditions forthird-party access and reuse

4. Active data management, preservation and publication services should makesufficient detail of their workflows available to support the reproducibility of research that produced the digital objects they act upon

5. Guidance services should support users of other services to make an informedchoice of downstream service capabilities, informed by best practices for reuseand reproducibility

6. Guidelines 1-5 should be implemented using machine-actionable content

Thanks for comments to Suenje Dalmeier Tiessen and Amy Nurnberger, co-chairs of the RDA Working Group on Data Publishing Workflows

Page 27: Paving the way to open and interoperable research data service workflows

Maintaining trust across the research cycle

Q. How do these principles apply in real cases?

Case Study 1 – Rory Macneill, Research Space

Integrating the RSpace ELN with University of Edinburgh’s DataShareand Harvard’s Dataverse repositories

Page 28: Paving the way to open and interoperable research data service workflows

Export data and metadata

Archive or Repository

Current Repository Paradigm

Page 29: Paving the way to open and interoperable research data service workflows

The reality is (a lot) more complex

DataFilesLinks

RepositoriesArchivesELNsFile systemsDatabases

Capture and structure dataGenerate metadata

Export data and metadataEstablish links to file(s) and databases

Track file locations

ActionsVehiclesResearch units

IssuesData versus file linksForms of data export

Data from intermediary vehiclesFile location and integrity of linksPost deposit access - permissionsPost deposit access - capabilities

Pre-deposit impact on post-deposit capabilities

Repositories need the ability to easily ingest diverse data types and formats, and links to files,

in a structured manner, directly and from other tools

Page 30: Paving the way to open and interoperable research data service workflows

The Dataverse – Starfish – RSpace project @Harvard Medical School

Towards a new paradigm

Data, files and research Capture and structure dataGenerate metadata

Track file locations Make data and files available for public access and query

Page 31: Paving the way to open and interoperable research data service workflows

Access hyperlinks √Track file locations √

Export data and metadataIn various formats

Using open standardsPDF √

Word √HTML √XML √

On premises orCloud file

system

Database

Capture and structure data √Generate metadata √

Track file locations √

Page 32: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

1. Active data management services should use open standards to express and expose the objects and metadata they offer to downstream services, includingtheir access and reuse terms

2. Preservation and publication services should publish policies stating what digitalobject types they accept, for what communities, and on what terms and conditions

3. Preservation and publication services should make openly available sufficientmetadata to enable reuse of their outputs, including all terms and conditions forthird-party access and reuse

4. Active data management, preservation and publication services should makesufficient detail of their workflows available to support the reproducibility of research that produced the digital objects they act upon

5. Guidance services should support users of other services to make an informedchoice of downstream service capabilities, informed by best practices for reuseand reproducibility

6. Guidelines 1-5 should be implemented using machine-actionable content

Page 33: Paving the way to open and interoperable research data service workflows

Case study 2

Stuart Lewis, University of Edinburgh

DataVault –Jisc Research Data Spring prototype for packaging data to be archived

Page 34: Paving the way to open and interoperable research data service workflows

Stuart LewisDeputy Director, Library &

University CollectionsThe University of Edinburgh

What is a Data Vault?

@JiscDataVault

Page 35: Paving the way to open and interoperable research data service workflows

Research Data Management Services

Data Management Support

Data Management

Planning

Active Data Infrastructure

Data Stewardship

Page 36: Paving the way to open and interoperable research data service workflows

Data Stewardship

• DataVault

– Long term archival storage

– First envisaged a few years ago…

Page 37: Paving the way to open and interoperable research data service workflows

What is the DataVault - Analogies

https://www.flickr.com/photos/brookward/8457736952

Page 38: Paving the way to open and interoperable research data service workflows

What is the DataVault - Analogies

https://www.flickr.com/photos/timshelyn/4125480767

Page 39: Paving the way to open and interoperable research data service workflows

Where does it sit?

Active Storage

Lab Equipment

Other Media

Archival Storage

Page 40: Paving the way to open and interoperable research data service workflows

Where does it sit?

Active Storage

Lab Equipment

Other Media

Archival Storage

PURE / CRIS / Data Catalogue

Page 41: Paving the way to open and interoperable research data service workflows

Where does could it sit?

Active Storage

Lab Equipment

Other Media

Archival Storage

PURE / CRIS / Data Catalogue

Open Data Repository

Page 42: Paving the way to open and interoperable research data service workflows

Phase 1 (3 months)

Phase 2 (4 months)

Phase 3 (6 months)

Page 43: Paving the way to open and interoperable research data service workflows
Page 44: Paving the way to open and interoperable research data service workflows
Page 45: Paving the way to open and interoperable research data service workflows
Page 46: Paving the way to open and interoperable research data service workflows
Page 47: Paving the way to open and interoperable research data service workflows
Page 48: Paving the way to open and interoperable research data service workflows
Page 49: Paving the way to open and interoperable research data service workflows
Page 50: Paving the way to open and interoperable research data service workflows
Page 51: Paving the way to open and interoperable research data service workflows
Page 52: Paving the way to open and interoperable research data service workflows
Page 53: Paving the way to open and interoperable research data service workflows
Page 54: Paving the way to open and interoperable research data service workflows

Make available online

Metadata already captured from CRIS, plus files from the Vault

Page 55: Paving the way to open and interoperable research data service workflows
Page 56: Paving the way to open and interoperable research data service workflows

Stuart [email protected]

@stuartlewis

What is a Data Vault?

Page 57: Paving the way to open and interoperable research data service workflows

Draft design principles for integration of research data service workflows

1. Active data management services should use open standards to express and expose the objects and metadata they offer to downstream services, includingtheir access and reuse terms

2. Preservation and publication services should publish policies stating what digitalobject types they accept, for what communities, and on what terms and conditions

3. Preservation and publication services should make openly available sufficientmetadata to enable reuse of their outputs, including all terms and conditions forthird-party access and reuse

4. Active data management, preservation and publication services should makesufficient detail of their workflows available to support the reproducibility of research that produced the digital objects they act upon

5. Guidance services should support users of other services to make an informedchoice of downstream service capabilities, informed by best practices for reuseand reproducibility

6. Guidelines 1-5 should be implemented using machine-actionable content