summary of rda outputs so far dr. ir. herman stehouwer 22 september 2015

17
Summary of RDA Outputs so far dr. Ir. Herman Stehouwer 22 September 2015

Upload: sharyl-hicks

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Summary of RDA Outputs so far

dr. Ir. Herman Stehouwer22 September 2015

2

3

4

Neutral Forum for discussing issues Generates global discussions Very diverse

Roles Disciplines

-> Increased insight (e.g. Jamie) -> Increased needs alignment (e.g. Antonio)

Intangible Outcomes

5RDA Working Groups

Form the Foundation for RDA Community Impact!

Working Groups envisioned as accelerants to data sharing practice and infrastructure in the short-term with the overarching goal of advancing global data-driven discovery and innovation

RDA Working Group profile:

Short-term: 12-18 months

Focused efforts with specific actions adopted by specific communities

International participation

Open, voluntary, consensus-driven

Complementary to effective efforts elsewhere

5

Potential outcomes / deliverables:

• New data standards or harmonization of existing standards.

• Greater data sharing, exchange, interoperability, usability and re-usability.

• Greater discoverability of research data sets.

• Better management, stewardship, and preservation of research data.

6

• An Interest Group (IGs) can be established prior to a Working Group for community discussion of issues and areas that facilitate data-driven research. • IGs are longer-term groups defining common issues and interests.•WGs and IGs are collaborating intensively

with groups in comparable initiatives such

as IETF, CODATA, WDS, W3C. Possible functions

Create new WGs Communication/Coordination

Domains Themes External pariets (WDS, CODATA, etc.)

RDA Interest Groups

7

Presented at P4 in Amsterdam Far along in adoption, all have ratified recommendations

1. DFT

2. DTR

3. PIT

4. PP

First Four Outputs

8RDA Results I: common data model

• PIDs at the beginning of trust chain • need a worldwide, independent and robust PID system

worldwide • metadata are essential in anonymous data world

taken from RDA WG Data Foundation & Terminology

9

result: a registry for data types simple example: you get an unknown file,

pull it on DTR and content is being

visualized DTR can also be used to describe

and re-use semantic content no free lunch: someone needs to

register and define type PIT Demo already working with

DTR

RDA Results II: Data Type Registry

Federated Set ofType Registries

Visualization

Data Processing1010011010101…. Data Set

Dissemination

1010011010101….

1010011010101….

Terms:…

Rights

Agree

VisualizationProcessingInterpretation

3

Domain ofServices

2

1

Human or Machine Consumers

4

10

result: a generic API and a set of basic attributes a PID Record is like a Passport (Number, Photo, Exp-Date, etc.) if all PID Service-Provider agree on one API and talk the same language

(registered terms) SW development will become easy Test-Installation

in operation

together with

DTR

RDA Results III: PID Information Types

LOC location, path

CKSM checksum

CKSM_T checksum type

RoR owning repository

MD path to MD

ŝŐĂƚĂWƌŽĐĞƐƐ ;ĐŽŶƐƵŵŝŶŐŵĂŶLJĚŝŐŝƚĂůŽďũĞĐƚƐ

ĨƌŽŵĚŝĨĨĞƌĞŶƚƌĞƉŽƐŝƚŽƌŝĞƐ Ϳ

W/ ϭ W/ Ϯ W/ ϯ W/ Ŭ

>ŝƐƚŽĨW/Ɛ

ĂƚĂ dLJƉĞZĞŐŝƐƚƌLJ

W/ZĞƐŽůƵƚŝŽŶ

^LJƐƚĞŵ

ĐŬĞĐŬƐƵŵ

W/ZĞƐŽůƵƚŝŽŶ

^LJƐƚĞŵ

ĐŚĞĐŬ

W/ZĞƐŽůƵƚŝŽŶ

^LJƐƚĞŵ

ĐŬƐŵ

ĚĞĨŝŶĞĚŝŶdZ

ŵĂŬĞƐƵƐĞ ŽĨdZ

ĚĞĨŝŶŝƚŝŽŶ

ƌĞƋƵĞƐƚŝŶŐĐŚĞĐŬƐƵŵ ĨŽƌĂůůW/ƐĨŽƵŶĚ

W/

11

due to unforeseen circumstances need until P5 Practical Policies = executable Workflow Statements result at P5: a set of Best Practice PPs for a number of typical DM/DP

tasks (Integrity Check, Replication, etc.) currently a large collection of PPs, currently being evaluated• huge simplification for data stewards• finally feasible quality checks and certification• huge step in trust improvement

RDA Results IV: Practical Policies

replication policy Xreplication policy Yintegrity policy Aintegrity policy Bintegrity policy Cmd extraction policy lmd extraction policy ketc.

Policy InventoryRepositoryselection

implementation

execution

data manager

12

Presented at the last Plenary in San Diego Working on Adoption / Recommendations

1. Citation of Dynamic Data

2. DDRI

3. Metadata Standards Directory

4. Wheat Data Interoperability

Second Group of Outcomes

13RDA Results V: Citation of Dynamic Data

We have: Data + Means-of-access

Dynamic Data Citation: Cite data dynamically via query!

Steps / Principles:

1. Data versioned (history, with time-stamps)

Researcher creates working-set via some interface:

2. Access assign PID to “QUERY”, enhanced with- Time-stamping for re-execution against versioned DB- Re-writing for normalization, unique-sort, mapping to history- Hashing result-set: verifying identity/correctness

leading to landing page

Many prototypes and pilot impletmentations

14

Enabling cross-platform discovery between research data registries

Interoperability projects between ANDS, CERN, Dryad based on DataCite and ORCID services Research Data Switchboard

Interoperability between da-ra and DataPASS based on Dataverse

De-duplication project; a collaboration between Data Curation Unit and DANS

This infrastructure enables anyone to query and find links between registries. It can be used by universities, repositories, registries and funders.

RDA Results VI: DDRI

15

Standards are a good thing But, only works when people use the same standards

Too few standards -> People do their own thing Too many standards -> Fragmentation

Goal: Develop a directory listing Metadata standards Comprehensive Easy to contribute to

Extend DCC Metadata Directory Make it community-updatable

RDA Results VII: Metadata Standards Directory

16

Wheat is a major food-staple Need data interoperability to increase production Encourage Interoperability by:

Creating an Interoperability framework Providing guidelines on Wheat data (cookbook) Repository of linked vocabularies

Adoption by WheatIS (Wheat Initiative), FAO, etc.

Currently in community validation

RDA Results VIII: Wheat

17

1. DSA-WDS Audit and Certification

2. Publishing:1. Bibliometrics

2. Workflows

3. Services

Will be presented at this plenary

Upcoming wave of Outputs