data repositories: recommendation, certification and models for cost recovery

7
| 1 Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services [email protected] NSF Workshop February 28, March 1, 2017 Data Repositories: Recommendation, Certification and Models for Cost Recovery

Upload: anita-de-waard

Post on 11-Apr-2017

100 views

Category:

Technology


1 download

TRANSCRIPT

| 1

Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services [email protected]

NSF Workshop February 28, March 1, 2017

Data Repositories: Recommendation, Certification and Models for Cost Recovery

| 2

Object of Study Raw

Data

Processed Data

Data With

Paper Curated Record

Method Analysis Tables/ Figures Curate

Methods Software

Four Types of Repositories:

Research Question

NOAA: 20 TB/ NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital data NSIDC: 1 PB data, : 1 PB total ALMA Telescope: 40 TB/day

Local Storage/ Instrument Repositories

Size: PB Nr of files: Trillions

Deep Blue (Umich): 80k MIT Dspace: 75 k HAL (France): 60 k D-Space Cambr: 1.5 k Of which data: hundreds

Institutional/Local Repositories

Size: GB Nr of files: Billions

Figshare: 1.2 M DataDryad: 3 k Dataverse: 58 k

Non-Domain Repositories

Size: MB Nr of files: Milliions

Domain Repositories

PetDB: 6 k PDB: 100 k NIST ASD: 170 k

Size: kB Nr of files: 100ks

Publication

| 3

Recommended vs Certified Data Repositories [1]

•  Studied repositories recommended by 17 organisations: •  Compiled list of 242 recommended repositories •  Identified criteria for recommendation •  Identified overlap between recommendations (Fig 1)

•  Identified 5 certification schema’s: •  Compiled list of 129 certified repositories •  Identified criteria for certification •  Identified overlap between recommended & certified repositories (Fig 2)

Figure 1: Most repositories are recommended by < 3 parties

Figure 2: Most recommended repositories are not certified

[1] All data is openly available at doi:10.17632/zx2kcyvvwm.1

| 4

Set Of Shared Criteria Between Recommendation and Certification of Repositories Umbrella  Categories  

Shared  Meaning   Recommended  Repository  Criteria   Repository  Cer8fica8on  Scheme  Criteria  

Mission   Explicit  mission  statement  in  providing  long-­‐term  responsibility,  persistence,  and  management  of  data(sets)  

Community/Recogni8on  

Evidence  of  use  by  downloads  or  cita<ons  from  an  iden<fiable  and  ac<ve  user  community  

Understand  and  meet  the  needs  of  the  designated  and  defined  target  community  

Legal  and  Contractual  Compliance  

Repository  operates  within  a  legal  framework/Ensures  compliance  with  legal  regula<ons  

When  applicable,  have    contractual  regula<ons  governing  the  protec<on  of  human  subjects  

Contracts  and  agreements  maintained  with  relevant  par<es  on  relevant  subjects  

Access/Accessibility   Public  access  to  the  scien<fic/repository  designated  community  

Anonymous  referees  (including  peer-­‐reviewers)  have  access  to  the  data  before  public  release  as  indicated  by  policies  

Technical  Structure/Interface  

The  soIware  system  supports  data  organisa<on  and  searchability  by  both  humans  and  computers.  The  interface  is  intui<ve  and  mobile  user-­‐friendly  

The  technical  (infra)structure  is  appropriate,  protec<ve,  and  secure  

Retrievability   Data  need  to  have  enough  metadata.  All  data  receive  a  persistent  iden<fier  

Preserva8on   Long-­‐term  and  formal  preserva<on/succession  plan  for  the  data,  even  if  the  repository  ceases  to  exist  

If  the  data  are  retracted,  the  persistent  iden<fier  needs  to  be  maintained  

Preserva<on  of  data  informa<on  proper<es  and  metadata  

Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017

| 5

Debit Economy (like a pie)

•  Single pile of ‘stuff’ gets divided: -  Thing can only be for one person

at one time -  “If you get more, I get less”

•  Examples: -  Money -  Jobs -  Samples, equipment, space, etc.

•  Behaviors: -  Hoarding, secrecy -  (Cut-throat) competition -  Winning by owning

(and not sharing)

Credit Economy (like a song)

•  Credit comes from visibility: -  The more you give away,

the more you benefit -  “Only if I share do I really own”

(“You need me to do you!” JW) •  Examples:

-  Papers, citations -  Good ideas (if credited) -  Skills

•  Behaviors: -  Open access, citation game -  Collaboration with top-X -  Winning by sharing

(to enable priority & visibility)

Two Economies of Science [3]:

[3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1

<<

< D

AT

A ?

??

| 6

RDA Repository Cost Recovery IG •  Interviewed 22 repositories & reported [2] •  Different income streams:

1.  Structurally funded 2.  Mostly data access charges 3.  Mostly data deposit fees 4.  Membership fees (for deposits and/or access) 5.  Serial project funding 6.  Supported by host institution

•  Different new models under considerations: •  Sponsorships/services for the commercial sector •  Contracts for specific services offered (hosting, archiving, curation) •  Expanding the number of affiliated institutions •  Deposit fees •  More services for “national memory institutes”

•  Some comments: •  Some countries structurally fund repositories (not US!) •  Some repositories embedded in scholarly practice •  Hard to come up with new models: no time, no skill sets!

•  Next step: OECD/GSF WG studies more in-depth, more countries: http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models [2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html

| 7

Thank you!

More on Elsevier’s RDM program and other interesting efforts: •  https://www.hivebench.com •  https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-

international-data-rescue-award-in-the-geosciences •  http://www.journals.elsevier.com/softwarex/ •  https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking •  https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html •  https://rd-alliance.org/bof-data-search.html •  https://datasearch.elsevier.com/ •  https://data.mendeley.com/ •  https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data •  https://www.force11.org/ •  http://www.nationaldataservice.org/ •  https://rd-alliance.org/ •  https://www.elsevier.com/about/open-science/research-data

Anita de Waard, [email protected]