small data: bridging the gap between generic and specific repositories
DESCRIPTION
My presentation for the http://iannotate.org// meeting in San Francisco, April 11th 2013TRANSCRIPT
![Page 1: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/1.jpg)
Small Data, or: Bridging the Gap Between Specific and Generic Research Repositories
April 11, 2013 Anita de Waard
VP Research Data CollaboraDons [email protected]
hHp://researchdata.elsevier.com/
![Page 2: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/2.jpg)
There are many efforts to enhance data storing and sharing...
• Many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)
• Many systems for creaDng/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc)
• Many e-‐lab notebooks (LabGuru, LabArchives, LaBlog, etc) • Scores of projects, commiHees, standards, bodies, grants, iniDaDves, conferences for discussing and connecDng all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc)
• You can make a living out of this ;-‐)! (and many of us do…)
![Page 3: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/3.jpg)
…but this is what scienDsts do:
Using anDbodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of this, and writes a paper. End of story.
![Page 4: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/4.jpg)
Why save research data?
A. Data PreservaDon: – Preserve record of scienDfic process,
provenance – Enable reproducible research
B. Data Use: – Use results obtained by others – Do beHer science! – Improve interdisciplinary work
![Page 5: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/5.jpg)
> 50 My Papers 2 M scienDsts
2 M papers/year
Where the data goes now:
Majority of data (90%?) is stored
on local hard drives Dryad:
7,631 files
Dataverse: 0.6 M
Datacite: 1.5 M
Some data (8%?) stored in large,
generic data repositories
MiRB: 25k
PetDB: 1,5 k
TAIR: 72,1 k
PDB: 88,3 k
SedDB: 0.6 k
A small porDon of data (1-‐2%?) stored in small,
topic-‐focused data repositories
![Page 6: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/6.jpg)
> 50 My Papers 2 M scienDsts
2 M papers/year
So this needs to happen:
Dryad: 7,631 files
Dataverse: 0.6 M
Datacite: 1.5 M
MiRB: 25k
PetDB: 1,5 k
Majority of data (90%?) is stored
on local hard drives
Some data (8%?) stored in large,
generic data repositories
TAIR: 72,1 k
PDB: 88,3 k
SedDB: 0.6 k
A small porDon of data (1-‐2%?) stored in small,
topic-‐focused data repositories
INCREASE DATA PRESERVATION
![Page 7: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/7.jpg)
Data PreservaDon Issues:
Example: create tailored metadata collecDon tools on mini-‐tablets in labs to replace paper notebooks
ObjecDon: “Our lab notebooks are all on paper – it’s how we do things” Response: Grao tools closely on scienDsts’ daily pracDce
![Page 8: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/8.jpg)
ObjecDon: “I need to see a direct benefit of any effort I put in.” Response: Create tools to allow beHer insight in own and other’s results. Example: ‘PI-‐Dashboard’: allow immediate access/analysis of shared data: new science!
Data PreservaDon Issues:
![Page 9: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/9.jpg)
ObjecDon: “I don’t really trust anyone else’s data – and don’t think they’ll trust mine”
Response: Create social networking context; allow data owner to provide granular access control. Example: • In Urban Lab app, data stored by researcher name. • PI decides who gets to see which data • Match up with NIF and Eagle-‐I ontologies on back end so export of (part of) data is possible at any Dme.
c o n s o r t i u m
Data Use Issues:
![Page 10: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/10.jpg)
• ObjecDon: “I am afraid other people might scoop my discoveries”
• Response: Reward system needs to move from direct compeDDon to a ‘shared mission’ approach (cf. Mars)
• Example: Data Rescue Challenge in the geosciences: collect and reward stories/pracDces of data preservaDon, enable cross-‐disciplinary access and use of all data.
The 2013 Interna.onal Data Rescue Award in the Geosciences Organised by IEDA and Elsevier Research Data Services hHp://researchdata.elsevier.com/datachallenge
Data Use Issues:
![Page 11: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/11.jpg)
Data PreservaDon and AnnotaDon: : Fine, I’ll do it– but where the hell do I put it?
Funding Agency: University:
Collaborators: Domain of study: Domain-‐Specific Data Repository
Local Data Repository
InsDtuDonal Data Repository
Generic Data Repository
AND
THEY ALL
WANT
DIFFERENT
METADATA!!!!
![Page 12: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/12.jpg)
Comparing Repository Types: Repository Advantages Disadvantages
Local data repository
Easy! No one steals your data.
No one sees it. Not compliant with requirements
InsDtuDonal Repository
Not very difficult. Administrators are happy.
Data can’t easily be reused. Credit?
Generic data repository
Not very hard to do. Have complied!
Data can’t be easily reused. Credit…
Domain-‐specific data repository
Data can be reused. Credit!
Lot of work – for curators Eff
ort, Re
use, Credit, Co
mpliance
Habit, Ease, Priv
acy, Con
trol
MORE
ANNOTA
TION
![Page 13: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/13.jpg)
Conclusions for data annotaDon: “Instead of building newer and larger weapons of mass destrucHon, I think mankind should try to get more use out of the ones we have”
Deep Thoughts by Jack Handy
• Let’s use the data standards we already have – and agree on using the same ones
• Work with exisDng data repositories in a field to come to a lowest common denominator of metadata
• Tailor the systems to be opDmally easy to use for scienDsts in terms of metadata: add as liHle as you have to, as few Dmes as you can.
![Page 14: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/14.jpg)
Summary: • Data PreservaDon: – Tailor tools to fit scienDsts’ workflow – follow the experiment! – We are creaDng repositories of shared experiments: Enable demonstrably beFer science!
• Data Use: – Allow owner full control over who sees which data -‐ create social networking context
– CollecDvely pioneer long-‐term funding opDons; support/develop ‘shared mission’ funding challenges
• How annotaDon can help reuse: – Collaborate between (generic/specific, insDtuDonal, cross-‐naDonal) data faciliDes to integrate repositories, enable cross-‐repository usage and reuse exisIng metadata.
![Page 15: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/15.jpg)
QuesDons?
Anita de Waard VP Research Data CollaboraDons
hHp://researchdata.elsevier.com/
![Page 16: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/16.jpg)
Elsevier Research Data Services Goals: 1. Increase Data PreservaDon:
Help increase the amount and quality of data preserved and shared
2. Improve Data Use: Help increase the value and usability of the data shared by increasing annotaDon, normalizaDon, provenance enabling enhanced interoperability
3. Develop Sustainable Models: Help measure and deliver credit for shared data, the researchers, the insDtute, and the funding body, enabling more sustainable plaworms.
![Page 17: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/17.jpg)
Guiding Principles of RDS: • In principle, all open data stays open and URLs, front end etc. stay where they are (i.e. with repository)
• CollaboraDon is tailored to data repositories’ unique needs/interests-‐ ‘service-‐model’ type: – Aspects where collaboraDon is needed are discussed – A collaboraDon plan is drawn up using a Service-‐Level Agreement: agree on Dme, condiDons, etc.
• Transparent business model • Very small (2/3 people) department; immediate communicaDon; instant deployment of ideas.
![Page 18: Small Data: Bridging the Gap Between Generic and Specific Repositories](https://reader033.vdocuments.site/reader033/viewer/2022060108/554e78a4b4c905f66a8b4f4f/html5/thumbnails/18.jpg)
“But aren’t you guys in it for the money?” • Yes, we are-‐ like most businesses… • Is your real quesDon perhaps: ‘Does no one want to work with you anymore because of the Open Access debate?’
• The OA debate focuses on three issues: – IPR and Access issues – Opaque business models
– Lack of perceived added value
E.g. BY-‐NC-‐SA? Github? ..?
E.g. Gold Open Access?Shared funding model? Commercial analyDcs with shared royalDes?
We offer a service: only use it if it’s any good!