fair software (and data) citation: europe, research object systems, networks and off the shelf...

53
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure Professor Carole Goble The University of Manchester, UK Software Sustainability Institute UK ELIXIR-UK, ELIXIR Interop Platform [email protected] Orcid 0000-0003-1219-2137 NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel

Upload: carole-goble

Post on 12-Jan-2017

236 views

Category:

Science


1 download

TRANSCRIPT

Page 1: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

FAIR Software (and Data) Citation: Europe, Research Object Systems,

Networks and Off the Shelf Infrastructure

Professor Carole GobleThe University of Manchester, UK

Software Sustainability Institute UKELIXIR-UK, ELIXIR Interop Platform

[email protected] 0000-0003-1219-2137

NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel

Page 2: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

AcknowledgementsU Manchester• Stian Soiland-Reyes• Stuart Owen• Caroline Jay• Robert Haines• Norman MorrisonU Newcastle• Paolo MissierU Illinois Urbana-Champaign• Dan KatzMurphy Mitchell Consulting Ltd• Fiona MurphyF1000• Liz AllenU Oxford• Neil Jefferies• Lucie BurgessISI, USC• Yolanda Gil• Daniel Garijo

Force11 DCIP / Harvard• Tim ClarkELIXIR / BioSchemas.org• Rafael Jimenez (Hub)• Niall Beard (ELIXIR UK)• Aleks Nenadic (ELIXIR UK)• Jo McEntyre (EBI, THOR)NIH BD2K • Susanna Sansone (bioCADDIE,

ELIXIR)• Ian Fore (NIH)Software Sustainability Institute• Shoaib Sufi • Neil Chue Hong • Mike Jackson STFC• Catherine Jones

Page 3: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Chief Contexts

Workflow Repository

Systems and Synthetic Biology Projects

Page 4: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure
Page 5: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

FAIRFindable

Accessible

Interoperable

ReusableIntelligible

Reproducible

Citable

Track & Countable

Page 6: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Findable

Accessible

Interoperable

ReusableIntelligible

Change

Citable

Track & Countable

FAIR Credit

sciencecodemanifesto.org

Page 7: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://www.elixir-europe.org/

17 ELIXIR members2 observers

major bioinformaticsservice providers (~150)

Co-operation Long term support

ob

Germany

ob

Page 8: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Data Citation in Europe PMC full text

Page 9: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://dliservice.research-infrastructures.eu/#/

https://www.openaire.eu/

https://www.rd-alliance.org/groups/rdawds-publishing-data-services-wg.html

European Open Science Cloud

Page 10: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Technical and Human infrastructure for

Open Research

• interoperability and integration between ORCID and DataCite infrastructures

• PID e-infrastructure: promote uptake and sustain

https://project-thor.eu/

Giving Researchers Credit for their Data

https://www.jisc.ac.uk/rd/projects/research-data-spring

• Carrots for authors, ”pain-free” submission• Helper app for submitting data papers and

data for papers (using DataCite and ORCID)

Page 11: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://www.software.ac.uk/software-credit

Over 90 GuidesWar StoriesPolicy, Supporthttp://www.software.ac.uk/software-management-plans

digital curation centrehttp://dcc.ac.uk

http://openresearchsoftware.metajnl.com/

Page 12: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://www.software.ac.uk/how-cite-and-describe-software

Mike Jackson

Page 13: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://rse.ac.uk

Page 14: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Not all creditable software is a “downloadable application”

Registration is hit and miss

Metrics Indica

tors

Counts

Community Smarts

Page 15: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Software Citation Space

Science as a Service

Open Source Codes

Virtual Machines

Portable Packaging

Libraries

Applications

Scripting environments

Infrastructure

Commercial tools

Scripts /

Workflows

Packages GEMS

Dynamic Deployments

Page 16: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Reproducible Research: Citing your execution environment using Docker and a DOI

http://www.software.ac.uk/blog/2016-03-29-reproducible-research-citing-your-execution-environment-using-docker-and-doi

+ +Caroline Jay, Robert Haines

http://idinteraction.cs.manchester.ac.uk‘ABC: Using Object Tracking to Automate Behavioural Coding.’ CHI 2016.

=FixityPublishing

Page 17: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Service vs ScienceBackground vs Foreground Software

Software and Data* in foreground most likely cited. Same software and data viewed as background not or not explicitly cited though equally essential

* Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices 10.1145/2232817.2232822

The invisibility of software, esp:• widely used• infrastructural• component/library• cross-discipline

Page 18: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Credit DriftImmediate

teamBackground

team

“Foreground”software

Authorship Authorship?

Cited?Acknowledged

Cited?Mentioned

Ignored“Background”

software

Cited

Transitive, Fractional CreditNot all software is equal

* Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices 10.1145/2232817.2232822

Page 19: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

https://mr-c.github.io/shouldacitehttp://bit.ly/shouldacite

SSI Collaborations Workshop 2016

Should I cite the software?

Page 20: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Overcoming Barriers to Software Citationsurvey of experiences citing software in research

publications

http://bit.ly/1WxWFY7

Caroline Jay, Robert Haines, University of Manchester, UKRobin Wilson, University of Southampton, UK

Page 21: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

System Biology Projects Common

s

http://fair-dom.org

Page 22: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Systems and Synthetic Biology ProjectsLinking, “Packaging” &

Citing Codes, Data, Models,

SOPs, Samples, Strains, Articles, People,

Projects….

Page 23: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Repository spanning catalogue, reference (“cite”) distributed 3rd party content

Standards

Public data archives

Project data repositories

Literature archives

Public model archives

Uploaded content Plugin Model

tools

FAIR

DO

M

Plugin Data tools

Page 24: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Structured Metadata Capture

metadata sheets sample sheets

data sheets

http://www.rightfield.org.uk

Page 25: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

[Martin Scharm, Rostock University]

Haus et al, BMC Systems Biology, 2011, 5:10Solvent production by Clostridium acetobutylicum

Page 26: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

https://dx.doi.org/10.1111/febs.13237

https://doi.org/10.15490/seek.1.investigation.56

http://data.datacite.org/10.15490/seek.1.investigation.56

Citation G. Penkler; F. du Toit; W. Adams; M. Rautenbach; D. C. Palm; D. D. van Niekerk; J. L. Snoep; (2014): Glucose metabolism in Plasmodium falciparum trophozoites; FAIRDOMHub. http://dx.doi.org/10.15490/seek.1.investigation.56

Fixity Publishing, URIs -> DOIs

Page 27: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" Sonja Leidenberger et al

CreditsAttributions

In Multiple Packs

Page 28: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Track?

Workflows

Pointer to 3rd Party Data Collection

Pointer to 3rd Party Code

Local files

Page 29: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Aggregated• Granularity• Atomicity / Subsets• Recombined• Distributed• Dynamic and versioned

• Multi-contributors• Spans resources• Independently stewarded• Shift and change

Content Contribution

Page 30: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Metadata Framework: Bundles and relate multi-hosted scattered digital resources of a scientific experiment or investigation using standard mechanisms

• Exchange, Publishing, Reproducibility, Portability, Repair

See Stephen Abrams Talk yesterday

Page 31: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, services

Slide

share

Github

figsh

are

Commun

ityDB

Arxiv.o

rg

Pubm

ed

Docke

rim

age

Codes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware

Input Data

WorkflowDescripti

on

Provenance

trace

Version of

Codes / Services

Output

Page 32: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Manifest Constructi

on

Manifest

Identificationto locate things

Aggregates to link things together

Annotations about things & their

relationships

Container

Metadata Objects Citable Reproducible Packaging

Manifest Descripti

on Type Checklists what should be thereProvenance where it came fromVersioning its evolutionDependencies what else is needed

Manifest

Packaging content & links: Zip files, BagIt, Docker

images

Catalogues & Commons Platforms: FAIRDOM SEEK, STELAR eLab

OAI

ORE

W3C

OADM

Page 33: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

RO Types: Manifest Content Profilesminimal, maximal, extensible

PIDCitation

Checklist

Version

Prov

enan

ce

Dependencies

JATSComms

DC DCAT

Exp

ISAEFODomain

SBMLMIAME CWL

Common properties

among content types

Minimum information

for one content type

Page 34: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Workflow RO BundleZIP or BagIt folder structure

Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, J Web Semantics doi:10.1016/j.websem.2015.01.003

application/vnd.wf4ever.robundle+zip

JSON and YAML

Page 35: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Persistent Identification of Software: a building block to citation & curation

[email protected] B. Matthews, I. Gent, J. Tedds & S LamertonProject URL http://rrr.cs.st-andrews.ac.uk/

Guidelines for persistently identifying software using DataCite

https://epubs.stfc.ac.uk/work/24058274

Page 36: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Most recent?– Location indicator, crosslink– Credit the contributors now, the version now– Strong presumption it exists and is living

• Fixed Snapshot?– Defend publication, Reuse – Credit the contributors then, the version then– Presumption it exists and is archived

• Line in the sand?– Credit the contributors then, the version then– Weak presumption it exists

• Warrant?• Acknowledgement not contribution• Don’t care if it exists• Important “influence” citation for its contributors

What does the citation meanfor the author or reader?

Identifier Resolution, Citation Persistence, Content Decay?

Page 37: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Commons

my Disk

Page 38: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Commons

• DOI proliferation– Channelling for Counting and

Landing Pages

• Authenticity: Tamper-proof Exchange and Provenance– Hashing & Checksums – Secure signature & probity

services– Block chain

• anti tampering transaction logging

• https://www.ethereum.org/– Proll and Rauber, Scalable

data citation in dynamic, large databases: Model and reference implementation, (2014) 10.1109/BigData.2013.6691588

Page 39: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Uber Collection / Hierarchy / subsetting (cf. Dryad, DataONE, DataVerse)*

• RO author/contributor information in its manifest

• ROs manifest => constituent resources, provenance for contribution.

*Ball, A. & Duke, M. (2011). "How to Cite Datasets and Link to Publications?". DCC How-to Guides. Edinburgh: Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/cite-datasets.

Granularity Atomicity

Aggregation

Page 40: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Robust Transitivity & PropagationCitation and Credit Aggregation and Granularity

• Backward Citation– What was this based

on, who did it?• Forward Citation

– What is using this, who did that?

• “PageRank”

Credit Aggregation

Citation GranularityDrift

D. S. Katz, "Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products," Journal of Open Research Software, v.2(1): e20, pp. 1-4, 2014. DOI: 10.5334/jors.be

Page 41: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

1

3

2

2

34

11

1

2

25

3

3

4

3

Who gets credit for what?

Using Provenance for Credit Mapping

Paolo Missier

Alice

Charlie

Bob

Paolo Missier, Data Trajectories: tracking reuse of published data for transitive credit attribution, IDCC 2016

W3C PROVdependency graph

“Provlets”

Page 42: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Tracking RO usage and indirect contributions

• Awarding fractional credit to contributors

1. “Contriponents” • contributors +

components2. Weighted contribution3. Networked Credit maps

• Travel with the contriponents

Transitive Credit contributionDan Katz and Arfon Smith

*Katz, D.S. & Smith, A.M., (2015). Transitive Credit and JSON-LD. Journal of Open Research Software. 3(1), p.e7, DOI: http://doi.org/10.5334/jors.by

D. S. Katz, "Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products," Journal of Open Research Software, v.2(1): e20, pp. 1-4, 2014. DOI: 10.5334/jors.be

Page 43: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

How do we weight and

track ?

https://www.refme.com/uk/

http://depsy.org/

• Literature mining– Duck et al Ambiguity and

variability of database and software names in bioinformatics (2015) DOI: 10.1186/s13326-015-0026-0

• Infrastructure– Identifier and provenance

infrastructure, dependency managers, metrics services, repositories, machine readable and processable metadata, reference managers

• CReDIT – contributor taxonomy– http://casrai.org/CRediT– Time for revision?

http://mdc.lagotto.io/

Page 44: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://ivory.idyll.org/blog/2015-authorship-on-software-papers.html

Page 45: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Find | Cite | CreditRamps “Riding the metadata COTS-tails”

• 3rd of web pages• Opening out -> community groups and extensions• Builds on a shared core and data structure• Simple embedding in web pages and CMS• Widespread tooling, harvesters and indexing• Search engines and Integration tools• It’s all about the metadata and knowledge graph

Google, Bing, Yahoo, Yandex

Page 46: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Find | Cite | CreditRamps “Riding the metadata COTS-tails”

DepthDATS

Reach

Page 47: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

http://codemeta.github.io/

http://ontosoft.org/

Find | Cite | Credit Ramps “Riding the metadata COTS-tails”

Reach

Depth

Page 48: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Bioschemas.org

Specification

Data model

Minimum information

Controlled vocabularies

Cardinality

Documentation

Examples

New (properties | types)

Restrictions

Constraints

Extensions

Page 49: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

BioSchemas.orgminimal, maximal, extensible

Trainingmaterials

Events Organizations

Data

Standards

Software

Minimum information

for one content type

Trainingmaterials

Events Organizations

DataSoftware

Standards

Common properties

among content types

Identifier, Title, Description, Author, Topics, Audience, Publication Date, …

Page 50: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

Schema.orgBioSchemas.org, W3C FHIR WG

Daniel Mietchen et al , Adapting JATS to support data citation, Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015, Bethesda (MD): National Center for Biotechnology Information 2015.

Journal Article Tag Suite

DATS

SoftwareSourceCode

Page 51: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

• Stretch in all directions– Granularity, Atomicity, Aggregation– Only partially automatable

• Dynamic Citation – “Citable Units” – Buneman et al, https://tinyurl.com/bdf-cacm

• ROs & Contriponents– Standardised metadata manifests – Tracking fabrics– Distributed => will break

• Keep it simple– Incremental, Commodity based, Low Tech– Guidelines & Conventions– Ramps – like Bioschemas.org– Capture metadata all along the way….

Open Questions?

Getting folks (authors, reviewers, editors) to cite software and data

Page 52: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

For Further Information• http://www.researchobject.org• http://www.wf4ever-project.org• http://www.fair-dom.org• http://seek4science.org• http://www.software.ac.uk• http://www.bioschemas.org• http://codemeta.github.io/• http://myexperiment.org• http://www.commonwl.org/

Page 53: FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure

EXTRAS

unshown