does it make sense to apply the fair data principles to ... · • combine the principles of core...

27
www.dans.knaw.nl DANS is an institute of KNAW and NWO Does it make sense to apply the FAIR Data Principles to Software? Peter Doorn, Director DANS Sustainable Software Sustainability Workshop The Hague, March 7-9, 2017 @pkdoorn @dansknaw

Upload: others

Post on 03-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

www.dans.knaw.nl DANS is an institute of KNAW and NWO

Does it make sense to apply the FAIR Data Principles to Software?

Peter Doorn, Director DANS

Sustainable Software Sustainability Workshop The Hague, March 7-9, 2017

@pkdoorn @dansknaw

Page 2: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Sustaining software

Preserving the code

Keep the software running

Sustain the service

Software Archive

Keep or emulate a platform for the

software to run on

Keep the knowledge and support for users

Costs & Complexity _

+

Page 3: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Institute of Dutch Academy and Research

Funding Organisation

(KNAW & NWO) since 2005

First predecessor dates back to

1964 (Steinmetz Foundation),

Historical Data Archive 1989

Mission: promote and

provide permanent

access to digital research resources

DANS is about keeping data FAIR

Page 4: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

1. Our customers ask us 2. Research Infrastructures demand it (eg. CLARIN) 3. Replication of results requires both data and the tools that

process them 4. Information about data is often encapsulated in software 5. Linking data and other research resources (publications,

project information - NARCIS) 6. Software as a special form of data (that is executable) 7. SoSu as required element in Research Data Management

Why do we care about software?

CLARIN: “Centres take care for the sustainability of data and tools and the permanent access to them”

Page 5: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Work on SoSu to which DANS contributed

• TDS Curator project (2010-2012): use case to “curate” the software and data of the “Typological Database System”: https://goo.gl/uTzRhH

• Workshops on Software Sustainability (eHumanities/KNAW, Amsterdam, 2013; Knowledge Exchange, Berlin, 2015; NCDD, Amsterdam, 2016)

• Reports on SoSu: https://goo.gl/BG69Au – Software sustainability at the Heart of Discovery (with Netherlands

eScience Center) – A Conceptual Approach to Data Stewardship and Software Sustainability – Software Sustainability - final report (commissioned by NCDD and NDE) – Guidelines for Software Quality (with Radboud University Nijmegen,

Huygens Institute, CLARIAH): https://goo.gl/HLi4nD – Research Software Sustainability: Report on Knowledge Exchange

workshop by Simon Hettrick, SSI (Berlin, October 2015): https://goo.gl/0Dm4kQ

• Webinar on SoSu: https://goo.gl/ShmyZH • Memorandum of Understanding, INRIA, Software Heritage Archive

Page 6: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

DANS and Data Seal of Approval (DSA)

• 2005: Formulate quality guidelines for digital repositories

including DANS

• 2006: 5 basic principles as basis for 16 DSA guidelines

• 2009: international DSA Board

• Almost 70 seals acquired around the globe, but with a focus on Europe

Page 7: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

The Certification Pyramid

ISO 16363:2012 - Audit and certification of trustworthy digital repositories http://www.iso16363.org/

DIN 31644 standard “Criteria for trustworthy digital archives” http://www.langzeitarchivierung.de

http://www.datasealofapproval.org/ https://www.icsu-wds.org/

Page 8: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Partnership DSA and World Data System (WDS)

New common requirements

• Context (1)

• Organizational infrastructure (6) • Digital object management (8) • Technology (2)

• Additional information and applicant

feedback (1) http://goo.gl/LLZqis

Page 9: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Repository Requirements dealing with data quality

2. The repository maintains all applicable licenses covering data access and use and monitors compliance.

3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.

4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.

7. The repository guarantees the integrity and authenticity of the data. 8. The repository accepts data and metadata based on defined criteria to ensure

relevance and understandability for data users. 10.The repository assumes responsibility for long-term preservation and manages

this function in a planned and documented way. 11.The repository has appropriate expertise to address technical data and

metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.

13.The repository enables users to discover the data and refer to them in a persistent way through proper citation.

14.The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data

Page 10: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Resemblance DSA – FAIR principles

DSA Principles (for data repositories) FAIR Principles (for data sets)

data can be found on the internet Findable

data are accessible Accessible

data are in a usable format Interoperable

data are reliable Reusable

data can be referred to (citable)

The resemblance is not perfect: • usable format (DSA) is an aspect of interoperability (FAIR) • FAIR explicitly addresses machine readability • etc.

A certified TDR already offers a baseline data quality level

Page 11: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Combine and operationalize: DSA & FAIR

• Growing demand for quality criteria for research datasets and

ways to assess their fitness for use

• Combine the principles of core repository certification and FAIR for individual datasets

• Use the principles as quality criteria: • Core certification – digital repositories • FAIR – research data (sets)

• Operationalize the principles as an instrument to assess

FAIRness of existing datasets in certified TDRs

Page 12: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

FAIR badge scheme

• First Badge System based on the FAIR principles: proxy for data quality assessment

• Operationalise the original principles to ensure no interactions among dimensions to ease scoring

• Consider Reusability as the resultant of the other three:

– the average FAIRness as an indicator of data quality

– (F+A+I)/3=R • Manual and automatic scoring

F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads

Page 13: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in

license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard

vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context

Page 14: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Can we use DSA and FAIR for software?

• Neil will speak about a “Software Seal of Approval” on Thursday morning

• I will focus on FAIR principles for Software • Why use the FAIR data principles and not use existing criteria

dedicated to software quality? • Do the FAIR principles cover the right aspects? • Do they cover enough ground? • How to operationalize and implement them? • How to proceed?

Page 15: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Why use the FAIR data principles and not use existing criteria dedicated for software quality?

• Simplicity: the FAIR principles are simple • Popularity: everybody seems to like them • Software as a special kind of data • Extend Data Management Planning with

requirements about software • Political: research funders are embracing

the FAIR principles • Operationalization and Implementation are

not fixed

“Specify what methods or software tools are needed to access the data? Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)?” https://goo.gl/7Jqqs5

Page 16: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

In the FAIR Data approach, data should be:

• Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets;

• Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content;

• Interoperable – Ready to be combined with other datasets by humans as well as computer systems;

• Reusable – Ready to be used for future research and to be processed further using computational methods.

https://www.dtls.nl/fair-data/

Page 17: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Approaches to software quality in the context of sustainability • Software Quality Management:

derived or extracted from ISO 9126-3 and ISO 25000:2005 quality model (SQuaRE).

• Consortium for IT Software Quality (CISQ) has defined five major desirable structural characteristics needed for a piece of software to provide business value: Reliability, Efficiency, Security, Maintainability and (adequate) Size.

• Maintainability of software is only one of the quality dimensions. It includes concepts of modularity, understandability, changeability, testability, reusability, and transferability from one development team to another. https://en.wikipedia.org/w/index.php?curid=32370588

Page 18: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

SSI “maintainability checklist”

• Can I find the code that is related to a specific problem or change?

• Can I understand the code? Can I explain the rationale behind it to someone else?

• Is it easy to change the code? Is it easy for me to determine what I need to change as a consequence? Are the number and magnitude of such knock-on changes small?

• Can I quickly verify a change (preferably in isolation)? • Can I make a change with only a low risk of breaking

existing features? • If I do break something, is it quick and easy to detect and

diagnose the problem? https://www.software.ac.uk/developing-maintainable-software

Page 19: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Criteria for evaluating Software Sustainability

https://goo.gl/9YRM01

https://goo.gl/HLi4nD

Page 20: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

From SSI and CLARIAH Criteria to FAIR: Usability

CLARIAH Number CLARIAH Criterion SSI Criterion Explanation FAIR

letter SSI

Criteria CLARIAH Criteria MoSCoW Repository

/ Software

5 Usability 73 42

5.1 Understandability Understandability Is the software easily understood? F1 11 6 M S

5.2 Documentation Documentation Comprehensive well-structured documentation? F2 25 12 M S

5.4 Buildability Buildability Straightforward to build from source on a supported system? ? 11 4 W S

5.5 Installability Installability Straightforward to install and deploy on a supported system? ? 19 10 W S

5.3 Learnability Learnability Easy/intuitive to learn how to use its functions? R1 7 5 C S

5.6 Performance - Does the software perform well? R2 - 5 C S

Page 21: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

From SSI and CLARIAH Criteria to FAIR: Sustainability 1

CLARIAH Number

CLARIAH Criterion SSI Criterion Explanation FAIR

letter SSI

Criteria CLARIAH Criteria MoSCoW Repository

/ Software

6 Sustainability & Manageability 130 45

6.1 Identity Identity Project/software identity is clear and unique? F3 8 3 M R / S

6.2 Copyright & Licensing Copyright Easy to see who owns the project/software? A1 7 3 M R / S

- Licencing Adoption of appropriate licence? A2 5 - (M) R / S

6.14 Governance Governance Easy to understand how the project is run and the development of the software managed? R? 2 ? W ?

6.4 Community Community Evidence of current/future community? R3 11 3 ? ?

6.3 Accessibility Accessibility Evidence of current/future ability to download? A3 12 7 M R / S

6.5 Testability Testability Easy to test correctness of source code? R4 19 4 S S

6.6 Portability Portability Usable on multiple platforms? I1 17* 3 C S

Page 22: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

CLARIAH Number CLARIAH Criterion SSI Criterion Explanation FAIR

letter SSI

Criteria CLARIAH Criteria MoSCoW Repository /

Software

6.7 Supportability Supportability Evidence of current/future developer support? R5 21 2 W R / S

6.8 Analysability** Analysability** Easy to understand at the source level? F4 20 8 M** S

6.9 Changeability Changeability Easy to modify and contribute changes to developers? I2 14 6 W S

6.12 Interoperability Evolvability Evidence of current/future development? R6 5 1 W R / S

6.12 - Interoperability Interoperable with other required/related software? I3 6 - S R / S

6.13 Interoperability for community (CLARIAH) -

Does the software comply to requirements for integration into the community (CLARIAH) infrastructure

I4 - ? C R / S

6.10 Reusability - To what extend is the software reusable? R7 - 3 W*** R / S

6.11 Security & Privacy - Are security and privacy dealt with adequately? R? - 2 S R / S

From SSI and CLARIAH Criteria to FAIR: Sustainability 2

* Several PC/Mac platforms are mentioned, no platforms for mobile devices ** Combine with understandability/documentation *** Is defined by all the other criteria

Page 23: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

CLARIAH’s Minimal Requirements for 4 Configurations

No Minimal Community Requirement (CLARIAH) 1: Actively

Supported End User Software

2: Unsupported End User Software

3: Actively Supported Experimental

Software

4: Unsupported Experimental

Software

1 version control system 3 3 3 3 2 one command that installs all dependencies 1 1 1 1 3 built with one command 4 4 4 4 4 one command that will perform the installation 1 1 - - 5 run in the foreground 1 1 - - 6 runtime dependencies 1 1 - - 7 one uninstall script 1 1 - - 8 installation exceptions - - 1 1 9 README file 20 19 13 13

10 documentation must be stored alongside the code 1 1 1 1 11 documentation must be also stored in an archivable format 1 1 1 1 12 website 6 6 - - 13 user interface of the application should point to the website 1 1 - - 14 user interface should point to the contact information 1 - 1 - 15 API must be discoverable 2 2 - - 16 configuration containing all available options 1 1 - - 17 distributed under an OSI approved licence 2 2 2 2 18 CONTRIBUTING file 1 - 1 - 19 not store usernames and passwords 1 1 - - 20 experimental nature must be clear - - 1 1 21 requirements must be distributed alongside the code 1 1 1 1

50 47 30 28

Page 24: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Scoring the criteria: binary? 5.2 Documentation (Y=1; N=0) No Yes D1 Is the software documented? 0 1 D2 Is the documentation accessible? 0 1 D3 Is the documentation clear? 0 1 D4 Is the documentation complete? 0 1 D5 Is the documentation accurate? 0 1

D6 Does the documentation provide a high level overview of the software? 0 1

D7 Are all the necessary audiences addressed, at their appropriate levels? 0 1

D8 Does the documentation make use of adequate examples? 0 1

D9 Is there troubleshooting information? 0 1

D10 Is the documentation available from the project website? 0 1

D11 Is the documentation under version control? 0 1

D12 Does the documentation describe the latest version? 0 1

CLARIAH

SSI

Page 25: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Or score on a 5-point scale?

5.2 Documentation (Low/Bad/No = 1; High/Very/Yes = 5) Low/Bad/No Moderate High/Very/Yes

D1 How well is the software documented? 0 1 2 3 4

D2 How well is the documentation accessible? 0 1 2 3 4

D3 How clear is the documentation? 0 1 2 3 4

D4 How complete Is the documentation? 0 1 2 3 4

D5 How accurate the documentation? 0 1 2 3 4

D6 Does the documentation provide a high level overview of the software? 0 1 2 3 4

D7 How well are all the necessary audiences addressed, at their appropriate levels? 0 1 2 3 4

D8 How adequately does the documentation make use of examples? 0 1 2 3 4

D9 Is there troubleshooting information? 0 - - - 4

D10 Is the documentation available from the project website? 0 - - - 4

D11 Is the documentation under version control? 0 - - - 4

D12 Does the documentation describe the latest version? 0 - - - 4

Page 26: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

FAIR criteria x MoSCoW

M S C W ? Total F 4 4 A 2 2 I 1 2 1 4 R 2 2 4 1 9 ? 2 2

Total 6 3 4 7 1 21

Required? Additional?

Remember KISS!

Page 27: Does it make sense to apply the FAIR Data Principles to ... · • Combine the principles of core repository certification and FAIR for individual datasets • Use the principles

Thank you for listening!

[email protected] iwww.dans.knaw.nl http://www.dtls.nl/go-fair/ Webinar Doorn & Dillo on FAIR & DSA: https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar