does it make sense to apply the fair data principles to ... · • combine the principles of core...
Post on 03-Aug-2020
5 Views
Preview:
TRANSCRIPT
www.dans.knaw.nl DANS is an institute of KNAW and NWO
Does it make sense to apply the FAIR Data Principles to Software?
Peter Doorn, Director DANS
Sustainable Software Sustainability Workshop The Hague, March 7-9, 2017
@pkdoorn @dansknaw
Sustaining software
Preserving the code
Keep the software running
Sustain the service
Software Archive
Keep or emulate a platform for the
software to run on
Keep the knowledge and support for users
Costs & Complexity _
+
Institute of Dutch Academy and Research
Funding Organisation
(KNAW & NWO) since 2005
First predecessor dates back to
1964 (Steinmetz Foundation),
Historical Data Archive 1989
Mission: promote and
provide permanent
access to digital research resources
DANS is about keeping data FAIR
1. Our customers ask us 2. Research Infrastructures demand it (eg. CLARIN) 3. Replication of results requires both data and the tools that
process them 4. Information about data is often encapsulated in software 5. Linking data and other research resources (publications,
project information - NARCIS) 6. Software as a special form of data (that is executable) 7. SoSu as required element in Research Data Management
Why do we care about software?
CLARIN: “Centres take care for the sustainability of data and tools and the permanent access to them”
Work on SoSu to which DANS contributed
• TDS Curator project (2010-2012): use case to “curate” the software and data of the “Typological Database System”: https://goo.gl/uTzRhH
• Workshops on Software Sustainability (eHumanities/KNAW, Amsterdam, 2013; Knowledge Exchange, Berlin, 2015; NCDD, Amsterdam, 2016)
• Reports on SoSu: https://goo.gl/BG69Au – Software sustainability at the Heart of Discovery (with Netherlands
eScience Center) – A Conceptual Approach to Data Stewardship and Software Sustainability – Software Sustainability - final report (commissioned by NCDD and NDE) – Guidelines for Software Quality (with Radboud University Nijmegen,
Huygens Institute, CLARIAH): https://goo.gl/HLi4nD – Research Software Sustainability: Report on Knowledge Exchange
workshop by Simon Hettrick, SSI (Berlin, October 2015): https://goo.gl/0Dm4kQ
• Webinar on SoSu: https://goo.gl/ShmyZH • Memorandum of Understanding, INRIA, Software Heritage Archive
DANS and Data Seal of Approval (DSA)
• 2005: Formulate quality guidelines for digital repositories
including DANS
• 2006: 5 basic principles as basis for 16 DSA guidelines
• 2009: international DSA Board
• Almost 70 seals acquired around the globe, but with a focus on Europe
The Certification Pyramid
ISO 16363:2012 - Audit and certification of trustworthy digital repositories http://www.iso16363.org/
DIN 31644 standard “Criteria for trustworthy digital archives” http://www.langzeitarchivierung.de
http://www.datasealofapproval.org/ https://www.icsu-wds.org/
Partnership DSA and World Data System (WDS)
New common requirements
• Context (1)
• Organizational infrastructure (6) • Digital object management (8) • Technology (2)
• Additional information and applicant
feedback (1) http://goo.gl/LLZqis
Repository Requirements dealing with data quality
2. The repository maintains all applicable licenses covering data access and use and monitors compliance.
3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.
4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.
7. The repository guarantees the integrity and authenticity of the data. 8. The repository accepts data and metadata based on defined criteria to ensure
relevance and understandability for data users. 10.The repository assumes responsibility for long-term preservation and manages
this function in a planned and documented way. 11.The repository has appropriate expertise to address technical data and
metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.
13.The repository enables users to discover the data and refer to them in a persistent way through proper citation.
14.The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data
Resemblance DSA – FAIR principles
DSA Principles (for data repositories) FAIR Principles (for data sets)
data can be found on the internet Findable
data are accessible Accessible
data are in a usable format Interoperable
data are reliable Reusable
data can be referred to (citable)
The resemblance is not perfect: • usable format (DSA) is an aspect of interoperability (FAIR) • FAIR explicitly addresses machine readability • etc.
A certified TDR already offers a baseline data quality level
Combine and operationalize: DSA & FAIR
• Growing demand for quality criteria for research datasets and
ways to assess their fitness for use
• Combine the principles of core repository certification and FAIR for individual datasets
• Use the principles as quality criteria: • Core certification – digital repositories • FAIR – research data (sets)
• Operationalize the principles as an instrument to assess
FAIRness of existing datasets in certified TDRs
FAIR badge scheme
• First Badge System based on the FAIR principles: proxy for data quality assessment
• Operationalise the original principles to ensure no interactions among dimensions to ease scoring
• Consider Reusability as the resultant of the other three:
– the average FAIRness as an indicator of data quality
– (F+A+I)/3=R • Manual and automatic scoring
F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads
Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in
license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard
vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context
Can we use DSA and FAIR for software?
• Neil will speak about a “Software Seal of Approval” on Thursday morning
• I will focus on FAIR principles for Software • Why use the FAIR data principles and not use existing criteria
dedicated to software quality? • Do the FAIR principles cover the right aspects? • Do they cover enough ground? • How to operationalize and implement them? • How to proceed?
Why use the FAIR data principles and not use existing criteria dedicated for software quality?
• Simplicity: the FAIR principles are simple • Popularity: everybody seems to like them • Software as a special kind of data • Extend Data Management Planning with
requirements about software • Political: research funders are embracing
the FAIR principles • Operationalization and Implementation are
not fixed
“Specify what methods or software tools are needed to access the data? Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)?” https://goo.gl/7Jqqs5
In the FAIR Data approach, data should be:
• Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets;
• Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content;
• Interoperable – Ready to be combined with other datasets by humans as well as computer systems;
• Reusable – Ready to be used for future research and to be processed further using computational methods.
https://www.dtls.nl/fair-data/
Approaches to software quality in the context of sustainability • Software Quality Management:
derived or extracted from ISO 9126-3 and ISO 25000:2005 quality model (SQuaRE).
• Consortium for IT Software Quality (CISQ) has defined five major desirable structural characteristics needed for a piece of software to provide business value: Reliability, Efficiency, Security, Maintainability and (adequate) Size.
• Maintainability of software is only one of the quality dimensions. It includes concepts of modularity, understandability, changeability, testability, reusability, and transferability from one development team to another. https://en.wikipedia.org/w/index.php?curid=32370588
SSI “maintainability checklist”
• Can I find the code that is related to a specific problem or change?
• Can I understand the code? Can I explain the rationale behind it to someone else?
• Is it easy to change the code? Is it easy for me to determine what I need to change as a consequence? Are the number and magnitude of such knock-on changes small?
• Can I quickly verify a change (preferably in isolation)? • Can I make a change with only a low risk of breaking
existing features? • If I do break something, is it quick and easy to detect and
diagnose the problem? https://www.software.ac.uk/developing-maintainable-software
Criteria for evaluating Software Sustainability
https://goo.gl/9YRM01
https://goo.gl/HLi4nD
From SSI and CLARIAH Criteria to FAIR: Usability
CLARIAH Number CLARIAH Criterion SSI Criterion Explanation FAIR
letter SSI
Criteria CLARIAH Criteria MoSCoW Repository
/ Software
5 Usability 73 42
5.1 Understandability Understandability Is the software easily understood? F1 11 6 M S
5.2 Documentation Documentation Comprehensive well-structured documentation? F2 25 12 M S
5.4 Buildability Buildability Straightforward to build from source on a supported system? ? 11 4 W S
5.5 Installability Installability Straightforward to install and deploy on a supported system? ? 19 10 W S
5.3 Learnability Learnability Easy/intuitive to learn how to use its functions? R1 7 5 C S
5.6 Performance - Does the software perform well? R2 - 5 C S
From SSI and CLARIAH Criteria to FAIR: Sustainability 1
CLARIAH Number
CLARIAH Criterion SSI Criterion Explanation FAIR
letter SSI
Criteria CLARIAH Criteria MoSCoW Repository
/ Software
6 Sustainability & Manageability 130 45
6.1 Identity Identity Project/software identity is clear and unique? F3 8 3 M R / S
6.2 Copyright & Licensing Copyright Easy to see who owns the project/software? A1 7 3 M R / S
- Licencing Adoption of appropriate licence? A2 5 - (M) R / S
6.14 Governance Governance Easy to understand how the project is run and the development of the software managed? R? 2 ? W ?
6.4 Community Community Evidence of current/future community? R3 11 3 ? ?
6.3 Accessibility Accessibility Evidence of current/future ability to download? A3 12 7 M R / S
6.5 Testability Testability Easy to test correctness of source code? R4 19 4 S S
6.6 Portability Portability Usable on multiple platforms? I1 17* 3 C S
CLARIAH Number CLARIAH Criterion SSI Criterion Explanation FAIR
letter SSI
Criteria CLARIAH Criteria MoSCoW Repository /
Software
6.7 Supportability Supportability Evidence of current/future developer support? R5 21 2 W R / S
6.8 Analysability** Analysability** Easy to understand at the source level? F4 20 8 M** S
6.9 Changeability Changeability Easy to modify and contribute changes to developers? I2 14 6 W S
6.12 Interoperability Evolvability Evidence of current/future development? R6 5 1 W R / S
6.12 - Interoperability Interoperable with other required/related software? I3 6 - S R / S
6.13 Interoperability for community (CLARIAH) -
Does the software comply to requirements for integration into the community (CLARIAH) infrastructure
I4 - ? C R / S
6.10 Reusability - To what extend is the software reusable? R7 - 3 W*** R / S
6.11 Security & Privacy - Are security and privacy dealt with adequately? R? - 2 S R / S
From SSI and CLARIAH Criteria to FAIR: Sustainability 2
* Several PC/Mac platforms are mentioned, no platforms for mobile devices ** Combine with understandability/documentation *** Is defined by all the other criteria
CLARIAH’s Minimal Requirements for 4 Configurations
No Minimal Community Requirement (CLARIAH) 1: Actively
Supported End User Software
2: Unsupported End User Software
3: Actively Supported Experimental
Software
4: Unsupported Experimental
Software
1 version control system 3 3 3 3 2 one command that installs all dependencies 1 1 1 1 3 built with one command 4 4 4 4 4 one command that will perform the installation 1 1 - - 5 run in the foreground 1 1 - - 6 runtime dependencies 1 1 - - 7 one uninstall script 1 1 - - 8 installation exceptions - - 1 1 9 README file 20 19 13 13
10 documentation must be stored alongside the code 1 1 1 1 11 documentation must be also stored in an archivable format 1 1 1 1 12 website 6 6 - - 13 user interface of the application should point to the website 1 1 - - 14 user interface should point to the contact information 1 - 1 - 15 API must be discoverable 2 2 - - 16 configuration containing all available options 1 1 - - 17 distributed under an OSI approved licence 2 2 2 2 18 CONTRIBUTING file 1 - 1 - 19 not store usernames and passwords 1 1 - - 20 experimental nature must be clear - - 1 1 21 requirements must be distributed alongside the code 1 1 1 1
50 47 30 28
Scoring the criteria: binary? 5.2 Documentation (Y=1; N=0) No Yes D1 Is the software documented? 0 1 D2 Is the documentation accessible? 0 1 D3 Is the documentation clear? 0 1 D4 Is the documentation complete? 0 1 D5 Is the documentation accurate? 0 1
D6 Does the documentation provide a high level overview of the software? 0 1
D7 Are all the necessary audiences addressed, at their appropriate levels? 0 1
D8 Does the documentation make use of adequate examples? 0 1
D9 Is there troubleshooting information? 0 1
D10 Is the documentation available from the project website? 0 1
D11 Is the documentation under version control? 0 1
D12 Does the documentation describe the latest version? 0 1
CLARIAH
SSI
Or score on a 5-point scale?
5.2 Documentation (Low/Bad/No = 1; High/Very/Yes = 5) Low/Bad/No Moderate High/Very/Yes
D1 How well is the software documented? 0 1 2 3 4
D2 How well is the documentation accessible? 0 1 2 3 4
D3 How clear is the documentation? 0 1 2 3 4
D4 How complete Is the documentation? 0 1 2 3 4
D5 How accurate the documentation? 0 1 2 3 4
D6 Does the documentation provide a high level overview of the software? 0 1 2 3 4
D7 How well are all the necessary audiences addressed, at their appropriate levels? 0 1 2 3 4
D8 How adequately does the documentation make use of examples? 0 1 2 3 4
D9 Is there troubleshooting information? 0 - - - 4
D10 Is the documentation available from the project website? 0 - - - 4
D11 Is the documentation under version control? 0 - - - 4
D12 Does the documentation describe the latest version? 0 - - - 4
FAIR criteria x MoSCoW
M S C W ? Total F 4 4 A 2 2 I 1 2 1 4 R 2 2 4 1 9 ? 2 2
Total 6 3 4 7 1 21
Required? Additional?
Remember KISS!
Thank you for listening!
peter.doorn@dans.knaw.nl iwww.dans.knaw.nl http://www.dtls.nl/go-fair/ Webinar Doorn & Dillo on FAIR & DSA: https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar
top related