what is reproducibility? the r* brouhaha (and how research objects can help)
TRANSCRIPT
![Page 1: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/1.jpg)
What is Reproducibility?
The R* brouhaha(and how Research Objects can help)
Professor Carole GobleThe University of Manchester, UKSoftware Sustainability Institute, UKELIXIR-UK, FAIRDOM Association [email protected] International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
![Page 2: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/2.jpg)
Acknowledgements• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041• ATI Symposium Reproducibility, Sustainability and Preservation , April
2016– https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/– https://osf.io/bcef5/files/
• C Titus Brown• Juliana Freire• David De Roure• Stian Soiland-Reyes• Barend Mons• Tim Clark• Daniel Garijo• Norman Morrison
![Page 3: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/3.jpg)
“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”
Carroll, Through the Looking Glass
re-compute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
robustness tolerance
verification compliance validation assurance
remix
![Page 4: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/4.jpg)
Reproducibility of Reproducibility Research
![Page 5: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/5.jpg)
Computational Science
http://tpeterka.github.io/maui-project/From: The Future of Scientific Workflows, Report of DOE Workshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
1. Observational, experimental
2. Theoretical3. Simulation4. Data
intensive
![Page 6: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/6.jpg)
BioSTIF
Computational Science
![Page 7: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/7.jpg)
Scientific publications goals: (i) announce a result(ii) convince readers its correct.
Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.
Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures.
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
Jill Mesirov
David Donoho
![Page 8: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/8.jpg)
Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, services
Slide
share
Github
figsh
are
Commun
ityDB
Arxiv.o
rg
Pubm
ed
Docke
rim
age
Codes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware
Systems of SystemsHeterogeneous hybrid patchwork of tools and service evolving over time
![Page 9: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/9.jpg)
10 “Simple” Rules for Reproducible Computational
Research: RACE1. For Every Result, Keep Track of How It
Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All
External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When
Possible in Standardized Formats6. For Analyses That Include Randomness,
Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output,
Allowing Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying Results
10.Provide Public Access to Scripts, Runs, and Results
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Record Everything
Automate Everything
Contain Everything
ExposeEverything
![Page 10: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/10.jpg)
Preparation painindependent testing trials and
tribulations
[Norman Morrison]
replication hostility no funding, time, recognition, place to publishresource intensive access to the complete environment
![Page 11: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/11.jpg)
Lab Analogy: Witnessing “Datascopes”
Input Data
Software
Output Data
ConfigParameters
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, , ref resourcesLaboratory
sw and hw infrastructure, systems software, integrative platformscomputational environment
Setup
![Page 12: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/12.jpg)
“Micro” Reproducibility
“Macro” Reproducibility
Fixivity
Validate
Verify
Trust
![Page 13: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/13.jpg)
Repeat, Replicate, Robust
[C Titus Brown]
https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html
Why the differences?
Reproduce, Trust
![Page 14: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/14.jpg)
“an experiment is reproducible until another
laboratory tries to repeat it” Alexander Kohn
Repeatability:“Sameness”Same result1 Lab1 experiment
Reproducibility:“Similarity”Similar result> 1 Lab> 1 experiment
Validate
Verify
![Page 15: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/15.jpg)
Method Reproducibility
the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated.
Result Reproducibility (aka replicability)
obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible
What does research reproducibility mean? Steven N. Goodman, Daniele Fanelli, John P. A. Ioannidis Science Translational Medicine 8 (341), 341ps12. [doi: 10.1126/scitranslmed.aaf5027] http://stm.sciencemag.org/content/scitransmed/8/341/341ps12.full.pdf
![Page 16: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/16.jpg)
ProductivityTrack differences
Validate
Verify
![Page 17: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/17.jpg)
reviewers want additional workstatistician wants more runsanalysis needs to be repeatedpost-doc leaves, student arrivesnew/revised datasetsupdated/new versions of algorithms/codessample was contaminatedbetter kit - longer simulationsnew partners, new projects
Personal & Lab
Productivity
Public GoodReproducibili
ty
![Page 18: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/18.jpg)
“Datascope” Lab Analogy
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets
Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment
Setup
![Page 19: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/19.jpg)
“Datascope” Lab Analogy
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets
Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment
Setup Form
Function
![Page 20: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/20.jpg)
“Datascope” Practicalities
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seeds
Experim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets
Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment
Setup
Living DependenciesScience, methods, datasetsquestions stay, answers change
breakage, labs decay, services and techniques come and go, new instruments, updated datasets, services, codes, hardware
One offs, streams,stochastics, sensitivities,scale, non-portable datablack boxes
supercomputer accessnon-portable softwarelicensing restrictionsunreliable resourcesblack boxescomplexity
![Page 21: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/21.jpg)
T1 T2
evolving ref datasets,new simulation codes
EnvironmentArchived vs Active
Contained vs DistributedRegimented vs Free-for-
allWho owns the dependencies?
Dependencies -> Manage
Black boxes -> Expose
Dynamics -> FixityReliability
![Page 22: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/22.jpg)
Replicate harder than Reproduce?
Repeating the experiment or the set up?
Container Conundrum Results will Vary
Replicability WindowAll experiments become less replicable over
timePrepare to repair
![Page 23: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/23.jpg)
Levels of Computational Reproducibility
Coverage: how much of an experiment is reproducible
Orig
inal
Exp
erim
ent S
imila
r Exp
erim
ent Di
ffere
nt E
xper
imen
tPo
rtabi
lity
Depth: how much of an experiment is available
Binaries + Data
Source Code / Workflow+ Data
Binaries + Data + Dependencies
Source Code / Workflow+ Data + Dependencies
Virtual MachineBinaries + Data + Dependencies
Virtual MachineSource Code / Workflow+ Data + Dependencies
Figures + Data
[Freire, 2014]
Minimum: data and source code available under terms that permit inspection and execution.
![Page 24: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/24.jpg)
Measuring Information Gain from Reproducibility
Research goal
Method/Alg.
Platform/Exec Env
Data Parameters
Input data
Actors
Information Gain Consistency
Robustness/S
ensitivity
Generality
Portability/Adoption 1
Portability/Adoption 2
Independent validation
Repurposability
Implementation/Code
No changeChangeDon’t care
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/
http://www.dagstuhl.de/16041
![Page 25: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/25.jpg)
How? Preserve by Reporting, Reproduce by Reading
Archived Record
Description Zoostandards, common metadata
![Page 26: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/26.jpg)
How? Preserve by Maintaining, Repairing, ContainingReproduce by Running, Emulating, Reconstructing
Active Instrument Byte level Buildability Zoo
![Page 27: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/27.jpg)
provenance
portability, preservation
robustness, versioning
access descriptionstandards
common APIslicensing, identifiers
standards,common metadata
change variation sensitivity
discrepancy handling
packaging, containers
FAIR RACE Reproducibility Dimensions
dependenciessteps
![Page 28: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/28.jpg)
Research ObjectStandards-based metadata framework for logically and physically bundling resources with context,
http://researchobject.org
Bigger on the inside than the outsideexternal referencing
![Page 29: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/29.jpg)
Manifest Constructi
on
Aggregates link things
togetherAnnotations
about things & their
relationships
Container
Research Object Standards-based metadata framework for logically and physically bundling resources with context, http://researchobject.org
Packaging content & links: Zip files, BagIt, Docker
images
Catalogues & Commons Platforms: FAIRDOM
Manifest Descripti
onDependencies
what else is needed
Versioning its evolution
Checklists what should be there
Provenance
where it came from
Identificationlocate things
regardless whereid
![Page 30: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/30.jpg)
Systems Biology Commons• Link data,
models and SOPs
• Standards• Span resources• Snapshot +
DOIs• Bundle and
export• Logical bundles
![Page 31: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/31.jpg)
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, J Web Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Workflow Research Objects exchange, portability and maintenance
*https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
![Page 32: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/32.jpg)
Asthma Research e-Lab
Dataset building and releasing
Standardised packing of Systems Biology models
European Space Agency RO Library
Large dataset management for life science workflows
LHC ATLAS experiments
Notre Dame U Rostock
Encyclopedia of DNA Elements
PeptideAtlas
![Page 33: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/33.jpg)
Words matter.
Reproducibility is not a end. Its a means to an end.Beware reproducibility
zealots.
50 Shades of Reproducibility.form vs function
A conundrum: big co-operative data-driven science makes
reproducibility desirable but also means
dependency and change are to be expected.
Lab analogy for computational
science
![Page 34: What is Reproducibility? The R* brouhaha (and how Research Objects can help)](https://reader036.vdocuments.site/reader036/viewer/2022081507/5876938d1a28abab2f8b6049/html5/thumbnails/34.jpg)
Bonus Slides