being reproducible: ssbss summer school 2017

60
Being Reproducible: Models, Research Objects and R* Brouhaha Professor Carole Goble, [email protected] The University of Manchester, UK The FAIRDOM Association Coordinator ELIXIR-UK Head of Node Co-lead ELIXIR Interoperability Platform SSBSS 2017, July 17 2017, Cambridge, UK 4th International Synthetic & Systems Biology Summer School

Upload: carole-goble

Post on 22-Jan-2018

294 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Being Reproducible: SSBSS Summer School 2017

Being Reproducible: Models, Research Objects and R* Brouhaha

Professor Carole Goble, [email protected]

The University of Manchester, UKThe FAIRDOM Association CoordinatorELIXIR-UK Head of NodeCo-lead ELIXIR Interoperability Platform

SSBSS 2017, July 17 2017, Cambridge, UK4th International Synthetic & Systems Biology Summer School

Page 2: Being Reproducible: SSBSS Summer School 2017

Reproducibility Rampancy

Page 3: Being Reproducible: SSBSS Summer School 2017

47/53 “landmark” publications

could not be replicated

[Begley, Ellis Nature, 483, 2012]

Page 4: Being Reproducible: SSBSS Summer School 2017

Retraction

http://www.nature.com/news/misconduct-is-the-main-cause-of-life-sciences-retractions-1.11507

Misconduct is the main cause of life-sciences retractionsZoë Corbyn01 October 2012

Page 5: Being Reproducible: SSBSS Summer School 2017

Vahan Simonyan,

Center for Biologics Evaluation and Research

Food and Drug Administration

USA

Page 6: Being Reproducible: SSBSS Summer School 2017

NIH Rigor and Reproducibility

https://www.nih.gov/research-training/rigor-reproducibility

cos.io/top

http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/

Page 7: Being Reproducible: SSBSS Summer School 2017

John P. A. Ioannidis How to Make More Published Research True, October 21, 2014 DOI: 10.1371/journal.pmed.1001747

Page 8: Being Reproducible: SSBSS Summer School 2017

Reproducibility of biological experiments is hard

for in vivo/vitro and for in silico analysis• OS version• Revision of scripts• Data analysis software versions• Version of data files• Command line parameters written on

a napkin• “Black magic” only a grad student

knows

Fix with latest technologies, best practices and willingness

[Keiichiro Ono, Scripps Institute]

The first step is to be FAIRSee the whole of the previous talk…

Page 9: Being Reproducible: SSBSS Summer School 2017

Record AllAutomate AllContain AllAccess All

Findable (Citable)Accessible (Trackable)Interoperable (Intelligible)Reusable (Reproducible)

Page 10: Being Reproducible: SSBSS Summer School 2017

designcherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software

reportingincomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software

Empirical Statistical ComputationalV. Stodden, IMS Bulletin (2013)

Reproducibility and reliability of biomedical research: improving research practice

https://www.sciencenews.org/article/12-reasons-research-goes-wrong

Page 11: Being Reproducible: SSBSS Summer School 2017
Page 12: Being Reproducible: SSBSS Summer School 2017

“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”

Carroll, Through the Looking Glass

re-compute

replicate

rerunrepeat

re-examine

repurpose

recreate

reuse

restore

reconstruct review

regeneraterevise

recycle

redo

robustness tolerance

verification compliance validation assurance

remix

Page 13: Being Reproducible: SSBSS Summer School 2017

Scientific publications goals: (i) announce a result(ii) convince readers its correct.

Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.

Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures.

Virtual Witnessing*

*Leviathan and the Air-Pump: Hobbes, Boyle, and the

Experimental Life (1985) Shapin and Schaffer.

Jill Mesirov

David Donoho

Page 14: Being Reproducible: SSBSS Summer School 2017

“Micro” Reproducibility

“Macro” Reproducibility

Fixivity

Validate

Verify

Trust

Page 15: Being Reproducible: SSBSS Summer School 2017

Repeatability:“Sameness”Same result1 Lab1 experiment

Reproducibility:“Similarity”Similar result> 1 Lab> 1 experimentwhy the differences?

https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html

Validate

Verify

Page 16: Being Reproducible: SSBSS Summer School 2017

Method Reproducibility

the provision of enough detail about study procedures and data so, in theory or in actuality, the same procedures could be exactly repeated.

Result Reproducibility (aka replicability)

obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible

Goodman, et al Science Translational Medicine 8 (341) 2016

Validate

Verify

Page 17: Being Reproducible: SSBSS Summer School 2017

What are you reproducing?Algorithm vs its script conflation

Methodstechniques, algorithms, spec. of the steps, models

Materialsdatasets, parameters, algorithm seeds

Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets

Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment

Page 18: Being Reproducible: SSBSS Summer School 2017

ProductivityTrack differences

Validate

Verify

Page 19: Being Reproducible: SSBSS Summer School 2017

Validate

Verify

Recompute By Degrees

Fixivity - Liveness• New/updated/deprecated methods,

datasets, services, codes, h/w• Snapshots

Dependency – Containment• Streams, non-portable data/software, • 3rd party services, supercomputer access,

licensing restrictions….• Locally contained and maintained• External dependencies

Transparency• Blackboxes, proprietary software,

manual steps

Robustness• Bounds of use• Stochastics, non-deterministics,

contexts

Page 20: Being Reproducible: SSBSS Summer School 2017

https://xkcd.com/797/

Components and Dependencies

Software are typically compound works.

Libraries. Plug-ins. Code fragments.

We are encouraged to reuse and not reinvent

Combining licenses.

License compatibilities

Page 21: Being Reproducible: SSBSS Summer School 2017

Black boxes

• closed codes

• closed external or cloud services

• method obscurity

• manual steps

[Thanks to Jason Scott]

Page 22: Being Reproducible: SSBSS Summer School 2017

The Reproducibility Window

all experiments become less reproducible over time….

• Can’t contain everything– Pesky Internet in a Box

• Can’t automate everything– Pesky people intervening

• Can’t fix and fossils everything– Pesky science keeps changing

Results may vary

Page 23: Being Reproducible: SSBSS Summer School 2017

Bonus slide

At SSBSS Theodor Gescher came up with REALSCI

Robust -many runs

Environment -describe the equipment/OS

Another -done by not your lab

Limits -parameters

Standards -well understood/comprehensible methods

Complete -not cherry picking

Immortal -community supported commodity systems

Page 24: Being Reproducible: SSBSS Summer School 2017

Mixed Central and Distributed stores: Containment and Dependencies. Upload vs Referencing

In House Stores

External Databases

Publishing services

Model Resources

Page 25: Being Reproducible: SSBSS Summer School 2017

Mixed Central and Distributed stores: Containment and Dependencies. Upload vs Referencing

In House Stores

External Databases

Publishing services

Model Resources

Migrations into FAIRDOMHubFor long term reproducibility

Page 26: Being Reproducible: SSBSS Summer School 2017

Shades of ReproducibilityRunning an active instrumentReading an archived record

Are you using hard-wired localhost ids?

WorkflowsSOPs

Containers, cloud services, common services

Markup languages, reporting guidelines and checklists, ontologies, catalogues

Sounds hard….what can I do?

Catalogue

Page 27: Being Reproducible: SSBSS Summer School 2017

Protocol specs and sharing…

A language for specifying experimental protocols for biological research in way that is precise, unambiguous, and understandable by both humans and computers.

Page 28: Being Reproducible: SSBSS Summer School 2017

Validation Data

https://fairdomhub.org/sops/203https://fairdomhub.org/investigations/56

Page 29: Being Reproducible: SSBSS Summer School 2017

Standard Operating Procedures

Quality Control

Page 30: Being Reproducible: SSBSS Summer School 2017

in situ reproducible models in FAIRDOMmetadata annotation against standardsvalidation, comparison and simulation

SBML Model simulation

Model comparison

Model versioning

Reproducing simulations

[Jacky Snoep, Dagmar Waltemath, Martin Peters, Martin Scharm]

JWS Online

Page 31: Being Reproducible: SSBSS Summer School 2017

Tracking versi0ns

Page 32: Being Reproducible: SSBSS Summer School 2017

Tracking model versions smartly

Scharm, M., Wolkenhauer, O., & Waltemath, D. (2015). An algorithm to detect and communicate the differences in computational models describing biological systems. Bioinformatics, btv484

Page 33: Being Reproducible: SSBSS Summer School 2017

Model simulation in FAIRDOMHubusing JWS Online

Page 34: Being Reproducible: SSBSS Summer School 2017

A simulation database allows a one-click, live figure reproduction in a FAIRDOM-SEEK

JWS model Excel data file

Dagmar Waltemath, Uni RostockJacky Snoep, Uni Stellenbosch

Simulation Experiment Description Markup Language: XML-based format for encoding simulation setups, to ensure exchangeability and reproducibility of simulation experiments• which models to use in an experiment,• modifications to apply on the models before using them,• which simulation procedures to run on each model,• what analysis results to output,• and how the results should be presented.

Page 35: Being Reproducible: SSBSS Summer School 2017

FAIRDOMHub Journal ProgrammeMolecular Systems Biology

Page 36: Being Reproducible: SSBSS Summer School 2017

Model Technical curation forJournals

[Jacky Snoep (Stellenbosch), Dagmar Waltemath, Martin Peters, Martin Scharm (Rostock)]

* store DOI citable supplementary files on FAIRDOMHub** model and data curation*** reproducible clickable figures in papers using SED-ML

Page 37: Being Reproducible: SSBSS Summer School 2017

CataloguingPackaging

Penkler, G., du Toit, F., Adams, W., Rautenbach, M., Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015), Construction and validation of a detailed kinetic model of glycolysis in Plasmodium falciparum. FEBS J, 282: 1481–1511. doi:10.1111/febs.13237

https://fairdomhub.org/investigations/56

DOI: 10.15490/seek.1.investigation.56

Snapshotpreservation

active

Page 38: Being Reproducible: SSBSS Summer School 2017

18/07/2017 39

An “evolving manuscript” would begin with a pre-publication, pre-peer review “beta 0.9” version of an article, followed by the approved published article itself, [ … ] “version 1.0”.

Subsequently, scientists would update this paper with details of further work as the area of research develops. Versions 2.0 and 3.0 might allow for the “accretion of confirmation [and] reputation”.

Ottoline Leyser […] assessment criteria in science revolve around the individual. “People have stopped thinking about the scientific enterprise”.

http://www.timeshighereducation.co.uk/news/evolving-manuscripts-the-future-of-scientific-communication/2020200.article

Page 39: Being Reproducible: SSBSS Summer School 2017

Packaging: CombineArchivehttps://sems.uni-rostock.de/projects/combinearchive/

Scharm M, Wendland F, Peters M, Wolfien M, Theile T, Waltemath DSEMS, University of Rostock

zip-like file with a manifest & metadata- Bundling files - Keeping provenance- Exchanging data - Shipping results

Bergmann, F. T., Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1.

Page 40: Being Reproducible: SSBSS Summer School 2017

Standards-based metadata framework for bundling (scattered) resources with context and citation

Packaging: Research Objects

http://researchobject.org

Page 41: Being Reproducible: SSBSS Summer School 2017

Packaging: Research Objects

Publishing Archive

InstitutionalArchive

1.Export2.Exchange

http://researchobject.org

Page 42: Being Reproducible: SSBSS Summer School 2017

Manifest Construction

Container

Manifest Description

Packaging Platforms: Zip files, BagIt,

Docker, Conda, Singularity

Repositories FAIRDOMHub

Packaging:Research Objects in a nutshell

Different manifest description profiles for different kinds of objects

Page 43: Being Reproducible: SSBSS Summer School 2017

From Virtual Machines to Executable Containersfor portable execution

• Containers everything required to make a piece of software run is packaged into isolated containers.

• Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work.

• Efficient, lightweight, self-contained systems• Guarantees that software will always run the same,

regardless of where it’s deployed.

https://www.software.ac.uk/c4rr/ https://biocontainers.pro/

Biocontainers

Page 44: Being Reproducible: SSBSS Summer School 2017

Use commodity and community systems

Sustained platformsCommunities to drive themTooling and training

Spreadsheets are the Cockroaches of Science

Page 45: Being Reproducible: SSBSS Summer School 2017

EU FAIR Data Expert Group Consultation

https://github.com/FAIR-Data-EG/consultation/issues

Page 46: Being Reproducible: SSBSS Summer School 2017

What to know more?Go on a Software or Data Carpentry Course

https://tess.elixir-europe.org

Page 47: Being Reproducible: SSBSS Summer School 2017

Make software open and reusable

Page 48: Being Reproducible: SSBSS Summer School 2017

Software Sustainability Institute , http://www.software.ac.uk

Goble, Better Software Better Research IEEE Internet Computing 18(5), (2014 ) DOI: 10.1109/MIC.2014.88

Jiménez RC, Kuzak M, Alhamdoosh M et al.Four simple recommendations to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1)

Page 49: Being Reproducible: SSBSS Summer School 2017

Use Common PlatformsGet the licencing right…MATLABMathematica….

Proprietary software

Cloud Centralised Service insitu reproducibility….

Galaxy

FAIRDOMHub + JWS Online

Blackbox vs Whitebox

Page 50: Being Reproducible: SSBSS Summer School 2017

https://view.commonwl.org/workflows/github.com/ProteinsWebTeam/ebi-metagenomics-cwl/tree/fa86fce/workflows/rna-selector.cwl

Use and document workflowspreferrably a workflow management system, Living Research Objects!

http://commonwl.org/

Workflow repository

Page 51: Being Reproducible: SSBSS Summer School 2017

Use a workflow – the vision!preferrably a workflow management systempreferrably described using Common Workflow Language

Experimentalworkflows

Event BUS Business Process Management

Taverna Knime Galaxy

WorkflowBPM layer

WorkflowComputationApplicationlayer

Computing resources DatabasesEffector layer

Front-endWeb interface / Monitoring interface

PipelinePilot

FAIRDOM SEEKWorkflow repository

Workflow portal

repository

launch, results

FAIRDOM

[Jean Loup Fallon, Carole Goble]

Page 52: Being Reproducible: SSBSS Summer School 2017

https://hive.biochemistry.gwu.edu/htscsrs/workshop_2017

Reproducible Pipelines for Robust Regulation

BioCompute Objects

Emphasis on fixing the pipeline so it can be replicated, and on reporting the parameter space

Page 53: Being Reproducible: SSBSS Summer School 2017

Use an Electronic Lab Notebook

Page 54: Being Reproducible: SSBSS Summer School 2017

What can you do?

• Follow the 10 RACA Principles

• Take action, be imperfect

• Demand reproducibility in reviews.

• Educate your PIs and supervisors.

Page 55: Being Reproducible: SSBSS Summer School 2017

[Norman Morrison]

Technological Debt: Appropriate EffortRetrospective Reusability

Page 56: Being Reproducible: SSBSS Summer School 2017

What are the incentives?

[Garza] [Malone] [Resnik]

Page 57: Being Reproducible: SSBSS Summer School 2017

Acknowledgements

• David De Roure• Tim Clark• Sean Bechhofer• Robert Stevens• Christine Borgman • Victoria Stodden• Marco Roos• Jose Enrique Ruiz del Mazo• Oscar Corcho• Ian Cottam• Steve Pettifer• Magnus Rattray• Chris Evelo• Katy Wolstencroft• Robin Williams• Pinar Alper• C. Titus Brown• Greg Wilson• Kristian Garza

• Juliana Freire• Jill Mesirov• Simon Cockell• Paolo Missier• Paul Watson• Gerhard Klimeck• Matthias Obst• Jun Zhao• Pinar Alper• Daniel Garijo• Yolanda Gil• James Taylor• Alex Pico• Sean Eddy• Cameron Neylon• Barend Mons• Kristina Hettne• Stian Soiland-Reyes• Rebecca Lawrence• Michael Crusoe

Page 58: Being Reproducible: SSBSS Summer School 2017

Jon Olav Vik, Norwegian University of Life Science

Maksim ZakhartsevUniversity Hohenheim, Stuttgart, Germany

Alexey KolodkinSiberian BranchRussian Academy of Sciences

Tomasz Zieliński,SynthSys CentreUniversity Edinburgh, UK

Martin Peters, Martin Scharm Systems Biology BioinformaticsUniversity of Rostock, Germany

Page 59: Being Reproducible: SSBSS Summer School 2017

Web sites

• Force11 http://www.force11.org

• TeSS https://tess.elixir-europe.org

• FAIRDOM http://www.fair-dom.org

• FAIRDOMHub http://www.fairdomhub.org

• Software Carpentry http://software-carpentry.org

• Data Carpentry http://datacarpentry.org

• Software Sustainability Institute http://www.software.ac.uk

• Rightfield http://www.rightfield.org.uk

• FAIRSharing http://www.fairsharing.org

• Common Workflow Language http://commonwl.org/

Page 60: Being Reproducible: SSBSS Summer School 2017

Reading List (refs also throughout)

• John P. A. Ioannidis How to Make More Published Research True, October 21, 2014 DOI: 10.1371/journal.pmed.1001747

• Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

• Steven N. Goodman*, Daniele Fanelli and John P. A. Ioannidis, What does research reproducibility mean? Science Translational Medicine 01 Jun 2016: Vol. 8, Issue 341, pp. 341ps12 DOI: 10.1126/scitranslmed.aaf5027

• Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285

• Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Science 2.0 Repositories: Time for a Change in Scholarly Communication, D-Lib Magazine January/February 2015, Volume 21, Number 1/2 , DOI: 10.1045/january2015-assante

• Waltemath, D., Henkel, R., Hälke, R., Scharm, M., & Wolkenhauer, O. (2013). Improving the reuse of computational models through version control.Bioinformatics, 29(6), 742-748.

• Bergmann, F. T., Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1.

• Scharm, M., Wolkenhauer, O., & Waltemath, D. (2015). An algorithm to detect and communicate the differences in computational models describing biological systems. Bioinformatics, btv484

• http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328

• http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/