2017-11-03 provenance and research object

23
Partners Funding bioexcel.eu Provenance and Research Object 1 Stian Soiland-Reyes eScience Lab, The University of Manchester 2017-11-03, Aix-en-Provence CESAB workshop: Reproducible Workflows orcid.org/0000-0001-9842-9718 @ soilandreyes This work is licensed under a Creative Commons Attribution 4.0 International License .

Upload: stian-soiland-reyes

Post on 28-Jan-2018

15 views

Category:

Science


0 download

TRANSCRIPT

Page 1: 2017-11-03 Provenance and Research Object

Partners Funding

bioexcel.eu

Provenance and Research Object

1

Stian Soiland-Reyes

eScience Lab, The University of Manchester

2017-11-03, Aix-en-Provence

CESAB workshop: Reproducible Workflows

orcid.org/0000-0001-9842-9718 @soilandreyes

This work is licensed under aCreative Commons Attribution 4.0 International License.

Page 2: 2017-11-03 Provenance and Research Object

bioexcel.eu

http://www.myexperiment.org Find and Share

Page 3: 2017-11-03 Provenance and Research Object

bioexcel.eu

https://view.commonwl.org/

http://doi.org/10.7490/f1000research.1114375.1

Page 4: 2017-11-03 Provenance and Research Object

bioexcel.eu

Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.

http://www.w3.org/TR/prov-overview/

Core PROV model

Entity – A “thing” in the worldDocument, Excel file, database row, molecule, LEGO structure, house, …

Activity – Something that happened Usually defined start/end time May use and generate entities

Agent – Someone/something Participating in activitiesPerson, SoftwareAgent, Organization

Key principles:Provenance statements point backwards in timeAny PROV document is one particular view on historyMore than one entity can describe same “thing”

Page 5: 2017-11-03 Provenance and Research Object

bioexcel.eu

AttributionWho collected this sample? Who helped?

Which lab performed the sequencing?

Who did the data analysis?

Who wrote the analysis workflow?

Who made the data set used by analysis?

Who curated the results?

AliceThe lab

Data

wasAttributedTo

actedOnBehalfOf

Why do I need this?i. To be recognized for my workii. Who should I give credits to?iii. Who should I complain to?iv. Can I trust them?v. Who should I make friends with?

Page 6: 2017-11-03 Provenance and Research Object

bioexcel.eu

Derivation

Which sample was this metagenome sequenced from?

Which meta-genomes was this sequence extracted from?

Which sequence was the basis for the results?

What is the previous revision of the new results?

wasDerivedFrom

wasQuotedFrom

Sequence

New results

wasDerivedFrom

Sample

Meta -genome

Old results

wasRevisionOf

wasInfluencedBy

Why do I need this?i. To verify consistency (did I use

the correct sequence?)ii. To find the latest revisioniii. To backtrack where a diversion

appeared after a changeiv. To credit work I depend onv. Auditing and defence for

peer review

Page 7: 2017-11-03 Provenance and Research Object

bioexcel.eu

Activities

What happened? When? Who?

What was used and generated?

Why was this workflow started?

Which workflow ran? Where?

used

wasGeneratedBy

wasStartedAt

"2012-06-21"

Metagenome

Sample

wasAssociatedWith

Workflow server

wasInformedBy

wasStartedBy

Workflow run

wasGeneratedBy

Results

Sequencing

wasAssociatedWith

Alice

hadPlan

Workflow definition

hadRole

Lab technician

Results

Why do I need this?i. To see which analysis was performedii. To find out who did whatiii. What was the metagenome

used for?iv. To understand the whole process

“make me a Methods section”v. To track down inconsistencies

Page 8: 2017-11-03 Provenance and Research Object

bioexcel.eu

Input ports

Processors

Output ports

Workflow

Typical (?) workflow structure

Data links

http://taverna.incubator.apache.org/

Page 9: 2017-11-03 Provenance and Research Object

bioexcel.eu

Workflow description (wfdesc)

http://purl.org/wf4ever/wfdesc#

Page 10: 2017-11-03 Provenance and Research Object

bioexcel.eu

Workflow run provenance (wfprov)

http://purl.org/wf4ever/wfprov#

Page 11: 2017-11-03 Provenance and Research Object

bioexcel.eu

Workflow Run Bundle

output/A.txt

output/C.jpg

output/B/

intermediates/

1.txt2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

input/X.txtworkflow

URI references

attribution

executionenvironment

ZIP folder structure (RO Bundle)

mimetype

application/vnd.wf4ever.robundle+zip

.ro/manifest.json

https://doi.org/10.5281/zenodo.51314

workflowrun.prov.ttl

Page 12: 2017-11-03 Provenance and Research Object

bioexcel.euhttps://doi.org/10.1016/j.websem.2015.01.003

application/vnd.wf4ever.robundle+zip

Research Object Bundlehttp://www.researchobject.org/

Page 13: 2017-11-03 Provenance and Research Object

bioexcel.eu

A Research Object bundles and relates digital resources of a scientific experiment/investigation +

context

Data used and results produced in experimental study

Methods employed to produce and analyse that data

Provenance and settings for the experiments

People involved in the investigation

Annotations about these resources, to improve understanding and

interpretation

Page 14: 2017-11-03 Provenance and Research Object

bioexcel.eu

Standards-based metadata framework for bundling embedded and referenced resources with context

Citable Reproducible Packaging

researchobject.org

Page 15: 2017-11-03 Provenance and Research Object

bioexcel.eu

Systems Biology Research Objects exchange, portability and maintenance

components packaged into

various containers

ISA-TABchecksum

Page 16: 2017-11-03 Provenance and Research Object

bioexcel.eu

Download as a Research Object Bundle

Snapshots evolving CWL files in GitHub

Permalink to snapshot the workflow identifier for RO

Common Workflow Language Viewer

CWL files packaged in a RO CWL RO + added richness

Lift out parts into the manifest

Page 17: 2017-11-03 Provenance and Research Object

bioexcel.eu

Artists Impression

Page 18: 2017-11-03 Provenance and Research Object

bioexcel.eu

https://osf.io/h59uh/ https://doi.org/10.1101/191783

Page 19: 2017-11-03 Provenance and Research Object

bioexcel.eu

https://doi.org/10.1101/191783

identifiers.org

Page 20: 2017-11-03 Provenance and Research Object

bioexcel.eu

identifiers.org

PROV

JSON

https://doi.org/10.1109/BigData.2016.7840618

manifest.json

Page 21: 2017-11-03 Provenance and Research Object

bioexcel.eu

Provenance from cwltoolFarah Z Khan:

Modify cwltool reference implementation

to capture provenance

Generates Bag-It Research Object

Mints identifiers for data and run

Capture intermediate values

Workflow activities as PROV

wfdesc, OPMW, ProvONE

http://doi.org/10.7490/f1000research.1114781.1

Page 22: 2017-11-03 Provenance and Research Object

Partners Funding

bioexcel.eu

Acknowledgements

22

Farah Z Khan

Carole Goble

Michael R. Crusoe

Apache Taverna

BioExcel

Common Workflow Language

Research Object

W3C PROV WG

Page 23: 2017-11-03 Provenance and Research Object

Partners Funding

bioexcel.eu

https://www.slideshare.net/StianSoilandReyes/