Download - 2017-11-03 Provenance and Research Object
![Page 1: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/1.jpg)
Partners Funding
bioexcel.eu
Provenance and Research Object
1
Stian Soiland-Reyes
eScience Lab, The University of Manchester
2017-11-03, Aix-en-Provence
CESAB workshop: Reproducible Workflows
orcid.org/0000-0001-9842-9718 @soilandreyes
This work is licensed under aCreative Commons Attribution 4.0 International License.
![Page 3: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/3.jpg)
bioexcel.eu
https://view.commonwl.org/
http://doi.org/10.7490/f1000research.1114375.1
![Page 4: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/4.jpg)
bioexcel.eu
Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
http://www.w3.org/TR/prov-overview/
Core PROV model
Entity – A “thing” in the worldDocument, Excel file, database row, molecule, LEGO structure, house, …
Activity – Something that happened Usually defined start/end time May use and generate entities
Agent – Someone/something Participating in activitiesPerson, SoftwareAgent, Organization
Key principles:Provenance statements point backwards in timeAny PROV document is one particular view on historyMore than one entity can describe same “thing”
![Page 5: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/5.jpg)
bioexcel.eu
AttributionWho collected this sample? Who helped?
Which lab performed the sequencing?
Who did the data analysis?
Who wrote the analysis workflow?
Who made the data set used by analysis?
Who curated the results?
AliceThe lab
Data
wasAttributedTo
actedOnBehalfOf
Why do I need this?i. To be recognized for my workii. Who should I give credits to?iii. Who should I complain to?iv. Can I trust them?v. Who should I make friends with?
![Page 6: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/6.jpg)
bioexcel.eu
Derivation
Which sample was this metagenome sequenced from?
Which meta-genomes was this sequence extracted from?
Which sequence was the basis for the results?
What is the previous revision of the new results?
wasDerivedFrom
wasQuotedFrom
Sequence
New results
wasDerivedFrom
Sample
Meta -genome
Old results
wasRevisionOf
wasInfluencedBy
Why do I need this?i. To verify consistency (did I use
the correct sequence?)ii. To find the latest revisioniii. To backtrack where a diversion
appeared after a changeiv. To credit work I depend onv. Auditing and defence for
peer review
![Page 7: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/7.jpg)
bioexcel.eu
Activities
What happened? When? Who?
What was used and generated?
Why was this workflow started?
Which workflow ran? Where?
used
wasGeneratedBy
wasStartedAt
"2012-06-21"
Metagenome
Sample
wasAssociatedWith
Workflow server
wasInformedBy
wasStartedBy
Workflow run
wasGeneratedBy
Results
Sequencing
wasAssociatedWith
Alice
hadPlan
Workflow definition
hadRole
Lab technician
Results
Why do I need this?i. To see which analysis was performedii. To find out who did whatiii. What was the metagenome
used for?iv. To understand the whole process
“make me a Methods section”v. To track down inconsistencies
![Page 8: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/8.jpg)
bioexcel.eu
Input ports
Processors
Output ports
Workflow
Typical (?) workflow structure
Data links
http://taverna.incubator.apache.org/
![Page 9: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/9.jpg)
bioexcel.eu
Workflow description (wfdesc)
http://purl.org/wf4ever/wfdesc#
![Page 10: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/10.jpg)
bioexcel.eu
Workflow run provenance (wfprov)
http://purl.org/wf4ever/wfprov#
![Page 11: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/11.jpg)
bioexcel.eu
Workflow Run Bundle
output/A.txt
output/C.jpg
output/B/
intermediates/
1.txt2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
input/X.txtworkflow
URI references
attribution
executionenvironment
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle+zip
.ro/manifest.json
https://doi.org/10.5281/zenodo.51314
workflowrun.prov.ttl
![Page 12: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/12.jpg)
bioexcel.euhttps://doi.org/10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Research Object Bundlehttp://www.researchobject.org/
![Page 13: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/13.jpg)
bioexcel.eu
A Research Object bundles and relates digital resources of a scientific experiment/investigation +
context
Data used and results produced in experimental study
Methods employed to produce and analyse that data
Provenance and settings for the experiments
People involved in the investigation
Annotations about these resources, to improve understanding and
interpretation
![Page 14: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/14.jpg)
bioexcel.eu
Standards-based metadata framework for bundling embedded and referenced resources with context
Citable Reproducible Packaging
researchobject.org
![Page 15: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/15.jpg)
bioexcel.eu
Systems Biology Research Objects exchange, portability and maintenance
components packaged into
various containers
ISA-TABchecksum
![Page 16: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/16.jpg)
bioexcel.eu
Download as a Research Object Bundle
Snapshots evolving CWL files in GitHub
Permalink to snapshot the workflow identifier for RO
Common Workflow Language Viewer
CWL files packaged in a RO CWL RO + added richness
Lift out parts into the manifest
![Page 17: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/17.jpg)
bioexcel.eu
Artists Impression
![Page 18: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/18.jpg)
bioexcel.eu
https://osf.io/h59uh/ https://doi.org/10.1101/191783
![Page 20: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/20.jpg)
bioexcel.eu
identifiers.org
PROV
JSON
https://doi.org/10.1109/BigData.2016.7840618
manifest.json
![Page 21: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/21.jpg)
bioexcel.eu
Provenance from cwltoolFarah Z Khan:
Modify cwltool reference implementation
to capture provenance
Generates Bag-It Research Object
Mints identifiers for data and run
Capture intermediate values
Workflow activities as PROV
wfdesc, OPMW, ProvONE
http://doi.org/10.7490/f1000research.1114781.1
![Page 22: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/22.jpg)
Partners Funding
bioexcel.eu
Acknowledgements
22
Farah Z Khan
Carole Goble
Michael R. Crusoe
Apache Taverna
BioExcel
Common Workflow Language
Research Object
W3C PROV WG
![Page 23: 2017-11-03 Provenance and Research Object](https://reader033.vdocuments.site/reader033/viewer/2022051710/5a6deb6b7f8b9afc578b53ed/html5/thumbnails/23.jpg)
Partners Funding
bioexcel.eu
https://www.slideshare.net/StianSoilandReyes/