2017-11-03 scientific workflow systems
TRANSCRIPT
![Page 1: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/1.jpg)
Partners Funding
bioexcel.eu
Scientific Workflow Systems
1
Stian Soiland-Reyes
eScience Lab, The University of Manchester
2017-11-03, Aix-en-Provence
CESAB workshop: Reproducible Workflows
orcid.org/0000-0001-9842-9718 @soilandreyes
This work is licensed under aCreative Commons Attribution 4.0 International License.
![Page 2: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/2.jpg)
bioexcel.eu
What is a Workflow?
Orchestrating computational tasks
Managing the control and data flow
Homogeneous or heterogeneous tasks:– Local / remote
– Own / third party
– White, grey or black boxes
– Reliable / fragile
– Reserved / dynamic
– Various underpinning infrastructure
– Various access controls
BioExcel: Biomolecular recognition
![Page 3: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/3.jpg)
bioexcel.eu
Not on the agenda: Business workflows
Control flow of who has responsibility for what
BPM
Business workflows + computational workflows
IBISBA
3
![Page 4: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/4.jpg)
bioexcel.eu
Why use workflows?Automation– Automate computational aspects
– Repetitive pipelines, sweep campaigns
Scaling – compute cycles– Make use of computational infrastructure &
handle large data
Abstraction – people cycles– Shield complexity and incompatibilities
– Report, re-use, evolve, share, compare
– Repeat –Tweak - Repeat
– First class commodities
Provenance - reporting– Capture, report and utilize log and data lineage
auto-documentation
– Traceable evolution, audit, transparency
– Compare
Findable
Accessible
Interoperable
Reusable
(Reproducible)
4 Adapted from Bertram Ludäscher at WORKS2015 https://www.slideshare.net/ludaesch/works-2015provenancemileage
![Page 5: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/5.jpg)
bioexcel.eu
The humble Makefile
5
https://github.com/vak/makefile2dot
![Page 6: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/6.jpg)
bioexcel.eu
Laser Interferometer Gravitational-Wave ObservatoryFirst detection of gravitational waves from colliding black holes
https://pegasus.isi.edu/2016/02/11/pegasus-powers-ligo-gravitational-waves-detection-analysis/
https://pegasus.isi.edu/
![Page 7: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/7.jpg)
bioexcel.eu
Workflow Environment Ecosystem
7
![Page 8: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/8.jpg)
bioexcel.euhttps://s.apache.org/existing-workflow-systems
![Page 10: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/10.jpg)
bioexcel.eu
https://www.knime.org/
https://www.openphacts.org/
Pharmacological queriestarget, compound and pathway data
https://doi.org/10.1371/journal.pone.0115460
http://www.myexperiment.org/workflows/4292
![Page 12: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/12.jpg)
bioexcel.eu
Stop Press!GUIs not essential!
GUI: Canvas, drag-drop blocks, arrows,
run button, data visualization
Script: Textual, command line, view data
externally. Script easily run from other apps.
Scripts can be workflows!
Workflow systems ⇆ Scripts
Scripts on ASAP meter:
Automation: ★ ★ ★ ★ ★
Scaling: ★ ★
Abstraction: ★
Provenance: ★ ★
![Page 13: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/13.jpg)
bioexcel.eu
https://www.nextflow.io/
Script-like, define flow as channels
Streaming
Automatic Parallelism
Checkpoints
Virtualization and packaging
Portable
Reproducibility
![Page 14: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/14.jpg)
bioexcel.eu
Snakemake
MakeFile + Python ⇝SnakeMake
Filename patterns
Shell commands
Inline Python, R
Scalable to grid/cloud
14
https://snakemake.readthedocs.io/
![Page 15: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/15.jpg)
bioexcel.eu
YesWorkflow
Declare workflow steps as
#annotations in existing scripts
Graphical visualization of workflow
15
http://yesworkflow.org/
![Page 16: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/16.jpg)
bioexcel.eu
https://github.com/chapmanb/bcbio-
nextgen
Distributed workflows for
Next-Gen Sequencing
analysis
Domain-specific language
Focus on parameters,
algorithms
Workflow fixed –
no command lines!
https://bcbio-nextgen.readthedocs.org
![Page 17: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/17.jpg)
bioexcel.eu
http://commonwl.org/
Workflow interoperability
Common workflow format
Community based standards effort
Designed for clusters & clouds
Use containers (e.g. Docker)
Textual YAML files
(GUIs available)
Workflow: Steps with data dependencies
Step: command line or inline scripts
Scatter/gather on steps
Rich annotations
![Page 18: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/18.jpg)
bioexcel.eu
http://www.commonwl.org/
![Page 19: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/19.jpg)
bioexcel.eu
ContainersLinux Container technology
..light-weight "virtual" virtual machine
A container is started from a image
Images downloaded from Docker Hub
Dockerfile: Layer-based recipe
Philosophy: One service, one
image → microservices
Cloud's best friend: scalable, reproducible,
customizable
19
![Page 20: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/20.jpg)
bioexcel.eu
Publish your own
container images
20
https://hub.docker.com/r/openphacts/
Dockerfile
![Page 22: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/22.jpg)
bioexcel.eu
https://view.commonwl.org/
http://doi.org/10.7490/f1000research.1114375.1
![Page 23: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/23.jpg)
bioexcel.eu
Running workflows,tracking provenance
![Page 24: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/24.jpg)
bioexcel.eu
Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
http://www.w3.org/TR/prov-overview/
ProvenanceW3C standard: PROV
But multiple formats
Multiple styles
Multiple extensions
Best practice for Workflow Provenance?
wfprov (Research Object, Taverna)OPMW/P-Plan (WINGS)ProvONE (DataOne)
https://w3id.org/ro/2016-01-28/wfprov/http://www.opmw.orghttp://vcvcomputing.com/provone/provone.html
![Page 25: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/25.jpg)
bioexcel.eu
https://twitter.com/ianholmes/status/288689712636493824
![Page 26: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/26.jpg)
bioexcel.euhttps://doi.org/10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Research Object Bundlehttp://www.researchobject.org/
![Page 27: 2017-11-03 Scientific Workflow systems](https://reader031.vdocuments.site/reader031/viewer/2022022415/5a6deb6b7f8b9afc578b53eb/html5/thumbnails/27.jpg)
Partners Funding
bioexcel.eu
Acknowledgements
27
Carole Goble
Michael R. Crusoe
Apache Taverna
BioExcel
Common Workflow Language
Research Object