a definition and analysis of the role of meta-workflows in workflow interoperability junaid arshad,...
TRANSCRIPT
A Definition and Analysis of the Role of Meta-workflows in Workflow
Interoperability
Junaid Arshad, Gabor Terstyanszky, Tamas Kiss, Noam WeingartenCenter for Parallel Computing, University of Westminster, London, UK
(J.Arshad, G.Z.Terstyanszky, T.Kiss, Weingan)@westminster.ac.uk
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
2
Scientific Workflows
• Complex computational experiments involving
• Analysis of large volumes of data• Use of distributed/high performance
computing infrastructures (DCIs)
• Individual tasks performing control and data analytics functions
Collect Data
Transform Data
Analytics
Data Store
Flush Outputs
External Aggregation
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
3
Meta-workflows
• Composed of one or more entities such as jobs and/or sub-workflows
• Re-use these existing entities as components
• Faster and more efficient development
Collect Data
Transform Data
Data Store
Flush Outputs
External Aggregation
Analytics
Join
Aggregate
results
Run algorith
m
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
4
Types of Workflows (1)• Single Workflow
• all the nodes of the workflow are individual jobs that are executed and managed by a single workflow system such as P-GRADE [1], Galaxy [2] or Taverna [3]
N1
CN2
CN1
N2
N3
Embedded WF
ce1
e1
e3
e2
Host Workflow System (WS1)
N1
N2
N3
e1
e3
e2
Host Workflow System (WS1)
5A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
N1
CN2
CN1
N2
N3
Embedded WF
ce1
e1
e3
e2
Host Workflow System (WS1)
N1
N2
N3
e1
e3
e2
Host Workflow System (WS1)
Single workflow Native meta-workflow
N1
CN2
CN1
N2N3
e1
e3
e2
Host Workflow System (WS1)
N4e4
CN3
CN2
CN1
WS3 WS2
ce1
ce1
ce2
Non-native meta-workflow
Types of Workflows (2)
• Native meta-workflow• the embedded sub-workflows are all from the host workflow system (WS1)
• Non-native meta-workflow• at least one of the embedded workflows is from external workflow systems
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
6
Non-native meta-workflows
• Two approaches have been established to execute non-native meta-workflows
• Coarse Grained Interoperability (CGI)• Host workflow engine and workflows are considered native whereas all other workflows
are considered non-native• The non-native workflows and workflow engines are managed as legacy applications
• Fine Grained Interoperability (FGI)• A workflow is considered to have two distinguished parts i.e. abstract and concrete• Abstract includes workflow graph and abstract I/O whereas concrete part contains
implementation details• FGI approach creates a IWIR bundle containing IWIR representation of abstract part and
the concrete part.
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
7
Meta-workflow Execution Approaches
Workflow Engine A
Workflow Repository
Submission Service
Workflow Engine B
DCIWF2WF1
WF3
Meta-workflowNative WF
Engine
Non-native workflow
Native WFNative WF
Non-native WF Engine
Coarse-grain Workflow Interoperability usage scenario Transformation of workflows for FGI approach
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
8
Formalizing CGI based Approach
• Based on existing formalization presented in [4]• Objective to improve the usability of existing meta-workflows by
highlighting the requirements of CGI based workflow interoperability approach
• Formalization achieved using the Z-notations
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
9
WFmeta =: {J1…Jk, WFnat1…WFnatl, WFnnt1,..WFnntm}, where Jnath - native job h = 1 . . . k
Wfnati - native workflow i = 1 . . . lWFnntj - non-native workflow j = 1 . . . m.
Existing formalization
Formal description of the CGI approach using Z
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
10
Types of CGI-based meta-workflow
• Focused on CGI-based meta-workflows• Aimed to facilitate workflow developers to design new workflows by
enabling them to comprehend with attributes and semantics of each type of workflow
• Following types of meta-workflows consideredi. Single job meta-workflowii. Linear multi-job meta-workflowiii. Parallel multi-job meta-workflow iv. Parameter sweep meta-workflow
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
11
Single job meta-workflow
• A job representing this workflow can be a simple native job, a native workflow or even a non-native workflow
Jn = {Jabs, Jcnf, Jcnr, Jennat}
where Jn : native job, Jabs : abstract description of the job, Jcnf : configuration of the job, Jcnr : implementation of the job Jennat : native workflow engine
WFn = {WFabs, WFcnr, WFcnf, WFenn}
where WFabs : abstract part of workflow WFcnr : concrete part of workflow WFcnf : workflow configuration WFenn : native workflow engine
WFnn = {TASKS, WFabs, WFcnr, WFcnf, WFennn}
where Jnn = {Jabs, Jcnf, Jcnr, Jennnat} Jnn ∈ TASKS WFnn : non-native workflow WFennn : non-native workflow engine
Single native job Single native workflow Single non-native workflow
1 2
Job0
Single Job Meta-Workflow
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
12
Linear multi-job meta-workflow
• A pipeline of multiple jobs in the native workflow system where any (or even all) of these jobs can be native jobs, native or non-native workflows
WFlmj = {J1…, Jm}={Jn1…, Jnk, WFn1, …,WFny, WFnn1, …,WFnnz} where Jj job of the linear multi-job meta-workflow, j = 1 … m Jni native job J∈ j, i = 1 … k WFnp native workflow J∈ j, p = 1… y WFnnt non-native workflow, J∈ j, t = 1… z
If Jx and Jy are two consecutive jobs and Jx is not the last job of the workflow, there must be an edge ei = (Pxo, Pyi) where Pxo : output port of job Jx Pyi : input port of job Jy
Job0
1 2 12 1 2
Job1 Job2
Linear multi-job meta-workflow
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
13
Parallel multi-job meta-workflow• A native workflow including parallel branches which can include either single jobs
or native and non-native workflows, i.e. composed of several single and/or linear multi-job meta-workflows
WFpmj = {J1…, Jm}={Jn1…, Jnk, WFn1, …,WFny, WFnn1, …,WFnnz} where Jj job of the parallel multi-job meta-workflow, j = 1 … m Jni native job J∈ j, i = 1 … k WFnp native workflow J∈ j, p = 1… y WFnnt non-native workflow, J∈ j, t = 1… z
A parallel multi-job meta-workflow must contain an edge that connects split and worker jobs es = (Pgo, Pwxi), where Pso : output port of the split job Jwx : worker job and x = 1 … k Pwxi : input port of the worker job Jwx
and, an edge em such that connects worker and merge jobs em = (Pwxo, Pmi), where Pwxo : output port of a worker job Jwx and x = 1 … k Pmi : input port of the merge job Jm and i = 1 … k
1
2
1
1 1
1
2
2
32
4
Job0
Job1
Job2 Job3
Job4
Parallel multi-job meta-workflow
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
14
Parameter-sweep meta-workflow• A parameter sweep workflow in the native workflow system that
includes one or more non-native workflows
1
1
1
1
2 2
2
Generator
Job2Job1
Collector
WFps = = {J1…, Jm}={Jn1…, Jnk, WFn1, …,WFny, WFnn1, …,WFnnz} where Jj job of the parallel sweep meta-workflow, j = 1 … m Jni native job J∈ j, i = 1 … k WFnp native workflow J∈ j, p = 1… y WFnnt non-native workflow, J∈ j, t = 1… z
Connections between the generator, collector and the worker jobs of the parameter sweep meta-workflow are described by two specific types of edges eg = (Pgo, Pwxi) where
Pgo : output port of the generator job Jg
Pwxi : input port of the worker job Jwx and x = 1 … kand ec = (Pwxo, Pci) where
Pwxo : output of a worker job Jwx and x = 1 … kPci : input port of collector job Jc and i = 1 … k Parameter-sweep meta-workflow
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
15
SHIWA Simulation Platform (1)• The simulation platform contains
• science gateway SHIWA Science Gateway• a submission service SHIWA Submission Service• a workflow repository SHIWA Repository• a data transfer service Data Avenue Service
• Enables data, infrastructure and workflow interoperability at both coarse- and fine-grained level
• Supports the whole workflow life-cycle addressing the challenges of• executing workflows of different workflow systems as non-native workflows• combining workflows of different workflow systems into meta-workflows• running these workflows on different DCIs
• The gateway is integrated with the WS-PGRADE Workflow System which is used as native workflow engine in the simulation platform
• Supports ASKALON, Galaxy, GWES, Kepler, MOTEUR, Pegasus, ProActive, Taverna, Triana
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
16
SHIWA Simulation Platform (2)
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
17
Meta-workflows in Heliophysics (1)• Simulating complex scenarios such as propagation of Coronal Mass
Ejection (CME)• Two types of meta-workflows used
• Linear multi-job meta-workflow• Investigation of the fastest CME in a given period of time using an embedded Taverna
workflow• Parameter sweep meta-workflow
• finds and propagates the fastest CME over a given time range• The WS-PGRADE parameter sweep extension of the Taverna meta-workflow performs the
same operation over many time ranges
• Experiments performed using meta-workflow support in SHIWA Simulation Platform
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
18
Linear multi-job meta-workflow for CME propagation Parameter sweep meta-workflow in WS-PGRADE
Meta-workflows in Heliophysics (2)
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
19
Conclusions and Further Ahead
• Meta-workflows represent an exciting but challenging prospect within the context of workflow execution and interoperability
• Formalization for workflow types and CGI based approach for workflow interoperability has been achieved
• Significant contribution to facilitate design and development of new and existing workflows and workflow engines
• An example use case from Heliophysics has been presented demonstrating the effectiveness of the efforts for broader scientific domains
• Envisage expanding the scope of formalization to fine grained workflow interoperability approach
A Definition and Analysis of the Role of Meta-workflows in Workflow Interoperability
20
References
• [1] A. Kertész, G. Sipos and P. Kacsuk; Brokering multi-grid workflows in the P-GRADE portal, in: Euro-Par 2006: Parallel Processing, vol. 4375, Springer, Berlin, 138–149, 2007.
• [2] Giardine, B., Riemer et al.; Galaxy: A platform for interactive large-scale genome analysis. Genome Research, 15(10):1451-5, 2005.
• [3] Oinn, T., Addis et al.; Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045-54, 2004
• [4] Terstyanszky, G. et al; Enabling Scientific Workflow Sharing Through Coarse Grained Interoperability in the Future Generation Com[puter Systems Vol 37, 46-59, 2014.