© 2006 open grid forum workflow management research group - wfm-rg q chairs: ian taylor and ewa...
TRANSCRIPT
© 2006 Open Grid Forum
Workflow Management Research Group - WFM-RG
qChairs: Ian Taylor and Ewa Deelman
Secretaries: Andrew Harrison and Matthew Shields
GridNet2 Activities for WFM Research Group Matthew Shields
18th October 2007
2© 2006 Open Grid Forum
Group Focus
• The Group was pretty quiet OGF 17, 18 and 19
• Taylor became co-chair in April 2007 - Shields and Harrison became secretaries
• New focus for the group based on use-case gathering in two specific areas:• Workflow Sharing (OGF 20)• Workflow Interoperability (OGF 21)
3© 2006 Open Grid Forum
Group New Focus
• There’s a number of options for standards• Standardise scientific workflow somehow
• Or accept there are a number of co-existing workflow systems
• Encourage reuse• Encourage sharing• What interfaces are needed for this?• Are there use cases for sharing and
interoperability?
There are many Workflow systems…
• 27 Chapters - from applications and environments• BPEL, Taverna, Triana, Pergasus, Kepler, P-Grade, Sedna, ICENI, Java
CoG Workflow, Condor, ASKALON, Swift, Petrinets, and so on …• Successfully used - choice depends on requirements and politics :)
5© 2006 Open Grid Forum
Motivation
• Focus• Accept co-existing workflow
representations/ environments• How do we share/reuse workflows?• Focus on the scientist performing the
experiment• How can sharing help him/her
• Workflow Interoperability• Do we need this - use cases?• Workflow embedding?
6© 2006 Open Grid Forum
Interoperability Levels
• Enactment level• Triana, Kepler etc.
• Representation level• BPEL, SCUFL etc.
• Data level• Files, XML Schema etc.
• Metadata level• Provenance, data description etc.
• Community level• cross-domain algorithms, results sharing etc.
7© 2006 Open Grid Forum
Enactment Level
• Workflow enactment engines are complex. Usually have:• Their own component architecture.• Their own flow mechanisms
• e.g dataflow, control flow, mixture of the two.
• Their own history!• Many person hours• Specialisms in certain domains• Designers/developers with particular research interests.
• Usually tightly coupled to their chosen workflow representation.
8© 2006 Open Grid Forum
Representation Level
• Workflow languages for distributed computing differ, particularly when it comes to control structures
• e.g. ASKALAN’s Abstract Grid Workflow Language (AGWL) supports if, forEach, while.
• Triana taskgraphs are really just component dependency graphs
• They have different levels of abstraction• And therefore are coupled to their target environment in different ways• This is not bad - it means some languages are better suited to certain
environments, other are not.
• The tight coupling between enactment engine and workflow representation means
• changing the enactment engine if the representation changes.• Homogenising existing systems - destruction of history and domain and
environment specific strengths
9© 2006 Open Grid Forum
Data Level
• Data level interoperability allows workflows to be treated as black boxes• What does it consume?
• What does it spit out?
• Data level is not Service level• How do I get it consume and spit out?
• Data level is the pivot between the other levels:• Once you have interoperable data you can:
• Represent and enact a workflow in your own favorite way• You can contextualize the data with metadata, e.g provenance• The data and metadata, can be shared in a community.
10© 2006 Open Grid Forum
Workflow Interoperability
• Enactment level
• Representation level
• Data level
• Metadata level
• Community level
Private
Public
11© 2006 Open Grid Forum
More Use Cases in The Area
• In OGF 20, there were 7 talks focused on sharing workflows
• In OGF 21, there are 4 talks on interoperability - workflow embedding
• Recently, there was an NSF workshop on interoperability
• We provide a summary of these ideas/ examples/workshops here:
12© 2006 Open Grid Forum
OGF 20 - Manchester
• Standards work from WfMC, Oasis, OMG• Gateways -> shared definition (XPDL) -> protocol
compatibility (Wf-XML/ASAP)
• Sharing abstract vs concrete, taskgraph vs services
• Data intensive workflows, share optimisation information
• Sharing through social network• Shibboleth extensions for workflow and service
security• Common model for securing service across VO
• QoS parameters across federated providers
13© 2006 Open Grid Forum
OGF 21 - Seattle
• NSF/Mellon Workshop on Scientific and Scholarly Workflows
• Users don’t always want interoperability• Academic exercise?
• Users do need to know capabilities of systems• Challenges
• Fault tolerance, Parallelism• Long running workflows
• Driving forces in interoperability• Scientific use case, hybrid coupled model• Author in one system, execute in another
14© 2006 Open Grid Forum
Workflow Interoperability through data interoperability - Our perspective
• Focus on sharing data• Make getting and sending it as simple as possible• Make the data available in as flexible way as possible• We are exploring RESTful approaches:
• Expose and provide access to data simply - CRUD• Makes no demands on how the data is interpreted
• Data is just ‘stuff’ at an address• Does not enforce typing data (like WS interfaces)• I can choose whether I want to perceive the data as a stream, a file, a
programming language object etc.
• Good at binary data• Atom feed format and publishing protocol allows more complex
interactions beyond simple request/response pairs• E.g. tracking job status through time
15© 2006 Open Grid Forum
Sharing and interoperability - Cardiff’s contributions
• Workflows Hosted in Portals (WHIP) is addressing both sharing and interoperability themes
16© 2006 Open Grid Forum
WHIP - Sharing
• Extending myExperiment to support:• Exposing Triana workflows
• Launching Triana locally
• Passing workflows to local Triana
• Uploading workflows to the server from local Triana
• Integrating WHIP archiving format• enables upload/download of compound objects (workflow
description, executable code, metadata)
• Enables signing of workflow archives using X-509 certificates.
17© 2006 Open Grid Forum
WHIP - Interoperability
• Use-case:• Kepler want to use Triana data mining workflow tools from
within the Kepler environment.• These components are part of larger workflows that are
‘native’ to Kepler.
• We are looking at embedding workflows as a solution.• Does not impose alien workflow representations onto
enactment engines.• Uses the WHIP archiving format.• Using RESTful approach e.g. Atom syndication format and
publishing protocol for sending and retrieving data and job/workflow status.
18© 2006 Open Grid Forum
Use Cases @ Cardiff
• Sharing• There are a number of projects that are looking at sharing workflows
• E.g. myExperiment• At Cardiff we have WHIP, Music Information Retrieval in Triana
(DART), Omer (CATNETS, FAEHIM, The Provenance Project)
• Interoperability• At Cardiff we have identified many levels of interoperability for
Triana, for example:• Workflow embedding:
• looking into Kepler-Triana interoperabilityKepler wants to make use of Triana’s data mining tools - DMG, FAEHIM
(Rana)Reuse existing work by embedding Triana rather than re-inventing the wheel
• Graphical Workflow Editing:• Triana Pegasus integration
Triana used to graphically edit DAGMan workflows. Current and on-going Might extend to execute Pegasus workflows also
19© 2006 Open Grid Forum
WFM-RG
• WFM-RG has started a research document • Shields editor to gather more use cases
from the community• Send us yours !
• Current on-going activity for this group• Email use cases to: [email protected]