© 2006 open grid forum workflow management research group - wfm-rg q chairs: ian taylor and ewa...

19
© 2006 Open Grid Forum Workflow Management Research Group - WFM-RG q Chairs: Ian Taylor and Ewa Deelman Secretaries: Andrew Harrison and Matthew Shields GridNet2 Activities for WFM Research Group Matthew Shields 18th October 2007

Upload: jayson-boyd

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

© 2006 Open Grid Forum

Workflow Management Research Group - WFM-RG

qChairs: Ian Taylor and Ewa Deelman

Secretaries: Andrew Harrison and Matthew Shields

GridNet2 Activities for WFM Research Group Matthew Shields

18th October 2007

2© 2006 Open Grid Forum

Group Focus

• The Group was pretty quiet OGF 17, 18 and 19

• Taylor became co-chair in April 2007 - Shields and Harrison became secretaries

• New focus for the group based on use-case gathering in two specific areas:• Workflow Sharing (OGF 20)• Workflow Interoperability (OGF 21)

3© 2006 Open Grid Forum

Group New Focus

• There’s a number of options for standards• Standardise scientific workflow somehow

• Or accept there are a number of co-existing workflow systems

• Encourage reuse• Encourage sharing• What interfaces are needed for this?• Are there use cases for sharing and

interoperability?

There are many Workflow systems…

• 27 Chapters - from applications and environments• BPEL, Taverna, Triana, Pergasus, Kepler, P-Grade, Sedna, ICENI, Java

CoG Workflow, Condor, ASKALON, Swift, Petrinets, and so on …• Successfully used - choice depends on requirements and politics :)

5© 2006 Open Grid Forum

Motivation

• Focus• Accept co-existing workflow

representations/ environments• How do we share/reuse workflows?• Focus on the scientist performing the

experiment• How can sharing help him/her

• Workflow Interoperability• Do we need this - use cases?• Workflow embedding?

6© 2006 Open Grid Forum

Interoperability Levels

• Enactment level• Triana, Kepler etc.

• Representation level• BPEL, SCUFL etc.

• Data level• Files, XML Schema etc.

• Metadata level• Provenance, data description etc.

• Community level• cross-domain algorithms, results sharing etc.

7© 2006 Open Grid Forum

Enactment Level

• Workflow enactment engines are complex. Usually have:• Their own component architecture.• Their own flow mechanisms

• e.g dataflow, control flow, mixture of the two.

• Their own history!• Many person hours• Specialisms in certain domains• Designers/developers with particular research interests.

• Usually tightly coupled to their chosen workflow representation.

8© 2006 Open Grid Forum

Representation Level

• Workflow languages for distributed computing differ, particularly when it comes to control structures

• e.g. ASKALAN’s Abstract Grid Workflow Language (AGWL) supports if, forEach, while.

• Triana taskgraphs are really just component dependency graphs

• They have different levels of abstraction• And therefore are coupled to their target environment in different ways• This is not bad - it means some languages are better suited to certain

environments, other are not.

• The tight coupling between enactment engine and workflow representation means

• changing the enactment engine if the representation changes.• Homogenising existing systems - destruction of history and domain and

environment specific strengths

9© 2006 Open Grid Forum

Data Level

• Data level interoperability allows workflows to be treated as black boxes• What does it consume?

• What does it spit out?

• Data level is not Service level• How do I get it consume and spit out?

• Data level is the pivot between the other levels:• Once you have interoperable data you can:

• Represent and enact a workflow in your own favorite way• You can contextualize the data with metadata, e.g provenance• The data and metadata, can be shared in a community.

10© 2006 Open Grid Forum

Workflow Interoperability

• Enactment level

• Representation level

• Data level

• Metadata level

• Community level

Private

Public

11© 2006 Open Grid Forum

More Use Cases in The Area

• In OGF 20, there were 7 talks focused on sharing workflows

• In OGF 21, there are 4 talks on interoperability - workflow embedding

• Recently, there was an NSF workshop on interoperability

• We provide a summary of these ideas/ examples/workshops here:

12© 2006 Open Grid Forum

OGF 20 - Manchester

• Standards work from WfMC, Oasis, OMG• Gateways -> shared definition (XPDL) -> protocol

compatibility (Wf-XML/ASAP)

• Sharing abstract vs concrete, taskgraph vs services

• Data intensive workflows, share optimisation information

• Sharing through social network• Shibboleth extensions for workflow and service

security• Common model for securing service across VO

• QoS parameters across federated providers

13© 2006 Open Grid Forum

OGF 21 - Seattle

• NSF/Mellon Workshop on Scientific and Scholarly Workflows

• Users don’t always want interoperability• Academic exercise?

• Users do need to know capabilities of systems• Challenges

• Fault tolerance, Parallelism• Long running workflows

• Driving forces in interoperability• Scientific use case, hybrid coupled model• Author in one system, execute in another

14© 2006 Open Grid Forum

Workflow Interoperability through data interoperability - Our perspective

• Focus on sharing data• Make getting and sending it as simple as possible• Make the data available in as flexible way as possible• We are exploring RESTful approaches:

• Expose and provide access to data simply - CRUD• Makes no demands on how the data is interpreted

• Data is just ‘stuff’ at an address• Does not enforce typing data (like WS interfaces)• I can choose whether I want to perceive the data as a stream, a file, a

programming language object etc.

• Good at binary data• Atom feed format and publishing protocol allows more complex

interactions beyond simple request/response pairs• E.g. tracking job status through time

15© 2006 Open Grid Forum

Sharing and interoperability - Cardiff’s contributions

• Workflows Hosted in Portals (WHIP) is addressing both sharing and interoperability themes

16© 2006 Open Grid Forum

WHIP - Sharing

• Extending myExperiment to support:• Exposing Triana workflows

• Launching Triana locally

• Passing workflows to local Triana

• Uploading workflows to the server from local Triana

• Integrating WHIP archiving format• enables upload/download of compound objects (workflow

description, executable code, metadata)

• Enables signing of workflow archives using X-509 certificates.

17© 2006 Open Grid Forum

WHIP - Interoperability

• Use-case:• Kepler want to use Triana data mining workflow tools from

within the Kepler environment.• These components are part of larger workflows that are

‘native’ to Kepler.

• We are looking at embedding workflows as a solution.• Does not impose alien workflow representations onto

enactment engines.• Uses the WHIP archiving format.• Using RESTful approach e.g. Atom syndication format and

publishing protocol for sending and retrieving data and job/workflow status.

18© 2006 Open Grid Forum

Use Cases @ Cardiff

• Sharing• There are a number of projects that are looking at sharing workflows

• E.g. myExperiment• At Cardiff we have WHIP, Music Information Retrieval in Triana

(DART), Omer (CATNETS, FAEHIM, The Provenance Project)

• Interoperability• At Cardiff we have identified many levels of interoperability for

Triana, for example:• Workflow embedding:

• looking into Kepler-Triana interoperabilityKepler wants to make use of Triana’s data mining tools - DMG, FAEHIM

(Rana)Reuse existing work by embedding Triana rather than re-inventing the wheel

• Graphical Workflow Editing:• Triana Pegasus integration

Triana used to graphically edit DAGMan workflows. Current and on-going Might extend to execute Pegasus workflows also

19© 2006 Open Grid Forum

WFM-RG

• WFM-RG has started a research document • Shields editor to gather more use cases

from the community• Send us yours !

• Current on-going activity for this group• Email use cases to: [email protected]