“Workflow” in Data Access and Integration
An OGSA-DAI/DAIS Perspective
Mario Antonioletti
EPCC
e-Science Workflow Services - www.ogsadai.org.uk 2
Talk Overview
Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions
e-Science Workflow Services - www.ogsadai.org.uk 3
OGSA-DAI and DAIS GGF DAIS WG
Database Access and Integration Services Attempting to standardise interfaces based on OGSI
OGSA-DAI Aim to provide an implementation of DAIS Serve UK e-Science Community
OGSA-DAI and DAIS Currently not aligned
Data service interface in OGSA-DAI coarse grained Based on an earlier version of DAIS
Data service interface in DAIS currently fine grained Scope for more coarse grained interfaces
OGSA-DAI will realign DAIS once the latter stabilizes
e-Science Workflow Services - www.ogsadai.org.uk 4
OGSA-DAI Project Partners
Powered by ….
e-Science Workflow Services - www.ogsadai.org.uk 5
Data Resource
1. Provides access to a data resource.
Simple Data Service Scenario
Client Data Service
Data Resource
Data Resource2. May provide integration of several data resources.
e-Science Workflow Services - www.ogsadai.org.uk 6
Some Definitions
Data Resource An object that can source/sink data Currently databases in scope
Files and file systems may come in scope
Data Services Grid services Provides common interface to data resources Exposes some capabilities of a data resource
SQL Queries, XPath, BinX, …
Can also provide additional capabilities Transformations, Third party data delivery, etc …
e-Science Workflow Services - www.ogsadai.org.uk 7
Motivation Want common interfaces for:
Data access Data integration
As requests to data service may produce lots of data Want to minimise data movement
Hence encapsulate interactions with service Serialise multiple interactions into one interaction Abstract each interaction into an “activity” Data flows between activities Use a document mechanism to describe this
DAIS and OGSA-DAI Concerned with data flow Currently do not have control constructs
No looping, conditionals, splits, joins, …
e-Science Workflow Services - www.ogsadai.org.uk 8
Service Coordination Patterns
Client Data Service
1. Coordinate of activities
performed at one Data Service.
Data Service
2. Client choreographs a set of services to work together.
ServiceService
Service
… or a service mayorchestrate on behalf of the client.
3. Orchestration of services using a document directed to one service.4. Possibly interface with standard workflow languages, e.g. BPEL4WS, WSCI, …
e-Science Workflow Services - www.ogsadai.org.uk 9
Coordination Hierarchies
Service coordination may take place: Intra service
Document based
Inter services – application driven Choreographed/orchestrated by a client or service
Inter service – document driven Orchestration Ideally would look the same
as the intra service document based interface
Combined with other workflow languages
e-Science Workflow Services - www.ogsadai.org.uk 10
Intra Service Processing
Service processing described by a document Possible activities (OGSA-DAI perspective):
Statement SQL Query, XPath Query
Delivery Input data from third party Output data to a third party Deliver data in the response
Transformations XSL Transformations, compression
OGSA-DAI has produced a framework for this
e-Science Workflow Services - www.ogsadai.org.uk 11
Simple Example: no data flow
sqlQueryStatement
DeliverToURL
<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression></sqlQueryStatement>
<deliverToURL name="deliverOutput"> <toURL> ftp://anon:[email protected]/home </toURL> </deliverToURL>
e-Science Workflow Services - www.ogsadai.org.uk 12
Simple Example: with data flow
DeliverToURL
<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“output1"/></sqlQueryStatement>
<deliverToURL name="deliverOutput"> <fromLocal from=“output1"/> <toURL> ftp://anon:[email protected]/home </toURL></deliverToURL>
sqlQueryStatement
e-Science Workflow Services - www.ogsadai.org.uk 13
The Perform Document<?xml version="1.0" encoding="UTF-8"?>
<gridDataServicePerform
xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types
../../../../schema/ogsadai/xsd/activities/activities.xsd">
<documentation>
This example performs a simple select statement to retrieve
one row from the test database. The results are delivered
within the response document.
</documentation>
<sqlQueryStatement name="statement">
<expression>
select * from littleblackbook where id=10
</expression>
<resultSetStream name=“output"/>
</sqlQueryStatement>
<deliverToURL name="deliverOutput">
<fromLocal from=“output"/>
<toURL>ftp://anon:[email protected]/home</toURL>
</deliverToURL>
</gridDataServicePerform>
e-Science Workflow Services - www.ogsadai.org.uk 14
Predefined Building Blocks
sqlQueryStatement
sqlStoredProcedure
sqlUpdateStatement
sqlBulkLoadRowset
xPathStatement
xUpdateStatement
xQueryStatement
xmlResourceManagement
xmlCollectionManagement
relationalResourceManager
gzipCompression
zipArchive
xslTransform
inputStream
outputStream
DeliverFromURL
DeliverToURL
DeliverToGFTP
DeliverFromGFTP
DeliverToStream
DeliverFromGDT DeliverToGDT
e-Science Workflow Services - www.ogsadai.org.uk 15
Activities: positives
Simple sequence pattern Data-flow
Avoid multiple message exchanges Minimise data movement Extensible
XML Schema excerpt gives syntax Associate an implementation with activity Done at configuration
Allows optimisation Enactment engine can optimise interaction
e-Science Workflow Services - www.ogsadai.org.uk 16
Activities: negatives Incomplete syntax
Activity inputs and outputs are not typed No typing of data streams Possible issue in coming up with a sensible document
Activity implementation & XML schema loosely coupled Keeping activity and implementation in synch
Semantics are not specified Puts work load on the server
Workloads on the server may need to be managed Activities not exposed at the interface level
This may change in line with DAIS Perform document factored out from DAIS base specs
Standardisation to become a DAIS informational document Scope may be bigger than DAIS
e-Science Workflow Services - www.ogsadai.org.uk 17
Inter Service Application Defined "Workflow"
Services stitched together by an application Could be a client
Use the OGSA-DAI GridDataTransport (GDT) portType
Could be another service Distributed Query Processing (DQP)
Service configured separately Each performs its part in the workflow
e-Science Workflow Services - www.ogsadai.org.uk 18
Client Driven Scenario (aka poor man's data integration)
Client
Data Service
Data Service
<inputStream … /><sqlUpdateStatement>…</sqlUpdateStatement>
<sqlQueryStatement>…</sqlQueryStatement><deliverToGDT … />
GDT
Client creates Data Services.
e-Science Workflow Services - www.ogsadai.org.uk 19
Service Driven Scenario
Client
Query planning,compilation, scheduling,evaluation, partitioning
GDQS
GQES
GQES
GQES
Evaluate sub-queriesDistributed Query Processing
e-Science Workflow Services - www.ogsadai.org.uk 20
More Complex DQP Scenario
GFactory G Q ES F
GFactory G Q ES F
GFactory G Q ES F
N 2
N 1
N3
GC lie n tGG D S
GG D S
G D Q
G D T
G D Q S
N 0G D S
GFactory G Q ES F
N4
p erform (Q u ery)1
cre a te S e rv ice
cre a te S e rv ice2
cre ate S e rvi ce
2
2
GG D S G Q ES 2
G D T
GG D S G Q ES 3
G D T
GG D S G Q ES 1
G D T
GG D S G Q ES 1
G D T
p erform (Q u ery S u b p la n )
p erform (Q u ery S u b p la n )
perform(Q
uer ySu bpl an)
3
s eq u en t ial_ s can
red u ce (p r o tein ID ,s eq u en ce )
s eq u en t ial_ s can ( ter m = 8 3 7 2 )
red u ce (p r o tein ID )
h as h _ jo in(p .p r o tein ID = t.p r o tein ID )
3
o p erat io n _ callb la s t(p .s eq u en ce)
red u ce (p .p r o tein ID , b la s t)
o p erat io n _ callb la s t(p .s eq u en ce)
red u ce (p .p r o tein ID , b la s t)
3
W e b S e rvi ce s (B L A S T)
resu lts
resu lts
resu lts
4
1144
e-Science Workflow Services - www.ogsadai.org.uk 21
Application Driven "Workflow" Labour intensive
Client driven (service choreography) Restricted to small numbers of services
Need tooling Even then this is best done through other means
Service driven (service orchestration) DQP hides details There may be other examples …
Need to explore this space further Can probably accommodate these patterns in an
existing workflow language For more general data integration need:
Describe more sophisticated behaviour
e-Science Workflow Services - www.ogsadai.org.uk 22
Inter Service Document Coordination
Currently evolving Document describes:
Sequence of operations that may span multiple services
Single document includes enough information to: Run an expression on a source data service Deliver the results to a target data service Run and expression on the target data service
Informational document to be presented at GGF10
e-Science Workflow Services - www.ogsadai.org.uk 23
A Dataset Example
Client Data Service
RequestDataRequest.xsd<dataRequest> …</dataRequest>
RemoteRequiredTableDataAccessRecipe.xsd<dar> <gsh> … </gsh> <type> …</type> <dataSet>
… </dataSet></dar>
Data Service
e-Science Workflow Services - www.ogsadai.org.uk 24
Document Driven "Workflow"
Work in this area is tentative No implementations as yet
OGSA-DAI needs to see how it matures
Shows versatility Carries over some of the OGSA-DAI activity framework
Focused on data Can track provenance in the dataSet
Needs to be positioned against general workflow languages
e-Science Workflow Services - www.ogsadai.org.uk 25
Traditional Workflow OGSA-DAI has not explored this space … yet
May need such a framework to facilitate data integration Traditionally workflow:
Revolves around the execution of atomic activities Use a processing model, e.g. WfMC based
Akin to how people talk about service orchestration Want to use existing frameworks as far as possible
OGSA-DAI does not want to define its own workflow DAIS may come up with something
Clearly: Activity model can be used to implement a workflow Collecting use cases
e-Science Workflow Services - www.ogsadai.org.uk 26
Workflow Issues
OGSA-DAI needs to play to see what works Standards still evolving
IP rights: BPEL4WS
Royalty-free … ? WSCI
Royalty-free
Need workflow engines Tooling to construct workflow
Ptolemy II … Triana … ?
e-Science Workflow Services - www.ogsadai.org.uk 27
Summary & Conclusions Base standards in a state of flux
DAIS not settled down yet If you don't like what you see get involved and change it
Document based interface needs to be re-worked OGSA-DAI implemented simple "workflow" patterns
Successful for data access Shied away from real workflow Should try to use emerging standards if possible
Data integration will require workflow patterns Need to examine use cases
Positioning of OGSA-DAI Want it to be the leaves of your complex workflow graphs Wrap your data sources and sinks
Try OGSA-DAI and feedback!
e-Science Workflow Services - www.ogsadai.org.uk 28
Further information The OGSA-DAI Project Site:
http://www.ogsadai.org.uk The DAIS-WG site:
http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list
[email protected] General discussion on grid DAI matters
Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support [email protected]
OGSA-DAI training courses