flexible scientific workflows using dynamic embedding
DESCRIPTION
Flexible Scientific Workflows Using Dynamic Embedding. Anne H.H. Ngu, Nicholas Haasch Terence Critchlow Shawn Bowers, Timothy McPhilips, Bertram Ludaescher. Outline. Scientific Workflow Problems with static scientific workflow Frame actor Dynamic Embedding - PowerPoint PPT PresentationTRANSCRIPT
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Flexible Scientific Workflows Using Dynamic Embedding
Anne H.H. Ngu, Nicholas HaaschTerence Critchlow
Shawn Bowers, Timothy McPhilips, Bertram Ludaescher
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Outline
Scientific Workflow Problems with static scientific workflow Frame actor Dynamic Embedding Implementation of Dynamic Embedding TSI case study Conclusion
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Scientific Workflows
A model of the way a scientist works with their data and tools Mentally coordinate data export, import, analysis and visualization via various
software tools
Goals: Design Automation Component reuse
… make data analysis and
management tasks easier
for the scientist!
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
SPA/Kepler Scientific Workflow System
Scientific Process Automation (SPA) Modeling Workflows using actor-
oriented framework Executing and Monitoring
Workflows Built on top of Ptolemy II
(Berkeley) Graphical User Interface
Similar to a charting program (Visio)
• Drag-and-Drop Components• Connect components
Execute workflows Monitor Execution Ptolemy II
SPA
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Actor-Oriented Modeling
Actors single component or task well-defined interface (signature) given input data, produces output data
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Actor-Oriented Modeling
Parameter Input that is configured statically
• User changes at design Produces data
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Actor-Oriented Modeling
Sub-Workflows (aka Composite Actors) composite actors “wrap” sub-workflows like actors, have signatures (i/o ports of sub-workflow) hierarchical workflows (arbitrary nesting levels) for abstraction versus ‘Atomic Actors'
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Actor-Oriented Modeling
Directors define the execution semantics of workflow graphs schedule and execute workflow graphs sub-workflows may be governed by different directors Examples: Synchronous Data-Flow (SDF), Process Networks (PN),
Discrete Event (DE), Finite State Machine (FSM)
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Problems in current SPA/Kepler modeling framework
Workflow is static and must be completely specified before orchestration
Specific tools used in the workflow must be picked at design time (It
may not be picking the best tool) All alternative tools
must be enumerated exhaustively (resulted in
complex workflow)
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Problems in current SPA/Kepler modeling framework (cont.)
Our work aims to provide an abstraction of a tool, a resource, or an algorithm
Specific tool or resource resource is selected during execution
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Terascale Supernova Initiative (TSI) Workflow
Submit JobTransfer Files
Process Files
Y-A
xis
X-Axis
Visualize
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Different TSI Workflows
Each containing a Submit Job Actor
A
B C
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
TSI Submit Job Actors…when we look at the sub-workflows
Each workflow does job submission in a different way
Each containing a remote execution
AC
B
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Goals
There exists actors (atomic and composite) that perform a similar task Submit Job Transfer Files Process Files RemoteExection
• SSH2Exec
• SSHWrapper
• InvokeSubmitJob Can we provide an abstraction for encapsulating different implementations of a
specific task that can be reused across different workflows? Can we execute the workflow flexibly?
Choosing a specific implementation depending on runtime condition
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Requirements
Select and execute actor based on run time conditions Execute with data process networks
• Built in capability for streaming data• Built in concurrency
Selected actors instantiated on demand Reusable actor can be nested Actors are usable by scientist
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Frames
Actors are concrete Correspond to particular
implementations Frames are abstract
Placeholder for actor / composite actor
input, output, and parameter ports An embedding occurs when a actor
is placed in a frame A refinement is an actor that can be
embedded in a frame
F
a
Embedding F[a]
F
a
RemoteExecutionFrame F
SSH Actor a
a
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
SSH2Exec
WebService
WebService
SSH2Exec
Static Embedding Frames
A refinement to a Frame is embedded during design Frames become concrete and cannot be reconfigured during
workflow execution
F
Design Time
WebService
SSH2Exec
Run Time
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Static Embedding Frames (cont.)
Execute with data process networks Refinement instantiated as needed Can be nested Actors are usable by scientist Select refinements based on run time conditions
SSH2Exec
F
Run Time
WebService
F
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Dynamic Embedding Frames
Refinement to frames are embedded during execution Frames are not concrete during workflow execution
SSH2Exec
WebService
WebService
SSH2Exec
F
WebService
SSH2Exec
Run Time
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Why Dynamic Embedding?
Select refinements based on run time conditions Execute with data process networks Refinement instantiated as needed Can be nested Actors are usable by scientist
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Implementation Of Dynamic Embedding
Construct a new workflow to execute the actor
Generates
Remote Execution FrameGenerated Workflow
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Dynamic Embedding Process
1. Wait for inputs to arrive to the Frame.
2. Select a refinement.
3. Transfer of input tokens from Frame to the refinement.
4. Select mappings Input Port Output Port Parameter
5. Constructs internal workflow.
6. Run internal workflow.
7. Transfer of output tokens from the refinement to the Frame.
Model Generated By Dynamic Frame
Remote Execution Frame
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
ModelReference Actor
A higher order actor that can execute a given model (workflow) through its input port.
It fits most of the requirement for dynamic embedding except: User must pre-construct the given model Output tokens from the given model are transferred only after
completion of the internal workflow.
Our implementation of dynamic embedding leverage the capability of ModelReference actor with two major improvements: The given model is constructed automatically Output tokens are transferred synchronously
Frame is thus implemented as a subclass of ModelReference actor with four additional components: SelectActor(), FrameSourceActor(); FrameSinkActor() and PortWiring()
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Implementing a Type of Frame
Subclass Frame and Implement Selection Process
• selectActor()
Configure Ports and Parameters • getIntputMappings()• getOutputMappings() • getParameterMappings()
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Selection Process: selectActor()
Implements selection policy Returns a refinement Refinement that is returned
gets automatically embedded
Remote ExecutionString selectedActor
selectActor(){ If(testWebService()) selectedActor = “webservice” return getWebServiceActor() else selectedActor = “ssh2exec” return getSSH2ExecActor()}
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Port Wiring
Transfer Token Frame Input PortActor Input Port Actor Output PortFrame Output Port
Expressed as list containing pairs of strings {“hostname”,”hostname”}
{“command”,”cmd”}
SSH2Exec
Ferrors
stdouterrors
outhostname
cmd
hostname
command
Remote Execution
String selectedActor
getInputPortMapping(){ if(selectedActor==“SSH2Exec”) return {{“hostname”,”hostname”} {“command”,”cmd”}} else if(selectedActor==“webservice”) return {{“hostname”,”url”} {“command”,”method”}}}
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
TSI-A Workflow
SubmitJobFrame
TSI-B Workflow
TSI Case Study
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Remote Execution Frame
SubmitJobFrameTSI-B subworkflow
TSI-A subworkflow
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Benefits
Select refinements based on run time conditions Execute with data process networks Refinement instantiated as needed Can be nested Actors are usable by scientist
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Limitations
Limitations Unable to type check internal workflow before execution There is overhead of creating an additional workflow to execute a
refinement Change in selection process requires recoding/recompiling Can not monitor internal workflow (Useful for debugging)
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Future Work
Semantic binding of Ports and Parameters Configurable selection criteria
Intelligent brokering Ptolemy Expression Language Python Perl …
Simplified refinement creation Caching of generated workflows Design time type checking of internal workflow
Ngu, Texas StatePtolemy Miniconference, February 13, 2007
Work performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48
UCRL-ABS-226047