7th Biennial Ptolemy Miniconference
Berkeley, CAFebruary 13, 2007
Scheduling Data-Intensive Workflows
Tim H. Wong, Daniel Zinn, Bertram Ludäscher
(UC Davis)
2Ptolemy Miniconference 2007 Daniel Zinn
Outline
Problem motivation Assumptions Cost model Problem formalization Different “simplifications” and their complexity Prototypical Java implementation for Kepler Summary
3Ptolemy Miniconference 2007 Daniel Zinn
Motivation: Distributed Execution of Scientific Workflows
4Ptolemy Miniconference 2007 Daniel Zinn
Motivation: Distributed Execution of Scientific Workflows
Process a set of data on a set of machines
GOAL:Minimize WF-Execution time!Allocation Problem: Which actors are computed on which hosts?
5Ptolemy Miniconference 2007 Daniel Zinn
Assumptions
Arbitrary data size Arbitrary machine speed Arbitrary bandwidth Arbitrary number of inputs Scientific workflow is a DAG (!)
GRID COMPUTING
6Ptolemy Miniconference 2007 Daniel Zinn
Cost Model
Communication Time: TC
Function Execution Time: TE
Total Time: TT = TC + TE
Shipping and Handling Problem:Schedule all tasks such that the total time is minimal
7Ptolemy Miniconference 2007 Daniel Zinn
Problem Variants and Complexities
Task Handling Problem (THP) Data Shipping Problem (DSP)
Reduction from Task Scheduling Problem [ERLA94]
Reduction from Multiprocessor Scheduling Problem [KA99]
Reduction from 1-Multiterminal Cut
Shipping and Handling Problem (SHP)Communication Cost: Non-uniformFunction Execution Cost: Non-uniformComplexity: NP-complete
Communication Cost: ZeroFunction Execution Cost: Non-uniformComplexity: NP-complete
Communication Cost: Non-uniformFunction Execution Cost: ZeroComplexity: NP-complete
8Ptolemy Miniconference 2007 Daniel Zinn
easy-DSP: Uniform Transfer Rate, Uniform Data Size
Given: Directed Acyclic Graph,
Set of Colors Some vertices are already
colored Edge Weight = 1, if two adjacent
vertices are of different colorsEdge Weight = 0, otherwise
TASK: Color the rest of the vertices
such that total weight is minimal!
Cost Model:Minimize TotalShipped Volume!
4
9Ptolemy Miniconference 2007 Daniel Zinn
1 - Multi-Terminal CUT
Given: Undirected Graph: G = (V,E) Set of Terminals: S V Edge Weights: 1
TASK: Find a multi-way cut of G with a
minimum number of edges
NP-Complete for more than 3 Terminals!
Minimize #edgesbetween differentterminals!
4
10Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
?
DSP 1-MTC
“Order graph Color terminals”
11Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
1
11
1
1
1 11
1
?!
DSP 1-MTC
12Ptolemy Miniconference 2007 Daniel Zinn
Reduction: 1-MTC <= DSP
4 4
1
11
1
1
1 11
1
!
DSP 1-MTC
13Ptolemy Miniconference 2007 Daniel Zinn
NP-Hard, ...But: Need to solve
Greedy Algorithm Dynamic Programing Algorithm
Investigate Approximation Algorithms for MTC/related !
14Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ...
abstractonly somenodes assigned
concreteall nodes assigned
scheduling
15Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ... in Kepler!
Abstract Workflow ...
SCHEDULING
16Ptolemy Miniconference 2007 Daniel Zinn
Prototypical Implementation ... in Kepler!
Concrete Workflow ...
17Ptolemy Miniconference 2007 Daniel Zinn
Future Work
Use Heuristics about looping to guess multiplicities(then not ACYCLIC any more!)
Investigate approximation algorithms with error guarantees for 1-MTC => try to apply for DSP
ALSO: Relevant for COMAD Workflows:can be “compiled” into a low-level conventional WF
18Ptolemy Miniconference 2007 Daniel Zinn
Summary
Bad news Scheduling is hard DSP is hard (for BEST plans)
Good news Finding a quite good plan is easy Greedy/Dynamic Algorithms
Open Problems Approximation Quality of “simple algorithms”? When do they perform badly? Does this occur often in real-life workflows?
19Ptolemy Miniconference 2007 Daniel Zinn
References
20Ptolemy Miniconference 2007 Daniel Zinn
Thank You. Questions?