www.infotech.monash.edu.au phd confirmation 19574746 scheduling multiple scientific workflows based...

56
www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors: Dr. Chris Ling Dr. Maria Indrawan

Upload: edgar-higgins

Post on 25-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

www.infotech.monash.edu.au

PhD Confirmation19574746

Scheduling Multiple Scientific WorkflowsBased on Resources in Grid Environment

Sucha Smanchat

Supervisors: Dr. Chris Ling

Dr. Maria Indrawan

Page 2: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

2www.infotech.monash.edu

Content

Introduction

Motivation

Related Work

Research Objective

Proposed Methodology

Evaluation

Project Plan

Conclusion

Page 3: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

3www.infotech.monash.edu

Introduction

Workflow has been employed in business domain to streamline business process.

A representation of business process which contains of a set of tasks to achieve a goal.

Workflow became popular for its ability to represent the orchestration of services from heterogeneous sources.

Recently, workflow technology has been introduced into scientific domains to help scientists perform their work.

Scientific workflows usually require high computation power which grid environment can provide.

Page 4: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

4www.infotech.monash.edu

Motivation

To execute a workflow, tasks in the workflow need to be scheduled onto grid resources. Scheduling from application side.

Several scheduling algorithms exist but few deal with multiple workflows.

Multiple grid workflows scheduled separately might compete for the same resources which may result in the constraint imposed on each workflow being violated.

Parameter sweep workflow – the workflow that is executed several times with different data for optimisation purpose.

Page 5: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

5www.infotech.monash.edu

Motivation (2)

Execute multiple instances of parameter sweep workflow concurrently.

Resources competition will occur thus necessitate the need to efficiently schedule tasks to avoid bottleneck and delay.

Each instance (or workflow) might have different constraint.

Page 6: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

6www.infotech.monash.edu

Related Work

Workflow

Business Workflow & Scientific Workflow

Scientific Workflow Management Systems

Grid Workflow Scheduling

Page 7: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

7www.infotech.monash.edu

Workflow

Workflow is a collection of tasks which are connected together to achieve a goal.

A task is a single unit of work. Tasks in workflow can be arranged in four basic control

structures Sequential Choice Concurrent Loop

Single starting point and single exit point. Workflow can be specified in language such as BPEL4WS. Graphical models such as Petri net and DAG (Directed Acyclic

Graph) can be used to model workflow.

Page 8: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

8www.infotech.monash.edu

Workflow Example

Example of a workflow modelled as Petri Net

Page 9: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

9www.infotech.monash.edu

Executing Workflow

To execute workflow, an instance of workflow definition is created.

The tasks within workflow instance are bound to actual services (e.g. web services, software packages). Static binding (fully ahead) Dynamic binding (late binding during execution)

Workflow engine in Workflow Management System executes the workflow instance.

Page 10: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

10www.infotech.monash.edu

Business Workflow & Scientific Workflow

The focus of business workflow is on the composition of services, flow control, and coordination of tasks to provide a more powerful and meaningful support for users .

Dynamism in business workflow mainly concerns the ability to customise the workflow to suit the user and the ability to adapt to failure of workflow execution.

Scientific workflow captures and automates scientific process to help scientists from various domains to perform their experiments.

Scientific workflow is data and computation intensive. Scientific workflows can contain a large number of tasks, which require high computation power and involve complex data of various sizes.

Dynamism in scientific workflow currently focuses on performance optimisation which involves mapping and scheduling of workflow execution

Page 11: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

11www.infotech.monash.edu

Scientific Workflow Management Systems

Pegasus WMS [1]

Taverna [2]

Kepler [3]

Page 12: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

12www.infotech.monash.edu

Pegasus WMS

Developed by the Information Sciences Institute of the University of Southern California and the Computer Science department of the University of Wisconsin Madison.

Pegasus allows scientists to define abstract workflow without specifying detail of execution such as resources and data location.

Abstract scientific workflow is mapped to an executable workflow by “Pegasus Mapper” and passed on to DAGMan for execution.

Page 13: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

13www.infotech.monash.edu

Taverna

Designed for bioinformatics workflows as part of myGrid project.

Taverna focuses on the orchestration of bioinformatics web services and applications by using “Scufl” workflow definition language.

User can create and edit workflow using Scufl workbench. The workflow is then executed by “Freefluo” enactment engine.

Page 14: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

14www.infotech.monash.edu

Kepler

Designed specifically for scientific workflows and has been used in various scientific domains.

Kepler is developed based on “Ptolemy II” and adopts the “Vergil” GUI [4].

Kepler provides “actors” which are used to perform tasks.

Workflow is created by connecting actors via input and output ports. Built-in actors Actors created by users

Workflow execution in Kepler is controlled by directors.

Page 15: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

15www.infotech.monash.edu

Directors in Kepler

SDF Director and DDF Director execute tasks sequentially. A task is started once the required inputs are available.

DDF allows choice structure (if-then) while SDF does not.

PN Director executes tasks in parallel.

CT Director and DE Director include time aspect to control execution of tasks, used for simulation.

Page 16: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

16www.infotech.monash.edu

TDA Director

TDA Director proposed by Abramson et al. [5] clones tasks involved in parameter sweep and execute them in parallel (Nimrod/K).

Each copy can be seen as an instance of parameter sweep workflow.

Every copy is of the same workflow definition and requires the same resource.

Efficient workflow scheduling is required to avoid delay and bottleneck.

Page 17: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

17www.infotech.monash.edu

Grid Workflow Scheduling

Best-Effort Batch mode Dependency mode Cluster and duplication based scheduling Meta-heuristics based scheduling

QoS-constraint Scheduling Deadline-constraint based scheduling Budget-constraint based scheduling

*Classification based on Yu et al. [6]

Page 18: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

18www.infotech.monash.edu

Static and Dynamic Aspects

Static scheduling algorithms schedule tasks to resources once.

Dynamic scheduling usually involves Run-time rescheduling using static algorithm Adaptation triggered by availability of resources and performance

Page 19: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

19www.infotech.monash.edu

Batch Mode

Task Prioritising Phase – creates a list of tasks that are ready to execute and finds the resource that can execute each task fastest with Minimum Estimated Completion Time (MCT)

Resource Selection Phase :-

Min-Min - pair of task and resource with minimum MCT is scheduled first [7].

Max-Min - pair of task and resource with maximum MCT is scheduled first [7].

Sufferage - task that will suffer most if not scheduled first (sufferage determined by min MCT – second min MCT) [7]

Page 20: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

20www.infotech.monash.edu

Batch Mode - Extended

XSufferage – sufferage value also considers the time to transfer file. A task should be assigned to resource that already has the file that tasks requires [8].

QoS guided Min-Min – tasks are grouped into tasks requiring high and low bandwidth. Tasks requiring high bandwidth are scheduled first using Min-min [9].

Selective Min-Min Max-Min – uses Standard Deviation to determine if there are a few longer tasks and many shorter tasks. If so, Max-min is used, otherwise Min-min is used [10].

Page 21: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

21www.infotech.monash.edu

Dependency Mode – HEFT

HEFT - Heterogeneous-Earliest-Finish-Time [11]

Rank tasks based on Average execution time of the tasks Average time required for the transferring data between resources Position of the tasks in the workflow.

The tasks with higher rank are scheduled first to the resource that can complete that task at the earliest time.

Page 22: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

22www.infotech.monash.edu

Dependency Mode – CPOP

CPOP - Critical-Path-on-a-Processor [11]

Find “critical path processor” that minimise critical path.

Tasks in critical path are assigned to the critical path processor. Other tasks are assigned to the resource that minimise their execution time.

Fail if there is no processor that can execute every task in the critical path, as pointed out by Shi and Dongarra [12].

Page 23: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

23www.infotech.monash.edu

Dependency Mode – Hybrid HEFT

Hybrid HEFT [13]

Rank tasks as normal HEFT.

Group tasks that do not depend on each other, similar to batch mode algorithms.

Schedule tasks in each group using “Balanced Minimum Completion Time” (BMCT) algorithm. Tasks scheduled on a resource can be moved to another resource to balance resource load.

Page 24: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

24www.infotech.monash.edu

Dependency Mode – SDC

SDC - Scheduling algorithm for heterogeneous processors with Different Capabilities. by Shi & Dongarra [12]

A task can only be executed by certain resources.

Consider tasks with scarce capable resource” - the tasks that fewer resources are able to execute. Capable resource / Total resource

Tasks with scarce capable resource are scheduled first (rank higher) to avoid delay caused by allocating scarce resource to the task which can be executed elsewhere.

Does not deal with resource competition For example, there are 10 resources that can execute task T but

there are also 100 other tasks that need the same 10 resources.

Page 25: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

25www.infotech.monash.edu

Cluster and Duplication Based Scheduling -TDS

TDS - Task Duplication Based Scheduling Scheme [14]

The tasks that have lower communication cost between each other are clustered together.

Each cluster is assigned the processor that takes minimum execution time to execute the tasks within.

Where possible, duplicate the predecessor task of a tasks on the same process to minimise or eliminate the communication cost.

Page 26: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

26www.infotech.monash.edu

Cluster and Duplication Based Scheduling - TANH

TANH - Task Duplication Based Scheduling Algorithm for Heterogeneous Systems [15]

Clusters tasks based on the number of available processors.

If there are more processors than clusters, tasks are duplicated and scheduled to the available processors

If there are more clusters than processors, clusters are merged until the number of clusters is equal to the number of processors

Both TDS and TANH fail if there is no processor that can execute every task in a cluster [12].

Page 27: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

27www.infotech.monash.edu

Meta-heuristics Based Scheduling

GRASP [16] Greedy Randomised Adaptive Search Procedure Iteratively generate randomised schedule Find local optimal solution in each iteration Once iteration stops, best solution stored is returned as result

Genetic Algorithm [17] Generate initial set of solutions (first parent) Create new solution (children) based on known good solution

(parent) Repeat until a preset condition is met.

Gives solution based on entire workflow but takes longer scheduling time.

Page 28: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

28www.infotech.monash.edu

Deadline-constraint Based Scheduling

Backtracking [18] Minimise cost while meeting deadline Allocates the ready tasks to the resources with least cost then

calculate the execution time. If deadline is violated, the last allocated task is reallocated to a

faster resource. Multiple backtracking may be required.

Partitioning [19] Workflow is partitioned into branches of sequential tasks. The deadline constraint of the workflow is then distributed to each

of the partition. If the deadline of a partition is violated during execution, the

subsequent partition(s) adjust to handle the delay.

Page 29: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

29www.infotech.monash.edu

Scheduling Multiple Grid Workflows

Merge multiple non-loop workflows (DAG) then use any of the existing algorithms for single workflow to schedule [20] Merge entry and exit points of all DAG Connect shorter DAG in the middle of longer DAG

Schedule tasks in each DAG Sequential – one DAG after another Round-Robin Selected tasks based on fairness

Does not deal with constraints and resource aspect

Page 30: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

30www.infotech.monash.edu

Scheduling Multiple Grid Workflows (2)

xDCP (extended Dynamic Critical Path) [21] Initialise schedule by randomly allocating resources to parameter

sweep tasks. For each task, if there is a resource that can execute faster, move

that task to that resource.

pM-S (priority based Master-Slave) [21] Arrange the execution order of parameter sweep tasks to allow

dependent tasks further down the workflow to execute earlier on another resource.

Increase parallelisation of parameter sweep tasks. This technique assumes that resources are assigned to each set

of parameter sweep tasks exclusively.

Page 31: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

31www.infotech.monash.edu

Scheduling Multiple Grid Workflows (3)

Game-quick & Game-cost by Duan et al. [22]

Based on game theory

Iterating through each workflow, tasks that are ready to execute are grouped together and compete for resources. A task can win the game and get a resource but will lose on

another resource.

Does not look at tasks further down in the entire of every workflow.

Applicability to our work needs further investigation.

Page 32: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

32www.infotech.monash.edu

Discussion

Few works have been done on scheduling multiple workflows, all of which do not consider enough resources aspect and do not utilise the information of the resources required by the entire workflows.

Page 33: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

33www.infotech.monash.edu

Research Objective

To propose a scheduling technique for executing multiple scientific grid workflows The scheduling must be aware of the resources required by every

task in every workflow. The scheduling algorithm must be able to schedule tasks in

multiple workflows based on resources required to avoid bottleneck and delay

Time constraints of every workflow should be satisfied.

Implementation in Kepler system

Dynamic change of resources are excluded from our scope.

Page 34: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

34www.infotech.monash.edu

Ongoing Scenario

Quantum Chemical Calculation using GAMESS quantum chemistry package [23].

Optimise four parameters that gives the best pseudo atom surface [5].

Page 35: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

35www.infotech.monash.edu

Scenario Explanation

Parameter Sweep actor creates combinations of parameters and assigns a unique token to each combination.

TDA director (Nimrod/K) in Kepler system clones the four GAMESS tasks for each parameter combination.

The outputs of the four GAMESS tasks are sent to RMS task to calculate Root Mean Square error.

The results are ordered using Order Tags task and displayed in graph

More Detail

Page 36: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

36www.infotech.monash.edu

Assumptions from Scenario

The same tasks in every instance require the same resource/resource types.

Within the same instance, the four GAMESS tasks may require the same or different resource types. A task can only be executed by certain resources. A resource cannot executed every task.

Depending on the currently available resources in the grid and the number of instances cloned, the execution plan for this scenario may be different.

Page 37: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

37www.infotech.monash.edu

Proposed Methodology

Task and Resource Prioritising Phase

Resource Selection Phase

Implementation Prototype in Kepler

Page 38: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

38www.infotech.monash.edu

Task and Resource Prioritising Phase

Tasks which are independent on other tasks from multiple workflows (or instances) can be grouped together.

Resources are also ranked based on the scarcity of resources and the degree of competition.

Tasks in each group can be ranked based on: Execution time Dependencies between tasks Ranks of the resources they required.

Need to know the resources required by every task in every workflow instance.

Page 39: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

39www.infotech.monash.edu

Possible Issues

A resource is a non-scarce resource to a workflow but is a scarce resource to another workflow.

A resource might not be a highly demanded resource at the beginning of the workflow but becomes one later on.

Complexity introduced by multiple workflows

Page 40: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

40www.infotech.monash.edu

Possible Issues (2)

Input model for algorithm Existing work mostly use Directed Acyclic Graph (DAG). Loop is not allowed

Kepler allows loop.

Petri Net which also supports workflow analyses might be a good alternative.

Page 41: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

41www.infotech.monash.edu

Resource Selection Phase

Resources are allocated to tasks based on task rank and resource rank.

At this stage, fastest execution and deadline constraint will be considered in resource allocation.

Each workflow (or workflow instance) might specify different constraints.

Allocation of resources to the tasks that are ready to execute must also consider the tasks further down the workflow. Resource ranks Estimated time at which these resources are in demand.

Output of this phase is the execution plan for executing multiple grid workflows.

Page 42: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

Example of Execution Plan for 3 Instances

See Assumptions

Page 43: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

43www.infotech.monash.edu

Implementation in Kepler

As a multiple workflow scheduler for Kepler.

As part of TDA director (Nimrod/K) for parameter sweep.

Page 44: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

44www.infotech.monash.edu

Evaluation

The proposed scheduling technique will finally be implemented and tested on Kepler using real scientific workflow scenarios against the following criteria.

The time required for the scheduling process for different number of workflows

The efficiency of the algorithm in comparison with existing work How efficient is the resulting execution plan? Need to identify proper comparison scheme as most existing work

are for single workflow Comparison with existing Kepler’s directors including TDA director

without the proposed algorithm.

Page 45: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

45www.infotech.monash.edu

Evaluation (2)

The ability of the algorithm to satisfy the user’s requirement Time constraints of each workflow (instance) specified by the

users must be satisfied. It might be possible to apply the verification technique proposed

by Chen and Yang [24] for this purpose.

The ability of the algorithm to maintain task dependencies The proposed algorithm should preserve the original task

dependencies and the order of tasks in the original workflows.

Page 46: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

46www.infotech.monash.edu

Project Plan

Expected Date Project Plan

Year 1

Jan 2008 – Sep 2008 - Initial research and literature review- Analysing existing work and identifying the gaps

Oct 2008 – Jan 2009 - Formulation of problems and objectives- Preparation of research proposal

Year 2

Feb 2009 – Aug 2009 - Development of the algorithm- Study of Kepler architecture

Sep 2009 – Jan 2010 - Initial evaluation of the proposed algorithm- Initial implementation of the scheduler

Year 3

Feb 2010 – Apr 2010 - Implementation of the scheduler in Kepler- Start writing thesis

May 2010 – Nov 2010 - Complete implementation of scheduler - Evaluation of the proposed algorithm using real scenarios- Continue writing thesis

Dec 2010 – Jan 2011 - Complete and finalize thesis

Page 47: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

47www.infotech.monash.edu

Published Paper

S. Smanchat, S. Ling, and M. Indrawan, "A Survey on Context-aware Workflow Adaptations," in Proceedings of the 3rd International Workshop on Trustworthy Ubiquitous Computing (TwUC 2008). Linz, Austria, 2008, pp. 422-425.

Page 48: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

48www.infotech.monash.edu

Conclusion

The need for scheduling multiple scientific workflows in grid environment is identified.

Existing grid workflow scheduling techniques are described.

We aim to develop an algorithm that can schedule multiple workflows into execution plan so that the time constraints of those workflows are considered together.

Our work should be able to help improve the performance of the execution of multiple scientific workflows over the existing approaches.

Page 49: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

49www.infotech.monash.edu

Question?

Page 50: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

50www.infotech.monash.edu

References

[1] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: A framework for mapping complex scientific workflows onto distributed systems," Sci. Program., vol. 13, pp. 219-237, 2005.

[2] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li, "Taverna: a tool for the composition and enactment of bioinformatics workflows," Bioinformatics, vol. 20, pp. 3045-3054, 2004.

[3] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao, "Scientific workflow management and the Kepler system," Concurr. Comput. : Pract. Exper., vol. 18, pp. 1039-1065, 2006.

[4] Ptolemy II, http://ptolemy.eecs.berkeley.edu/ptolemyII/, Accessed October, 2008

[5] D. Abramson, C. Enticott, and I. Altinas, "Nimrod/K: towards massively parallel dynamic grid workflows," in Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Austin, Texas: IEEE Press, 2008.

[6] J. Yu, R. Buyya, and K. Ramamohanarao, "Workflow Scheduling Algorithms for Grid Computing," in Metaheuristics for Scheduling in Distributed Computing Environments, 2008, pp. 173-214.

Page 51: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

51www.infotech.monash.edu

References (2)

[7] M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, and R. F. Freund, "Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems," in Proceedings of the 8th Heterogeneous Computing Workshop (HCW '99), 1999, pp. 30-44.

[8] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman, "Heuristics for scheduling parameter sweep applications in grid environments," in Proceedings of the 9th Heterogeneous Computing Workshop (HCW 2000) 2000, pp. 349-363.

[9] X. He, X. Sun, and G. v. Laszewski, "QoS guided min-min heuristic for grid task scheduling," J. Comput. Sci. Technol., vol. 18, pp. 442-451, 2003.

[10] K. Etminani and M. Naghibzadeh, "A Min-Min Max-Min selective algorihtm for grid task scheduling," in Proceedings of the 3rd IEEE/IFIP International Conference in Central Asia on Internet (ICI 2007), 2007, pp. 1-7.

[11] H. Topcuoglu, S. Hariri, and W. Min-You, "Performance-effective and low-complexity task scheduling for heterogeneous computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, pp. 260-274, 2002.

[12] Z. Shi and J. J. Dongarra, "Scheduling workflow applications on processors with different capabilities," Future Gener. Comput. Syst., vol. 22, pp. 665-675, 2006.

Page 52: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

52www.infotech.monash.edu

References (3)

[13] R. Sakellariou and H. Zhao, "A hybrid heuristic for DAG scheduling on heterogeneous systems," in Proceedings of the 18th International Conference on Parallel and Distributed Processing Symposium, 2004, pp. 111.

[14] S. Ranaweera and D. P. Agrawal, "A task duplication based scheduling algorithm for heterogeneous systems," in Proceedings of the 14th International Conference on Parallel and Distributed Processing Symposium (IPDPS 2000), 2000, pp. 445-450.

[15] R. Bajaj and D. P. Agrawal, "Improving scheduling of tasks in a heterogeneous environment," IEEE Transactions on Parallel and Distributed Systems, vol. 15, pp. 107-118, 2004.

[16] T. A. Feo and M. G. C. Resende, "Greedy Randomized Adaptive Search Procedures," Journal of Global Optimization, vol. 6, pp. 109-133, 1995.

[17] J. Stender, "Introduction to genetic algorithms," in IEE Colloquium on Applications of Genetic Algorithms, 1994, pp. 1/1-1/4.

[18] D. A. Menascé and E. Casalicchio, "A framework for resource allocation in grid computing," in Proceedings of the 12th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS 2004), 2004, pp. 259-267.

Page 53: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

53www.infotech.monash.edu

References (4)

[19] J. Yu, R. Buyya, and C. K. Tham, "Cost-based scheduling of scientific workflow applications on utility grids," in Proceedings of the First International Conference on e-Science and Grid Computing, 2005, pp. 8 pp.

[20] H. Zhao and R. Sakellariou, "Scheduling multiple DAGs onto heterogeneous systems," in Proceedings of the 20th International Conference on Parallel and Distributed Processing Symposium (IPDPS 2006), 2006, pp. 14 pp.

[21] T. Ma and R. Buyya, "Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids," in Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005, pp. 251-258.

[22] R. Duan, R. Prodan, and T. Fahringer, "Performance and cost optimization for multiple large-scale grid workflow applications," in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. Reno, Nevada: ACM, 2007.

[23] The General Atomic and Molecular Electronic Structure System (GAMESS), http://www.msg.chem.iastate.edu/GAMESS/, Accessed November, 2008.

[24] J. Chen and Y. Yang, "Temporal Dependency for Dynamic Verification of Fixed-Date Constraints in Grid Workflow Systems," in Web Technologies Research and Development - APWeb 2005, 2005, pp. 820-831.

Page 54: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

54www.infotech.monash.edu

Petri Net Basic

Event-driven graphical model consisting of places, transitions, and arcs.

Enabled Transition Tokens in every input

place

Transition Fires Tokens moved from

input place to every output place

Fire

Fire

Fire

Enabled

Enabled

Enabled

Page 55: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

55www.infotech.monash.edu

GAMESS Tasks Detail

Four parameters are input to each calculation

Spherical and Cartesian basis on

Pseudoethane and Pseudomethyl radical

THUS

GAMESS - Spherical basis on pseudoethane

GAMESS - Cartesian basis on pseudoethane

GAMESS - Spherical basis on pseudomethyl radical

GAMESS - Cartesian basis on pseudomethyl radical

Back to Scenario

Page 56: Www.infotech.monash.edu.au PhD Confirmation 19574746 Scheduling Multiple Scientific Workflows Based on Resources in Grid Environment Sucha Smanchat Supervisors:

56www.infotech.monash.edu

Assumption for Example Execution Plan

The task GAMESS_spher_eth can be executed by resource r1 or r2.

The task GAMESS_cart_eth can be executed by resource r2 or r3.

The task GAMESS_spher_rad can be executed by resource r1 and r2.

The task GAMESS_cart_rad can only be executed by resource r3.

The task RMS, which can only be executed by resource r1, is also cloned in this example to show possible bottleneck problem.

Back to Example