tchernykh workflow scheduling with user estimate 2012 journal grid computing

Upload: susosusi

Post on 14-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    1/24 1 23

    Journal of Grid Computing ISSN 1570-7873 J Grid ComputingDOI 10.1007/s10723-012-9215-6

    Multiple Workflow Scheduling Strategieswith User Run Time Estimates on a Grid

    Adn Hirales-Carbajal, AndreiTchernykh, Ramin Yahyapour, Jos LuisGonzlez-Garca, Thomas Rblitz & Juan Manuel Ramrez-Alcaraz

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    2/24 1 23

    Your article is protected by copyright andall rights are held exclusively by SpringerScience+Business Media B.V.. This e-offprintis for personal use only and shall not be self-

    archived in electronic repositories. If youwish to self-archive your work, please use theaccepted authors version for posting to yourown website or your institutions repository.You may further deposit the accepted authorsversion on a funders repository at a fundersrequest, provided it is not made publiclyavailable until 12 months after publication.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    3/24

    J Grid ComputingDOI 10.1007/s10723-012-9215-6

    Multiple Workflow Scheduling Strategies with User Run

    Time Estimates on a GridAdn Hirales-Carbajal Andrei Tchernykh Ramin Yahyapour Jos Luis Gonzlez-Garca Thomas Rblitz Juan Manuel Ramrez-Alcaraz

    Received: 22 December 2011 / Accepted: 13 March 2012 Springer Science+Business Media B.V. 2012

    Abstract In this paper, we present an experimen-tal study of deterministic non-preemptive multipleworkflow scheduling strategies on a Grid. We dis-tinguish twenty five strategies depending on thetype and amount of information they require. Weanalyze scheduling strategies that consist of twoand four stages: labeling, adaptive allocation, pri-oritization, and parallel machine scheduling. Weapply these strategies in the context of executingthe Cybershake, Epigenomics, Genome, Inspiral,

    A. Hirales-Carbajal A. Tchernykh ( B )Computer Science Department,CICESE Research Center,Ensenada, BC, Mxicoe-mail: [email protected]

    A. Hirales-Carbajale-mail: [email protected]

    R. Yahyapour J. L. Gonzlez-Garca T. RblitzGWDG University of Gttingen,37077 Gttingen, GermanyR. Yahyapoure-mail: [email protected]

    J. L. Gonzlez-Garcae-mail: [email protected]

    T. Rblitze-mail: [email protected]

    J. M. Ramrez-AlcarazColima University, C.P. 28040 Colima, Col., Mxicoe-mail: [email protected]

    LIGO, Montage, and SIPHT workflows applica-tions. In order to provide performance compa-rison, we performed a joint analysis consideringthree metrics. A case study is given and corre-sponding results indicate that well known DAGscheduling algorithms designed for single DAGand single machine settings are not well suited forGrid scheduling scenarios, where user run timeestimates are available. We show that the pro-posed new strategies outperform other strategiesin terms of approximation factor, mean criticalpath waiting time, and critical path slowdown. Therobustness of these strategies is also discussed.

    Keywords Grid computing Workflow scheduling Resource management User run time estimate

    1 Introduction

    The problem of scheduling jobs with precedenceconstraints is an important problem in schedulingtheory and has been shown to be NP-hard [ 1].It arises in many industrial and scientific appli-cations. The manner in which a job allocationcan be done depends not only on their proper-ties and constraints, but also on the nature of the infrastructure which may be subject to anunpredictable workload generated by indepen-dent users in a distributed environment. While

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    4/24

    A. Hirales-Carbajal et al.

    scheduling for Grids has been subject to researchfor many years, we see a trend to more complexworkflows as the Grid has become a commonproduction environment in the scientific field.

    Most of the studies have addressed schedulingof a single DAG on a single computer. Onlyfew studies have considered scheduling of mul-tiple DAGs. Similarly, workflow scheduling hasalso been considered for Grids over the pastdecade. Several algorithms have been proposedfor different types of Grids. Details are discussedin Section 3.

    There are two major drawbacks of these ap-proaches: they are based on the premise of knowl-edge of the exact execution times and consider asingle optimization criterion.

    In this paper, we consider workflow schedulingwith no or estimated runtime information, andmultiple optimization goals.

    User run time estimates available in many realgrid scenarios have been shown to be quite in-accurate [23], and may result in bad workloaddistribution or inefficient machine utilization [ 24].We evaluate whether this information has anypractical benefit for workflow scheduling. To ourknowledge, this issue has not been addressed inother studies. In addition we propose two strate-gies which are free of run time information.

    Most strategies only consider a single crite-rion, like total completion time, overall resourceutilization, or slowdown. The multi objectiveworkflow allocation problem has rarely been con-sidered so far. Especially the Grid scenario con-tains aspects which are multi-objective by nature.For instance, system performance related issuesand user QoS demands must be considered. More-over, resource providers and users have different,often conflicting performance goals: from mini-mizing response time to optimizing resource uti-lization. Grid resource management should in-clude multiple objectives and use multi-criteriadecision support in a way that both stakeholderdemands are satisfied. In this paper, we proposemulti-criteria analysis of workflow scheduling al-gorithms as well as the applicability of user job runtime estimates in Grid environments.

    Another factor that should be considered inGrid scenarios is the unpredictable workload gen-erated by other workflows due to the distributed

    multi-user environment. Hence, we also proposestrategies that take into account both, workflowproperties and current site state information.

    More precisely, we consider workflow jobscheduling in Grids with two levels. At the firstlevel, jobs are allocated to a suitable Grid site,while local scheduling is independently applied toeach site on the second level. The first level is apart of the Grid broker and is often called a Grid-layer scheduler. It typically has a general view of job requests while specific details on the state of the resources remain hidden from it.

    We address offline scheduling of workflowsunder consideration of zero release times of workflows and a non-clairvoyant execution, wherethe scheduler has no knowledge of the real exe-cution length of the tasks in the workflow. Whilereal Grid systems exhibit online behavior, it isknown that jobs typically remain in queues fora significant time. Therefore, an offline schedul-ing strategy is beneficial for the efficient onlinescheduling of such a set of workflows in the cur-rent queue. Moreover, many offline schedulingalgorithms exhibit a good performance also in theonline case. From theory, it is known that theperformance bounds of offline scheduling strate-gies can be approximated for the online case [ 25].As there are yet no established online workloadtraces of workflows, we start with existing jobtraces, and model the missing workflows features.We present and evaluate two novel group of workflow scheduling strategies named MWGS4(Multiple Workflow Grid Scheduling 4 stages)and MWGS2 (Multiple Workflow Grid Schedul-ing 2 stages). The stages of these strategies con-sist of labeling ( WGS_Label ), adaptive allocation(WGS_Alloc ), prioritization ( PRIO ), and parallelmachine scheduling ( PS ).

    We continue this paper by formally present-ing our Grid scheduling model in Section 2, anddiscuss related work in Section 3. We introduceworkflow scheduling algorithms and classify themin Section 4. Experimental setup, performanceanalysis methodology, and experimental resultsare presented in Section 5. We evaluate schedul-ing strategies with respect to the robustness of theschedules they produce in Section 6. Finally, weconclude with a summary and an outlook of futureworks in Section 7.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    5/24

    Multiple Workflow Scheduling Strategies

    2 The Scheduling Model

    First, we address an offline (deterministic) non-preemptive, non-clairvoyant multiple parallelworkflow scheduling problem on a computationalgrid, where n workflow jobs J 1 , J 2 , . . . , J n must bescheduled on m parallel machines (sites) N 1 , N 2 ,. . . , N m . Let m i be the size of machine N i (numberof identical processors), and m 1 ,m be the numberof processors in the Grid. Jobs are scheduled on a job-by-job basis, no rescheduling is allowed.

    A workflow is a composition of tasks subjectto precedence constraints. Workflows are mod-eled by a directed acyclic graph G j = (V j , E j ),where V j is the set of tasks, and E j = { (T u ,T v ) | T u , T v V j , u = V }, with no cycles T u T v T u , is the set of edges between tasks in V j .Each edge ( T u , T v ) E j represents the precedenceconstraint between tasks T u and T v , such thatT u must be completed prior execution of T v isinitiated.

    Each workflow job J j is described by the tuple(G j , size j , p j , pG j , p j , p

    G j , cpn j ): with G j = (V j ,

    E j ); its size size j that is referred to as workflowprocessor requirements or maximum degree of parallelism; critical path execution time p j ; totalworkflow execution time pG j , user critical pathrun time estimate p j , workflow run time estimate pG j , and number of tasks in the critical path cpn j .

    The graph size (width) is the cardinality of thelargest set of nodes in G j that are disjoint (thereis no path connecting any pair of them). Such aset represents nodes that might be simultaneouslyexecuted.

    Each workflow task T k is a sequential applica-tion and is described by the tuple r k , pk , pk : withrelease date r k , execution time pk , and user runtime estimate pk . Due to the offline schedulingmodel release date of a workflow r j = 0, how-ever, the release date of a task r k is not availablebefore the task is released. Tasks are releasedover time according to the precedence constraints.A task can start its execution only after all itsdependencies have been satisfied. At its releasedate, a task must be immediately and irrevocablyallocated to a single machine. However, we do notdemand that the specific processor is immediatelyassigned to a task at its release date; that is, theprocessor allocation of a task can be delayed. We

    use g(T k ) = N i to denote that task T k is allocatedto machine N i, and n i to denote the number of tasks allocated to the site N i. A machine mustexecute a task by allocating a processor for anuninterrupted period of time pk to it. We use s

    ik

    to denote start time of the task T k on machine N i.Total workflow processing time pG j and critical

    path execution cost p j are unknown until the job has completed its execution. They representtime requirements of all tasks of the job pG j =

    T vV j pv , and time requirements of tasks thatbelong to the critical path p j = T vcp pv .

    We allow multisite workflow execution; hence,tasks of a job J j can be run on different sites. Weconsider jobs that tolerate latency since sites maynot be at the same geographical location. We alsoassume that the resources are stable and dedicatedto the Grid.

    We focus on the analysis of scheduling systemswhere all jobs are given and processed in the samebatch. A set of available and ready jobs will beexecuted up to the completion of the last one.All jobs that arrive during this time interval willbe processed in the next batch. This allows thetransformation of the online scheduling problemto be solved in an offline fashion. A relation be-tween this scheme and the scheme where jobs arereleased over time, either at their release date, oraccording to the precedence constraints is known,and studied for different scheduling strategies forgeneral or restricted cases [ 25]. There are severalproduction systems in use whose primary focus is job batching and resource scheduling, like Condor[26], and PBS (Portable Batch System) [ 27]. Theyare supported as a job scheduler mechanism byseveral meta schedulers, like GRAM (Grid Re-source Allocation Manager) [ 28].

    The following criteria are used to evaluate pro-posed algorithms: approximation factor, criticalpath waiting time, and critical path slowdown. Letc j and ck be the completion time of job J j and taskT k , respectively. Let C imax be the maximum com-pletion time of tasks allocated to machine N i. Theapproximation factor of the strategy is definedas = C maxC max , where C max = max i= 1 ..m C

    imax is the

    makespan of a schedule, and C max is the optimalmakespan. Waiting time of a task t w k = ck pk r k is the difference between the completion timeof the task, its execution time, and its release date.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    6/24

    A. Hirales-Carbajal et al.

    Waiting time of a critical path cpw j = c j p j isthe difference between the completion time of the job and the length of its critical path. It takes intoaccount waiting times of all tasks in the criticalpath. Critical path slowdown cps j = 1 +

    cpw j p j is the

    relative critical path waiting time and evaluatesthe quality of the critical path execution. A slow-down of one indicates zero waiting times for criti-cal path tasks, while a value greater than one indi-cates that the critical path completion is increasedby increasing waiting time of critical path tasks.The approximation factor is used to qualify theefficiency of the scheduling algorithms. We usethe approximation factor over makespan criterionas both metrics differ in constants regardless of thescheduler being used. In addition, to estimate thequality of workflow executions two workflow met-rics are used. They are commonly used to expressthe objectives of different stakeholders of Gridscheduling (end-users, local resource providers,and Grid administrators).

    3 Related Work

    For single parallel computers, several schedulingalgorithms for tasks with arbitrary precedenceconstraints have been studied in the literature [ 16]. Most of them are designed to schedule only asingle Directed Acyclic Graph (DAG) at a time.

    Similarly, workflow scheduling has also beenconsidered for Grids over the past decade. Severalalgorithms have been proposed for different typesof Grids. This includes incremental workflowpartitioning and full graph scheduling strategies[7], scheduling with communication and process-ing variations [8], QoS workflow scheduling withdeadline and budget constraints [ 9], schedul-ing data intensive workflows [ 10], opportunisticworkflow scheduling [ 11], and level based taskclustering of data intensive workflows on compu-tational grids [ 12].

    Another approach to schedule workflows isto decompose them into sub-graphs, which arethen treated as moldable tasks [ 13, 14]. This canbe used to schedule multiple workflows, whereworkflows are compacted into moldable tasks.This method reduces the problem of scheduling

    multiple workflows to scheduling multiple mold-able tasks. Once a moldable task schedule is ob-tained, the constraints of the workflows can beremoved, and tasks can be backfilled to furtherimprove efficiency.

    Workflow scheduling has diversified into manyresearch directions: analysis of workflow struc-ture properties in order to identify tasks clus-ters; minimization of critical path execution time;selection of admissible resources; allocation of suitable resources for data intensive workflows;scheduling subject to QoS constraints, fine tun-ing workflow execution and performance analysis,etc. [712, 15]. Most of them have consideredsingle workflow scheduling problems.

    Only few studies have addressed scheduling of multiple DAGs. In [ 16], authors discussed cluster-ing DAG tasks into chains and allocating them tosingle machines. They proposed four schedulingheuristics that differ in the way how several DAGsare scheduled. DAGs can either be scheduledindependently by interleaving their execution, orcan be combined into a single DAG. In [ 17], afairness based approach for scheduling of mul-tiple DAGs on heterogeneous multiprocessor isproposed. Two strategies are considered: FairnessPolicy based on Finishing Time (FPFT) and Fair-ness Policy based on Concurrent Time (FPCT).Both strategies arrange DAGs in ascending or-der of their slowdown value, select independenttasks from the DAG with minimum slowdown,and schedule them using Heterogeneous EarliestFinishing Time (HEFT) [18] or Hybrid.BMCT[19]. FPFT re-calculates the slowdown of a DAGeach time a task of the DAG completes execu-tion, while FPCT re-computes the slowdown of allDAGs each time any task in a DAG completesexecution. In [ 20], online scheduling of multipleDAGs is addressed. Authors proposed two strate-gies based on aggregating DAGs into a singleDAG. A modified FCFS and Service-On-Time(SOT) scheduling are applied. FCFS appends ar-riving DAGs to an exit node of the single DAG,while SOT appends arriving DAGs to a taskwhose predecessors have not completed execu-tion and will be ready afterward. Once the singleDAG has been built, scheduling is carried out byHEFT.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    7/24

    Multiple Workflow Scheduling Strategies

    In [21], authors focused on developing strate-gies that provide a proper distribution of re-sources among Parallel Task Graphs (PTG),providing fairness and makespan minimization.Constraints are defined according to four gen-eral resource sharing policies: unbounded Share(S) , Equal Share ( ES ), Proportional Share ( PS ),and Weighted Proportional Share ( WPS ). S pol-icy uses all available resources. ES policy usesequal resources for each PTG. PS and WPS useresources proportional to the work of each PTG,where the work is considered as critical path costby width of PTG.

    In [22], an Online Workflow Management(OWM) strategy for scheduling multiple mix-parallel workflows is proposed. OWM includesthree stages: workflow scheduling, task schedul-ing, and resource allocation. Workflow schedul-ing, referred to as Critical Path WorkflowScheduling (CPWS), labels DAG tasks, sortsthem, and stores into independent buffers. Label-ing is based on the upward rank strategy. Thesorting arranges tasks in descendent order of thetask rank. Task scheduling referred to as a rankhybrid phase determines the task execution order.Tasks are sorted in descending order when alltasks in the queue belong to the same workflow.Otherwise, they are sorted in ascending order.Allocation assigns idle processors to tasks fromthe waiting queue. Tasks with highest priority areallocated only if their resource requirements canbe satisfied subject to minimize earliest estimatedfinishing time.

    4 Proposed Workflow Scheduling Strategies

    In this section, we present details of our schedul-ing strategies. We consider multi-stage schedul-ing named MWGS4 (Multiple Workflow GridScheduling with 4 stages) and MWGS2 (Multi-ple Workflow Grid Scheduling with 2 stages).The stages of these strategies consist of (1) la-beling ( WGS_Label ), (2) adaptive allocation(WGS_Alloc ), (3) prioritization ( PRIO ), and(4) parallel machine scheduling ( PS ). Hence,we regard them as MWGS4 = WGS_Label +

    WGS_Alloc + PRIO + PS and MWGS2 =WGS_Alloc + PS .

    At the WGS_Label stage, workflow tasks arelabeled by their ranks using user run timeestimates.

    At the WGS_Alloc stage, the list of tasks thatare ready to be started is maintained. Tasksfrom the list are allocated to suitable resourcesusing a given optimization criteria.

    At the PRIO stage, labels are used to priori-tize tasks.

    At the PS stage, a PS algorithm is applied totasks that were allocated during WGS_Allocstage for eachparallel machine independently.

    Note that WGS_Label procedures are designedto schedule only a single workflow at a time. Tostudy an impact of task labeling and task allo-cation policies in the multiple workflow schedul-ing environment with unpredictable workload,we compare MWGS4 with two stage strategiesMWGS2 = WGS_Alloc + PS , where tasks la-beling is not performed and prioritizing is notused. Hence, we compare MWGS4 with workflowscheduling strategies that are based only on allo-cation policies designed for scheduling indepen-dent tasks.

    4.1 Task Labeling (WGS_Label)

    Tasks labeling ( WGS_Label ) prioritizes workflowtasks considering each workflow independently.Labels are not changed nor recomputed on com-pletion of predecessor tasks. Task labels are usedto identify properties of a given workflow, suchas the task level, critical path length, etc. We dis-tinguish two labeling strategies: Downward Rank(DR ), and Constant Rank ( CR ). DR estimates thelength of the longest path from considered task toa terminal task in a workflow. Descending orderof DR supports scheduling tasks on the criticalpath first.

    CR labels tasks using the following procedure:all tasks of the critical path are labeled by thecritical path cost. Then the critical path tasks areremoved together with corresponding edges. Theprocedure is repeated recursively for each graphof a residual forest.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    8/24

    A. Hirales-Carbajal et al.

    Figure 1 illustrates the CR labeling procedure.In Fig. 1a critical path nodes { A , C , F , I } (dashlines), with total length 11 are labeled by 11 andremoved. A dummy zero cost task { P } is addedto the residual graph, and nodes with no pre-decessors are used as successors (Fig. 1b). Newcritical path nodes { P , E , G }, with length 7 arelabeled by 7 and removed. The resulting DAG isshown in Fig. 1c. The process is repeated until theresidual graph vertex set V j is empty. User runtime estimates are used for labeling. Descendingorder of CR supports scheduling all tasks on thelonger path first.

    4.2 Task Allocation (WGS_Alloc)

    The list of tasks that are ready to be started ismaintained. Independent tasks with no predeces-sors and with predecessors that completed theirexecution are entered into the list. Upon com-pletion of a workflow task its immediate succes-sors may become available and are entered intothe list. Allocation policies are responsible forselecting a suitable site for task allocation. Weuse adaptive task allocation strategies presentedin [24, 29] as independent task allocation policies,and introduce novel strategy MaxAR (Table 1).We distinguish allocation strategies depending onthe amount of information they require. Four lev-els of available information are considered.

    Level 1: Information about the status of avail-able resources is available.

    Level 2: Once a task has been submitted, the sitewith least load per processor is known.

    Level 3: All information of the level 2 and the job run time estimates are available.

    Level 4: Information of the level 3, all localschedules, and site status are avail-able. Levels 14 information may beprovided via Grid information service.Note that the number and the sizes of the parallel machines are known.

    In addition, the following single workflowscheduling strategies, used in many performanceevaluation studies, are considered: HEFT (Het-erogeneous Earliest Finishing Time First), andCPOP (Critical Path on Processor) [18].

    HEFT schedules DAGs in two phases: job la-beling and processor selection. In the job labelingphase, a rank value (upward rank) based on meancomputation and communication costs is assignedto each task of a DAG. In the following, weconsider only computation costs.

    CPOP labeling phase computes the rank of each task as the sum of its up and downwardranks. It identifies tasks on the critical path asthose having equal cost.

    Prior the job allocation phase, CPOP selectsa processor that minimizes its computation cost.Such a processor is referred to as the critical path

    Fig. 1 CR labeling of tasks

    (a) (b) (c)

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    9/24

    Multiple Workflow Scheduling Strategies

    Table 1 Task allocation strategies

    MPS_Alloc Level DescriptionMaxAR 1 Allocates task T k to the site with maximum fraction of available resources at time r k :

    max i= 1... mavail i

    m i , were avail i denotes the number of available processors at time r k .MaxAR does not consider the requirements of queued jobs

    Rand 1 Allocates task T k to a machine with the number randomly generated from a uniform

    distribution in range [1 , m]MLp 2 Allocates task T k to the site with least load per processor at time r kmin i= 1 ... m (n i / m i)

    MLB 3 Allocates task T k to the site with least work per processor at time r k :min i= 1 ... m g(T k )= N i

    pkm i

    MWT 4 Allocates task T k to the site with minimum average task waiting timemin i= 1 ... m g(T k )= N i

    t w kn i

    MST 4 Allocates task T k to the site with earliest start time for this taskmin i= 1 ... m sik

    MCT 4 Allocates task T k to the site with earliest site completion timemin i= 1 ... m cimax before allocating task T k

    processor. On the allocation phase tasks that areon the critical path are assigned to the criticalpath processor, otherwise, they are allocated to aprocessor that minimizes the task earliest finishingtime. Since knowledge of local schedules is re-quired by both HEFT and CPOP their allocationstrategies are categorized as level 4.

    4.3 Local Queue Prioritization (PRIO)and Site Scheduling Algorithm (PS)

    Local Resource Management System ( LRMS )prioritizes tasks in the waiting queue based ontheir labels and assigns tasks with the highestpriority to execution. To this end, LRMS applieslocal queue ordering PRIO procedure. Two lo-cal queue prioritizing policies are distinguished:increasing and decreasing, referred as Shortest Chain First (SCF ), and Longest Chain First (LCF ),hence, PRIO {SCF, LCF }. A decreasing orderallows high priority tasks to be schedule first,while an increasing ordering schedules low pri-ority tasks first. For tasks belonging to a criticalpath, an increasing order schedules longest criticalpath tasks first, while a decreasing order schedulesshortest critical path tasks first. The labeling stageis omitted when workflow properties are not usedfor job prioritization.

    Different scheduling algorithms may be usedby the LRMS. We assume that the LRMS uses

    the on-line parallel scheduling algorithm EASY (First-Come-First-Serve with EASY backfilling).In order to apply EASY backfilling user run timeestimates are used.

    5 Experimental Validation

    In order to provide performance comparison, weuse workloads from a parametric workload gener-ator that produces workflows that resemble thoseof real workflow applications such as Cybershake,Epigenomics, Gnome, Inspiral, Ligo, Montage,and Sipht [30]. We show that known single DAGscheduling algorithms are not suitable for Gridmultiple workflow scheduling contexts, while ourMWGS4 approach delivers good performance andoutperforms other algorithms in terms of approx-imation factor, mean critical path waiting time,and critical path slowdown. However, one of theirdrawbacks is that these strategies are based onlabeling and ordering of site local queues, andhave significant computational complexity.

    We show that quite simple schedulers MaxARwith minimal information requirements can pro-vide good performance for multiple workflowscheduling with user run time estimates. Note,that it does not use specific workflow informa-tion nor requires task labeling. Besides the per-formance aspect it does not require additionalmanagement overhead such as DAG analysis,

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    10/24

    A. Hirales-Carbajal et al.

    requesting info about local schedules, and con-structing preliminary schedules by the Gridbroker.

    While machines in real Grids often exhibitdifferent forms of heterogeneity, like differenthardware, operating system, software, dynamicbehavior, many different types of jobs, and otherrestrictions, the model of any study on Gridscheduling must be an abstraction of reality toperform performance evaluation in a repeatableand controllable manner. On the other hand, keyproperties of Grids should be observed to pro-vide benefits for real installations. As computa-tional Grids are successors of parallel computers,we extend their most basic model of Garey andGraham [ 31] that assumes identical processorsas well as jobs with unknown processing times.Hence, we restrict ourselves to heterogeneous ma-chines with a different number of the identicalprocessors or cores, as architectures of individualcores and their clock frequency tend to be rathersimilar. This simplified model neither matchesevery real installation nor all real applications, butthe assumptions are nonetheless reasonable. Theproposed algorithms may serve as a starting pointfor algorithms that can be implemented in realcomputing Grids.

    5.1 Experimental Setup

    Several Grid scheduling toolkits with differentfunctionality have been developed in recent years.Most of them are used by their own develop-ers, target a specific community, and designedto study specific features, such as data replica-tion, P2P applications, etc. [ 43]. GridSim [41] andSimGrid [42] are freely available, widely usedand acknowledged frameworks for the simula-tion of distributed resource management, andscheduling. GridSim was initially intended forgrid economy scheduling, and later on becameused in other areas of grid computing. Likewise,the SimGrid scope has been generalized fromcommunity-specific toolkit. However, validatingthe performance of multiple workflow schedul-ing strategies needs flexibility to manage Gridscenarios with unpredictable workloads generatedby other workflows due to the distributed multi-user environment. To this end, we require a grid

    simulation tool that is able to take into accountboth, workflow properties and current site stateinformation.

    All experiments are performed using theGrid scheduling simulator tGSF (Teikoku GridScheduling Framework). tGSF is a standard tracebased simulator that is used to study Grid re-source management problems. We have extendedTeikoku to include Grid workflow job schedul-ing capabilities. Design details of the simulatorare described in [ 32]. Two workloads types areused in experiments for comprehensive analysis:workload A (306 workflows) and workload B(508 workflows). Their properties are described inAppendix A.1 .

    Note that the number of tasks in workload B(120300 tasks) is almost five times as much asin workload A (25002 tasks). The resource con-sumption is also five times as much. It gives ustwo different scenarios for simulation. The Cy-bershake, Epigenomics, Genome, Inspiral, LIGO,Montage, and SIPHT workflows [33] are used.They are publicly available via the Pegasus projectportal [30], and include | V j | , | E j | , size j , p j , pG j and cpn j . Examples of workflow types are shownin Appendix A.1 (Fig. 4). Their characteristicsare presented in Appendix A.1 (Tables 7 and 8).These workflows were executed using the PegasusWorkflow Management System (Pegasus-WMS)on the TeraGrid.

    In this paper, we also use TeraGrid sites in-formation for an experimental analysis. We chosethe TeraGrid since it has been used in workflowscheduling studies [ 14]. Existing TeraGrid work-loads are recognized to be intensive, complex andmultidimensional in terms of job type, job size and job duration. To make possible a variety of userand usage scenarios with multiple workflow work-loads, we have chosen only three computationalsites with a total of 5376 processors (Table 2),which compromise among the three smallest ma-chines in the TeraGrid. Background workload (lo-cally generated jobs) that is an important issue innon-dedicated Grid environment is not addressed.Communication cost is not considered.

    Users of large parallel computers are typicallyrequired to provide run time estimates for submit-ted jobs. It is used by production system sched-ulers to bind job execution time, and to avoid

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    11/24

    Multiple Workflow Scheduling Strategies

    Table 2 Experimental testbed

    Description SettingsWorkload type Workflows. Workload type A

    and B (see Appendix A.1).Number of grid sites 3Site sizes Sizes of sites are taken from the

    TeraGrid: site 1 (Bigred),3072 procs; site 2 (Ember),1536 procs; and site 3 (Pople),768 procs; 5376 procs in total

    Metrics Mean critical path waiting time,critical path slowdown,and approximation factor

    Number of experiments 30Experimental unit LiDO Linux-HPC-Cluster at

    the Technische UniversittDortmund

    Problem model Offline

    wasting resource consumption. Thus, when a jobreaches such a limit, it is typically aborted. To ourknowledge, run time estimates for workflows arenot provided. Thus, for the simulation purposes,we set p j to p j + c p j , where c is randomly gen-erated value uniformly distributed between 0 and5, and pk to p j , for each T k V j , modeling inac-curacy of the user run time estimates as proposedin [34]. Note that only level 3 and 4 allocationstrategies use run time estimates.

    Table 3 summarizes site scheduling strategies,site queuing policies, workflow scheduling strate-gies, and workflow labeling policies. Two class of strategies MWGS2 and MWGS4 are evaluated.

    5.2 Performance Analysis

    A good scheduling algorithm should schedule jobsto achieve high Grid performance while satisfyingvarious user demands in an equitable fashion. Of-ten, resource providers and users have different,conflicting, performance goals: from minimizingresponse time to optimizing resource utilization.Grid resource management involves multiple ob- jectives and may use multi-criteria decision sup-port, for instance, based on the Pareto optimality.However, it is very difficult to achieve fast solu-tions needed for Grid resource management byusing the Pareto dominance.

    The problem is very often simplified to a sin-gle objective problem or to different methods of objectives combining. There are various ways tomodel preferences, for instance, they can be givenexplicitly by stakeholders to specify an impor-tance of every criterion or a relative importancebetween criteria. Due to the different nature of criteria, the actual difference may have a differentmeaning.

    In order to provide effective guidance in choos-ing the best strategy, we performed a joint analysisof several metrics according to methodology usedin Ramrez-Alcaraz et al. [ 24].

    They use an approach to multi-criteria analy-sis assuming equal importance of each metric.The goal is to find a robust and well performingstrategy under all test cases, with the expectationthat it will also perform well under other

    Table 3 Experimentalsettings

    MWGS model Parameters DescriptionI WGS_Alloc + PS Labeling is 9 scheduling strategies are analyzed:

    not performed MLp, MaxAR, MLB, MWT, MCT, MST,PRIO = FCFS CPOP, Rand, and HEFTPS = EASY

    II WGS_Label + WGS_Label {DR, CR}, 16 workflow scheduling strategies andMPS_Alloc + PRIO {SCF, LCF}, 3 best rigid job strategies are analyzed:RIO + PS PS = EASY DR + MLp + LCF, DR + MaxAR + LCF,

    DR + MLB + LCF, DR + MST+ LCFDR + MLp + SCF, DR + MaxAR + SCF,DR + MLB + SCF, DR + MST+ SCFCR + MLp + LCF, CR + MaxAR + LCF,CR + MLB + LCF, CR + MST+ LCFCR + MLp + SCF, CR + MaxAR + SCF,CR + MLB + SCF, CR + MST+ SCFMLp, MaxAR, MLB

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    12/24

    A. Hirales-Carbajal et al.

    conditions, e.g., with different Grid configurationsand workloads.

    The analysis is conducted as follows. First, weevaluate the degradation in performance of eachstrategy under each of the three metrics (Table 4).

    They are well known performance metrics com-monly used to express the objectives of differentstakeholders of Grid scheduling (end-users, localresource providers, and Grid administrators). Theobjective of the mean critical path waiting timeand slowdown performance metrics is to evaluatethe quality of the critical path execution in thecontext of scheduling several workflows.

    Note that in the experimental analysis the lowerbound of the optimal completion time C max =

    max max j p j , j = 1 ,n p

    G j

    m1,mis used instead of the

    optimal makespan C max to calculate the approxi-mation factor. The degradation in performance iscomputed relatively to the best performing strat-egy for the metric, as follows: strategy metricbest metric 1 100 .

    Thus, each strategy is now characterized by3 numbers, reflecting its relative performancedegradation under the test cases. In [24], these 3values (assuming equal importance of each met-ric) are averaged and ranked. The best strategy,

    with the lowest average performance degradation,has rank 1. However, some metrics might have lit-tle variation by nature or by a given scenario, andthe averaging would yield to lesser impact to theoverall score. In this paper, a different approach istaken. We average the individual ranks instead of the metric degradations their self.

    Note that we try to identify strategies whichperform reliably well in different scenarios; thatis we try to find a compromise that considers allof our test cases. For example, the rank of the

    Table 4 Metrics

    Metric DefinitionApproximation factor = C max

    C max

    Mean critical path waiting time cpw = 1nn

    j = 1cpw j

    Mean critical path slowdown cps = 1nn

    j = 1cps j

    strategy in the average performance degradationcould not be the same for any of the metricsindividually.

    5.3 Experimental Results

    In this section, 2- and 4-stage scheduling systemsare simulated by various types of input processesand constraints. The following simulation resultscompare the performance of the 25 strategies(6 MWGS2 strategies designed for independenttasks, 2 workflow scheduling strategies, Rand, and16 MWGS4 priority based workflow schedulingstrategies). We conduct their comprehensive per-formance evaluation for two different scenariosconsidering three metrics (Tables 3 and 4). Inthe experiment I, analysis of 9 two-stage strate-gies is performed based on the workload A . Inexperiment II, we select the 3 best performedstrategies of the previous experiment, add 16 four-stage workflow strategies and analyze them withthe heavier load B .

    5.3.1 Scheduling Strategies

    In this section, we evaluate scheduling strate-gies MWGS2 = WGS_Alloc + PS using work-load A (Table 3). Firstly, we compare alloca-tion strategies considering the approximation fac-tor, mean critical path waiting time, and meancritical path slowdown, independently. Then, weperform a joint analysis of these metrics accord-ing to methodology described in Section 5, andpresent their ranking when considering all metricsaverage.

    Table 5 shows the performance degradation of all strategies for , cp w , and cps . It also shows themean degradations of the strategies when consid-ering all metrics average, and ranking for all testcases. A small percentage of degradation indicatesthat the performance of a strategy for a givenmetric is close to the performance of the best per-forming strategy for the same metric. Therefore,small degradations represent better results.

    From Table 5 we observe that MCT and MWT are strategies with worst approximation factorswith 4717 and 3005% of degradation in perfor-mance. One of the reasons for such large values

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    13/24

    Multiple Workflow Scheduling Strategies

    Table 5 Rounded performance degradation

    Metric StrategyCPOP HEFT MCT MLB MLp MST MWT MaxAR Rand

    8 5 4717 0 0 5 3005 0 0cps 61 50 98 5 0 50 98 3 10cpw 15470095 9823658 789285050 56269 9 9862349 1167811792 13862 143334Mean 5156721 3274571 263096622 18758 3 3287468 389271632 4622 47781Ranking 7 5 8 3 1 6 9 2 4

    is high inaccuracy of user run time estimates usedin scheduling. Such a problem is also addressedin [24, 34]. MLp strategy has small percent of degradations in all metrics. We see that and cpsmetrics have less variation compared with cpw ,and yield to lesser impact to the overall score. Theapproximation factor of MLB , MLp , MaxAR , and

    Rand are near the lower value. MLp , MaxAR andMLB have similar behavior in terms of criticalpath slowdown. Due to our model is a simplifiedrepresentation of a system, we can conclude thatthese strategies might have similar efficiency inreal Grid environment when considering abovemetrics.

    However, there exist significant differencescomparing cp w , where MLp shows the best per-formance. In this strategy the critical path comple-tion time does not grow significantly with respect

    to p j , therefore tasks in the critical path experi-ence small waiting times. Results also show thatfor all strategies small mean critical path waiting

    time degradation corresponds to small mean criti-cal path slowdown.

    We presented the metrics degradations aver-ages to evaluate performance of the strategies,and showed that some strategies tend to dominateresults. The degradation approach provides thepercent improvement, but does not show negative

    effects of allowing a small portion of the problemsto dominate the conclusions.To analyze possible negative effects of allowing

    a small portion of the problem instances with largedeviation to dominate the conclusions that basedon averages, and to help with the interpretation of the data generated by the benchmarking process,we presented performance profiles of our strate-gies in Fig. 2.

    The performance profile ( ) is a non-decreasing, piecewise constant function that

    presents the probability that a performance ratior = strategy metricb est metric is within a factor of the best ratio[35, 36].

    Fig. 2 Performanceprofiles

    0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

    x 10 8

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    CPOPHEFT

    MaxARMLB

    MLpMST

    Rand

    (a) 8105..1 =

    1 1.05 1.1 1.15 1.2

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    CPOPHEFT

    MaxARMLB

    MLpMST

    Rand

    (b) 2.1..1=

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    14/24

    A. Hirales-Carbajal et al.

    The function ( ) is the cumulative distributionfunction. Strategies with large probability ( ) forsmaller are to be preferred. For instance, inFig. 2b, (1.08) = 0.65 means that MaxAR per-formed at most 8% worse than the best strategyon 65% of the instances considered.

    Figure 2 shows the average performance profile ( ) = 13 s= 1 ... 3 ( ) of three metrics assum-ing equal importance of each metric, where 1 ..r M , and r M is the maximum over all r M values

    ratio [ 35]. We show the performance profiles indifferent ranges of to provide objective infor-mation for analysis of a test set. Figure 2a and bshow the performance profiles of the 7 strategiesin the interval = 1 ..5 10 8 and = [1 ..1 .2] ,respectively. This figure displays the large discrep-ancies in the performance ratios on a substantialpercentage of the problems.

    MLp has the highest ranking and the highestprobability of being the better strategy. The prob-ability that it is the winner on a given problemwithin factors of 1.01.2 of the best solution isabout 1. If we choose being within a factor of 1.07as the scope of our interest, then the probabilitythat the low complexity strategy MaxAR is thewinner is 0.65. If we choose being within a factorof 1.15 of the best solution then either MaxAR,MLB or Rand would suffice with a probability0.65.

    The most significant aspect of Fig. 2 is thaton this test set MLp dominates other strate-gies: its performance profile lays above all oth-ers for all values of performance ratios. We seethat an appropriate distribution of jobs over theGrid has a higher performance than an allocationof jobs based on user run time estimates andinformation on local schedules, even when spe-cialized workflow scheduling strategies are used.It turns out that MLp is the best performing al-gorithm for several workflow scheduling. MaxARprovides second best result. They take into ac-count dynamic site state information. Note, thatthey do not use specific workflow information norrequire task labeling. Besides the performanceaspect the use of these strategies does not requireadditional management overhead such as DAGanalysis, and info about local schedules.

    5.3.2 Priority Based Workf low SchedulingStrategies

    In the experiment II, we employ workflowproperties for scheduling purposes. We evaluate16 priority based workflow scheduling strategiesMWGS4 = WGS_Label + WGS_Alloc + PRIO +PS , where two DAG labeling algorithms CR andDR (Section 3) are used, and compare themwith 3 non-priority based scheduling strategiesMWGS2 = WGS_Alloc + PS for comprehensiveanalysis. WGS_Label procedure generates labelsthat are used as tasks priorities. PRIO procedureuses these priorities to sort site local queues. Re-sults are presented in Table 6.

    We find that MLp and MaxAR with differentlabeling and prioritization policies are morebeneficial than other strategies. Their averagedegradations are less than 18.6%.

    Results show that labeling strategies DR andCR affect very slightly the performance of theworkflow scheduling strategies in a Grid setting.We found that classical single DAG schedulingalgorithms CPOP and HEFT designed for singlemachine settings are not suitable for Grid schedul-ing contexts. Workflow scheduling algorithms thatuse task allocation strategies based on user runtime estimates give unstable results.

    As expected, LCF prioritization performs wellfor minimization of approximation factor in MLB ,MLp , and MaxAR allocation strategies indistinc-tive of the labeling strategies used. While SCF decreases mean critical path slowdown, and meancritical path waiting time.

    Results show that CR and DR labeling reducedegradation in most cases. DR labeling performsbetter than CR for each metric and all metricsaverage. SCF ordering performs better than LCF ,for considered metrics.

    To analyze possible biasing the results andavoid negative effects of allowing a small portionof the problem instances with large deviation todominate the conclusions, we presented perfor-mance profiles ( ) of our strategies in Fig. 3.

    To reduce complexity of presented results, weshow only 7 strategies with the ranking from 1 to4: (2) dr + maxar + lcf , (5) dr + mlp + scf , (10)

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    15/24

    Multiple Workflow Scheduling Strategies

    Table 6 Performancedegradations of allstrategies

    Strategy cps cp w Mean Ranking2 DR + MaxAR + LCF 0 0 0 0 15 DR + MLp + SCF 0 0 0 0 110 CR + MaxAR + LCF 0 0 0 0 113 CR + MLp + SCF 0 0 0 0 118 MaxAR 7.9 0 0 2.6 29 CR+ MLp + LCF 0 3.8 24.5 9.4 314 CR + MaxAR + SCF 2.1 4.6 30 12.2 41 DR + MLp + LCF 0 4 48.2 17.4 56 DR + MaxAR + SCF 1.8 7 47.1 18.6 617 MLp 13.9 5.3 51.9 23.7 77 DR + MLB + SCF 11.5 1.7 71.2 28.1 815 CR + MLB + SCF 11.8 4.6 74.6 30.3 911 CR + MLB + LCF 0 9.5 190.6 66.7 103 DR + MLB + LCF 0 10 234.9 81.6 1119 MLB 18.2 14.3 219.7 84.1 1216 CR+ MST+ SCF 51.1 22.7 558.1 210.6 138 DR + MST+ SCF 50.7 24.1 592.1 222.3 1412 CR+ MST+ LCF 513.4 7.3 345.8 288.8 15

    cr + maxar + lcf , (13) cr + mlp + scf , (18) maxar ,(9) cr + mlp + lcf , (14) cr + maxar + scf (see,Table 6). Figure 3 shows the performance profilesin the intervals = [1 , 10 ] and = [1 , 1 .2]. We seethat (5) DR + MLp + SCF has the highest proba-bility of being the better strategy. The probabilitythat it is the winner on a given problem within afactor of 1.1 of the best solution is about 0.70, andwithin a factor of 1.2 is about 1 (Fig. 3b). It domi-nates other strategies: its performance profile lays

    above all others. It also has ranking 1 for allmetrics degradation average.

    If we choose being within a factor of 1.2 as thescope of our interest, then either (5) DR + MLp +SCF would suffice with a probability 0.99, (13)CR + MLp + SCF with a probability 0.7, (14) cr +maxar + scf with probility 0.62, or (18) MaxARwith a probability 0.5 (Fig. 3b).

    We see that an appropriate distribution of loadper processor by MLp strategy with priority based

    Fig. 3 Performanceprofiles

    1 2 3 4 5 6 7 8 9 100

    0.1

    0.2

    0.3

    0.40.5

    0.6

    0.7

    0.8

    0.9

    1

    25

    1013

    189

    14

    (a) [1,10 ]= (b) [1,1.2 ]=

    1 1.05 1.1 1.15 1.20

    0.1

    0.2

    0.3

    0.40.5

    0.6

    0.7

    0.8

    0.9

    1

    25

    1013

    189

    14

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    16/24

    A. Hirales-Carbajal et al.

    scheduling has a higher performance than an al-location of jobs based on user run time esti-mates and information on local schedules, evenwhen specialized workflow scheduling strategiesare used.

    Strategies (2) DR + MaxAR + LCF , (5) DR +MLp + SCF , (10) CR + MaxAR + LCF , and(13) CR + MLp + SCF have the best rankingconsidering average degradation of three metrics.Considering the performance profiles, it turns outthat DR + MLp + SCF is the best performingalgorithms for several workflow scheduling.

    Note, that they use specific workflow informa-tion, require task labeling, and ordering. Hence,they always suffer from a large time complexity.Moreover, ordering by PRIO is responsibility of Grid site policy, and Grid resource managementmight not have control over it, and local clustersmight not perform task ordering.

    The degradation of MaxAR has a rank of 2,however, its average degradation is only within2.6% of the best strategy. If we choose beingwithin a factor of 1.2 of the best solution MaxARwould suffice with a probability 0.5.

    We conclude that in real Grid scenarios thisstrategy might have similar performance compar-ing with the best ones. Besides the performanceaspect the use of MaxAR does not requires addi-tional management overhead such as DAG analy-sis, and site local queue ordering. It has small timecomplexity.

    6 Robustness

    Most of the work in literature has focused onDAG scheduling heuristics that minimize themakespan. Only few studies have evaluated DAGscheduling with respect to the robustness of sched-ules they produce.

    In [37], a bi-objective algorithm for schedulingof independent tasks and DAGs subject to opti-mizing makespan and reliability is addressed. TheRHEFT (Reliable HEFT) strategy is proposed,where tasks are allocated to resources with thesmallest value of failure rate and execution time.Authors showed that optimizing both criteria isnot successful. Therefore, a user has to find a com-promise between minimization of the makespan

    and maximization of the reliability. In [ 38], au-thors extend results of [37] by developing schedul-ing strategy that increases reliability by replicatingtasks. Two fundamental problems are addressed:determining the number of replicas and defining acriterion for a proper allocation of the replicatedtasks. Scheduling is performed by a list schedulingbased heuristic called Scheduling by Replication(LSR). LSR allocates replicas on the least loadedprocessors.

    In [39], the robustness of twenty DAG schedul-ing heuristics from literature designed to minimizethe makespan was studied. A stochastic modelwas used to evaluate the standard deviation of themakespan, so that, the higher the deviation, theless robustness of the strategy. Authors showedthat static scheduling strategies tend to be morerobust. They demonstrated that the makespan androbustness are correlated, and addressed prob-lems of uncertainty due to resource failures.

    In [40], authors study an impact of informa-tion quality on performance estimation accuracyin different scheduling strategies in terms of per-formance estimates, by adding a percentage of random noise.

    We study another aspect of the robustness of DAG scheduling heuristics in the Grid environ-ment. MWGS strategies work under the assump-tion that when a broker requests site state in-formation, this information is provided instantlyand is most recent. In a real Grid scenario, thisassumption often does not hold due to severalfactors such as communication delays, congestion,processing load, etc. An information service isused to collect information from multiple scat-tered resources. It collects configuration, state,load, etc., and reports them periodically or byrequest. It is also responsible for assembling thisinformation, handling conflict resolutions, iden-tification of critical events, generating reports, andperforming various automated actions.

    Once the data have been retrieved, they canbe used for resource allocation decisions. If thedata are not updated frequently they can becomestale. Such a scenario is highly expected whenthe system load is high. In this section, we studyan impact of data retrieving delay on the per-formance of allocation policies, and on the over-all Grid performance. We assume that site state

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    17/24

    Multiple Workflow Scheduling Strategies

    information on the Broker level is updated at agiven time interval. Resource allocation decisionsmade within this time interval are based on thesame potentially out of date information.

    Experiments were conducted using workload Band strategies which show good performance (see,Section 5.3.1) MWGS2 , with WGS_Alloc {MLB,MLp , and MaxAR }, and PS = EASY .

    In order to evaluate their robustness, we in-corporate a site refresh interval into the brokerinformation service. We use refresh intervals of 0,2, 4, . . . , 221 milliseconds (up to 35 min).

    The degradation in performance of each strat-egy under each of the three metrics versus refreshinterval is shown in Figs. 9, 10 and 11 in Appendix .Mean performance degradation of each strategyunder each of the three metrics versus refreshinterval is shown in Fig. 12 in Appendix . We seethat workflow allocation strategies under the shortdelay scenario are invariant to stale informationduring 0 to 4 s refresh interval.

    MaxAR , and MLp strategies are more robustfor this refresh interval. Performance of strategiesdeteriorated when refresh interval exceeds 4 s.The degradations vary greatly if refresh interval isin the order of minutes. We conclude that MaxARis robust and well performed strategy.

    7 Conclusions

    Effective workflow management requires anefficient allocation of tasks to limited resourcesover time and currently the subject of many re-search projects. Multiple workflows add a newcomplexity to the problem. The manner in whichallocation of a workflow task can be done dependsnot only on the workflow properties and con-straints, but also on an unpredictable workloadgenerated by other workflows in the distributedcontext.

    In this paper, we concentrate on allocationstrategies that take into account both dynamic sitestate information and workflow properties. Wecharacterize the type and amount of informationthey require, and analyze strategies that consist of two and four stages: labeling, adaptive allocation,prioritization and parallel machine scheduling.

    We conduct a comprehensive performanceevaluation study of 25 workflow scheduling strate-gies in Grids using simulation. In order to provideeffective guidance in choosing the best strategy,we performed a joint analysis of three metrics(approximation factor, mean critical path waitingtime, and critical path slowdown) according toa degradation methodology that considers multi-criteria analysis assuming equal importance of each metric.

    To analyze possible biasing results and negativeeffects of allowing a small portion of the probleminstances with large deviation to dominate theconclusions, we presented performance profiles of our strategies. The goal is to find a robust andwell performing strategies under all test cases,with the expectation that it will also perform wellunder other conditions, e.g., with different Gridconfigurations and workloads.

    Our study results in several contributions.

    Firstly, we identify several levels of informa-tion available to make scheduling decisionswith respect to workflows and workflow tasksallocations.

    We discussed and analyzed a variety of two- and four-stage allocation strategies withuser run time estimates together with EASY

    backfilling local scheduling algorithm. We propose to use adaptive allocation strate-gies that demonstrate high efficiency as inde-pendent job allocation policies and take intoaccount dynamic site state information.

    We demonstrate that DAG scheduling algo-rithms designed for a single DAG and sin-gle machine settings are not suitable for Gridscheduling contexts.

    We show that the information about localschedules traditionally used for workflow al-

    location like Earliest Finishing Time does nothelp to improve the outcome of the multipleworkflow scheduling strategies when limitedinformation about tasks run time is available.

    Algorithms that use task allocation strategiesbased on user run time estimates give unstableresults.

    When we examine the overall Grid performancebased on real data, we find that an appropriatedistribution of load per processor by MLp with

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    18/24

    A. Hirales-Carbajal et al.

    priority based scheduling has a higher perfor-mance than an allocation of jobs based on user runtime estimates and information on local sched-ules, even when specialized workflow schedulingstrategies are used. However, they use specificworkflow information, require task labeling, andordering. Hence, they always suffer from a sig-nificant time complexity. Moreover, ordering byPRIO is responsibility of Grid site policy, andGrid resource management might not have con-trol over it, and local clusters might not performtask ordering.

    MaxAR is the second best strategy. Its degra-dation in performance has ranking 2. However, itis only within 2.6% of the best strategy. Even wechoose being within a factor of 1.2 of the best solu-tion MaxAR would suffice with a probability 0.5.

    In real Grid environments this strategy mighthave similar performance comparing with the bestones when considering above metrics. Besides theperformance aspect the use of MaxAR does notrequire additional management overhead such asDAG analysis, site local queue ordering, and con-structing preliminary schedules by the Grid bro-ker. It has small time complexity.

    As a conclusion, we can draw that for prac-tical purposes quite simple scheduler MaxARwith minimal information requirements can pro-vide good performance for multiple workflowscheduling.

    Our results consider offline scheduling whichcan be used as a starting point for address-ing the online case. Online Grid workflow man-agement brings new challenges to above prob-lem, as it requires more flexible load balancingworkflows and their tasks over the time. Manyoffline scheduling strategies perform reasonablewell when used in online scenarios. The pro-posed strategies can be easily adapted to solve on-line multiple workflows scheduling problem with-out extra computational complexity. However,further study is required to assess their actualefficiency and effectiveness. This will be subjectof future work requiring a better understanding of online workflows and, ideally, the availability of Grid logs with workflows from real installations.

    Currently, such data are not available. Moreover,scheduling in non-dedicated Grid environment,where background workload (locally generated jobs) is unpredictable, is another important issueto be addressed. The communication latency thatis a major factor in data Grid scheduling perfor-mance is additional relevant issues to be consid-ered.

    Acknowledgements Part of this work was supported byUABC under grant #4006/C/35/14, Deutscher Akademis-cher Austausch Dienst (DAAD) under grants A/07/74928,A/09/03177, and CONACYT under grant #32989-A.

    Appendix

    A.1 Workload Characteristics

    To facilitate evaluation of proposed multipleworkflow scheduling algorithms Fig. 4 shows ex-amples of workflow types available in the Pegasusproject [ 30]. Tables 7 and 8 list the characteristicsof the used workloads, where n is the number of jobs; V j is the number of tasks in the job; cpn j is the average number of tasks in critical paths; p j is the critical path cost, and size j is the averagewidth of a given DAG.

    Figures 5 and 6 show the critical path runtime in the workload A . 83% of workflows haveshort critical path run time with the range be-tween 28 and 83 min, while the remaining 17% of workflows have a critical path run time within therange of 137520 min. The width of workflows iswithin the range of 5 to 162 tasks, and the numberof tasks in the job is within the range of 24 to 200.

    Figures 7 and 8 show the critical path runtime in the workload B . 65% of workflows haveshort critical path run time with the range be-tween 28 and 83 min, while the remaining 35% of workflows have a critical path run time of 583 minin average. The width of workflows is within therange of 5578 tasks, and the number of tasks inthe job is within the range of 24700.

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    19/24

    Multiple Workflow Scheduling Strategies

    Fig. 4 Examples of workflow types

    (a) Montage (b) CyberShake

    (d) LIGO (e) SIPHT

    Table 7 Characteristicsof workload A

    Log name J j n | V j | | E j | p j (second) p j range (second) cpn j size j Cybershake 20 200 392 226 [183274] 4 99Cybershake 20 100 192 222 [169271] 4 49Cybershake 20 50 92 224 [159272] 4 24Epigenomics 1 100 122 29878 29878 8 24Epigenomics 1 46 54 7734 7734 9 10Epigenomics 1 24 27 5586 5586 8 5Gnome 20 200 244 20585 [939732434] 9 48Gnome 20 100 122 16322 [762431212] 9 24Gnome 20 50 54 20095 [902633091] 9 10Inspiral 1 100 119 1336 1336 6 24Inspiral 1 50 60 1415 1415 6 12Inspiral 1 30 35 1337 1337 6 7Ligo 20 200 241 1389 [13681408] 6 48

    Ligo 20 100 120 1349 [12901411] 6 24Ligo 20 50 60 1348 [12631396] 6 12Montage 20 200 485 110 [101123] 9 162Montage 20 100 234 76 [7479] 9 78Montage 20 50 108 58 [5661] 9 36Sipht 20 200 224 4811 [40005467] 5 146Sipht 20 100 112 4610 [40835374] 5 73Sipht 20 50 58 4453 [37155368] 5 32

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    20/24

    A. Hirales-Carbajal et al.

    Table 8 Characteristicsof workload B

    Log name J j n | V j | | E j | p j (second) p j range (second) cpn j size j Cybershake 5 700 1388 252 [200,274] 4 348Cybershake 5 600 1189 250 [215,275] 4 298Cybershake 10 500 990 247 [179276] 4 248Cybershake 10 400 792 220 [160266] 4 199Cybershake 10 300 592 218 [188270] 4 149Cybershake 20 200 392 226 [183274] 4 99Cybershake 20 100 192 222 [169271] 4 49Cybershake 20 50 92 224 [159272] 4 24Epigenomics 1 100 122 29878 29878 8 24Epigenomics 1 46 54 7734 7734 9 10Epigenomics 1 24 27 5586 5586 8 5Gnome 5 700 854 21276 [772830454] 9 168Gnome 5 600 704 20751 [951530911] 9 135Gnome 10 500 591 27527 [852635022] 9 114Gnome 10 400 470 17505 [885826685] 9 90Gnome 10 300 359 18423 [921434686] 9 70Gnome 20 200 244 20585 [939732434] 9 48Gnome 20 100 122 16322 [762431212] 9 24

    Gnome 20 50 54 20095 [902633091] 9 10Inspiral 1 100 119 1336 1336 6 24Inspiral 1 50 60 1415 1415 6 12Inspiral 1 30 35 1337 1337 6 7Ligo 5 700 873 1407 [13941415] 6 174Ligo 5 600 752 1407 [14001413] 6 149Ligo 10 500 617 1404 [13701416] 6 123Ligo 10 400 490 1404 [13901413] 6 97Ligo 10 300 369 1394 [13791418] 6 73Ligo 20 200 241 1389 [13681408] 6 48Ligo 20 100 120 1349 [12901411] 6 24Ligo 20 50 60 1348 [12631396] 6 12Montage 5 700 1734 263 [236298] 9 578Montage 5 600 1485 238 [209274] 9 495Montage 10 500 1236 209 [186237] 9 412Montage 10 400 983 185 [158196] 9 328Montage 10 300 734 140 [129161] 9 245Montage 20 200 485 110 [101123] 9 162Montage 20 100 234 76 [7479] 9 78Montage 20 50 108 58 [5661] 9 36Sipht 5 700 792 5285 [50445473] 5 493Sipht 5 600 676 5292 [51165515] 5 429Sipht 10 500 564 5143 [48455367] 5 356Sipht 10 400 452 5126 [44475602] 5 283Sipht 10 300 340 4994 [46375553] 5 210

    Sipht 20 200 224 4811 [40005467] 5 146Sipht 20 100 112 4610 [40835374] 5 73Sipht 1 60 68 4491 4491 5 42Sipht 20 50 58 4453 [37155368] 5 32Sipht 1 30 34 3399 3399 5 21

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    21/24

    Multiple Workflow Scheduling Strategies

    20 40 60 80 100 120 140 160 180 2000

    0.5

    1

    1.5

    2

    2.5

    3

    3.5x 10 4

    Number of tasks: |V j|

    C r i

    t i c a

    l p a t h

    c o s

    t : c p c j

    Fig. 5 Critical path cost versus number of tasks perworkflow in workload A

    0 0.5 1 1.5 2 2.5 3 3.5

    x 10 4

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    200

    Critical path cost: cpc j

    Fig. 6 Critical path cost histogram in workload A

    200 300 400 500 600 700

    Number of tasks: |V j|

    Fig. 7 Critical path cost and number of tasks per workflowin workload B

    0 0.5 1 1.5 2 2.5 3 3.5

    x 10 4

    0

    50

    100

    150

    200

    250

    300

    350

    Critical path cost histogram

    Fig. 8 Critical path cost histogram in workload B

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    22/24

    A. Hirales-Carbajal et al.

    A.2 Robustness

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    0 m s

    2 m s

    4 m s

    8 m s

    1 6 m s

    3 2 m s

    6 4 m s

    1 2 8 m s

    2 5 6 m s

    5 1 2 m s

    1 s

    2 s

    4 s

    8 s

    1 6 s

    3 3 s

    6 6 s

    2 m

    4 m

    9 m

    1 8 m

    3 5 m

    MLBMLpMaxAR

    Fig. 9 Approximation factor performance degradationversus refresh interval

    0

    5

    10

    15

    % D e g r a

    d a

    t i o n

    0 m s

    2 m s

    4 m s

    8 m s

    1 6 m s

    3 2 m s

    6 4 m s

    1 2 8 m s

    2 5 6 m s

    5 1 2 m s

    1 s

    2 s

    4 s

    8 s

    1 6 s

    3 3 s

    6 6 s

    2 m

    4 m

    9 m

    1 8 m

    3 5 m

    MLBMLpMaxAR

    Fig. 10 Mean critical path slowdown performance degra-dation versus refresh interval

    0

    50

    100

    150

    200

    250

    0 m s

    2 m s

    4 m s

    8 m s

    1 6 m s

    3 2 m s

    6 4 m s

    1 2 8 m s

    2 5 6 m s

    5 1 2 m s

    1 s

    2 s

    4 s

    8 s

    1 6 s

    3 3 s

    6 6 s

    2 m

    4 m

    9 m

    1 8 m

    3 5 m

    MLBMLpMaxAR

    Fig. 11 Mean critical path waiting time performancedegradation versus refresh interval

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    % D

    e g r a

    d a

    t i o n

    0 m s

    2 m s

    4 m s

    8 m s

    1 6 m s

    3 2 m s

    6 4 m s

    1 2 8 m s

    2 5 6 m s

    5 1 2 m s

    1 s

    2 s

    4 s

    8 s

    1 6 s

    3 3 s

    6 6 s

    2 m

    4 m

    9 m

    1 8 m

    3 5 m

    MLBMLpMaxAR

    Fig. 12 Average performance degradation versus refreshinterval

  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    23/24

    Multiple Workflow Scheduling Strategies

    References

    1. Pinedo, M.L.: Scheduling: Theory, Algorithms, andSystems, 3rd edn. Springer (2008)

    2. Mccreary, C., Khan, A.A., Thompson, J.J., Mcardle,M.E.: A comparison of heuristics for scheduling dags

    on multiprocessors. In: International Parallel and Dis-tributed Processing Symposium (IPPS94), pp. 446451.Cancun, Mxico (1994)

    3. Kwong, K.Y., Ahmad, I.: Dynamic critical-pathscheduling: an effective technique for allocating taskgraphs to multiprocessors. IEEE Trans. Parallel Dis-trib. Syst. 7, 506521 (1996)

    4. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithmsfor allocating directed task graphs to multiprocessors.ACM Comput. Surv. 31(4), 406471 (1999)

    5. Leung, J., Kelly, L., Anderson, J.H.: Handbook of Scheduling: Algorithms, Models, and PerformanceAnalysis. CRC Press, Inc., Boca Raton (2004)

    6. Rajakumar, S., Arunachalam, V.P., Selladurai, V.:Workflow balancing strategies in parallel machinescheduling. Int. J. Adv. Manuf. Technol. 23, 366374(2004)

    7. Wieczorek, M., Prodan, R., Fahringer, T.: Schedulingof scientific workflows in the askalon grid environment.SIGMOD Record 34(3), 5662 (2005)

    8. Bittencourt, L.F., Madeira, E.R.M.: A dynamic ap-proach for scheduling dependent tasks on the xavantesgrid middleware. In: MCG06: Proceedings of the 4thInternational Workshop on Middleware for Grid Com-puting. MCG06, pp. 1016. ACM, New York (2006)

    9. Jia, Y., Rajkumar, B.: Scheduling scientific workflowapplications with deadline and budget constraints us-ing genetic algorithms. Sci. Program. 14(3), 217230(2006)

    10. Ramakrishnan, A., Singh, G., Zhao, H., Deelman,E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers,D., Samidi, M.: Scheduling data-intensive workflowsonto storage-constrained distributed resources. In:CCGRID07: Proceedings of the 7th IEEE Sympo-sium on Cluster Computing and the Grid. CCGRID07,pp. 1417 (2007)

    11. Szepieniec, T., Bubak, M.: Investigation of the dag el-igible jobs maximization algorithm in a grid. In: Pro-ceedings of the 2008 9th IEEE/ACM InternationalConference on Grid Computing, GRID08, pp. 340345. IEEE Computer Society, Washington (2008)

    12. Singh, G., Su, M.-H., Vahi, K., Deelman, E., Berriman,B., Good, J., Katz, D.S., Mehta, G.:Workflow task clus-tering for best effort systems with Pegasus. In: MG08:Proceedings of the 15th ACM Mardi Gras Conference,pp. 18. ACM, New York (2008)

    13. Masko, L., Dutot, P.F., Mounie, G., Trystram, D., Tu-druj, M.: Scheduling moldable tasks for dynamic SMPclusters in soc technology. In: Parallel Processing andApplied Mathematics. Lecture Notes in Computer Sci-ence, vol. 3911, pp. 879887. Springer (2005)

    14. Masko, L., Mounie, G., Trystram, D., Tudruj, M.: Pro-gram graph structuring for execution in dynamic SMP

    clusters using moldable tasks. In: International Sympo-sium on Parallel Computing in Electrical Engineering,PAR ELEC 2006, pp. 95100 (2006)

    15. Singh, G., Kesselman, C., Deelman, E.: Optimizinggrid-based workflow execution. J. Grid Computing 3,201219 (2005)

    16. Bittencourt, L.F., Madeira, E.R.M.: Towards the

    scheduling of multiple workflows on computationalgrids. J. Grid Computing 8, 419441 (2010)17. Zhao, H., Sakellariou, R.: Scheduling multiple dags

    onto heterogeneous systems. In: Parallel and Dis-tributed Processing Symposium, 20th International,IPDPS06, p. 14. IEEE Computer Society, Washington(2006)

    18. Topcuouglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for het-erogeneous computing. IEEE Trans. Parallel Distrib.Syst. 13(3), 260274 (2002)

    19. Sakellariou, R., Zhao, H.: A hybrid heuristic for dagscheduling on heterogeneous systems. In: 13th IEEEHeterogeneous Computing Workshop (HCW04).IPDPS04, pp. 111123. IEEE Computer Society,Santa Fe (2004)

    20. Zhu, L., Sun, Z., Guo, W., Jin, Y., Sun, W., Hu, W.:Dynamic multi dag scheduling algorithm for opticalgrid environment. Netw. Architect. Manag. Appl. V6784(1), 1122 (2007)

    21. Ntakp, T., Suter, F.: Concurrent scheduling of par-allel task graphs on multi-clusters using constrainedresource allocations. In: International Parallel and Dis-tributed Processing Symposium/International ParallelProcessing Symposium, pp. 18 (2009)

    22. Hsu, C.-C., Huang, K.-C., Wang, F.-J.: Online schedul-ing of workflow applications. In Grid environments.Future Gen. Comput. Syst. 27, 860870 (2011)

    23. Mualem, A.W., Feitelson, D.G.: Utilization, pre-dictability, workloads, and user runtime estimates inscheduling the IBM SP2 with backfilling. IEEE Trans.Parallel Distrib. Syst. 12, 529543 (2001)

    24. Ramirez-Alcaraz, J.M., Tchernykh, A., Yahyapour,R., Schwiegelshohn, U., Quezada-Pina, A., Gonzalez-Garca, J.L., Hirales-Carbajal, A.: Job allocation strate-gies with user run time estimates for online schedul-ing in hierarchical Grids. J. Grid Computing 9, 95116(2011)

    25. Shmoys, D.B., Wein, J., Williamson, D.P.: Schedulingparallel machines on-line. SIAM J. Comput. 24, 13131331 (1995)

    26. Condor high throughput computing. Available in:http://www.cs.wisc.edu/condor/ . Cited August 201127. Openpbs. Available in: http://www.mcs.anl.gov/

    research/projects/openpbs/ . Cited August 201128. Globus. Available in http://www.globus.org/ . Cited

    August 201129. Tchernykh, A., Schwiegelshohn, U., Yahyapour, R.,

    Kuzjurin, N.: On-line hierarchical job scheduling onGrids with admissible allocation. J. Scheduling 13, 545552 (2010)

    30. Workflow generator. Available in https://confluence.pegasus.isi.edu/display/pegasus/ . Cited August 2010

    http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/http://www.mcs.anl.gov/research/projects/openpbs/http://www.mcs.anl.gov/research/projects/openpbs/http://www.mcs.anl.gov/research/projects/openpbs/http://www.globus.org/https://confluence.pegasus.isi.edu/display/pegasus/https://confluence.pegasus.isi.edu/display/pegasus/https://confluence.pegasus.isi.edu/display/pegasus/https://confluence.pegasus.isi.edu/display/pegasus/https://confluence.pegasus.isi.edu/display/pegasus/http://www.globus.org/http://www.mcs.anl.gov/research/projects/openpbs/http://www.mcs.anl.gov/research/projects/openpbs/http://www.cs.wisc.edu/condor/
  • 7/30/2019 Tchernykh Workflow Scheduling With User Estimate 2012 Journal Grid Computing

    24/24

    A. Hirales-Carbajal et al.

    31. Garey, M.R., Graham, R.L.: Bounds for multiproces-sor scheduling with resource constraints. SIAM J.Comput. 4, 187200 (1975)

    32. Hirales-Carbajal, A., Tchernykh, A., Roblitz, T.,Yahyapour, R.: A grid simulation framework to studyadvance scheduling strategies for complex workflowapplications. In: IEEE International Symposium on

    Parallel Distributed Processing, Workshops and PhdForum (IPDPSW), pp. 18 (2010)33. Bharathi, S., Chervenak, A., Deelman, E., Mehta,

    G., Su, M.-H., Vahi, K.: Characterization of scien-tific workflows. In: Third Workshop on Workflows inSupport of Large-Scale Science, WORKS08, pp. 110(2008)

    34. Lee, C.B., Schwartzman, Y.,Hardy, J.,Snavely, A.:Areuser runtime estimates inherently inaccurate? In: JobScheduling Strategies for Parallel Processing, pp. 253263 (2004)

    35. Dolan, E.D., Mor, J.J.: Benchmarking optimizationsoftware with performance profiles. Math. Program.91(2), 201213 (2002)

    36. Dolan, E.D., Mor, J.J., Munson, T.S.: Optimality mea-sures for performance profiles. Siam. J. Optim. 16, 891909 (2006)

    37. Dongarra, J.J., Jeannot, E., Saule, E., Shi, Z.:Bi-objective scheduling algorithms for optimizingmakespan and reliability on heterogeneous systems.In: Proceedings of the Nineteenth Annual ACM Sym-

    posium on Parallel Algorithms and Architectures,SPAA07, pp. 280288. ACM, New York (2007)

    38. Saule, E., Trystram, D.: Analyzing scheduling withtransient failures. Inform. Process. Lett. 109(11), 539542 (2009)

    39. Canon, L.-C., Jeannot, E., Sakellariou, R., Zheng,W.: Comparative evaluation of the robustness of dag

    scheduling heuristics. In: Gorlatch, S., Fragopoulou, P.,Priol, T. (eds.) Journal of Grid Computing, pp. 7384.Springer, New York (2008)

    40. Casanova, H., Legrand, A., Zagorodnov, D., Berman,F.: Heuristics for scheduling parameter sweep applica-tions in grid environments. In: Heterogeneous Com-puting Workshop, pp. 349363 (2000)

    41. Buyya, R., Murshed, M.: GridSim: a toolkit forthe modeling and simulation of distributed re-source management and scheduling for grid comput-ing. J. Concurr. Comput. Pract. Exp. 14, 11751220(2002)

    42. Casanova, H.: SimGrid: a toolkit for the simula-tion of application scheduling. In: Proceedings of theFirst IEEE/ACM International Symposium on ClusterComputing and the Grid, pp. 430437 (2001)

    43. Sulistio, A., Yeo, C.S., Buyya, R.A.: A taxonomy of computer-based simulations and its mapping to paral-lel and distributed systems simulation tools. Software:Practice and Experience (SPE) 34(7), 653673 (2004).ISSN: 0038-0644