1668 ieee transactions on computers, vol. 61, …lik/publications/keqin-li-ieee-tc-2012.pdf ·...
TRANSCRIPT
Scheduling Precedence ConstrainedTasks with Reduced Processor Energy
on Multiprocessor ComputersKeqin Li, Senior Member, IEEE
Abstract—Energy-efficient scheduling of sequential tasks with precedence constraints on multiprocessor computers with dynamically
variable voltage and speed is investigated as combinatorial optimization problems. In particular, the problem of minimizing schedule
length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint are
considered. Our scheduling problems contain three nontrivial subproblems, namely, precedence constraining, task scheduling, and
power supplying. Each subproblem should be solved efficiently so that heuristic algorithms with overall good performance can be
developed. Such decomposition of our optimization problems into three subproblems makes design and analysis of heuristic
algorithms tractable. Three types of heuristic power allocation and scheduling algorithms are proposed for precedence constrained
sequential tasks with energy and time constraints, namely, prepower-determination algorithms, postpower-determination algorithms,
and hybrid algorithms. The performance of our algorithms are analyzed and compared with optimal schedules analytically. Such
analysis has not been conducted in the literature for any algorithm. Therefore, our investigation in this paper makes initial contribution
to analytical performance study of heuristic power allocation and scheduling algorithms for precedence constrained sequential tasks.
Our extensive simulation data demonstrate that for wide task graphs, the performance ratios of all our heuristic algorithms approach
one as the number of tasks increases.
Index Terms—Energy consumption, list scheduling, performance analysis, power-aware scheduling, precedence constraint,
simulation, task scheduling
Ç
1 INTRODUCTION
PERFORMANCE-DRIVEN computer development has lastedfor over six decades. Computers have been developed to
achieve higher performance. While performance/hardware-cost has increased dramatically, power consumption incomputer systems has also increased according to Moore’slaw [3]. To achieve higher computing performance perprocessor, microprocessor manufacturers have doubled thepower density at an exponential speed over decades, whichwill soon reach that of a nuclear reactor [37]. Such increasedenergy consumption causes severe economic, ecological,and technical problems. Power conservation is critical inmany computation and communication environments andhas attracted extensive research activities. Reducing pro-cessor energy consumption has been an important andpressing research issue in recent years. There has beenincreasing interest and importance in developing highperformance and energy-efficient computing systems. Thereexists a large body of literature on power-aware computingand communication. The reader is referred to [5], [9], [36],[37] for comprehensive surveys.
There are two approaches to reducing power consump-tion in computing systems. The first approach is the methodof thermal-aware hardware design, which can be carried out
at various levels, including device-level power reduction,circuit and logic-level techniques, architecture-level powerreduction (low-power processor architecture adaptations,low-power memories and memory hierarchies, and low-power interconnects). Low-power consumption and high-system reliability, availability, and usability are mainconcerns of modern high-performance computing systemdevelopment. In addition to the traditional performancemeasure using FLOPS, the Green500 list uses FLOPS perWatt to rank the performance of computing systems, so thatthe awareness of other performance metrics such as energyefficiency and system reliability can be raised [4]. The secondapproach to reducing energy consumption in computingsystems is the method of power-aware software design atvarious levels, including operating system-level powermanagement, compiler-level power management, applica-tion-level power management, cross-layer (from transistorsto applications) adaptations. The power reduction techniquediscussed in this paper belongs to the operating system level.
Software techniques for power reduction are supportedby a mechanism called dynamic voltage scaling [2] (equiva-lently, dynamic frequency scaling, dynamic speed scaling,dynamic power scaling). Dynamic power management atthe operating system level refers to supply voltage and clockfrequency adjustment schemes implemented while tasks arerunning. These energy conservation techniques explore theopportunities for tuning the energy-delay tradeoff [35].Power-aware task scheduling on processors with variablevoltages and speeds has been extensively studied since mid1990s. In a pioneering paper [38], the Weiser et al. first
1668 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
. The author is with the Department of Computer Science, State Universityof New York, New Paltz, New York 12561. E-mail: [email protected].
Manuscript received 31 Oct. 2011; revised 17 Feb. 2012; accepted 22 Apr.2012; published online 30 May 2012.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TCSI-2011-10-0795.Digital Object Identifier no. 10.1109/TC.2012.120.
0018-9340/12/$31.00 � 2012 IEEE Published by the IEEE Computer Society
proposed the approach to energy saving by using fine graincontrol of CPU speed by an operating system scheduler. Themain idea is to monitor CPU idle time and to reduce energyconsumption by reducing clock speed and idle time to aminimum. In a subsequent work [40], Yao et al. analyzedoffline and online algorithms for scheduling tasks witharrival times and deadlines on a uniprocessor computerwith minimum energy consumption. These research havebeen extended in [7], [11], [19], [26], [27], [28], and [41] andinspired substantial further investigation, much of whichfocus on real-time applications, namely, adjusting thesupply voltage and clock frequency to minimize CPUenergy consumption while still meeting the deadlines fortask execution. In [6], [15], [16], [18], [21], [29], [30], [31], [33],[34], [39], [43], [44], [45], and [46] and many other relatedwork, the authors addressed the problem of schedulingindependent or precedence constrained tasks on uniproces-sor or multiprocessor computers where the actual executiontime of a task may be less than the estimated worst caseexecution time. The main issue is energy reduction by slacktime reclamation.
There are two considerations in dealing with the energy-delay tradeoff. On the one hand, in high performancecomputing systems, power-aware design techniques andalgorithms attempt to maximize performance under certainenergy consumption constraints. On the other hand, low-power and energy-efficient design techniques and algo-rithms aim to minimize energy consumption while stillmeeting certain performance goals. In [8], Barnett studiedthe problems of minimizing the expected execution timegiven a hard energy budget and minimizing the expectedenergy expenditure given a hard execution deadline for asingle task with randomized execution requirement. In [10],Bunde considered scheduling jobs with equal requirementson multiprocessors. In [13], the Cho and Melhem studiedthe relationship among parallelization, performance, andenergy consumption, and the problem of minimizingenergy-delay product. In [17] Khan and Ahmad and [20]Lee and Zomaya, attempted joint minimization of energyconsumption and task execution time. In [32], Rusu et al.investigated the problem of system value maximizationsubject to both time and energy constraints.
In [22], [23], and [25], we addressed energy and timeconstrained power allocation and task scheduling onmultiprocessor computers with dynamically variable vol-tage and frequency and speed and power as combinatorialoptimization problems. In particular, we defined theproblem of minimizing schedule length with energyconsumption constraint and the problem of minimizingenergy consumption with schedule length constraint onmultiprocessor computers. The first problem has applica-tions in general multiprocessor and multicore processorcomputing systems where energy consumption is animportant concern and in mobile computers where energyconservation is a main concern. The second problem hasapplications in real-time multiprocessing systems andenvironments such as parallel signal processing, automatedtarget recognition, and real-time MPEG encoding, wheretiming constraint is a major requirement. Our schedulingproblems are defined such that the energy-delay product isoptimized by fixing one factor and minimizing the other. In
[22], and [25], we studied the problems of schedulingindependent sequential tasks. In [23], and [24], we studiedthe problems of scheduling independent parallel tasks.
In this paper, we investigate scheduling sequential taskswith precedence constraints on multiprocessor computerswith dynamically variable voltage and speed as combina-torial optimization problems. Our scheduling problemscontain three nontrivial subproblems, namely, precedenceconstraining, task scheduling, and power supplying.
. Precedence Constraining. Precedence constraints makedesign and analysis of heuristic scheduling algo-rithms more difficult.
. Task Scheduling. It is NP-hard even schedulingindependent sequential tasks without precedenceconstraint.
. Power Supplying. Tasks should be supplied withappropriate powers and execution speeds, such thatthe schedule length is minimized by consuminggiven amount of energy or the energy consumed isminimized without missing a given deadline.
Each subproblem should be solved efficiently so thatheuristic algorithms with overall good performance can bedeveloped.
There are naturally three types of power-aware taskscheduling algorithms, depending on the order of powersupplying and task scheduling.
. Prepower-Determination Algorithms. In this type ofalgorithms, we first determine power supplies andthen schedule the tasks.
. Postpower-Determination Algorithms. In this type ofalgorithms, we first schedule the tasks and thendetermine power supplies.
. Hybrid Algorithms. In this type of algorithms, schedul-ing tasks and determining power supplies areinterleaved among different stages of an algorithm.
We will propose heuristic power allocation and schedulingalgorithms of the above three types for precedence con-strained sequential tasks with energy and time constraints.We will also analyze their performance and demonstratesimulation results. Notice that we compare the performanceof our algorithms with optimal schedules analytically. Suchanalysis has not been conducted in the literature for anyalgorithm. Therefore, our investigation in this paper makesinitial contribution to analytical performance study ofheuristic power allocation and scheduling algorithms forprecedence constrained sequential tasks.
The rest of the paper is organized as follows: In Section 2,we provide background information, including the powerconsumption model, definitions of our problems, lowerbounds for optimal solutions, and performance measures. InSection 3, we propose and analyze prepower-determinationalgorithms. In Section 4, we propose and analyze postpower-determination algorithms. In Section 5, we propose andanalyze hybrid algorithms. In Section 6, we present simula-tion data and show that for wide task graphs, theperformance ratios of all our heuristic algorithms approachone as the number of tasks increases. We conclude the paperin Section 7.
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1669
2 PRELIMINARIES
Power dissipation and circuit delay in digital CMOScircuits can be accurately modeled by simple equations,even for complex microprocessor circuits. CMOS circuitshave dynamic, static, and short-circuit power dissipation;however, the dominant component in a well designedcircuit is dynamic power consumption p (i.e., the switchingcomponent of power), which is approximately p ¼ aCV 2f ,where a is an activity factor, C is the loading capacitance, Vis the supply voltage, and f is the clock frequency [12].Since s / f , where s is the processor speed, and f / V �
with 0 < � � 1 [42], which implies that V / f1=�, we knowthat power consumption is p / f� and p / s�, where� ¼ 1þ 2=� � 3.
It is clear that due to precedence constraints, a processormay be idle between execution of tasks. It is well knownthat an idle processor also consumes certain power, whichincludes static power dissipation, short circuit powerdissipation, and other leakage and wasted power [1]. Inthis paper, we do not include this part of the powerconsumption into consideration. We are only interested inmanaging the switching power consumption by dynamicvoltage/frequency/speed/power scaling. Therefore, bypower/energy consumption we mean dynamic power/energy consumption. We assume that a processor consumesno dynamic energy when it is idle.
Assume that we are given n precedence constrainedsequential tasks to be executed on m identical processors.Let ri represent the execution requirement (i.e., the numberof CPU cycles or the number of instructions) of task i,where 1 � i � n. We use pi to represent the power allocatedto execute task i. For ease of discussion, we will assume thatpi is simply s�i , where si ¼ p1=�
i is the execution speed oftask i. The execution time of task i is ti ¼ ri=si ¼ ri=p1=�
i .The energy consumed to execute task i is ei ¼ piti ¼rip
1�1=�i ¼ ris��1
i .Given task execution requirements r1; r2,. . . ; rn, the
problem of minimizing schedule length with energy consump-tion constraint E on a multiprocessor computer withm processors is to find the power supplies p1; p2; . . . ; pnand a nonpreemptive schedule of the n tasks on them processors such that the schedule length is minimizedand the total energy consumed does not exceed E.
Given task execution requirements r1; r2; . . . ; rn, theproblem of minimizing energy consumption with schedulelength constraint T on a multiprocessor computer withm processors is to find the power supplies p1; p2; . . . ; pn anda nonpreemptive schedule of the n tasks on them processors such that the total energy consumption isminimized and the schedule length does not exceed T .
Notice that we assume that the m processors are tightlycoupled with certain highly efficient communication me-chanisms such as shared memory. Therefore, we will notinclude the communication times among the tasks into ourproblems. This is because the data produced by a prede-cessor can be readily accessible by a successor without anydelay. In other words, we focus on computation times andprocessor energy consumption, not communication timesand network energy consumption. This is different from [47]that addresses task scheduling on clusters which contain
significant communication costs. Our computing modelis the class of multiprocessor computers with negligiblecommunication costs.
To compare the solutions of our algorithms with optimalsolutions, we need lower bounds for the optimal solutions.Let R ¼ r1 þ r2 þ � � � þ rn be the total execution requirementof the n tasks.
The following theorem gives a lower bound for theoptimal schedule length T � for the problem of minimizingschedule length with energy consumption constraint whentasks are independent [22], and is certainly applicable totasks with precedence constraints.
Theorem 1. For the problem of minimizing schedule length withenergy consumption constraint on a multiprocessor computer,we have the following lower bound,
T � � m
E
R
m
� ��� �1=ð��1Þ;
for the optimal schedule length T �.
Notice that the above lower bound can be achieved onlywhen the workload R can be evenly distributed among them processors.
The following theorem gives a lower bound for theminimum energy consumption E� for the problem ofminimizing energy consumption with schedule lengthconstraint when tasks are independent [22], and is certainlyapplicable to tasks with precedence constraints.
Theorem 2. For the problem of minimizing energy consumptionwith schedule length constraint on a multiprocessor computer,we have the following lower bound,
E� � m R
m
� �� 1
T��1;
for the minimum energy consumption E�.
Again, the above lower bound can be achieved onlywhen the workload R can be evenly distributed among them processors.
We define the performance ratio as � ¼ T=T � and theasymptotic performance ratio as �1 ¼ limR=r�!1 � (by fixingm) for heuristic algorithms that solve the problem ofminimizing schedule length with energy consumptionconstraint on a multiprocessor computer, where T is thelength of the schedule produced by an algorithm, andr� ¼ maxðr1; r2; . . . ; rnÞ is the maximum task executionrequirement.
We define the performance ratio as � ¼ E=E� and theasymptotic performance ratio as �1 ¼ limR=r�!1 � (by fixingm)for heuristic algorithms that solve the problem of minimiz-ing energy consumption with schedule length constraint ona multiprocessor computer, whereE is the amount of energyconsumed by an algorithm, and r� ¼ maxðr1; r2; . . . ; rnÞ isthe maximum task execution requirement.
An algorithm is asymptotically optimal if the asymptoticperformance ratio is �1 ¼ 1.
Notice that the condition R=r� ! 1 can be satisfiedtypically when the ri’s are bounded from above and thenumber of tasks increases. In this case, each individual ribecomes smaller and smaller when compared with R.
1670 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
3 PREPOWER-DETERMINATION ALGORITHMS
The prepower-determination algorithms in this section arecalled ES-A, where ES stands for equal-speed, and A is a listscheduling algorithm. The strategies to solve the threesubproblems are described as follows:
. Power Supplying. All tasks are supplied with thesame power and executed with the same speed(hence the name equal speed).
. Task Scheduling. The list scheduling algorithms areused to schedule the tasks.
. Precedence Constraining. Precedence constraints aredealt with by a list scheduling algorithm during taskscheduling.
The class of list scheduling algorithms were originallydesigned for scheduling tasks with precedence constraints[14]. Given a list of precedence constrained tasks, a listscheduling algorithm works as follows: as soon as a processoris available, the first ready task whose predecessors have allbeen completed is scheduled on that processor and removedfrom the list. This process repeats until all tasks in the list arefinished. There are different strategies in the initial orderingof the tasks. We mention three of them here.
. List Scheduling (LS). The tasks are arranged in arandom order.
. Largest Requirement First (LRF). The tasks are ar-ranged such that r1 � r2 � � � � � rn.
. Smallest Requirement First (SRF). The tasks arearranged such that r1 � r2 � � � � � rn.
In this paper, we consider A 2 fLS;LRF; SRFg. Hence, wehave three prepower-determination algorithms, namely,ES-LS, ES-LRF, and ES-SRF.
3.1 Minimizing Schedule Length
Let Aðt1; t2; . . . ; tnÞ represent the length of the scheduleproduced by algorithmA for n precedence constrained taskswith execution times t1; t2; . . . ; tn, whereA is a list schedulingalgorithm. The following theorem gives the performanceratio of algorithm ES-A to solve the problem of minimizingschedule length with energy consumption constraint.
Theorem 3. By using algorithm ES-A to solve the problem ofminimizing schedule length with energy consumption con-straint on a multiprocessor computer, the schedule length is
T ¼ Aðr1; r2; :::; rnÞR
E
� �1=ð��1Þ;
whereA is a list scheduling algorithm. The performance ratio is
� � Aðr1; r2; . . . ; rnÞR=m
:
For independent tasks, as R=r� ! 1, the asymptotic perfor-mance ratio is �1 ¼ 1.
Proof. To solve the problem of minimizing schedule lengthwith energy consumption constraintE by using algorithmES-A, we have p1 ¼ p2 ¼ � � � ¼ pn ¼ p and s1 ¼ s2 ¼ � � �¼ sn ¼ s. Based on the fact that
E ¼ r1p1�1=� þ r2p
1�1=� þ � � � þ rnp1�1=� ¼ Rp1�1=�;
we get p ¼ ðE=RÞ�=ð��1Þ, and s ¼ p1=� ¼ ðE=RÞ1=ð��1Þ, and
ti ¼ ri=s ¼ riðR=EÞ1=ð��1Þ. We notice that for all x � 0, we
have Aðt1; t2; . . . ; tnÞ ¼ xAðt01; t02; . . . ; t0nÞ, if ti ¼ xt0i for all
1 � i � n. That is, the schedule length is scaled by a factor
ofx if all the task execution times are scaled by a factor ofx.
When the n tasks with execution times t1; t2; . . . ; tn are
scheduled by using a list scheduling algorithm A, we get
the length of the schedule produced by algorithm ES-A as
T ¼ Aðt1; t2; . . . ; tnÞ ¼ Aðr1; r2; . . . ; rnÞR
E
� �1=ð��1Þ:
Thus, by Theorem 1, we get
� ¼ T
T �� Aðr1; r2; . . . ; rnÞ
R=m:
For any list scheduling algorithm A and independent
tasks, we have
Aðr1; r2; . . . ; rnÞ �R
mþ r�:
The asymptotic optimality follows the last two
inequalities. tuThe power supply to each task is simply ðE=RÞ�=ð��1Þ.
3.2 Minimizing Energy Consumption
The following theorem gives the performance ratio of
algorithm ES-A to solve the problem of minimizing energy
consumption with schedule length constraint.
Theorem 4. By using algorithm ES-A to solve the problem of
minimizing energy consumption with schedule length con-
straint on a multiprocessor computer, the energy consumed is
E ¼ Aðr1; r2; . . . ; rnÞT
� ���1
R;
where A is a list scheduling algorithm. The performance
ratio is
� � Aðr1; r2; . . . ; rnÞR=m
� ���1
:
For independent tasks, as R=r� ! 1, the asymptotic perfor-
mance ratio is �1 ¼ 1.
Proof. To solve the problem of minimizing energy
consumption with schedule length constraint T by using
algorithm ES-A, we need E such that
Aðt1; t2; . . . ; tnÞ ¼ Aðr1; r2; . . . ; rnÞR
E
� �1=ð��1Þ¼ T;
which gives rise to
E ¼ Aðr1; r2; . . . ; rnÞT
� ���1
R:
By Theorem 2, we get
� ¼ E
E�� Aðr1; r2; . . . ; rnÞ
R=m
� ���1
:
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1671
The asymptotic optimality for independent tasks followsthe last inequality. tuThe power supply to each task is simply ðE=RÞ�=ð��1Þ ¼
ðAðr1; r2; . . . ; rnÞ=T Þ�.
4 POSTPOWER-DETERMINATION ALGORITHMS
The postpower-determination algorithms in this section arecalled LL-A, where LL stands for level by level, and A is a listscheduling algorithm. In this paper, we assumeA ¼ LS, LRF,SRF. Hence, we have three postpower-determination algo-rithms, namely, LL-LS, LL-LRF, and LL-SRF. The strategies tosolve the three subproblems are described as follows:
. Precedence Constraining. We propose to use level-by-level scheduling algorithms to deal with precedenceconstraints.
. Task Scheduling. Since tasks in the same level areindependent of each other, they can be scheduled byany of the efficient algorithms such as list schedulingalgorithms previously developed for schedulingindependent tasks.
. Power Supplying. We then find the optimal powersupplies for the given schedule. We adopt a two-level energy/time/power allocation scheme for agiven schedule, namely, optimal energy/time allo-cation among levels of tasks (Theorems 6 and 8) andoptimal energy allocation among groups of tasks inthe same level and optimal power supplies to tasksin the same group (Theorems 5 and 7).
The decomposition of scheduling precedence constrainedtasks into scheduling levels of independent tasks makesanalysis of level-by-level scheduling algorithms tractable.
A set of n tasks with precedence constraints can berepresented by a partial order � on the tasks, i.e., for twotasks i and j, if i � j, then task j cannot start its execution untiltask i finishes. It is clear that the n tasks and the partial order� can be represented by a directed task graph, in which,there arenvertices for then tasks and ði; jÞ is an arc if and onlyif i � j. Furthermore, such a task graph must be a directedacyclic graph (dag). An arc ði; jÞ is redundant if there exists ksuch that ði; kÞ and ðk; jÞ are also arcs in the task graph. Weassume that there is no redundant arc in the task graph.
A dag can be decomposed into levels, with v being thenumber of levels. Tasks with no predecessors (calledinitial tasks) constitute level 1. Generally, a task i is inlevel l if the number of nodes on the longest path fromsome initial task to task i is l, where 1 � l � v. Note thatall tasks in the same level are independent of each other,and hence, they can be scheduled by any of the algorithms(e.g., list scheduling algorithms) for scheduling indepen-dent tasks. Algorithm LL-A, standing for level-by-levelscheduling with algorithm A, where A is a list schedulingalgorithm, schedules the n tasks level by level in the orderlevel 1, level 2, . . . , level v. Tasks in level lþ 1 cannot starttheir execution until all tasks in level l are completed. Foreach level l, where 1 � l � v, we use algorithm A togenerate its schedule.
In the following, we analyze the performance ofalgorithm LL-A, and also specify energy/time allocation
among levels of tasks, energy allocation among groups of
tasks in the same level, and power supplies to tasks in the
same group, thus completing the description of our post-
power-determination algorithms.
4.1 Minimizing Schedule Length
Assume that a set of n independent tasks is partitioned into
m groups, such that all the tasks in group k are executed on
processor k, where 1 � k � m. Let Rk denote the total
execution requirement of the tasks in group k. For a given
partition of the n tasks into m groups, we are seeking power
supplies that minimize the schedule length. Let Ek be the
energy consumed by all the tasks in group k. The following
result characterizes the optimal power supplies [22].
Theorem 5. For a given partition R1; R2; . . . ; Rm of the n tasks
into m groups on a multiprocessor computer, the schedule
length is minimized when all tasks in group k are executed
with the same power ðEk=RkÞ�=ð��1Þ, where
Ek ¼R�k
R�1 þR�
2 þ � � � þR�m
� �E;
for all 1 � k � m. The optimal schedule length is
T ¼ R�1 þR�
2 þ � � � þR�m
E
� �1=ð��1Þ;
for the above power supplies.
To use a postpower-determination algorithm to solve the
problem of minimizing schedule length with energy
consumption constraint E, we need to allocate the available
energy E to the v levels. We use E1; E2; . . . ; Ev to represent
an energy allocation to the v levels, where level l consumes
energy El, and E1 þE2 þ � � � þ Ev ¼ E. Let nl be the
number of tasks in level l, and rl;1; rl;2; . . . ; rl;nl be the
execution requirements of the nl tasks in level l, and Rl ¼rl;1 þ rl;2 þ � � � þ rl;nl be the total execution requirement of
tasks in level l, where 1 � l � v. Theorem 6 provides
optimal energy allocation to the v levels for minimizing
schedule length with energy consumption constraint in
scheduling precedence constrained tasks by using schedul-
ing algorithms LL-A, where A is a list scheduling algorithm.
Theorem 6. For a given partition Rl;1; Rl;2; . . . ; Rl;m of the
nl tasks in level l into m groups produced by a list scheduling
algorithm A, where 1 � l � v, and an energy allocation
E1; E2; . . . ; Ev to the v levels, algorithm LL-A produces
schedule length
T ¼Xvl¼1
R�l;1 þR�
l;2 þ � � � þR�l;m
El
� �1=ð��1Þ:
The energy allocation E1; E2; . . . ; Ev which minimizes T is
El ¼S
1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!E;
where Sl ¼ R�l;1 þR�
l;2 þ � � � þR�l;m, for all 1 � l � v, and the
minimized schedule length is
1672 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
T ¼ ðS1=�1 þ S1=�
2 þ � � � þ S1=�v Þ
�=ð��1Þ
E1=ð��1Þ ;
by using the above energy allocation. The performance ratio is
� � 1þmvr�
R
� ��=ð��1Þ;
where r� ¼ maxðr1; r2; . . . ; rnÞ is the maximum task execu-
tion requirement. For a fixed m, the performance ratio � can be
arbitrarily close to 1 as R=ðvr�Þ ! 1.
Proof. By Theorem 5, for a given partition Rl;1; Rl;2; . . . ; Rl;m
of the nl tasks in level l into m groups, the schedule
length Tl for level l is minimized when all tasks in group
k of level l are executed with the same power
ðEl;k=Rl;kÞ�=ð��1Þ, where
El;k ¼R�l;k
R�l;1 þR�
l;2 þ � � � þR�l;m
!El;
for all 1 � l � v and 1 � k � m. The optimal schedule
length is
Tl ¼R�l;1 þR�
l;2 þ � � � þR�l;m
El
� �1=ð��1Þ;
for the above power supplies. Since the level-by-
level scheduling algorithm produces schedule length
T ¼ T1 þ T2 þ � � � þ Tv, we have
T ¼Xvl¼1
R�l;1 þR�
l;2 þ � � � þR�l;m
El
� �1=ð��1Þ:
By the definition of Sl, we obtain
T ¼ S1
E1
� �1=ð��1Þþ S2
E2
� �1=ð��1Þþ � � � þ Sv
Ev
� �1=ð��1Þ:
To minimize T , we use the Lagrange multiplier system
rT ðE1; E2; . . . ; EvÞ ¼ �rF ðE1; E2; . . . ; EvÞ;
where � is the Lagrange multiplier, and F is the
constraint E1 þE2 þ � � � þ Ev �E ¼ 0. Since
@T
@El¼ � @F
@El;
that is,
S1=ð��1Þl � 1
�� 1
� �1
E1=ð��1Þþ1l
¼ �;
for all 1 � l � v, we get
El ¼ S1=�l
1
�ð1� �Þ
� �ð��1Þ=�;
which implies that
E ¼�S
1=�1 þ S1=�
2 þ � � � þ S1=�v
� 1
�ð1� �Þ
� �ð��1Þ=�;
and
El ¼S
1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!E;
for all 1 � l � v. By using the above energy allocation,we have
T ¼Xvl¼1
SlEl
� �1=ð��1Þ
¼Xvl¼1
S1=ð��1Þl
S1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!E
!1=ð��1Þ
¼Xvl¼1
S1=�l
�S
1=�1 þ S1=�
2 þ � � � þ S1=�v
�1=ð��1Þ
E1=ð��1Þ
¼�S
1=�1 þ S1=�
2 þ � � � þ S1=�v
��=ð��1Þ
E1=ð��1Þ :
For any list scheduling algorithm A, we have
Rl;k �Rl
mþ r�;
for all 1 � l � v and 1 � k � m. Therefore,
Sl � mRl
mþ r�
� ��;
and
S1=�l � m1=� Rl
mþ r�
� �;
and
S1=�1 þ S1=�
2 þ � � � þ S1=�v � m1=� R
mþ vr�
� �;
which implies that
T � m1=ð��1Þ R
mþ vr�
� ��=ð��1Þ 1
E1=ð��1Þ :
By Theorem 1, we get
� ¼ T
T �� 1þmvr
�
R
� ��=ð��1Þ:
It is clear that for a fixed m, � can be arbitrarily close to 1as R=ðvr�Þ becomes large. tuTheorems 5 and 6 give the power supply to the tasks in
group k of level l as
El;k
Rl;k
� ��=ð��1Þ¼
R�l;k
R�l;1 þR�
l;2 þ � � � þR�l;m
!
S1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!E
Rl;k
!�=ð��1Þ
;
for all 1 � l � v and 1 � k � m.
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1673
4.2 Minimizing Energy Consumption
The following result gives the optimal power supplies that
minimize energy consumption for a given partition of a set
of n independent tasks into m groups on a multiprocessor
computer [22].
Theorem 7. For a given partition R1; R2,. . . ; Rm of the n tasks
into m groups on a multiprocessor computer, the total energy
consumption is minimized when all tasks in group k are
executed with the same power ðRk=T Þ�, where 1 � k � m.
The minimum energy consumption is
E ¼ R�1 þR�
2 þ � � � þR�m
T��1;
for the above power supplies.
To use a postpower-determination algorithm to solve the
problem of minimizing energy consumption with schedule
length constraint T , we need to allocate the time T to the
v levels. We use T1; T2,. . . ; Tv to represent a time allocation to
the v levels, where tasks in level l are executed within
deadline Tl, and T1 þ T2 þ � � � þ Tv ¼ T . Theorem 8 provides
optimal time allocation to the v levels for minimizing energy
consumption with schedule length constraint in scheduling
precedence constrained tasks by using scheduling algo-
rithms LL-A, where A is a list scheduling algorithm.
Theorem 8. For a given partition Rl;1; Rl;2,. . . ; Rl;m of the
nl tasks in level l into m groups produced by a list
scheduling algorithm A, where 1 � l � v, and a time
allocation T1; T2,. . . ; Tv to the v levels, algorithm LL-A
consumes energy
E ¼Xvl¼1
R�l;1 þR�
l;2 þ � � � þR�l;m
T��1l
� �:
The time allocation T1; T2,. . . ; Tv which minimizes E is
Tl ¼S
1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!T;
where Sl ¼ R�l;1 þR�
l;2 þ � � � þR�l;m, for all 1 � l � v, and the
minimized energy consumption is
E ¼�S
1=�1 þ S1=�
2 þ � � � þ S1=�v
��T��1
;
by using the above time allocation. The performance ratio is
� � 1þmvr�
R
� ��;
where r� ¼ maxðr1; r2; . . . ; rnÞ is the maximum task execu-
tion requirement. For a fixed m, the performance ratio � can be
arbitrarily close to 1 as R=ðvr�Þ ! 1.
Proof. By Theorem 7, for a given partition Rl;1; Rl;2; . . . ; Rl;m
of the nl tasks in level l into m groups, the total energy El
consumed by level l is minimized when all tasks in
group k are executed with the same power ðRl;k=TlÞ�,
where 1 � l � v and 1 � k � m. The minimum energy
consumption is
El ¼R�l;1 þR�
l;2 þ � � � þR�l;m
T��1l
;
for the above power supplies. Since the level-by-level
scheduling algorithm consumes energy E ¼ E1 þ E2 þ� � � þEv, we have
E ¼Xvl¼1
R�l;1 þR�
l;2 þ � � � þR�l;m
T��1l
� �:
By the definition of Sl, we obtain
E ¼ S1
T��11
þ S2
T��12
þ � � � þ SvT��1v
:
To minimize E, we use the Lagrange multiplier system
rEðT1; T2; . . . ; TvÞ ¼ �rF ðT1; T2; . . . ; TvÞ;
where � is the Lagrange multiplier, and F is the
constraint T1 þ T2 þ � � � þ Tv � T ¼ 0. Since
@E
@Tl¼ � @F
@Tl;
that is,
Sl1� �T�l
� �¼ �;
for all 1 � l � v, we get
Tl ¼ S1=�l
1� ��
� �1=�
;
which implies that
T ¼ ðS1=�1 þ S1=�
2 þ � � � þ S1=�v Þ
1� ��
� �1=�
;
and
Tl ¼S
1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!T;
for all 1 � l � v. By using the above time allocation, we
have
E ¼Xvl¼1
SlT��1l
¼Xvl¼1
Sl
S1=�l
S1=�1 þ S1=�
2 þ � � � þ S1=�v
!T
!��1
¼Xvl¼1
S1=�l ðS
1=�1 þ S1=�
2 þ � � � þ S1=�v Þ
��1
T��1
¼ ðS1=�1 þ S1=�
2 þ � � � þ S1=�v Þ
�
T��1:
1674 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
Similar to the proof of Theorem 6, we have
S1=�1 þ S1=�
2 þ � � � þ S1=�v � m1=� R
mþ vr�
� �;
which implies that
E � m R
mþ vr�
� �� 1
T��1:
By Theorem 2, we get
� ¼ E
E�� 1þmvr
�
R
� ��:
It is clear that for a fixed m;� can be arbitrarily close to 1
as R=ðvr�Þ becomes large. tuTheorems 7 and 8 give the power supply to the tasks in
group k of level l as
Rl;k
Tl
� ��¼ Rl;kðS1=�
1 þ S1=�2 þ � � � þ S1=�
v ÞS
1=�l T
!�
;
for all 1 � l � v and 1 � k � m.
5 HYBRID ALGORITHMS
The hybrid algorithms in this section are called LL-ES-A,
where LL stands for level by level, ES stands for equal speed,
and A is a list scheduling algorithm. In this paper, we
consider A 2 fLS;LRF; SRFg. Hence, we have three hybrid
algorithms, namely, LL-ES-LS, LL-ES-LRF, and LL-ES-SRF.
The strategies to solve the three subproblems are described
as follows:
. Precedence Constraining. The level-by-level schedul-ing algorithms are employed to deal with prece-dence constraints.
. Task Scheduling. Independent tasks in the same levelare supplied with the same power and executedwith the same speed and scheduled by using listscheduling algorithms.
. Power Supplying. We then determine optimal en-ergy/time allocation among levels of tasks for thegiven schedule (Theorems 9 and 10).
Notice that power allocation and task scheduling are mixed.
The equal-speed strategy implies prepower-determination
for tasks in the same level. The level-by-level scheduling
method implies postpower-determination for tasks of
different levels.
5.1 Minimizing Schedule Length
Theorem 9 provides optimal energy allocation to the v levels
for minimizing schedule length with energy consumption
constraint in scheduling precedence constrained tasks by
using hybrid scheduling algorithms LL-ES-A, where A is a
list scheduling algorithm.
Theorem 9. For a given energy allocation E1; E2; . . . ; Ev to the
v levels, the scheduling algorithm LL-ES-A, where A is a list
scheduling algorithm, produces schedule length
T ¼ A1R1
E1
� �1=ð��1ÞþA2
R2
E2
� �1=ð��1Þþ � � � þAv
Rv
Ev
� �1=ð��1Þ;
where Al ¼ Aðrl;1; rl;2; . . . ; rl;nlÞ is the length of the schedule
produced by algorithm A for nl tasks with execution times
rl;1; rl;2; . . . ; rl;nl . The en\textbackslash ergy allocation
E1; E2; . . . ; Ev which minimizes T is
El ¼A
1�1=�l R
1=�l
A1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v
!E;
for all 1 � l � v, and the minimized schedule length is
T ¼ ðA1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v Þ
�=ð��1Þ
E1=ð��1Þ ;
by using the above energy allocation. The performance ratio is
� � 1þmvr�
R
� ��=ð��1Þ;
where r� ¼ maxðr1; r2; . . . ; rnÞ is the maximum task execu-
tion requirement. For a fixed m, the performance ratio � can be
arbitrarily close to 1 as R=ðvr�Þ ! 1.
Proof. From Theorem 3, we know that the schedule length
for the nl tasks in level l is
Tl ¼ Aðrl;1; rl;2; . . . ; rl;nlÞRl
El
� �1=ð��1Þ¼ Al
Rl
El
� �1=ð��1Þ;
by using algorithm ES-A, where A is a list scheduling
algorithm. Since the level-by-level scheduling algorithm
produces schedule length T ¼ T1 þ T2 þ � � � þ Tv, we
have
T ¼ A1R1
E1
� �1=ð��1Þ
þA2R2
E2
� �1=ð��1Þþ � � � þAv
Rv
Ev
� �1=ð��1Þ:
To minimize T , we use the Lagrange multiplier system
rT ðE1; E2; . . . ; EvÞ ¼ �rF ðE1; E2; . . . ; EvÞ;
where � is the Lagrange multiplier, and F is the constraint
E1 þ E2 þ � � � þ Ev � E ¼ 0. Since
@T
@El¼ � @F
@El;
that is,
AlR1=ð��1Þl � 1
�� 1
� �1
E1=ð��1Þþ1l
¼ �;
for all 1 � l � v, we get
El ¼ A1�1=�l R
1=�l
1
�ð1� �Þ
� �ð��1Þ=�;
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1675
which implies that
E ¼�A
1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
� 1
�ð1� �Þ
� �ð��1Þ=�;
and
El ¼A
1�1=�l R
1=�l
A1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
!E;
for all 1 � l � v. By using the above energy allocation, we
have
T ¼Xvl¼1
AlRl
El
� �1=ð��1Þ
¼Xvl¼1
AlR1=ð��1Þl
A1�1=�l R
1=�l
A1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
!E
!1=ð��1Þ
¼�A
1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
�1=ð��1Þ
Xvl¼1
A1�1=�l R
1=�l
E1=ð��1Þ
¼�A
1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
��=ð��1Þ
E1=ð��1Þ :
Since
Al �Rl
mþ r�;
we obtain
A1�1=�l R
1=�l � Rl
mþ r�
� �1�1=�
R1=�l
¼ Rl þmr�m
� �1�1=�
R1=�l
� Rl þmr�m
� �1�1=�
ðRl þmr�Þ1=�
¼ Rl þmr�m1�1=�
;
and
A1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v � Rþmvr
�
m1�1=�;
and
T � ðRþmvr�Þ�=ð��1Þ
mE1=ð��1Þ :
By Theorem 1, we get
� ¼ T
T �� 1þmvr
�
R
� ��=ð��1Þ:
It is clear that for a fixed m;� can be arbitrarily close to 1as R=ðvr�Þ becomes large. tuTheorems 3 and 9 give the power supply to the tasks
in level l as
ElRl
� ��=ð��1Þ¼
A1�1=�l R
1=�l
A1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
!E
Rl
!�=ð��1Þ
;
for all 1 � l � v.
5.2 Minimizing Energy Consumption
Theorem 10 provides optimal time allocation to the v levelsfor minimizing energy consumption with schedule lengthconstraint in scheduling precedence constrained tasks byusing hybrid scheduling algorithms LL-ES-A, where A is alist scheduling algorithm.
Theorem 10. For a given time allocation T1; T2,. . . ; Tv to the
v levels, the scheduling algorithm LL-ES-A, where A is a listscheduling algorithm, consumes energy
E ¼ A1
T1
� ���1
R1 þA2
T2
� ���1
R2 þ � � � þAv
Tv
� ���1
Rv;
where Al ¼ Aðrl;1; rl;2; . . . ; rl;nlÞ is the length of the scheduleproduced by algorithm A for nl tasks with execution timesrl;1; rl;2; . . . ; rl;nl . The time allocation T1; T2; . . . ; Tv whichminimizes E is
Tl ¼A
1�1=�l R
1=�l
A1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v
!T;
for all 1 � l � v, and the minimized energy consumption is
E ¼ ðA1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v Þ
�
T��1;
by using the above time allocation. The performance ratio is
� � 1þmvr�
R
� ��;
where r� ¼ maxðr1; r2; . . . ; rnÞ is the maximum task execu-tion requirement. For a fixed m, the performance ratio � can bearbitrarily close to 1 as R=ðvr�Þ ! 1.
Proof. From Theorem 4, we know that the energy consumedby the nl tasks in level l is
El ¼Aðrl;1; rl;2; . . . ; rl;nlÞ
Tl
� ���1
Rl ¼Al
Tl
� ���1
Rl;
by using algorithm ES-A, where A is a list schedulingalgorithm. Since the level-by-level scheduling algorithmconsumes energy E ¼ E1 þ E2 þ � � � þ Ev, we have
E ¼ A1
T1
� ���1
R1 þA2
T2
� ���1
R2 þ � � � þAv
Tv
� ���1
Rv:
1676 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
To minimize E, we use the Lagrange multiplier system
rEðT1; T2; . . . ; TvÞ ¼ �rF ðT1; T2; . . . ; TvÞ;
where � is the Lagrange multiplier, and F is theconstraint T1 þ T2 þ � � � þ Tv � T ¼ 0. Since
@E
@Tl¼ � @F
@Tl;
that is,
A��1l Rl
1� �T�l
� �¼ �;
for all 1 � l � v, we get
Tl ¼ A1�1=�l R
1=�l
1� ��
� �1=�
;
which implies that
T ¼ ðA1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v Þ
1� ��
� �1=�
;
and
Tl ¼A
1�1=�l R
1=�l
A1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v
!T;
for all 1 � l � v. By using the above time allocation, wehave
E ¼Xvl¼1
Al
Tl
� ���1
Rl
¼Xvl¼1
A��1l Rl
A1�1=�l R
1=�l
A1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v
!T
!��1
¼ ðA1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v Þ
��1
T��1 Xvl¼1
A1�1=�l R
1=�l
¼ ðA1�1=�1 R
1=�1 þ � � � þA1�1=�
v R1=�v Þ
�
T��1:
Similar to the proof of Theorem 9, we have
A1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v � Rþmvr
�
m1�1=�;
and
E � ðRþmvr�Þ�
m��1T��1:
By Theorem 2, we get
� ¼ E
E�� 1þmvr
�
R
� ��:
It is clear that for a fixed m;� can be arbitrarily close to 1
as R=ðvr�Þ becomes large. tuTheorems 4 and 10 give the power supply to the tasks
in level l as
Al
Tl
� ��¼
AlðA1�1=�1 R
1=�1 þA1�1=�
2 R1=�2 þ � � � þA1�1=�
v R1=�v Þ
A1�1=�l R
1=�l T
!�
;
for all 1 � l � v.
6 SIMULATION RESULTS
In this section, we present simulation data for several typicaltask graphs in parallel and distributed computing. Thefollowing task graphs are considered in our experiments.
1. Tree-Structured Computations. Many computations aretree-structured, including backtracking search,branch-and-bound computations, game-tree evalua-tion, functional and logical programming, andvarious numeric computations. For simplicity, weconsider CTðb; hÞ, i.e., complete b-ary trees of height h(see Fig. 1 with b ¼ 2 and h ¼ 4). It is easy to see thatthere are v ¼ hþ 1 levels numbered as 0; 1; 2; . . . ; h,and nl ¼ bl for 0 � l � h, and n ¼ ðbhþ1 � 1Þ=ðb� 1Þ.
2. Partitioning Algorithms. A partitioning algorithmPAðb; hÞ represents a divide-and-conquer computa-tion with branching factor b and height (i.e., depth ofrecursion) h (see Fig. 2 with b ¼ 2 and h ¼ 3). The dagof PAðb; hÞ has v ¼ 2hþ 1 levels numbered as0; 1; 2; . . . ; 2h. A partitioning algorithm proceeds inthree stages. In levels 0; 1; . . . ; h� 1, each task isdivided into b subtasks. Then, in level h, subproblems
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1677
Fig. 1. CTðb; hÞ: a complete binary tree with b ¼ 2 and height h ¼ 4.
Fig. 2. PAðb; hÞ: a partitioning algorithm with b ¼ 2 and h ¼ 3.
of small sizes are solved directly. Finally, in levelshþ 1; hþ 2; . . . ; 2h, solutions to subproblems arecombined to form the solution to the originalproblem. Clearly, nl ¼ n2h�l ¼ bl, for all 0 � l �h� 1; nh ¼ bh, and n ¼ ðbhþ1 þ bh � 2Þ=ðb� 1Þ.
3. Linear Algebra Task Graphs. A linear algebra task graphLAðvÞ with v levels (see Fig. 3 where v ¼ 5) has nl ¼v� lþ 1 for l ¼ 1; 2; . . . ; v, and n ¼ vðvþ 1Þ=2.
4. Diamond Dags. A diamond dag DDðdÞ (see Fig. 4with d ¼ 4) contains v ¼ 2d� 1 levels numbered as1; 2; . . . ; 2d� 1. It is clear that nl ¼ n2d�l ¼ l, for all1 � l � d� 1; nd ¼ d, and n ¼ d2.
Since each task graph has at least one parameter, we areactually dealing with classes of task graphs.
We define the normalized schedule length (NSL) as
NSL ¼ T
m
E
R
m
� ��� �1=ð��1Þ ;
where T is the schedule length produced by a heuristicalgorithm. NSL is an upper bound for the performanceratio � ¼ T=T � for the problem of minimizing schedulelength with energy consumption constraint on a multi-processor computer. When the ri’s are random variables,T; T �; �, and NSL all become random variables. It is clearthat for the problem of minimizing schedule length withenergy consumption constraint, we have �� � NSL, i.e., theexpected performance ratio is no larger than the expectednormalized schedule length. (We use �x to represent theexpectation of a random variable x.)
We define the normalized energy consumption (NEC) as
NEC ¼ E
mR
m
� �� 1
T��1
;
where E is the energy consumed by a heuristic algorithm.NEC is an upper bound for the performance ratio � ¼ E=E�for the problem of minimizing energy consumption withschedule length constraint on a multiprocessor computer.For the problem of minimizing energy consumption withschedule length constraint, we have �� � NEC.
Notice that for a given algorithm and a given class of taskgraphs, the expected normalized schedule length NSL andthe expected normalized energy consumption NEC aredetermined by m;n; �, and the probability distribution ofthe ri’s. In our simulations, the number of processors is setas m ¼ 10 and the parameter � is set as 3. For convenience,the ri’s are treated as independent and identically dis-tributed (i.i.d.) continuous random variables uniformlydistributed in ½0; 1Þ.
Tables 1, 2, 3, and 4, show our simulation data of theexpected NSL and the expected NEC for the four classes oftask graphs. For each combination of n and algorithm A 2 fLL-ES-SRF, LL-ES-LS, LL-ES-LRF, LL-SRF, LL-LS, LL-LRF,ES-SRF, ES-LS, ES-LRF g, we generate 1,000 sets of n tasks,produce their schedules by using Algorithm A, calculatetheir NSL (or NEC), and report the average of NSL (orNEC), which is the experimental value of NSL (or NEC).The 99 percent confidence interval of all the data in thesame table is also given.
It is observed that in all cases, NSL and NEC (and �� aswell) quickly approach one as n increases. This is explainedas follows:
A class of task graphs are called wide task graphs if v=n!0 as n!1. It is easily verified that all the four classes oftask graphs are wide task graphs. Since wide task graphsexhibit large parallelism due to increasing number ofindependent tasks as v increases, the performance ofalgorithm ES-A in scheduling precedence constrained tasksis dominated by the performance of algorithm ES-A inscheduling levels of independent tasks, where A is a list
1678 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
Fig. 4. DDðdÞ: a diamond dag with d ¼ 4.
Fig. 3. LAðvÞ: a linear algebra task graph with v ¼ 5.
TABLE 1Simulation Data for Expected (a) NSLon CTð2; hÞ and (b) NEC on CTð2; hÞ
scheduling algorithm. By Theorems 3 and 4, the asymptoticperformance ratio is �1 ¼ 1 as R=r� ! 1.
It is clear that for a class of wide task graphs, a uniformdistribution of the ri’s in ½0; 1Þ, and any small � > 0, thereexists sufficiently large n�, such that ðvr�Þ=R � � with highprobability (w.h.p.) if n � n�. By Theorems 6 and 9, theperformance ratio is � � ð1þm�Þ�=ð��1Þ w.h.p. for theproblem of minimizing schedule length with energyconsumption constraint, for all LL-A and LL-ES-A, whereA is a list scheduling algorithm. By Theorems 8 and 10, the
performance ratio is � � ð1þm�Þ� w.h.p. for the problem ofminimizing energy consumption with schedule lengthconstraint, for all LL-A and LL-ES-A, where A is a listscheduling algorithm.
It is also observed that the postpower-determinationalgorithm LL-LRF performs better than all other algorithms.
7 CONCLUDING REMARKS
We have addressed power-aware scheduling of sequentialtasks with precedence constraints on multiprocessor com-puters with dynamically variable voltage and speed ascombinatorial optimization problems. We have investigatedtwo problems, namely, the problem of minimizing schedulelength with energy consumption constraint and the problemof minimizing energy consumption with schedule lengthconstraint. We identified three nontrivial subproblems inour scheduling problems, i.e., precedence constraining, taskscheduling, and power supplying. Such decomposition ofour optimization problems into three subproblems makesdesign and analysis of heuristic algorithms tractable. Wehave proposed three types of heuristic power allocation andscheduling algorithms for precedence constrained sequen-tial tasks with energy and time constraints, namely, pre-power-determination algorithms, postpower-determinationalgorithms, and hybrid algorithms. We have analyzed theperformance of our algorithms and compared their solutionswith optimal schedules analytically. We also presentedextensive simulation data and demonstrated that for widetask graphs, the performance ratios of all our heuristicalgorithms approach one as the number of tasks increases.Our algorithms are applicable to energy-efficient taskscheduling in general multiprocessor and multicore proces-sor computing systems and large scale parallel computingsystems, real-time multiprocessing systems and environ-ments, and mobile computing and communication environ-ments.
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1679
TABLE 3Simulation Data for Expected (a) NSL
on LA(v) and (b) NEC on LA(v)
TABLE 4Simulation Data for Expected (a) NSL
on DDðdÞ and (b) NEC on DDðdÞ
TABLE 2Simulation Data for Expected (a) NSLon PAð2; hÞ and (b) NEC on PAð2; hÞ
We would like to mention the following further research
directions. 1) The lower bounds in Theorems 1 and 2 should
be refined for precedence constrained tasks. 2) We con-
jecture that for any class of wide task graphs, any
probability distribution of the ri’s with finite mean and
variance, and any small � > 0, there exists sufficiently large
n�, such that Aðr1; r2; . . . ; rnÞ=ðR=mÞ � 1þ � w.h.p. if
n � n�, where A is a list scheduling algorithm, and hence,
the performance ratio is � � 1þ � w.h.p. for the problem of
minimizing schedule length with energy consumption
constraint and � � ð1þ �Þ��1 w.h.p. for the problem of
minimizing energy consumption with schedule length
constraint, for all ES-A, where A is a list scheduling
algorithm. A proof of the above conjecture would be of
great interest. 3) We conjecture that for any class of wide
task graphs, any probability distribution of the ri’s with
finite mean and variance, and any small � > 0, there exists
sufficiently large n�, such that ðvr�Þ=R � � w.h.p. if n � n�,and hence, the performance ratio is � � ð1þm�Þ�=ð��1Þ
w.h.p. for the problem of minimizing schedule length with
energy consumption constraint and � � ð1þm�Þ� w.h.p.
for the problem of minimizing energy consumption with
schedule length constraint, for all LL-A and LL-ES-A, where
A is a list scheduling algorithm. A proof of the above
conjecture would be of great interest.
ACKNOWLEDGMENTS
Thanks are due to the reviewers for their comments. Apreliminary version of the paper was presented on theseventh Workshop on High-Performance, Power-AwareComputing, Anchorage, Alaska, May 16-20, 2011.
REFERENCES
[1] http://en.wikipedia.org/wiki/CMOS, 2012.[2] http://en.wikipedia.org/wiki/Dynamic_voltage_scaling, 2012.[3] http://en.wikipedia.org/wiki/Moore’s_law, 2012.[4] http://www.green500.org/, 2012.[5] S. Albers, “Energy-Efficient Algorithms,” Comm. of the ACM,
vol. 53, no. 5, pp. 86-96, 2010.[6] H. Aydin, R. Melhem, D. Mosse, and P. Mejıa-Alvarez, “Power-
Aware Scheduling for Periodic Real-Time Tasks,” IEEE Trans.Computers, vol. 53, no. 5, pp. 584-600, May 2004.
[7] N. Bansal, T. Kimbrel, and K. Pruhs, “Dynamic Speed Scaling toManage Energy and Temperature,” Proc. IEEE 45th Symp.Foundation of Computer Science, pp. 520-529, 2004.
[8] J.A. Barnett, “Dynamic Task-Level Voltage Scheduling Optimiza-tions,” IEEE Trans. Computers, vol. 54, no. 5, pp. 508-520, May 2005.
[9] L. Benini, A. Bogliolo, and G. De Micheli, “A Survey of DesignTechniques for System-Level Dynamic Power Management,”IEEE Trans. Very Large Scale Integration Systems, vol. 8, no. 3,pp. 299-316, June 2000.
[10] D.P. Bunde, “Power-Aware Scheduling for Makespan and Flow,”Proc. 18th ACM Symp. Parallelism in Algorithms and Architectures,pp. 190-196, 2006.
[11] H.-L. Chan, W.-T. Chan, T.-W. Lam, L.-K. Lee, K.-S. Mak, andP.W.H. Wong, “Energy Efficient Online Deadline Scheduling,”Proc. 18th ACM-SIAM Symp. Discrete Algorithms, pp. 795-804, 2007.
[12] A.P. Chandrakasan, S. Sheng, and R.W. Brodersen, “Low-PowerCMOS Digital Design,” IEEE J. Solid-State Circuits, vol. 27, no. 4,pp. 473-484, Apr. 1992.
[13] S. Cho and R.G. Melhem, “On the Interplay of Parallelization,Program Performance, and Energy Consumption,” IEEE Trans.Parallel and Distributed Systems, vol. 21, no. 3, pp. 342-353, Mar. 2010.
[14] R.L. Graham, “Bounds on Multiprocessing Timing Anomalies,”SIAM J. Applied Math., vol. 2, pp. 416-429, 1969.
[15] I. Hong, D. Kirovski, G. Qu, M. Potkonjak, and M.B. Srivastava,“Power Optimization of Variable-Voltage Core-Based Systems,”IEEE Trans. Computer-Aided Design of Integrated Circuits andSystems, vol. 18, no. 12, pp. 1702-1714, Dec. 1999.
[16] C. Im, S. Ha, and H. Kim, “Dynamic Voltage Scheduling withBuffers in Low-Power Multimedia Applications,” ACM Trans.Embedded Computing Systems, vol. 3, no. 4, pp. 686-705, 2004.
[17] S.U. Khan and I. Ahmad, “A Cooperative Game TheoreticalTechnique for Joint Optimization of Energy Consumption andResponse Time in Computational Grids,” IEEE Trans. Parallel andDistributed Systems, vol. 20, no. 3, pp. 346-360, Mar. 2009.
[18] C.M. Krishna and Y.-H. Lee, “Voltage-Clock-Scaling AdaptiveScheduling Techniques for Low Power in Hard Real-TimeSystems,” IEEE Trans. Computers, vol. 52, no. 12, pp. 1586-1593,Dec. 2003.
[19] W.-C. Kwon and T. Kim, “Optimal Voltage Allocation Techniquesfor Dynamically Variable Voltage Processors,” ACM Trans.Embedded Computing Systems, vol. 4, no. 1, pp. 211-230, 2005.
[20] Y.C. Lee and A.Y. Zomaya, “Energy Conscious Scheduling forDistributed Computing Systems Under Different OperatingConditions,” IEEE Trans. Parallel and Distributed Systems, vol. 22,no. 8, pp. 1374-1381, Aug. 2011.
[21] Y.-H. Lee and C.M. Krishna, “Voltage-Clock Scaling for LowEnergy Consumption in Fixed-Priority Real-Time Systems,” Real-Time Systems, vol. 24, no. 3, pp. 303-317, 2003.
[22] K. Li, “Performance Analysis of Power-Aware Task SchedulingAlgorithms on Multiprocessor Computers with Dynamic Voltageand Speed,” IEEE Trans. Parallel and Distributed Systems, vol. 19,no. 11, pp. 1484-1497, Nov. 2008.
[23] K. Li, “Energy Efficient Scheduling of Parallel Tasks on Multi-processor Computers,” J. Supercomputing, vol. 60, no. 2, pp. 223-247, 2012.
[24] K. Li, “Algorithms and Analysis of Energy-Efficient Scheduling ofParallel Tasks,” Handbook of Energy-Aware and Green Computing,I. Ahmad and S. Ranka, eds., vol. 1, ch. 15, pp. 331-360,CRC Press/Taylor & Francis Group, 2012.
[25] K. Li, “Power Allocation and Task Scheduling on MultiprocessorComputers with Energy and Time Constraints,” Energy AwareDistributed Computing Systems, A. Zomaya and Y.-C. Lee, eds.,ch. 1, John Wiley & Sons, July 2012.
[26] M. Li, B.J. Liu, and F.F. Yao, “Min-Energy Voltage Allocation forTree-Structured Tasks,” J. Combinatorial Optimization, vol. 11,pp. 305-319, 2006.
[27] M. Li, A.C. Yao, and F.F. Yao, “Discrete and Continuous Min-Energy Schedules for Variable Voltage Processors,” Proc. Nat’lAcademy of Sciences of USA, vol. 103, no. 11, pp. 3983-3987, 2006.
[28] M. Li and F.F. Yao, “An Efficient Algorithm for ComputingOptimal Discrete Voltage Schedules,” SIAM J. Computing, vol. 35,no. 3, pp. 658-671, 2006.
[29] J.R. Lorch and A.J. Smith, “PACE: A New Approach to DynamicVoltage Scaling,” IEEE Trans. Computers, vol. 53, no. 7, pp. 856-869,July 2004.
[30] R.N. Mahapatra and W. Zhao, “An Energy-Efficient SlackDistribution Technique for Multimode Distributed Real-TimeEmbedded Systems,” IEEE Trans. Parallel and Distributed Systems,vol. 16, no. 7, pp. 650-662, July 2005.
[31] G. Quan and X.S. Hu, “Energy Efficient DVS Schedule for Fixed-Priority Real-Time Systems,” ACM Trans. Embedded ComputingSystems, vol. 6, no. 4, article 29, 2007.
[32] C. Rusu, R. Melhem, and D. Mosse, “Maximizing the SystemValue While Satisfying Time and Energy Constraints,” Proc. IEEE23rd Real-Time Systems Symp., pp. 256-265, 2002.
[33] D. Shin and J. Kim, “Power-Aware Scheduling of ConditionalTask Graphs in Real-Time Multiprocessor Systems,” Proc. Int’lSymp. Low Power Electronics and Design, pp. 408-413, 2003.
[34] D. Shin, J. Kim, and S. Lee, “Intra-Task Voltage Scheduling forLow-Energy Hard Real-Time Applications,” IEEE Design & Test ofComputers, vol. 18, no. 2, pp. 20-30, Mar./Apr. 2001.
[35] M.R. Stan and K. Skadron, “Guest Editors’ Introduction: Power-Aware Computing,” Computer, vol. 36, no. 12, pp. 35-38, Dec. 2003.
[36] O.S. Unsal and I. Koren, “System-Level Power-Aware DesignTechniques in Real-Time Systems,” Proc. IEEE, vol. 91, no. 7,pp. 1055-1069, July 2003.
[37] V. Venkatachalam and M. Franz, “Power Reduction Techniquesfor Microprocessor Systems,” ACM Computing Surveys, vol. 37,no. 3, pp. 195-237, 2005.
1680 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 12, DECEMBER 2012
[38] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling forReduced CPU Energy,” Proc. First USENIX Symp. OperatingSystems Design and Implementation, pp. 13-23, 1994.
[39] P. Yang, C. Wong, P. Marchal, F. Catthoor, D. Desmet, D. Verkest,and R. Lauwereins, “Energy-Aware Runtime Scheduling forEmbedded-Multiprocessor SOCs,” IEEE Design & Test of Compu-ters, vol. 18, no. 5, pp. 46-58, Sept./Oct. 2001.
[40] F. Yao, A. Demers, and S. Shenker, “A Scheduling Model forReduced CPU Energy,” Proc. IEEE 36th Symp. Foundations ofComputer Science, pp. 374-382, 1995.
[41] H.-S. Yun and J. Kim, “On Energy-Optimal Voltage Scheduling forFixed-Priority Hard Real-Time Systems,” ACM Trans. EmbeddedComputing Systems, vol. 2, no. 3, pp. 393-430, 2003.
[42] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “Theoretical andPractical Limits of Dynamic Voltage Scaling,” Proc. 41st DesignAutomation Conf., pp. 868-873, 2004.
[43] X. Zhong and C.-Z. Xu, “Energy-Aware Modeling and Schedulingfor Dynamic Voltage Scaling with Statistical Real-Time Guaran-tee,” IEEE Trans. Computers, vol. 56, no. 3, pp. 358-372, Mar. 2007.
[44] D. Zhu, R. Melhem, and B.R. Childers, “Scheduling with DynamicVoltage/Speed Adjustment Using Slack Reclamation in Multi-processor Real-Time Systems,” IEEE Trans. Parallel and DistributedSystems, vol. 14, no. 7, pp. 686-700, July 2003.
[45] D. Zhu, D. Mosse, and R. Melhem, “Power-Aware Scheduling forAND/OR Graphs in Real-Time Systems,” IEEE Trans. Parallel andDistributed Systems, vol. 15, no. 9, pp. 849-864, Sept. 2004.
[46] J. Zhuo and C. Chakrabarti, “Energy-Efficient Dynamic TaskScheduling Algorithms for DVS Systems,” ACM Trans. EmbeddedComputing Systems, vol. 7, no. 2, article 17, 2008.
[47] Z. Zong, A. Manzanares, X. Ruan, and X. Qin, “EAD and PEBD:Two Energy-Aware Duplication Scheduling Algorithms forParallel Tasks on Homogeneous Clusters,” IEEE Trans. Computers,vol. 60, no. 3, pp. 360-374, Mar. 2011.
Keqin Li is a SUNY distinguished professor ofcomputer science in the State University of NewYork at New Paltz. He is also an intellectualventures endowed visiting chair professor at theNational Laboratory for Information Science andTechnology, Tsinghua University, Beijing, Chi-na. His research interests include design andanalysis of algorithms, parallel and distributedcomputing, and computer networking. He hascontributed extensively to processor allocation
and resource management; design and analysis of sequential/parallel,deterministic/probabilistic, and approximation algorithms; parallel anddistributed computing systems performance analysis, prediction, andevaluation; job scheduling, task dispatching, and load balancing inheterogeneous distributed systems; dynamic tree embedding andrandomized load distribution in static networks; parallel computing usingoptical interconnections; dynamic location management in wirelesscommunication networks; routing and wavelength assignment in opticalnetworks; energy-efficient computing and communication. His currentresearch interests include lifetime maximization in sensor networks, filesharing in peer-to-peer systems, power management and performanceoptimization, and cloud computing. He has published more than 240journal articles, book chapters, and research papers in refereedinternational conference proceedings. He has received several BestPaper Awards for his highest quality work. He has served in variouscapacities for numerous international conferences as general chair,program chair, workshop chair, track chair, and steering/advisory/award/program committee member. Currently, he is on the editorial board ofIEEE Transactions on Parallel and Distributed Systems, IEEE Transac-tions on Computers, Journal of Parallel and Distributed Computing,International Journal of Parallel, Emergent and Distributed Systems,International Journal of High Performance Computing and Networking,and Optimization Letters. He is a senior member of the IEEE and theIEEE Computer Society.
. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.
LI: SCHEDULING PRECEDENCE CONSTRAINED TASKS WITH REDUCED PROCESSOR ENERGY ON MULTIPROCESSOR COMPUTERS 1681