communication-aware task scheduling and voltage selection for...

Information Sciences 181 (2011) 3995–4008

Contents lists available at ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Communication-aware task scheduling and voltage selection for totalenergy minimization in a multiprocessor system using Ant ColonyOptimization

HyunJin Kim, Sungho Kang ⇑Computer Systems and Reliable SOC Lab., Department of Electrical and Electronic Engineering, Yonsei University, 120-749 Seoul, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history:Received 31 December 2009Received in revised form 16 November 2010Accepted 26 April 2011Available online 12 May 2011

Keywords:Task schedulingVoltage selectionAnt Colony OptimizationMultiprocessor systemLow power

0020-0255/$ - see front matter � 2011 Elsevier Incdoi:10.1016/j.ins.2011.04.037

⇑ Corresponding author. Tel.: +822 2123 2775; faE-mail addresses: [email protected] (H. KURL: http://soc.yonsei.ac.kr (H. Kim).

Energy consumption is a key parameter when highly computational tasks should be per-formed in a multiprocessor system. In this case, in order to reduce total energy consump-tion, task scheduling and low-power methodology should be combined in an efficient way.This paper proposes an algorithm for off-line communication-aware task scheduling andvoltage selection using Ant Colony Optimization. The proposed algorithm minimizes totalenergy consumption of an application executing on a homogeneous multiprocessor sys-tem. The artificial agents explore the search space based on stochastic decision-makingusing global heuristic information with total energy consumption and local heuristic infor-mation with interprocessor communication volume. In search space exploration, both volt-age selection and the dependencies between tasks are considered. The pheromone trailsare updated by normalizing the total energy consumption. The pheromone trails representthe global heuristic information in order to utilize all entire energy consumption informa-tion from previous evaluated solutions. Experimental results show that the proposed algo-rithm outperforms traditional communication-aware task scheduling and task schedulingusing genetic algorithms in terms of total energy consumption.

� 2011 Elsevier Inc. All rights reserved.

1. Introduction

Beyond ASICs and single processor systems, a new paradigm for developing integrated circuits has emerged with theintroduction of deep sub-micron (DSM) technology. In addition, multiprocessor systems have been widely adopted to in-crease the parallelism of highly computational tasks. Modern multiprocessor systems are composed of a bundle of process-ing elements (PEs) (computational resources) and their communication links (communication resources). The largeintegration of PEs and increasing task parallelism causes large task computation energy consumption. Moreover, as shownin [9], the trend of shrinking feature size in DSM technology requires highly efficient communication resources for the inter-connections between PEs. In addition, the amount of communication volume in a multiprocessor system increases with thenumber of PEs. The ratio of communication energy consumption to total energy consumption, therefore, becomes higherwith adoptions of highly efficient communication resources. The need to minimize communication energy consumptionin communication links, therefore, is a serious problem that arises from the development of multiprocessor systems [17].Traditionally, task computation energy-saving techniques were studied intensively. In [6], it was shown that high-level, taskcomputation energy-saving techniques implemented during the design process were more effective than low-level

. All rights reserved.

x: +822 313 8053.im), [email protected] (S. Kang).

http://dx.doi.org/10.1016/j.ins.2011.04.037

mailto:[email protected]

mailto:[email protected]

http://soc.yonsei.ac.kr

http://dx.doi.org/10.1016/j.ins.2011.04.037

http://www.sciencedirect.com/science/journal/00200255

http://www.elsevier.com/locate/ins

3996 H. Kim, S. Kang / Information Sciences 181 (2011) 3995–4008

techniques. Voltage selection (VS) and power management (PM) were the main high-level task computation energy-savingtechniques. VS techniques scaled the voltage swing and the clock frequency of each cycle of voltage-variable PEs on the fly[6,21], whereas PM techniques shut down the power of PEs or made the PEs dormant [21,1,12,3]. From previous research, itwas known that VS techniques were more efficient than PM techniques [14].

Several previous works related to energy-saving techniques using VS on multiprocessor systems concentrated on the en-ergy savings for dependent tasks running on multiprocessors. On-line algorithm provided real-time task scheduling duringexecuting tasks [22,20,31]. On the other hand, off-line or static scheduling such as [42,36] can be performed before executingtasks. In the off-line task scheduling, more efficient worst-case energy consumption can be provided, as compared to on-linescheduling. In the off-line task scheduling, to schedule tasks in an application onto a given number of voltage-variable PEs,each task is assigned onto a voltage-variable PE (task assignment), and then these assigned tasks in a PE are ordered in timeslots (task ordering) [42]. The starting and the computational time of each task can be determined by task scheduling and VS.In this case, if tasks are mapped on the same PE, the volume of the communication between PEs is reduced, so that the inter-processor communication energy consumption can be minimized. In order to reduce total energy consumption with taskscheduling and VS, therefore, a communication-aware approach that combines task scheduling and VS efficiently is neces-sary, especially in the context of a multiprocessor system.

This paper describes an off-line task scheduling and VS algorithm that minimizes total energy consumption by reducingtotal interprocessor communication volume and stretches the task computations in the slacks of time-constrained commu-nication tasks using VS. To explore the search space efficiently, we exploit Ant Colony Optimization (ACO) [10,11], a meta-heuristic inspired by the ecological study of communities of ants. In the proposed algorithm, stochastic task mapping isbased on both global and local heuristics. Pheromone trails, which are updated using total energy consumption, serve as glo-bal heuristic information to exploit the energy consumption results obtained from previous evaluated solutions. Using theinterprocessor communication volume, local heuristic information is updated. An extension of Ant System (AS) based on theelitist strategy is adopted to impose strong weights on the global-best solution [36,10]. Moreover, collected global and a pri-ori available informations provide an efficient stochastic decisions-making process in an iterative manner. The values of ASparameters are determined experimentally. Experimental results show that total energy savings are improved on average by5.65% and 7.59%, as compared with traditional communication-aware task scheduling [36] and task scheduling using a ge-netic algorithm, respectively. In addition, compared with task scheduling without pheromone reinforcements, total energyconsumption decreases on average by 4.65%.

This paper is structured as follows. In Section 2, related works and the concept of the ACO are briefly reviewed. Section 3explains several models adopted in this paper. In Section 4, our motivational example is provided. In Section 5, the proposedalgorithm is explained in detail. Experimental results are provided and discussed in Section 6.

2. Related works

Several heuristics for off-line or static task scheduling of an application on multiprocessor systems were studied,where task scheduling was performed before running the application. In [42], Zhang et al. proposed a framework thatintegrated priority-based task scheduling and VS together to minimize energy consumption of dependent tasks with agiven number of voltage-variable PEs. The priority-based task scheduling in [42] avoided putting tasks on unnecessarypaths that caused tighter timing constraints. Another contribution in [42] was the exact algorithm using a LinearProgramming (LP) formulation for VS problems. Interprocessor communication energy consumption, however, was notconsidered in [42]. In [41], Yu and Prasanna formulated the static allocation or mapping problem for independentreal-time tasks as an extension of the generalized assignment problem. The approach in [41] did not consider the depen-dency between tasks and the communication-awareness. Chowdhury and Chakrabarti proposed a task scheduling algo-rithm based on a complex battery model using electrochemistry equations, in which only task computation energy wasminimized [8]. The previous works in [17,15] considered only communication energy consumption. Hu et al. presentedan energy-aware task mapping technique for regular Network-on-Chips (NoCs) [17]. The energy-aware task mappingadopted a branch-and-bound algorithm in order to expand search space and improve solution quality. The energy-awaretask mapping in [17], however, considered only the communication load of on-chip networks. Moreover, the time con-straints on tasks were not considered. Hou et al. presented a method of periodic task scheduling to minimize interpro-cessor communication volume in [15]. Varatkar et al. combined both communication-aware task scheduling and VS tominimize total energy consumption in [36], where the communication ignorance parameter was introduced to selectthe best-fit PE where a task was mapped. The optimal value of the communication ignorance parameter was determinedby sweeping the possible values of the communication ignorance parameter exhaustively, and then the best parametervalue for minimizing total energy consumption was obtained. However, only the interprocessor communications withthe maximum communication volume were considered to select the PE where the task is mapped. Moreover, the valueof the communication ignorance parameter applied to every task in a turn was uniform, not individual to each task;therefore, the parameter-sweeping technique did not explore the search space efficiently. In [39], Watanabe et al. pro-posed a communication-aware task scheduling and VS technique for multiprocessor system-on-chip (MP-SoC) under theconstraints of latency and throughput. The time constraint could be relaxed by pipelining periodic tasks in [39]. Thepipelining, however, could only be applied to periodically scheduled tasks, so that the algorithm in [39] could be limited

H. Kim, S. Kang / Information Sciences 181 (2011) 3995–4008 3997

due to the loss of generality. Kim et al. presented an off-line task scheduling algorithm for MP-SoC in [23]. Because onlyone energy source was assumed to provide power to all multiprocessors, the operation frequency applied to allmultiprocessors was the same at a point of time. In addition, the communication energy consumption was not obviouslyconsidered. Kwok and Ahmad summarized static scheduling algorithms to minimize the makespan (application compu-tation time) in parallel processing [24].

On the other hand, metaheuristics were applied to generalized scheduling problems [24,4,28,37,32,25,38,18,26,7,2,30,33,16,40]. Increasing studies about soft computing techniques or metaheuristics based on Monte Carlo methods, such asgenetic algorithm and ACO, were due to the NP hardness of the combinatorial problems. By combining user heuristics andsearching candidate solutions in an efficient way, the metaheuristics provided solutions for general classes of combinato-rial problems. Different from an iterative search of neighborhood solutions based on local search algorithms, the metaheu-ristics performed a coarse-grained search or global search for candidate solutions. In particular, several works in[38,18,26,7,2,5] applied ACO to task scheduling for multiprocessor systems statically. However, those works focused onlyon performance optimization in order to minimize the completion time of applications. In this case, the objective of thetask scheduling approaches was to allow an application to finish as soon as possible; therefore, these works could not beextended directly into the energy-saving technique for considering interprocessor communication energy. In other words,an iterative search exploration method should be proposed depending on the adopted abstract model and evaluationmethod.

The concept of ACO was introduced by Dorigo et al. as a name for the Ant System (AS), a meta-heuristic search algorithminspired by the ethological study of the behavior of ants [10,11]. In ACO, agents, which are called as ants, crawl over theiravailable paths to explore search space and to find near-optimal solutions. The process of ACO is similar to the behaviorof ants in real ecosystems; the ants indirectly communicate with each other by means of pheromone trails in available paths,where good paths are reinforced by the amount of the updated pheromone. The probability of selecting a path is propor-tional to the amount of pheromones accumulated in the path and the local heuristics. The collective autocatalytic behaviorsincrease the probability that ants can find better solutions. When updating the pheromone trails, there are two main oper-ations: reinforcement and evaporation. The reinforcement of pheromones ensures that frequently selected paths or bettersolutions have better chances of being selected in future iterations, where the probability that ants can find better solutionsgradually increases with reinforcement. The evaporation of pheromones makes the ants explore the different parts of thesearch space efficiently by removing a small portion of global heuristic information gradually. The iterative crawling of antsin the search space is repeated until an end condition is satisfied (e.g., no advances after getting a global-best solution or afterfixed iterations). The reason why ACO could be efficient in static task scheduling is threefold. First, the evaluated results ofenergy consumption can be stored as a form of pheromone in each ant path. In addition, ACO exploits all information relatedto the indirectly accumulated solution information in an iterative manner. Second, ACO is less affected by the initial condi-tion, as compared with a local search or genetic algorithm. Third, local heuristic informations can be combined with distrib-uted global information. Later in this paper, details of the proposed algorithm and effectiveness of applying ACO for off-linetask scheduling and VS are shown.

3. Preliminaries

Before describing the proposed algorithm, application, hardware, and energy models are provided.

3.1. Application model

A communication task graph (CTG) is defined as g(V,E). Each vertex vi, vj 2 V represents task Ti and Tj, respectively.Each edge e(i, j) 2 E denotes control or data dependency between a source vi and a destination vj, where the subscriptsi and j are indexes of tasks Ti and Tj. A vertex vi contains the task computation time at the maximum operation frequency,NCi, and the task-finishing deadline or time constraint, di. Communication volume between tasks Ti and Tj is representedas Cij.

3.2. Hardware model

A multiprocessor system has N homogeneous PEs. The computation of a PE and communication with its target can be pro-cessed in parallel, so that the time overhead due to interprocessor communication is neglected in this paper. Based on themultiprocessor development in [29], each communication event would be reduced by just a handful of processor cycles. Inaddition, several previous works in [42,36] did not consider communication time overhead explicitly. Therefore, it is as-sumed that each PE is fully connected with each other using zero-delay communication networks. Each PE is a voltage-var-iable embedded core; it can change clock speed depending on its voltage mode. Communication energy consumption occursonly in interprocessor communications of the communication networks. For a CTG, if both the source and the target of anedge are mapped on the same PE, there is no interprocessor communication energy consumption. Energy consumptionfor buffering the data in registers or local memory is required.


3.3. Energy model

Total power consumption of a PE is composed of dynamic power consumption due to switching activity and static powerconsumption due to leakage current. Intrinsic power consumption such as short-circuit power consumption is assumed to beneglected. Dynamic power consumption, Pdyn, is given by:

Pdyn ¼ Ceff � V2dd � fclock; ð1Þ

where Ceff, Vdd, and fclock denote normalized switching capacitance, supply voltage, and operation frequency, respectively. Sta-tic power consumption, Psta, is given by:

Psta ¼ Vdd � Ileak; ð2Þ

where Ileak denotes leakage current. Therefore, total energy consumption per one cycle, Ecycle, can be given by:

Ecycle ¼ Ceff � V2dd þðVdd � IleakÞ

fclock; ð3Þ

which means that dynamic energy consumption per cycle is proportional to the square of the supply voltage level. Staticenergy consumption is linearly proportional to the supply voltage and inversely proportional to clock speed. Therefore,VS could minimize dynamic energy consumption by reducing supply voltage level and clock frequency. When a PE doesnot compute any task, only static energy consumption can be considered because the switching activity is negligible andthe ratio of static energy to total energy consumption increases along with the shrinking feature size. Therefore, static energyconsumption when a PE does not compute any task can be considered to calculate total energy consumption.

4. Motivational example

The proposed algorithm minimizes total energy consumption of applications by considering both the communication andthe task computation energy consumptions. In order to understand the motivation for the proposed algorithm, an example isprovided as follows: the CTG illustrated in Fig. 1 consists of several tasks (vertices) and their dependencies (edges). In Fig. 1,the number inside each vertex is the task computation time at the maximum operation frequency. In this case, one clockcycle consumes one time unit. A value c near each edge represents communication volume between the source and targetof the edge. A multiprocessor system with two homogeneous PEs that run in two-step voltage-variable mode is adopted inFig. 2, where the clock cycle at a higher voltage mode (white) is one time unit, and the clock cycle at a lower voltage mode(gray) is two time units. The tasks of the CTG in Fig. 1 are mapped onto two PEs. Figs. 1(a) and 2(a) show the VS-aware taskmapping and its ordered tasks described in [42], respectively. In Fig. 2(a), the tasks are ordered in two PEs as close as pos-sible, while the dependencies between tasks and the deadlines for each task are considered. The VS results of Fig. 2(b) can be

Fig. 1. Example of CTG mapping onto two PEs.

Fig. 2. Example of task scheduling and VS: (a) priority-based scheduling in [42]; (b) VS results of (a); (c) communication and voltage selection-awarescheduling; (d) voltage selection results of (c).


obtained by applying the LP formulation in [42]. The number of tasks at the lower voltage mode is maximized by reducingthe slack as much as possible (the maximum amount of time that a task can be slowed down without violating the timingconstraints). Communication energy consumption, however, is not considered in the example of Fig. 2(b). On the other hand,communication and VS-aware task mapping and their ordered tasks are illustrated in Figs. 1(b) and 2(c), respectively. Thetask schedule in Fig. 2(c) swaps the task mapping between T1 and T2 because the communication volume between T2 andT4 is greater than that between T1 and T4. Interprocessor communication energy consumption between tasks can be reducedif the source and target of an edge are mapped onto the same PE. The mapping of T5 is not changed because the communi-cation volume between T4 and T5 is greater than that between T3 and T5. The unit number of slow task computational time inFig. 2(d) is the same as that in Fig. 2(b). Fig. 2(d) shows how to minimize total energy consumption by reducing interpro-cessor communication volume.

Considering the examples in Fig. 2(c) and (d), it is noted that the minimization of total energy consumption using com-munication and VS-aware task scheduling is a time-consuming job; search space, therefore, can be exponentially increased.For the example illustrated in Fig. 2, there are four solutions to be evaluated by the LP formulation considering all cases oftask mapping. Because it is impossible to explore the search space exhaustively, this example creates a need for a heuristic toexplore search space efficiently by considering interprocessor communication energy.

5. Proposed task scheduling and voltage selection

In the section, the proposed algorithm that combines task scheduling and VS is explained in detail. In addition, the searchspace exploration using AS is explained.

5.1. Outline of the proposed task scheduling and voltage selection

The outline of the proposed algorithm based on AS is shown in Fig. 3; the outer loop updates pheromone trails accordingto the solution quality. The inner loop allows the iterative crawling of ants and task scheduling to obtain feasible solutions. A

Fig. 3. Outline of proposed task scheduling and VS.


feasible solution refers to the scheduled or voltage-scaled tasks that do not violate any timing and resource constraints. First,the target application is transformed into a CTG. For a given CTG, a set of s feasible solutions is obtained by scheduling taskswith ant crawling. During ant crawling, every task is scheduled by repetition of priority-based task selection, stochastic taskmapping, and task ordering for every task, where a priori available heuristic information is obtained in a dynamic fashion. Fora given CTG, every task is scheduled until there are no unscheduled tasks. This scheduling is performed by the repetitions ofpriority-based task selection and stochastic task mapping based on the obtained feasible solutions and energy calculationparameter. The feasible solutions are voltage-scaled by applying the LP formulation in order to obtain voltage-scaled solu-tions. Among the voltage-scaled solutions, the best solution with minimized total energy consumption is selected, and thenthe pheromone trails related to ants’ decisions are updated for the global heuristic information. In the next iteration, antsselect their paths according to the updated amount of pheromones and interprocessor communication volume. The processis repeated until an end condition is reached.

The function of the task scheduling with ant crawling results in a set of m feasible solutions. In the right part of Fig. 3, theprocess of the task scheduling with ant crawling is illustrated. For a given CTG g, every task is scheduled until there are nounscheduled tasks, which is achieved by repetition of priority-based task selection, stochastic task mapping, and task


ordering for every task. In each turn, a priori available heuristic information is obtained in a dynamic fashion. The resultingheuristic information is adopted for stochastic task mapping. In the following subsections, the submodules of the proposedtask scheduling and VS are explained in detail.

5.2. Priority-based task selection

The proposed algorithm adopts a priority rule based on [42,36]. The process of priority-based task selection is as follows:the priority values for all unscheduled tasks are calculated. The unscheduled task with the smallest priority value is selectedin a dynamic fashion because the priority of each task is not fixed until all tasks are scheduled. Priority-based task selectionattempts to avoid unnecessary long paths (dependent sets of ordered tasks). Therefore, the priority-based task selection pro-vides energy-saving opportunities to minimize computation energy consumption. In this case, task computations are con-strained by small deadlines and can be stretched by VS as much as possible. The priority value of a task Ti, PRIi, can beobtained as follows: The latest finish time of task Ti, lfti, is defined as:

lfti ¼ minðdi; lftj � NCij8i; j 2 indexesðVÞ; and 8eði; jÞ 2 EÞ: ð4Þ

As shown in (4), the latest finish time lfti is computed from the deadline of leaf tasks, or assigned directly from the CTG. Thetask ready time ri is defined as the time when the computations of all Ti’s predecessors finish. The PE available time of PEk,apik, denotes the earliest time when the Ti mapped onto PEk can start its computation. Based on notations above, the earlieststarting time of Ti, esi, can be defined as:

esi ¼ maxðri;minðapikj 8PEk 2 PE s:t: k 6 NÞ; ð5Þ

where N and PE denote the number of PEs and a set of total PEs, respectively. The term k denotes the PE index. The priorityvalue of Ti, PRIi, is defined as lfti + esi. The unscheduled task with the smallest priority value is to be scheduled consideringboth the dependency between tasks for the latest finish time and the hardware resource constraint for the earliest startingtime.

5.3. Stochastic task mapping

The ants select their paths according to the amount of pheromone in the ant trails and a priori available information.Interprocessor communication volume is used for a local heuristic when an ant crawls in the paths related to a task. Unre-duced interprocessor communication volume can be computed before applying LP formulation for obtaining VS. The formu-lation of the probability for each path is as follows: Ant_Pathi denotes a set that contains multiple feasible decisions for atask Ti, where a path can be an element of Ant_Pathi. In this case, Ant_Pathi contains the paths related to input edges ofthe task Ti and a path related to VS-aware task mapping. If the path pi for an edge e(h, i), pi(e(h, i)), is selected, tasks Th

and Ti are mapped onto the same PE. When the VS-aware task mapping is selected for a task Ti, which is denoted as pi(VS),the unmapped task with the smallest priority value is mapped on the earliest available PE when the task ready time is earlierthan the PE available time. The normalized communication volume for a path pi, Cnormalized(pi), is defined as:

CnormalizedðpiÞ ¼TrafficavgP

a¼1Cai þ Trafficavgs:t: 8va 2 V and 8eða; iÞ 2 input edges of Ti; ð6Þ

where Trafficavg denotes the average traffic per edge. Average communication volume or traffic per edge is calculated beforethe task scheduling. A term

Pa¼1Cai refers to the unreduced inputted communication volume for Ti when interprocessor

communications between different PEs occur. The normalized communication volume for the path is used as a priori avail-able local heuristic information, which is denoted as gip. In the stochastic task mapping process based on AS described in[10,11], the probability that an ant selects a path p in Ant_Pathi, probip, is given by:

probip ¼sa

ip � gbipP

l¼1ðsail � gb

ilÞs:t: 8l 2 indexesðAnt PathiÞ; ð7Þ

where sip denotes the pheromone trail for a path pi in the Ant_Pathi. Terms a and b denote the parameters that determinerelative influences or reinforcements of pheromone trail and local heuristic information, respectively.

5.4. Task ordering

After selecting the most appropriate PE during stochastic task mapping, the starting time of the mapped task is deter-mined. If the task ready time ri is later than or equal to the PE available time apik, the task Ti is ordered to start from ri; other-wise, the start time of Ti is set as apik. After ordering the task, the task is removed from the list of unscheduled tasks. Then, thetime constraint of the scheduled task is checked to see whether the task scheduling is valid or not. If the task scheduling isnot valid, the task scheduling is restarted from the original CTG. As shown in Fig. 4, the outer loop is repeated to find s fea-sible solutions. The obtained s feasible solutions are used for determining VS for each task.

Fig. 4. Ant_Paths for CTG in Fig. 1.

Fig. 5. Ant crawling for Ant_Paths in Fig. 4.


5.5. Example of task scheduling with ant crawling

Based on the detailed descriptions above, an example of task scheduling with ant crawling is shown. Fig. 4 shows anexample of the Ant_Paths for tasks T4 and T5 in Fig. 1. Each Ant_Path is described using task notation in brackets, whereAnt_Pathi contains the paths related to input edges of the task Ti and a path related to VS-aware task mapping. Fig. 5illustrates two cases of ant crawling over the paths of tasks T4 and T5. In Fig. 5(a), tasks T4 and T5 are mapped on PE1,which means that if the task ready time is later than or the same as the PE available time, the task is mapped on the latestavailable PE at the task ready time. Only tasks T4 and T5 have input edges, so that tasks T4 and T5 can have Ant_Path4 andAnt_Path5. When a task has no input edges, only the VS-aware task scheduling is applied to the task; therefore, Ant_Pathsfor tasks T1, T2, and T3 are not provided. In Fig. 5(a), the ant crawls over p4(e(1,4)) and p4(VS), which are the path related toan edge e(1,4) and the path related to the VS-aware task mapping of T4, respectively. The ant in Fig. 5(b) crawls over thepath for an edge e(2,4), p4(e(2,4)). The ant, however, does not crawl down the path of the VS-aware task mapping p4(VS).As shown in Fig. 5, the ant visits every Ant_Path and selects the paths in the Ant_Path for each task during stochastic taskmapping. After the task scheduling is finished, the information about the selected paths is utilized to update pheromonetrails.

5.6. Total energy calculation

After obtaining feasible solutions from task scheduling, the voltage mode for each task is selected. The feasible solutionsare evaluated by an LP formulation, and then total energy consumption is calculated. Our algorithm adopts the LP


formulation described in [42] to calculate the task computation energy after applying VS. The timing constraint on a CTGg(V,E) is modeled in [42], which is formulated as follows:

TSj � TSi � exi P 0 s:t: 8eði; jÞ 2 E; ð8ÞTSi þ exi 6 di s:t: 8i 2 indexesðVÞ; and ð9Þexi P NCi s:t: 8i 2 indexesðVÞ; ð10Þ

where TSi and exi refer to the start time and execution time of a task Ti after applying VS, respectively. The data dependencybetween tasks and the deadline constraints are formulated in (8) and (9), respectively. From (10), the stretched executiontime of task Ti should be equal to or greater than the execution time of Ti at the maximum frequency. The objective ofthe formulation is to minimize task computation energy, where each task computation is stretched in the slack of time-con-straints as much as possible. The latest finishing time of each task is determined by (4). The dependencies between tasks canbe the time-constraints for building the LP formulations. Then, total energy consumption is calculated by summing up inter-processor communication energy consumption, task computation energy consumption, and static energy consumption. Thevalue of total energy consumption is used to update the ant pheromone trails. If the end condition is met, a feasible solutionwith the minimized energy consumption is provided as the best solution.

5.7. Global best solution finding and pheromone updating

In each iteration of the loop body, pheromones on the paths in the Ant_Path of each task are updated. The updated pher-omones contain all information for the indirectly accumulated previous evaluation results. The proposed algorithm exploitsthe accumulated information in an iterative manner. In the proposed algorithm, an extension of AS using an elitist strategywith the global-best AS is adopted, where the pheromone reinforcements are strongly weighted to expand the search spaceefficiently. In the global-best AS, a gradated reinforcement is applied for the best solution. To limit the amount of the updatedpheromone, the normalized energy consumption is proposed. When an artificial ant antk or agent visits every Ant_Path, thenormalized consumed energy of the feasible solution for antk, Enormalized(antk), is defined as:

EnormalizedðantkÞ ¼EðantkÞP

i¼1EðTiÞ þP

i¼1

Pj¼1Eðeði; jÞÞ s:t: 8eði; jÞ 2 E: ð11Þ

Therefore, Enormalized(antk) in (11) refers to the ratio of the evaluated energy consumption of antk, E(antk), to the summation ofunreduced total task energy consumption at the highest voltage mode and unreduced total communication energy calcu-lated from total communication volume. Before task scheduling using ant crawling, both unreduced total task energy con-sumption and the unreduced total communication energy are available a priori before starting the task scheduling. Thenormalized energy consumption provides both upper and lower bounds of the amount of the updated pheromone. The for-mulation of the pheromone updating in the global-best AS is shown in (12). Given s feasible solutions for a task Ti, the updateof sip in the ant pheromone trail for a path p of Ant_Pathi, is defined as:

sipðt þ 1Þ ¼ ð1� qÞ � sipðtÞ þX

k¼1

DskipðtÞ þx� Dsgb

ip ; ð12Þ

where q, x, and k are the evaporation ratio, the weight for the global-best solution, and the index of an ant that crawls overthe path p, respectively. The values of parameters q and x are predetermined experimentally. The amount of updated pher-omones Dsk

ipðtÞ is calculated by:

DskipðtÞ ¼

1EnormalizedðantkÞ

s:t: 8p 2 Ant Pathi and k 2 indexesðantÞ: ð13Þ

If antk does not crawl over the path p; DskipðtÞ is set as zero. In addition, the amount of updated the pheromone for the

global-best solution, Dsgbip , is determined by:

Dsgbip ¼

1EnormalizedðantgbÞ

s:t: 8p 2 Ant Pathi; ð14Þ

where antgb refers to the ant with the global best results among ants for task Ti. From (13) and (14), the amount of updatedpheromone is inversely proportional to the energy consumption. In other words, good paths with minimized energyconsumption could be stochastically weighted.

6. Experimental results

The proposed algorithm was implemented in the C++ language, the boost graph library [34], and the standard templatelibrary (STL) [19]. The implementation was compiled and evaluated using the GNU GCC and a RedHat Linux workstation. Forcomparison with other energy-aware task scheduling algorithms in terms of total energy consumption, communication-aware task scheduling [36] was implemented. In addition, task scheduling using a genetic algorithm, which was suggestedin [24,25,38], was implemented, too. The genetic algorithm was applied in order to map tasks onto PEs stochastically. In


order to analyze the effectiveness of the proposed stochastic task mapping, task scheduling without pheromone reinforce-ments in (7) was implemented. The application program interface (API) functions of the lpsolve [27] were interfaced with STLcontainers in order to apply the LP formulation described in [42] for the VS of each task.

6.1. Experimental environments

To evaluate the proposed algorithm, three standard task graphs (STGs) [35] modeled from real application programs wereadopted. The STGs were transformed into CTGs with the graphviz dot format [13], so that the transformed STGs were read bythe parser supported in the boost graph library. Three CTGs, named fpppp, robot, and sparse, were extracted from differentthree real applications: the SPEC fpppp, the robot control, and the sparse matrix solver. The fpppp CTG represented a taskgraph for the subroutine of the SPEC benchmark fpppp. The robot CTG was a task graph for Newton–Euler dynamic controlcalculation. The sparse CTG was provided for a random sparse matrix solver of an electronic circuit simulation.

The sparse CTG had high parallelisms for evaulation of a large number of PEs. In the robot CTG, the maximum numberof predecessors for any task was three. On the other hand, the maximum number of predecessors was 81 in fpppp. More-over, the fpppp CTG had large number of tasks more than 300. The original STGs did not contain communication volume orcosts, so that the communication volume between tasks in the CTGs were assumed to be randomly selected between 32,64, 128, and 256 bits. The CTGs used in the experiments had three types of deadlines: tight, normal, and loose. The dead-lines were determined based on the critical path (CP) of the original STGs: the tight with 1.5 times CP, the normal withtwice CP, and the loose with 2.5 times CP of original STGs. The numbers of tasks and edges, and the deadlines of the CTGsare listed in Table 1, where dummy tasks and edges are not included. Energy consumption of each PE was based on thesilicon-measured data of ULTRA926 SoC [21], which adopted four settings of clock speed and energy consumption per cy-cle: (180 MHz,142.64 pJ/cycle), (240 MHz,165.15 pJ/cycle), (300 MHz,201.52 pJ/cycle), and (360 MHz,244.74 pJ/cycle). Theunit of CTG’s deadline was set as the cycle of the fastest clock speed. As shown in [36], register buffer energy and inter-processor communication energy were assumed to be 0.75 pJ/bit and 20 pJ/bit, respectively. Static energy when a PE doesnot compute any task was assumed to be 28.52 pJ/cycle, which was twenty percent of the energy consumption at theslowest clock speed (180 MHz).

6.2. Parameter sensitivity and selection

In order to predetermine parameter values for the proposed algorithm, experiments were performed by varying values ofseveral parameters. The fpppp, which had the largest number of tasks and edges among the three applications, was adoptedto determine the parameter sensitivity, where the number of PEs and the deadline type were set as six and normal, respec-tively. To derive the effect of weight parameters for pheromone trail a and local heuristic information b, evaluations wereperformed with several combinations of 1, 2, and 5. Considering the evaluation results by varying values of a and b, the pro-posed algorithm outperformed other approaches in every case, where it was concluded that the parameter setting ofa = b = 2 would be the most efficient in our implementation. The experiments were performed by changing the evaporationratio q in (12), where we adopted three values of the evaporation ratio: 0.01, 0.05, and 0.1. Based on the evaluation resultsfrom changing the evaporation ratios, no significant changes were necessary (less than 1%). Therefore, the evaporation ratioq was set to 0.05. In addition, the effect of varying the weight parameter value for the global-best solution, x, in (12) wasevaluated by studying four cases, 1, 5, 10, and 20. As a results, the parameter setting of x = 5 could be more efficient than theother three parameter settings by 3%. By changing the number of ants and the iteration number in the proposed algorithm,the effect by varying the number of feasible solutions was evaluated. Along with maintaining same number of feasible solu-tions, we evaluated three pairs of number of ants and iteration number: (5,80), (10,40), and (20,20), where the pair (10,40)outperformed the other two pairs by 2% � 3%. Therefore, the number of ants and the iteration number were set as ten andforty, respectively. In addition, in order to know the effectiveness of the large number of ants, evaluations of a pair (100,40)were performed in our experiments. During task scheduling without pheromone reinforcements, the parameter values forpheromone trail a and local heuristic information b in (7) were set as zero. Other parameter values were based on thoseof the proposed algorithm mentioned above.

For communication-aware task scheduling, the suitable range for the communication ignorance parameter was set asfrom 0.1 to 10, as shown in [36]. For the communication-aware task scheduling, the number of feasible solutions obtainedby sweeping the communication ignorance parameter were set as 400. In addition, in order to evaluate a large number offeasible solutions, the number of feasible solutions was set as 4000.

Table 1Characteristics of CTGs for target applications.

CTG #tasks #edges Deadline (cycles)

Tight Normal Loose

fpppp 334 1196 1593 2124 2655robot 88 130 854 1138 1423sparse 96 128 183 244 305


For task scheduling using the genetic algorithm, parameter values were selected based on the guidelines described in[25]: tournament selection and uniform crossover were adopted. In addition, the probabilities of applying crossover to achromosome and mutation to a gene were set as 80% and 5%, respectively. The initial population was obtained by mappingtasks randomly to paths in order to show how the initial condition in the task scheduling using genetic algorithm affectedthe quality of the best solution, compared with the proposed algorithm. To provide side-by-side comparisons with other ap-proaches, the number of populations and the iteration number in the task scheduling using the genetic algorithm were set asten and forty, respectively. In addition, for the evaluation of a large number of populations, the number of populations andthe iteration number during task scheduling using a genetic algorithm were set as one hundred and forty, respectively.

6.3. Energy evaluation

Based on the predetermined parameters, Table 1 shows the energy evaluations for fpppp, robot, and sparse in terms of totalenergy consumption. In the evaluation results, the number of feasible solutions and the deadline type were set as 400 andnormal, respectively. In Table 2, CA, GA, and No weight denote communication-aware task scheduling, task scheduling usinga genetic algorithm, and task scheduling without pheromone reinforcements. Because of probabilistic task mapping in GA, Noweight, and the proposed algorithm, the three approaches were evaluated by adopting five runs with the same parameters andresource environments. In Table 2, the average total energy consumption and the standard deviation for the evaluations of theproposed algorithm were denoted as Proposed and r. For the proposed algorithm, D, S, and C denoted the dynamic, static, andcommunication energy consumptions. In Table 2, the proposed algorithm outperforms other approaches. The average totalenergy consumption was reduced by 8.34% for fpppp, 4.76% for robot, and 4.50% for sparse, respectively, compared with thecommunication-aware task scheduling in [36]. On average, the proposed algorithm achieved energy saving improvementsof 5.13% over the communication-aware task scheduling, 6.69% over task scheduling with a genetic algorithm, and 4.15% overtask scheduling without pheromone reinforcements. The performance of the task scheduling using a genetic algorithm wasworse than other algorithms. The ACO approach could be faster than the genetic algorithm when finding a near optimal solu-tion for task scheduling and VS for fpppp. In addition, the performance could be greatly influenced by initial populations in thetask scheduling using the genetic algorithm, so that the evaluation results could be degraded in the task scheduling using ge-netic algorithm. Even though the number of PEs increased, total energy consumption could not always decrease in every algo-rithm. This means that static energy consumption and the interprocessor communication volume increased with growinghardware resource demands, as shown in Table 2. Therefore, the threshold numbers of PEs that minimized total energy con-sumption could be obtained.

Fig. 6 illustrates a summary of the averaged energy-saving improvements over other approaches for fpppp, robot, andsparse. For the three types of deadlines and applications, average total energy consumption was reduced by 5.65%, 7.59%,and 4.65% on average, as compared with communication-aware task scheduling, task scheduling using a genetic algorithm,and task scheduling without pheromone reinforcements, respectively. In addition, the results illustrated in Fig. 6 show thatthe proposed algorithm outperformed other approaches. Generally, for the summary of the energy-saving improvementsshown in Fig. 6, energy-saving improvements decreased with the increasing number of PEs in the experiments for the fppppand the robot. The energy-saving improvements of sparse, however, had no significant relationship with the type of deadlineand the number of PEs. Therefore, it was concluded that the energy-saving improvements over other approaches were dee-ply dependent on CTG structures and hardware resources.

For evaluation of a large number of feasible solutions, Table 3 shows the energy evaluations, when the number of feasiblesolutions and the deadline type were 4000 and the normal, respectively. In Table 3, compared with communication-aware taskscheduling in [36], the average total energy consumption was reduced by 8.24% for fpppp, 5.21% for robot, and 7.24% for sparse,respectively. On average, the proposed algorithm achieved energy saving improvements by 5.82% over communication-aware

Table 2Evaluation results for three CTGs with 400 feasible solutions and normal deadline.

CTG #PE CA(nJ) GA(nJ) No weight (nJ) Proposed (nJ) (D,S,C) r (nJ)

fpppp 6 2810 2913 2966 2451(1001,160,1290) 29.97 2847 2914 2971 2587(992,251,1344) 17.98 2777 2975 2793 2690(987,233,1470) 29.6

robot 3 563 568 559 529(401,49,79) 4.54 522 517 501 483(373,39,71) 8.15 484 502 495 466(362,36,68) 3.66 497 506 502 488(361,41,86) 5.4

sparse 14 359 371 367 351(221,83,47) 3.115 350 360 356 349(212,78,59) 1.716 347 359 355 342(204,80,58) 1.817 377 365 356 348(203,83,65) 4.118 372 365 353 350(202,82,66) 0.819 374 370 358 356(200,89,67) 1.2

Table 3Evaluation results for three CTGS with 4000 feasible solutions and Normal deadline.

CTG #PE CA (nJ) GA (nJ) No weight (nJ) Proposed (nJ) r (nJ)

fpppp 6 2801 2870 2777 2424 14.17 2833 2864 2792 2531 15.08 2724 2905 2780 2611 17.7

robot 3 563 559 544 524 3.24 521 503 490 480 3.25 482 485 474 461 1.96 478 493 464 460 0.7

sparse 14 358 367 362 347 3.115 350 360 351 347 1.716 345 352 347 339 1.817 374 351 346 343 4.118 366 351 343 338 0.819 364 352 346 342 1.2

Fig. 6. Summary of energy consumptions in proposed algorithm and other approaches when the number of feasible solutions was 400: (a) energyconsumption in fpppp; (b) energy consumption in robot; (c) energy consumption in sparse.


task scheduling, 6.05% over task scheduling using a genetic algorithm, and 3.48% over task scheduling without pheromonereinforcements. For the proposed algorithm, compared with the experimental results in Table 2, the total energy consumptiondecreased on average by 2.07%. Considering the standard deviations in Tables 2 and 3, the increasing number of feasiblesolutions did not always guarantee better solutions with the minimized total energy consumption; however, the averagedevaluation results showed the probability of obtaining better solutions increased with the number of feasible solutions.


7. Conclusion

This paper proposes an algorithm that provides communication-aware task scheduling and VS using an extension of AS tominimize total energy consumption of applications executing on homogeneous multiprocessor systems. The proposed algo-rithm adopts a new task scheduling approach that consist of priority-based task selection, stochastic task mapping, and taskordering, where the formulations for an ACO approach and the definitions for search space are provided. Considering theresults of energy evaluations, it is concluded that the proposed static task scheduling and VS using the ACO approach is use-ful for improving energy-saving over other approaches in multiprocessor systems.

References

[1] ARM reference book, web site: <http://infocenter.arm.com>.[2] M. Bank, U. Honig, W. Schiffmann, An ACO-based approach for scheduling task graphs with communication costs, Proc. Int. Conf. Parallel Processing,

2005, pp. 623–629.[3] L. Benini, A. Bogliolo, G. De Micheli, A survey of design techniques for system-level dynamic power management, IEEE Trans. VLSI Syst. 8 (3) (2000)

299–316.[4] C. Blum, M. Sampels, An ant colony optimization algorithm for shop scheduling problems, J. Math. Model. Algorithms 3 (3) (2004) 285–308.[5] P.C. Chang, I. Wu, ETAHM: an energy-aware task allocation algorithm for heterogeneous multiprocessor, Proc. Design Automat. Conf., 2008,

pp. 776–779.[6] A. Chandrakasan, R. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers., 1995.[7] C. Chiang, Y. Lee, C. Lee, T. Chou, Ant colony optimisation for task matching and scheduling, IET Proc. Comput. Digital Tech. 153 (6) (2006)

373–380.[8] P. Chowdhury, C. Chakrabarti, Static task-scheduling algorithm for battery-powered DVS systems, IEEE Trans. VLSI Syst. 13 (2) (2005) 226–237.[9] W.J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, 2004.

[10] M. Dorigo, T. Stutzle, Ant Colony Optimization, Bradford Book, 2004.[11] M. Dorigo, T. Stutzle, Ant Colony Optimization Metaheuristic, Handbook of Metaheuristics, Kluwer Academic Publishers., 2003.[12] S. Gary, Low Power Design Methodologies, Kluwer Academic Publishers., 1996.[13] Graphviz, web site: <http://www.graphviz.org>.[14] I. Hong, D. Kirovski, G. Qu, M. Potkonjak, M.B. Srivastava, Power optimization of variable-voltage core-based systems, IEEE Trans. Comput.-Aided

Design 18 (12) (1999) 1702–1714.[15] C.J. Hou, K.G. Shin, Allocation of periodic task modules with precedence and deadline constraints in distributed real-time systems, IEEE Trans. Comput.

46 (12) (1997) 1338–1356.[16] H. Hernández, C. Blum, G. Francès, Ant Colony Optimization for Energy-Efficient Broadcasting in Ad-Hoc Networks, Ant Colony Optimization and

Swarm Intelligence, Springer, 2008.[17] J. Hu, R. Marculescu, Energy-and performance-aware mapping for regular NoC architectures, IEEE Trans. Comput.-Aided Design 24 (4) (2005) 551–562.[18] H. Jin, H. Wang, H. Wang, G. Dai, An ACO-based approach for task assignment and scheduling of multiprocessor control systems, Lect. Notes Comput.

Sci. 3959 (2006) 138–147.[19] N.M. Josuttis, C++ Standard Library, A Tutorial and Reference, Addison Wesley, 1999.[20] M. Kang, D.I. Kang, J. Suh, J. Lee, An energy-efficient real-time scheduling scheme on dual-channel networks, Inform. Sci. 178 (12) (2008) 2553–2563.[21] M. Keating, D. Flynn, R. Aitken, A. Gibbons, K. Shi, Low Power Methodology Manual: For System-on-Chip Design, Springer, 2007.[22] H. Kooti, E. Bozorgzadeh, S. Liao, L. Bao, Transition-aware real-time task scheduling for reconfigurable embedded systems, in: Proc. Design Automat.

Test Europe Conf., 2010, pp. 232–237.[23] N. M Josuttis, C++ Standard Library, A Tutorial and Reference, Addison Wesley, 1999.[24] Y. Kwok, I. Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Comput. Survey 31 (4) (1999)

406–471.[25] D. Levine, Genetic algorithms: a practitioner’s view, INFORMS J. Comput. 9 (1997) 256–259.[26] S. Lo, R. Chen, Y. Huang, C. Wu, Multiprocessor system scheduling with precedence and resource constraints using an enhanced ant colony system,

Expert Systems with Applications 34 (3) (2008) 2071–2081.[27] Lpsolve, web site: <http://sourceforge.net/project/lpsolve>.[28] D. Merkle, M. Middendorf, H. Schmeck, Ant colony optimization for resource-constrained project scheduling, IEEE Trans. Evol. Comput. 6 (4) (2002)

333–346.[29] K. Olukotun, L. Hammond, The future of microprocessors, ACM Queue 3 (7) (2005) 26–29.[30] F.A. Omara, M.M. Arafa, Genetic algorithms for task scheduling problem, J. Parallel Distributed Comput. 70 (1) (2010) 13–22.[31] S.J. Park, K.H. Cho, Real-Time Preemptive Scheduling of Sporadic Tasks based on Supervisory Control of Discrete Event Systems 178 (17) (2008)

3393–3401.[32] C. Reeves, Genetic Algorithms, Handbook of Metaheuristics, Kluwer Academic Publishers., 2003.[33] A. Shah, K. Kotecha, D. Shah, Dynamic scheduling for real-time distributed systems using ant colony optimization, Int. J. Intel. Comput. Cybernet. 3 (2)

(2010) 279–292.[34] G.J. Siek, L.Q. Lee, A. Lumsdaine, The boost graph library user guide and reference manual, Addison-Wesley Professional, 2001., Proc. Design Automat.

Conf., 2002, pp. 183–188.[35] Standard Task Graph, web site: <http://www.kasahara.elec.waseda.ac.jp>.[36] G. Varatkar, R. Marculescu, Communication-aware task scheduling and voltage selection for total systems energy minimization, in: Proc. Int. Conf.

Comput.-Aided Design, 2003, pp. 510–517.[37] G. Wang, W. Gong, R. Kastner, A new approach for task level computational resource bi-partitioning, in: Proc. Int. Conf. Parallel Distributed Comput.

Syst., 2003.[38] G. Wang, B. DeRenzi, R. Kastner, Ant colony optimizations for resource- and timing-constrained operation scheduling, IEEE Trans. Comput.-Aided

Design 26 (6) (2007) 1010–1029.[39] R. Watanabe, M. Kondo, M. Imai, H. Nakamura, T. Nanya, Task scheduling under performance constraints for reducing the energy consumption of the

GALS multi-processor SoC, in: Proc. Design Automat. Test Europe Conf., 2007, pp. 1–6.[40] Y. Wen, H. Xu, J. Yang, A heuristic-based hybrid genetic-variable neighborhood search algorithm for task scheduling in heterogeneous multiprocessor

system, Information Sciences (2010).[41] Y. Yu, V.K. Prasanna, Power-aware resource allocation for independent tasks in heterogeneous real-time systems, in: Proc. Int. Conf. Parallel

Distributed Comput. Syst., 2002, pp. 341–348.[42] Y. Zhang, X.S. Hu, D.Z. Chen, Task scheduling and voltage selection for energy minimization, in: Proc. Design Automat. Conf., 2002, pp. 183–188.

http://infocenter.arm.com

http://www.graphviz.org

http://sourceforge.net/project/lpsolve

http://www.kasahara.elec.waseda.ac.jp


HyunJin Kim received B.S., M.S., and Ph. D degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1997, 1999, and 2010, respectively.From 2002 to 2004, he worked in the R&D center of Samsung ElectroMechaincs as a research engineer. He is currently working as senior engineer inSamsung Electronics. His interests include mixed-signal hardware integration, parallel string matching, reconfigurable computing, interconnection net-work, micro architecture, and compiler.

Sungho Kang received a B.S. degree from Seoul National University, Seoul, Korea, and M.S. and Ph.D. degrees in electrical and computer engineering, fromthe University of Texas at Austin. He was a post-doctorial fellow at UT, Austin, a research scientist at the Schlumberger Laboratory for Computer Science,Schlumberger Inc., and a senior staff engineer at Semiconductor Systems Design Technology, Motorola Inc. Since 1994, he has been a professor in theDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea. His current research interests include VLSI design, VLSI CAD, and VLSItesting and design for testability.

communication-aware task scheduling and voltage selection for...

Documents