04.03 imprecise task schedule optimization [i].pdf

8/10/2019 04.03 Imprecise Task Schedule Optimization [I].pdf

http://slidepdf.com/reader/full/0403-imprecise-task-schedule-optimization-ipdf 1/11



1 Introduction

One of the most important issues in high-level synthesis is to obtain a good schedule so as to reduce the

total computation time of an application when the system is implemented. In order to produce such a

good schedule, the exact knowledge of execution time of each task is normally provided; however, fromtime to time, these values are uncertain. Most previous work assumed that each task is associated with

fixed computation time, e.g., either worst or average case is considered. Since the knowledge about

computation time of a task could be uncertain, these assumptions may mislead and result in producing

a schedule that gives a long execution time after the actual system is built [11]. Assuming the execution

time of a task is a random variable with probability distributions seems to be an interesting approach.

Existing scheduling algorithms could, therefore, be extended to compute the resulting schedule with

probabilistic information. Nevertheless, collecting the probability data is costly and time consuming.

Furthermore, an algorithm to handle probability values is more complex than manipulating imprecise

data in fuzzy arithmetic [7]. In other words, using fuzzy numbers and fuzzy arithmetic which includes

- operations would be a good choice to avoid such complications. In this paper, a task is

associated with a computation time represented by a triangular fuzzy number. Fuzzy arithmetic is

applied to compute a possible schedule length so that the best schedule can be decided. This notion

is, then, incorporated in an efficient scheduling optimization algorithm called rotation scheduling in

which fuzzy arithmetic helps select a schedule position for each task.

Many researchers have applied the fuzzy logic approach to various kinds of scheduling problem. In

compiler optimization, fuzzy set theory has been used to represent an unpredictable real-time event and

imprecise knowledge about variables [8]. Lee and others. applied the fuzzy inference technique to finda feasible real-time schedule where each task satisfies its deadline under resource constraints [15]. In

production management area, fuzzy rules were applied to job shop and shop floor scheduling [17,21].

Kaviani and Vranesic used fuzzy rules to determine the appropriate number of processors for a given

set of tasks and deadlines for real-time systems [13]. Likewise, fuzzy calculus was applied to real-time

scheduling in [19,20]. Soma and others considered the schedule optimization based on fuzzy inference

engine [18]. Their approach, however, do not take into account the fact that an execution time of each

job can be imprecise.

In data-flow analysis, an iterative code segment can be modeled as a cyclic data-flow graph (DFG)

or task graph where a set of nodes represents computation tasks and edges represents dependencies

(precedence relations) between such tasks. Each node is weighted according to its computation time,

i.e., the total time required to execute a task. The dependency distances or delays, between tasks in

different iterations, can be represented by bar lines on the edges. Considerable research has been

conducted in the area of scheduling nodes from directed-acyclic graphs (DAGs), a simpler version

of the cyclic data flow graph. Many heuristics have been proposed, e.g., list scheduling, and graph

2



or equal to zero.

Figure 1(a) illustrates the task graph , representing a set of tasks represented by vertices

and . The set of edges contains , , , , , and

. The computation time of each node is assumed to be crisp-valued. Each node in this graph takes 1

time unit except node which takes two time units to execute. The delays of the edges are ,

and , for . The direct edge from to , conveys that must be executed before

since may need data produced by at the current iteration and so on. An edge from to with

two bar lines (delays) conveys that requires data produced by in the two previous iterations (see

Figure 1(b)). This graph represents only one iteration of the execution pattern. This pattern is repeated

for iterations as shown in Figure 1(b). Figure 1(c) shows a possible legal static schedule for this

graph where the target system is assumed to consist of two homogeneous processing elements. In this

schedule, the total execution time of this graph is 4 time units. Since takes 2 time units to finish,

cannot start the execution until time step 4.

A

DB

C

1 1

12

(a) Task graph

A

DB

CA

DB

C A

DB

C A

DB

C

n Iterations

(b) Repeated execution

Time PE1 PE2

1 A

2 B C

3 B

4 D

(c) Static schedule

Figure 1: Task graph, repeated execution pattern, and its schedule

A computation time for each node in the graph is modeled by a fuzzy number. We assume that

a fuzzy number is normal triangular-shaped as presented in Figure 2(a). Suppose the computation

time of node is assigned as shown in this figure with the confidence interval . The most

possible computation time of node is 4 since its confidence level or presumption level is 1. Similarly,

Figure 2(b) shows the fuzzy number with the confidence interval representing the execution time

of .

42 6 x A

µ(x)

1

(a)

1

3 5 7 By

µ(y)

(b)

1

5 987 10 11 12A+B

z

µ(z)

(c)

Figure 2: Two fuzzy numbers representing computation time of nodes and

4



In order to calculate the addition of two fuzzy numbers, the following equation is used [7]:

(1)

where and denote and operations respectively. Similar idea can also be applied to some

other operations such as subtraction, multiplication, maximum, minimum, etc. Figure 2(c) demon-

strates a graphical result of adding two fuzzy numbers from Figures 2(a) –2(b).

Rotation scheduling

Since the rotation scheduling implicitly retimes a node in a task-graph, we first briefly review the retim-

ing concepts. Intuitively, by using retiming, the placement of the delays in the graph (or dependency

distance between iterations) can be rearranged in such a way that the graph will produce the shorter

total execution time in one iteration and the characteristic of the graph is preserved. That is a shorterstatic schedule can be obtained by retiming “some” nodes. The retiming of a node moves one delay

from all of its incoming edges to all of its outgoing edges. In other words, a number of delays for all

of its incoming edges are decremented by one and one delay is added to every outgoing edges. A node

can legally be retimed (without changing the graph behavior) if all its incoming edges contain at least

one delay1.

A

DB

C

1 1

12

(a) Retime

Time PE1 PE2

1 A

2 B C

3 B

4 D

5 A

6 B C

7 B

8 D

.

.

.

.

.

.

(b) Original

Time PE1 PE2

1 A

2 B C

3 B

4 D

5 A

6 B C

7 B

8 D

9 A

.

.

.

.

.

.

.

.

.

(c) After shifted

Time PE1 PE2

- A

1 B C2 B A3 D

(d) Reposition

Figure 3: Retiming and scheduling

In Figure 1(a), node can be retimed. If is retimed once, . Figure 3(a) shows the

retimed graph after has been retimed. A delay is moved from edge to edges ,

and . Nevertheless, applying retiming to nodes in the graph and producing a static

1One of the properties that makes the retiming approach preserve the graph behavior is that the number of delays in any

cycle in the graph is always constant. Details of retiming and its properties can be found in [16].

5



schedule from the retimed graph sometimes does not yield a good static schedule under resource con-

straints. Rotation scheduling systematically includes the notion of retiming while considering resource

constraints [1]. We adopt the algorithm and extend its principles in order to schedule and optimize an

imprecise task graph.

Consider the example in Figure 1(a). Assume that the target system consists of 2 processors. Fig-

ure 3(b) shows an initial repeated execution pattern obtained from Figure 1(c). The rotation scheduling

algorithm first selects nodes which can legally be retimed, e.g., for this case is node . Retiming such

a node is equivalent to shifting iteration boundary down as shown in Figure 3(c). Node in the first

iteration becomes a prologue instruction while task in the next iteration is moved to the current iter-

ation. A prologue is the set of instructions that must be executed to provide the necessary data before

entering the iterative process. In the new retimed graph 3(a), node has no direct dependency to any

node in the current iteration since it produces data for and in the next iteration. Any available

position is legal for placing node . Figure 3(d) shows the schedule where is assigned after . This

resulting schedule, then, has the total execution time of the graph smaller than the previous one.

3 Algorithms

Once the computation time of a task is a fuzzy number, two issues should be considered: how to

determine the “best” place to schedule a node and how to determine whether a new schedule is better

than the old one. Therefore, the traditional scheduling algorithm has to be extended in order to handle

fuzzy situations. In our implementation, we use the conventional scheduling heuristic which attempts

to place a task at the time slot where a processor is idle, i.e., waiting for data needed to execute the

successive task. In order to do so, a task in a schedule is assigned a time step or control step (cs),

e.g., a legal earliest starting time that the task can begin execution. After that, the available time slot is

calculated by the difference between two successive tasks in the same processor.

As an example, Figure 4(a) diagrams a task graph where the computation time of each node is not

specified. Assume that two processing elements are available. The execution order of this graph can

be presented in Figure 4(b). Since requires data produced by both and , it needs to wait until

finishes in order to start the execution. Hence, PE has an empty time slot. The size of the available

time slot between tasks and needs to be estimated so that the scheduler can decide where to place

a new task that has no data dependency to any other tasks. When there are more than one empty

time slots in a processor, the biggest one will be chosen. Based on the rotation framework mentioned

in the previous section, a task can also be rescheduled when the iteration boundary is shifted. For

simplicity, the computation of each node is defuzzified using the weighted average method. After that,

the defuzzified time step can be assigned to each node in the schedule. Then the size of the free time

6



slot for each processor is computed as usual. Although such a method does not take advantages of

fuzzy nature, it is easy to implement and yields a comparable approximation. Our future work will

include how to compute fuzzy time steps and use fuzzy nature to decide the most likely position to

schedule a task.

B

AD

C

(a) Task graph

PE1 PE2

(b) Schedule

v

u ui2u1

max (u.t)ii

(c) Max. operation

Figure 4: Another task graph, its execution order, and maximum operation

Secondly, we use fuzzy arithmetic in order to compute the size of a schedule, i.e., the total exe-

cution time of the graph. If node depends on more than one parent , e.g., , fuzzy

maximum of all the computation time of its parents are computed ( ) (see Figure 4(c)). Then,

the result is added to the computation time of the node , i.e., , via the fuzzy addition op-

eration. After the overall computation time of the graph is computed, the fuzzy schedule length is used

to determine whether such processor assignments are good. In order to compare the effectiveness, the

defuzzified crisp schedule length is recorded and the best schedule which results in the shortest execu-

tion time is returned. The following presents the overall scheduling optimization algorithm. Procedure

initiates the first processor assignment which is a tuple . is a

function mapping from a node to its processor and is a function mapping from a node to its suc-

cessor in processor . The optimized schedule can be viewed by combining with the resulting task

graph . This procedure is based on the list-scheduling approach [5]. Lines 2 –6 begins the schedule

optimization phase, rotation iterations. It returns a new legal task graph with the same behavior and

the processor assignments. Function selects a legal task to retime and updates the task

graph. Function finds a new legal place for a node according to the task graph and the

Algorithm 3.1 (Optimization)

Input: A fuzzy task graph , # processors,

heuristic to reschedule a node

Output: A processor assignment ,initially, NULL and

1 // create initial schedule assignment

2 for to do

3

4 // down-rotate a legal node from

5

6 if then end do

7 return

7



given scheduling heuristic . Note that during the optimization phase, the algorithm only needs to

find a place for a rescheduled node . After is retimed, may only require data produced by some

nodes but does not give data to any other node. Hence, only can be rescheduled while other nodes

remain at the same position in the schedule. Recall that the scheduling problem is NP-hard [6]. A lot of

computation would be required if all nodes were rescheduled. At each iteration of rotation, Procedurecompares the new schedule to the reference one while the better schedule is saved for the future

reference.

Algorithm 3.2 (Init Schedule)

Input: , # PEs,cur. schedule

Output: A schedule

1 where edges with delays

2 vert ices in with no incoming edges // roots

3 while empty do

4 mark scheduled

5 if NULL

6 then assign at PE 7 else NULL

8 foreach PE PE to PE do

9

10 assign after the previous node at PE

11 if then end do

12

13 foreach do

14

15 if then end do end do

16 return

Now, let us consider the initial scheduling process and schedule comparison in more details. Since

the non-zero delay edges come from previous iteration(s), Algorithm 3.2 first ignores the edges with

delays and constructs a DAG out of the input graph. Then, nodes without incoming edges (or root-

nodes) are scheduled one by one (Line 2). In Lines 8 –12, the algorithm assigns a node, with respect

to its ancestor(s), to every processor and checks which assignment gives the smallest schedule size.

Again, in our algorithm procedure is used to compute and the size of the old and the new

schedule are compared. Function counts the number of parents of . A node can be

scheduled next if all parents of a node are scheduled or . After all nodes are scheduled,

the set of processor assignments are returned.

Algorithm 3.3 considers a sub-graph (DAG) consisting of the scheduled nodes. It updates

by adding a sink node and its zero-delay edges connecting to all nodes with no outgoing edges (or

leaf-nodes). It also adds extra edges corresponding to the successor values in Line 3. Node

is used as a reference to compute the total execution time of the graph. A temporary variable

is initialized to zero. For each set of assignments (Line 4), the total schedule length is calculated by

traversing the graph in topological order [4]. The fuzzy computation time of each node is added to its

temporary variable in Line 9. This variable, actually, keeps the current fuzzy maximum computation

time from roots to all of its parents (Line 11). A node is enqueued if all of its parents have been

8



Algorithm 3.3 (Better)

Input: , schedules

Output: 1 if is better than , 0 otherwise.

1 where unscheduled nodes

2 // joint node

3 PE

4 foreach assignment to do

5 // init. sum. var.

6 vertices in with no incoming edges // roots

7 while empty do

8

9 // fuzzy add. to comp. time

10 foreach do

11

12

13 if then end do end do

14 length end do

15 return l ength l ength

visited (Lines 12 –13). After the last node ( ) is considered, contains the total length of such

an assignment. Both s obtained from both and are compared and the boolean value is

returned in Line 15.

4 Results

We have experimented the algorithms on different benchmarks in digital signal processing areas. The

target systems consist of 3 functional units (2 adders, 1 multiplier) and 4 functional units (2 adders, 2

multipliers). In this experiment, the computation time of adders and multipliers were obtained from [9].

The confidence interval of the execution time of the adder is ns. with the typical value ns.

while the confidence interval of the execution time of the multiplier is ns. with the typical

value ns. The following table presents some numerical results. Columns start. and rot. show the

initial defuzzified schedule length and the schedule length after the optimization is applied respectively.

Column red. shows the percent of reduction after the optimization algorithm is applied which presents

the effectiveness of the proposed method.

Benchmarks # tasks 2 adders, 1 mult. 2 adders, 2 mults.

start. rot. red. start. rot. red.

Diff. Equation 11 158 121 23% 111 78 29%

3-stage direct IIR Filter 12 174 141 19% 111 78 30%

All-pole Lattice Filter 15 220 142 36% 220 142 36%

Volterra Filter 27 519 352 32% 349 225 36%

Elliptic Filter 34 310 281 10% 278 266 4%

All-pole Lattice (uf=2) [1] 45 488 410 16% 478 405 15%

Table 1: Experimental results on 3 functional units and 2 functional units

9



5 Conclusion

Frequently, uncertainty occurs when estimating the computation time of a task which gives dif ficulties

in deriving a good static task schedule for high-level synthesis. In this paper, a triangular fuzzy number

is used to express these uncertainties. This representation is simple and easy to approximate. Fuzzyarithmetic can be applied to compute the overall schedule length and determine the best place to sched-

ule. Such calculation is very effective and time saving when compared with probability computation.

Rotation scheduling algorithm is applied to the initial processor assignment in order to reduce the pos-

sible schedule size while considering resource constraints. The experiments show that the algorithm

can ef ficiently optimize a schedule.

As an initial step of this research, in the scheduling heuristic, fuzzy values are defuzzifed into

crisp values which, however, does not explore the fuzzy nature. We are currently generalizing the

use of fuzzy arithmetic to the schedule heuristic. Formal definitions about fuzzy time step and sched-

ule length are necessary. Methods to decide where to place a task have to be refined. Further, we

are also experimenting other approaches to compare two fuzzy schedule lengths which incorporate

fuzzy distance and confidence intervals. The comparison to the use of probability theory is also under

investigation.

References

[1] L. Chao. Scheduling and Behavioral Transformations for Parallel Systems. PhD thesis, Princeton

University, October 1993.

[2] L. Chao, A. LaPaugh, and E. Sha. Rotation scheduling: A loop pipelining algorithm. In Proceed-

ings of the 30th Design Automation Conference, pages 566 –572, Dallas, TX, June 1993.

[3] L. Chao and E. Sha. Static scheduling for synthesis of DSP algorithms on various models. Journal

of VLSI Signal Processing, pages 207 –223, October 1995.

[4] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Electrical

Engineering and Computer Science Series. McGraw-Hill Book Company, New York, 1990.

[5] El-Revini, Lewis, and Ali. Task scheduling in parallel and distributed systems. Prentice-Hall,

1994.

[6] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-

Completeness. W. H. Freeman and Company, New York, 1979.

[7] K. Gupta. Introduction to Fuzzy Arithmetic. 1 edition, 1984.

10



[8] O. Hammmami. Fuzzy scheduling in compiler optimizations. In Proceedings of the ISUMA-

NAFIPS , 1995.

[9] Texas Instruments. The TTL data book , volume 2. Texas Instruments Incorporation, 1985.

[10] R. A. Kamin, G. B. Adams, and P. K. Dubey. Dynamic list-scheduling with finite resources. InProceedings of the 1994 International Conference on Computer Design, pages 140 –144, Cam-

bridge, MA, October 1994.

[11] I. Karkowski. Architectural synthesis with possibilistic programming. In HICSS-28, January 95.

[12] I. Karkowski and R. H. J. M. Otten. Retiming synchronous circuitry with imprecise delays. In

Proceedings of the 32nd Design Automation Conference, pages 322 –326, San Francisco, CA,

1995.

[13] A. S. Kaviani and Z. G. Vranesic. On scheduling in multiprocess systems using fuzzy logic. InProceedings of the International Symposium on Multiple-valued Logic, pages 141 –147, 1994.

[14] A. A. Khan, C. L. McCreary, and M. S. Jones. A comparison of multiprocessor scheduling heuris-

tics. In Proceedings of the 1994 International Conference on Parallel Processing, volume II,

pages 243 –250, 1994.

[15] J. Lee, A. Tiao, and J. Yen. A fuzzy rule-based approach to real-time scheduling. In Proceedings

of the International Conference on Fuzzy Systems, volume 2, 1994.

[16] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5 –35, 1991.

[17] K. Mertins et al. Set-up scheduling by fuzzy logic. In Proceedings of the International conference

on computer integrated manufactoring and automation technology, pages 345 –350, 1994.

[18] H. Soma, M. Hori, and T. Sogou. Schedule optimization using fuzzy inference. In Proceedings

of the International Conference on Fuzzy Systems, pages 1171 –1176, 1995.

[19] F. Terrier and Z. Chen. Fuzzy calculus applied to real-time scheduling. In Proceedings of the In-

ternational Conference on Fuzzy Systems, pages 1905 –1910, 1994.

[20] F. Terrier, L. Rioux, and Z. Chen. Real time scheduling under uncertainty. In Proceedings of the International Conference on Fuzzy Systems, pages 1177 –1184, 1995.

[21] I.B. Turksen et al. Fuzzy expert system shell for scheduling. SPIE , pages 308 –319, 1993.

[22] L. A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1:3 –28,

1978.

11

04.03 imprecise task schedule optimization [i].pdf

Documents