a direct approach for finding loop transformation matrices

20

Click here to load reader

Upload: hua-lin

Post on 25-Aug-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A direct approach for finding loop transformation matrices

V o l . l l No.3 J. o f C o m p u t . Sci. & T e c h n o l . M a y 1996

A Direct Approach for Finding Loop Transformation Matrices

L I N H u a ($~ -~) a n d L 0 M i ( g ~,)*

Electrical Engineering Department Texas A&M University, College Station, TX 77843, U.S.A.

email: {lin1246,mlu}@ee.tamu.edu

Jesse Z. F A N G ( ~ Z ~ )

Hewlett-Packard Lab., P.O. Box 10490, Palo Alto, CA 94303, U.S.A. email: j [email protected]

Received July, 1995.

Abs t rac t

Loop transformations, such as loop interchange, reversal and skewing, have been unified under linear matrix transformations. A legal transformation matrix is usually generated based upon distance vectors or direction vectors. Unfortunately, for some nested loops, distance vectors may not be computable and direction vectors, on the other hand, may not contain useful information. We propose the use of linear equations or inequalities of distance vectors to approximate data dependcnce. This approach is advantageous since (1) many loops having no constant distance vectors have very simple equations of distance vectors; (2) these equations contain more information than direction vectors do, thus the chance of exploiting potential parallelism is improved.

In general, the equations or inequalities that approximate the data dependence of a given nested loop is not unique, hence classification is discussed for the purpose of loop transformation. Efficient algorithms are developed to generate all kinds of linear equations of distance vectors for a given nested loop. The issue of how to obtain a desired transformation matrix from those equations is also addressed.

1 I n t r o d u c t i o n

Loop transformations, such as loop interchange, reversal and skewing, have been proposed for automatically or manually parallelizing sequential nested loops as well as enhancing data locality in memory hierarchies [1-3]. The success of conducting these transformations relies on the collection of information on data dependence in nested loops. Data dependence is usually represented by either distance vectors or direction vectors. Distance vectors - - which are defined by the difference of loop indices of dependent iteration - - contain sufficient information for optimizing loop transformations; however, for many nested loops computing distance vectors is expensive i f it is not impossible, especially when data dependence cannot be represented by finite number of distance vectors. On the other hand, as a compromise, direction vectors - - which are defined by the direction of distance vectors - - are usually used to approximate distance vectors; however, for many nested loops direction vectors may not contain useful information needed for loop transformation. We use the nested loop in Example 1.1 to illustrate this.

* This research was supported by the Texas Advanced Technology Program under Grant No.999903-165.

Page 2: A direct approach for finding loop transformation matrices

238 3. of Comput. Sci. & Technol. Vol. 11

Example 1.1:

do I1 = 0, n do I2 = O, n

do Ia = 0, n S[I1+ I2 + I3 +1 , I1+ 212 + 3Ia +1] . . . . . . . . h(S[I1 + I2 + I3, I1 + 212 + 3/3])

e n d d o e n d d o

e n d d o

The size of the iteration space of this triple loop is controlled by variable n. We notice that loop-carried data dependence exists when n > 1. Suppose n = 4 for instance. We can find that iteration (0, 2, - 1 ) is flow dependent on iteration (0, 0, 0) since the data written into S[1, 1] in iteration (0, 0, 0) is to be read in iteration (0, 2 , - 1 ) . By subtracting (0, 0, 0) form (0,2,--1), we can get the corresponding distance vector ( 0 , 2 , - 1 ) . There is another data dependence between iteration (0, 4, 0) and iteration (1, 0, 2). This is an anti-dependence because S[4, 8] read in iteration (0, 4, 0) is to be updated in iteration (1, 0, 2). The distance vector is ( 1 , - 4 , 2). Other data dependence can be found without any difficulty. It is further noticed that the number of distinct distance vector increase with n; in fact, it is proportional to the size of the iteration space. Clearly, distancc vectors are not capable to describe the dependence information in this situation. Now let's look at the direction vectors. Direction vectors, which use symbols "< . . . . . . . . =" , > , or their combination, represent the direction of each component of distance vectors. For the above triple loop, (=, <, >) and (<, >, <) are two direction vectors among others - - the former is due to distance (0, 2, - 1 ) and the latter is due to distance (1, - 4 , 2). Obviously, loop 12 cannot be parallelized because of the existence of direction vector (=, <, >) which shows the data dependence carried by loop I2. We like to know if transformations like loop skewing and loop interchange can be applied to such triple loop that the middle loop of the transformed code can be parallelized. Direction vectors in this case will provide a negative answer because of (=, <, >) and (<, >, <). Nevertheless, our dcsired transformation does exist. A further study on this nested loop can show that for any distance vector d = (dl, d2, d3) slope ~ is greater than or equal to - 4 regardless of the value of n. The theory in [2] tells us that the nested loop can be parallelized through transformation

: (510) (i1)001001 The parallelized nested loop is as follows:

do I~ = O, 6n doa l l I~. . . . .

do I~ . . . . s[I~ - 4~. + I~ + 1, 2I~ - o g + 3;r~ + 1] . . . .

. . . . h(S[X~ - 4I~ + g , 2I~ - 0I~ + 3 g ] ) end.do

e n d d o a l l e n d d o

Page 3: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 239

Even more coarse parallelism can be achieved through such transformation. We will show, later in this paper, that the following transformation (,,) (023)(,1)

I~ = 1 0 0 /2 l~ 0 1 0 13

can transform the triple loop into the following form which has no dependence carried by the outermost loop.

doa l l I~ . . . . do I~ . . . .

d o ' ' "

S[(I~ + 4I~ + 3I~)/3 + 1, I~ + I~ - I~ q-1] . . . . . . . . h(S[(I~ + 4I~ + 3I~)/3, Ii + I~. - I~])

e n d d o e n d d o

e n d d o a l l

It is worth pointing out that here the failure of recognizing parallelism based on direction vectors is not due to the lack of algorithms, but rather due to that the direction vectors do not contain necessary information.

To deal with this situation, we propose a new. approach to approximate data dependence, using linear equations or inequalities of distance vectors. This approach is advantageous in the following sense. First, for many loops whose distance vectors are difficult to obtain, computing equations or inequalities of distance vectors is easy. Second, the information that is important to transformation design is contained in those equations, and moreover, they are easy to obtain. To show this , let us continue the discussion of Example 1.1. It can be shown that any distance vector of this loop satisfies either 2dl + d 2 - 2 == 0 or 2dl + d2 ~- 2 = 0, despite that there is no way of enumerating each individual. From these equations, one can easily see that the slope d z where dl ~ 0, is greater or equal to - 4 , which immediately results in the first transformation shown previously. It can be seen that any distance vector also satisfies equation d2 - 2d3 = 0. This property of distance vectors of the loop results in the second transformation shown previously [~'3]. It is quite obvious that there are loops whose distance vectors do not satisfy linear equations or inequalities, and our approach may not work for those loops.

In complying with this approach, we discuss two important issues in this paper. First, how to determine if the distance vectors of a loop can be approximated by linear equations or inequalities, and how to generate those linear systems. Second, how to design a desired transformation based on linear system of distance vectors.

The rest of this paper is organized as follows. Section 2 discusses program model and some background. The idea of approximating data dependence with linear systems of dis- tance vectors is introduced in Section 3. Section 4 focuses on how to generate all kinds of linear systems of dependence from index functions, ignoring iteration space. Section 5 discusses the loop transformation based on the information provided by linear systems. In Section 6 we continue the discussion of linear systems, considering both index functions and iteration space. The conclusion is given in Section 7.

Page 4: A direct approach for finding loop transformation matrices

240 J. of Comput. Sci. & Technol. Vol. 11

2 P r e l i m i n a r i e s

2 .1 M o d e l o f N e s t e d L o o p s

Although it; is not a necessary condition for applying our approach, wc restrict our at tention to perfect nested sequential loops for simplicity. An r-deep nested loop with m- dimensional array references has the following general structure:

do I1 = L1 (rT), U 1 (n) do h : L2(I1, ~), U2(I1, ~)

do I~ = L . ( I 1 , . . . , I~-l, ~), U,(I1,..., I~_~, ~) s, . . . .

$2 . . . . A(gl(D,. . . ,gin(D) . . .

e n d d o

e n d d o e n d d o

Here, f is the loop variable and ~ the loop-independent variable. As functions of f and r~, L~s and Uk's are the lower and upper limits. The iteration space, denoted as R, is defined

by Lk and Uk, and its size is controlled by [ and ft. fk(_f)'s and gk(f) ' s are referred to as index functions�9 We assume that index functions are linear and that their coefficients are loop-invariant constant. Certain types of nested loops which do not meet this requirement may be converted into this class. For example, when the indices of a nested loop with multi- dimensional array are linearized, the coefficients could be a function of variables ~. This type of nests can be delinearized to meet our assumption [41.

2 . 2 D a t a D e p e n d e n c e

When a memory location is shared by two data references, data dependence occurs which imposes a partial order on the execution of iterations. Let iterations i" and /7 be two iterations in R. They are considered being dependent one on the other if fk(~') = gk(/7) holds for every i, 1 < k < r; otherwise, they are independent. Further, da ta dependence can be classified into several types, and we only concern two types of data dependence in this paper: flow dependence and anti-dependence�9 Iteration i is flow dependent on iteration /7 if iteration i" proceeds i terat ion/7 , that is/7 >- ~ ; anti-dependence occurs if i teration i"

"-: - i - ; . succeeds iterati(m z', that is i ;--

Traditionally, information about data dependency is approximated by the set of distance vectors, denoted as D. For iterations, say i" and/~ with i ' >- i, i ' is flow dependent on i" and the distance vector d ~is defined as d = i' - i. I t is important to note that distance vectors are always lexicographically positive. As mentione.d, for some loops the number of distinct distance v~ctors is finite; however, for other loops this number is a function of the size of iteration space. In this paper, we use W~'s, 1 < k < r to represent a parti t ion of D, the set of distance vectors such that any d-E D with dj ~ 0 and dj = 0 for 1 < j < k - 1 belongs

I A vector Z" is lexicographically positive, denoted as ~" ~- 0, if there exists k such that xk > 0 and x j >_ 0 for any 1 _< j <_ s - 1. Vector ~ is lexicographi,:ally greater than vector ~, denoted a.s ~ ~- ~, if ~ ' - ~ is lexicographically positive.

Page 5: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop 25"ansformation Matrices 241

to Wk. It is possible that some Wk's might be empty. Since a distance vector in Wk must be lexicographically positive, the following fact is trivial.

O b s e r v a t i o n 2.1. For any d in Wk, dk > O.

As mentioned before, the distance vectors of a nested loop could be incomputable in which case direction vectors are usually used to further approximate da ta dependency . Suppose i" and/~ are two iterations with/-; >- i', the direction vector, ~ corresponding to i and i ' is defined as:

"<" if ik < i' k *!

d k = "=" if i k = z a ">" if ik > i' k

Obviously, the number of direction vectors is always finite, and each vector approximates a subset of distance vectors. The drawback of direction vectors is that they may lose useful information for parallelization as seen in Example 1.1.

2.3 Linear Loop Transformations

A nested loop can be converted into a new nested loop by conducting linear transfor- mations over itcration space, without changing the original da ta dependency. Let T be the transformation matr ix for a given nested loop. Each iteration/-; in the new iteration space R' is a mapping from an iteration ~" in R, that is i7 _ Ti'. The index functions of the new nested loop are therefore obtained from the original index functions by substi tuting f with /5T-1, where T -1 is the inw~rse of T and /5 the loop index variables of the new nested loop. Reader~ may refer to [2, 3] and [1] for more details, including how to form new iteration space R' given T. Several powerful loop transformation approaches, Like loop interchange, reversal, shewing and any combination of them, are special cases of such linear transforma- tions. In fact, each of these three transformations can be viewed as an elementary matrix transformation, and the combination of them is viewed as a series products of elementary matrices. For an r-deep nested loop, the t ransformation matr ix must be a matr ix of size r x r. To be a legal transformation, i.e., the ncw nested loop reserves original data dependency, t ransformation matr ix must satisfy the following condition:

L e r n m a 2.2 [2,3]. Let T be the matrix corresponding to a legal loop trans/ormation on a nested loop with dependence vector set D. Then for any d in D, dT is lezicographieally positive.

2.4 Loop Parallelization If the iterations of a loop can bc executed simultaneously without violating the partial

order imposed by data dependence, the loop of a nested loop is parallelizable. I t is a well- known fact tha t a loop is parallelizable if and only if there is no dependence carried by that loop. It is not hard to see that data dependence carried by loop I~ exists only if there is a distance vector with form (0, 0 , . . . , dk, d k + l , . . . , dr) where dk is the k-th component of the vector and dtr ~ 0.

L e m m a 2.2. For an r-deep nested loop, its k-th loop Ik is parallelizable if Wk is empty.

3 A p p r o x i m a t i n g D a t a D e p e n d e n c e w i t h Linear Sys- t e m s

A linear system consists of equations and inequalities. Let Df and Da be the sets of distance vectors of flow-depcndence and distance vectors of anti-dependence, respectively,

Page 6: A direct approach for finding loop transformation matrices

242 J. of Comput. Sci. & Technol. Vol. 11

of a nested loop, L. We say tha t the dependence of nested loop L can be approx imated by linear system S / and linear sys tem S~ if any d I E D f satisfies S/ and any d~ E D~ satisfies S~. Consider the nested loop in Example 3.1. Both f low-dependence and anti- dependence exist when n > 1. Suppose i teration (i~, i~) and i terat ion (il, i2) are dependent

�9 ! - !

" " If ( h , z s ) is flow one on the other. The indices must satisfy equat ion il + i2 -4- 1 = z 1 -4- z 2.

dependent on (i~, i2), then distance d~= (dl, d2), where dl = i~ - i~ and ds = i.~ - is, satisfies dl -4- d2 - 1 = 0. T h a t is, the f low-dependence of this double loop can be approx imated by

�9 l " l sys tem S / : dl -4- 2d2 - 1 = 0. Likewise, if (il,i.2) is an t i -dependent on (h ,z2) , then the "~ and d2 = i2 - i~ satisfy dl -4- d2 -4- 1 = 0. Thus, the ant i -dependence distance dl = il - h

can be approximated by sys tem S~ : dl -4- 2ds -4- 1 = 0.

Example 3.1:

d o I1 = 1, n d o I s = 1 ,n

A(I1 -4- 2I~ -4- 1) = A(I1 -4- 2/2) -4-.-- e n d d o

e n d d o

It is impor tan t to notice tha t for a given nested loop, linear systems tha t can approximate the da t a dependence are usually not unique. Take Example 3.2 and consider the following equations

dl + 2d2 + 5d3 - 1 = 0

o r

dl + ds + 3d3 - 1 = 0

2dl - ds - 2 = 0

d2 + 2d3 = 0

where the first equat ion is obtained from the index functions in the first subscript position, the second one is due to the index functions in the second subscript posit ion, the third and the four th ones are linear combinat ions of the previous two. It is quite clear tha t any of these equat ions approximate the f low-dependence since any distance vectors of flow dependence must satisfy them. Identifying proper ty of linear systems has a direct impact on loop t ransformat ions . Fur ther discussion on this ma t te r is included in Sections 4 and 5.

Example 3.2:

d o I i = 1 ,n d o I 2 - - 1 , n

d o I3 = 1,n A ( I i + 2 I s + 5 / 3 + l , I 1 + / 2 + 3 / 3 + 1 ) . . . . . . . . A(I1 + 2/2 + 5/3, I~ + / 2 + 3/3) + - - "

e n d d o e n d d o

e n d d o

A P r o p e r t y o f L i n e a r S y s t e m o f D e p e n d e n c e . Suppose both flow-dependence and anti-dependence exist in a nested loop. I f the flow-dependence can be approximated by linear system ~k=l,~ ckdk -4-Co = O, then the anti-dependence can be approximated by linear system

~k=l , r ckd~ - Co = O.

Page 7: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 243

4 L i n e a r S y s t e m s o f D e p e n d e n c e f r o m I n d e x F u n c t i o n s

To find linear system of dependence in general, both index functions and iteration space should be considered. We focus on index functions in this section; in the next section, index functions as well as iteration space will be considered.

4.1 A S u f f i c i e n t C o n d i t i o n

In this subsection, we give the condition which can be used to determine if the distance vectors of a nested loop can be approximated by linear systems. We consider an r-depth perfectly nested loop with m-dimensional array reference:

s : s i f t , . . . , fm] . . . . T : . . . .

where r

f~(f) : ak,o 4- ~ ak,3Ij j==l

gk(f) = bk,o + ~ bkdIj j = l

for 1 < k < m. We assume that the cocfficients akd's and bkj 's are loop-inw~riant variables,

and we also assume that fk(f) ' s and gk(f) 's axe not coimtant. An r-deep nested loop of this type is denoted by L = ( fk ,gk,r) . By F we denote the matrix of coefficients ~ follows:

1: = a2,1 - - b 2 , 1 a2,2 - - b 2 , 2 " ' " a2,r -- b2,.

ara, l - - bin,1 am,2 - - brn,2 . . " arn,r -- brn , .

T h e o r e m 4.1. / f Rank(F) < m - 1, where Rank(F) is the rank of matrix F, then distance vectors of a nested loop described above can be approximated by linear system

E c k d k + c o = O k = l , r

or linear system

v)_~ ckdk -- Co = 0 k = l , r

(Note that this condition is not necessary.)

Proof. When Rank(F) < m - 1, there exists a nonzero vector gt = ( n l , n 2 , . . . , n m ) satisfying g t F = ~t. Therefore, we have

rr~ 7n.

E n ak,j = nkb , , for 1 < j _< r. k = l k = l

Let d 'be an arbitrary distance vector in distance set D, and iterations i ' and /7 be a pair of dependent iterations represented by & They must satisfy fk(~ = gk(~), for 1 _< k <_ m b y definition. Tha t is ,

' ak ,o + a k , j i j = bk,o + b k , j Z j ,

j=l j=l

Page 8: A direct approach for finding loop transformation matrices

244 J. of Comput. Sci. & Technol. Vol. 11

for 1 < k < m. By multiplying n~

ak,o

for 1 < k < m, and then adding

~ m

nkak,lil +" " + E n~ak,rir k = l k = l

Since

on both sides of

+ ak,jij = bk,o + bk,jzj, j = l j : l

them together, we obtain

m m m m

E E ' E ' E + nkak,o = nkbk,lZ 1 + �9 .. + nkbk,~z~ + n~bk,o. k=l k=l k=l k=l

m

nkak,j = E nkbkd, k=l k = l

the above equation is equivalent to

m m m m

E nkak ' l ( i l -- i t l ) + ' ' ' + E n k a k , 2 ( i r - i~ )+ E n k a k , O -- E nkbk'O = O. k=l k=l k:l k=l

Remember that dcould be a flow dependence or an anti-flow dependence. If the former is the case, that is, d = i' - i, then we have

m m m

nkak,ldl + + E n~ak,zdr - E nkak,o + E nkbk,o = O. k--:-i k = l k:=l k:l

If the latter is the case, that is, d = i - i', then we have

k=:l k=l k=l

m

E nkbk,o = O. k=l

4 . 2 G e n e r a t i n g L i n e a r S y s t e m s U s i n g I n d e x F u n c t i o n s

As seen before, in general linear systems of dependence is not unique for a given nested loop. We define two classes of linear systems below. Linear systems of Type I are those whose offset c0's are nonzero, and linear systems of Type II are those whose offset c0's equal to zero. It is possible that for a given nested loop, there are several linear systems of the same type that approximate the data dependence. In the loop of Example 3.1, for instance, we have already seen three linear systems of Type I. They are dl + 2d2 + 5d3 - 1 = 0, dl + d2 + 3d3 -- 1 = 0, and 2dl - d 2 - 2 - ~ 0. From transformation point of view, all of them are different, because different linear systems produce different transformations. However, in some applications the difference between the third one and the first two is more significant than that between the first two. For example, the transformation that makes the innermost loop parallelizable can be obtained from either of the first two systems, while the transformation that makes the middle loop parallelizable can be obtained from the third system (more details can be seen later). The other way around does not hold. We refer a linear equation of Type I (or Type II) to as of form Iq (or of form IIq if cq # 0 and cj --- 0), for q+ 1 < j _< r. In this section, we are interested in the following problem: Given an r-deep nested loop with rank.(F)< m - 1, whether the data dependence can be approximated by a linear system of certain form? If the answer is yes, then what is that system?

Page 9: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 245

We shall develop a technique to deal with this problem based on manipulating the coefficients of index function. Let L = f ( I , g ( I ) , r ) be an r-deep nested loop, where

f k ( I ) = a~,o + ~-]~=1 a ~ j I j and gk(I) = bk,0 + ~ = 1 bk,jIj . Matrix Fk is defined as follows:

F~ -- (F, Ar, A ~ _ I , . . . , A k )

where . . . a t Aj = (al,j, a2,j . . . . j ) �9

As an example, the coefficient matr ix / '1 of the loop in Example 4.1 is given as follows:

F I = 1 1 1 2 2 2 . 1 1 1 1 2 3

Example 4.1.

do I1 = 0, n - 1 d o / 2 = 0 , n - 1

d o / 3 = 0 , n - 1 A[211 + Is + 2/3 + 2, 211 + 212 + 2/3 + 1, 311 + 2/2 + /3 ] . . . . . . . . h(A[I1 + Ia, I~ + I2 + I3, 211 +/3])

e n d d o e n d d o

e n d d o

We now give the conditions of the existence of linear systems of Type I and Type II.

T h e o r e m 4.2. For an r-deep nested loop with m-dimensional array reference, the data dependence can be approximated by a linear system of Type I i f r ank (F) < rank( (F , W) ) , where V = (al o - b l 0, as,0 -b2 0 , . . . , am o -b ,n o )t ; the data dependence can be approximated by a hnear system of Type II i f rank((F, W) ) < m.

To prove this we need a theorem of linear algebra.

L e m m a 4.1. Let P be an m x n matrix with rank (P) < m, and let Q -~ (P ,A) be another matrix where A is an m • l matrix. There exists a 1 • m row vector ~ such that ~ P = 6 and ~Q ~ 6 i f and only if r ank (P) < rank(Q) .

Proof of Theorem 4.2. According to the proof of Theorem 4.1, it is not hard to see that if there exists g such that n~F = 0 t and gV ~ 0, then linear systems of Type I exist. This condition is equivalent to r~F -- 0 ~ and nt(F, W) r 0 ~. According to Lamina 4.1, this is equivalent to r a n k ( F ) < rank((F, W)) . The second statement of the theorem is trivial. (11 )

Consider the loop in Example 4.1 for instance. Since F = 1 1 1 and (F, W ) = 1 1 1

1 1 1 1 , r a n k ( F ) = 1 and r a n k ( ( F , W ) ) = 2. Applying Theorem 4.2, we know 1 1 1 0

that linear systems of both types can be used to approximate the data dependence of the triple loop.

We now give the condition for checking if a specific form of linear system exists.

T h e o r e m 4.3. For a 4-deep nested loop with m-dimensional array reference, the data dependence can be approximated by a linear system of form Iq, 1 < q < r, i f rank(Fq+l) <

Page 10: A direct approach for finding loop transformation matrices

246 J. of Comput. Sci. & Technol. Vol. 11

rank(Fq) and rank(Fq+l) < rank((Fq+l,W)); the data de.pendence can be approximated by a linear ,system of form l[q, 1 <_ q <<_ r, if rank($~+l) < rank(Fq) and rank(Fq+l) = rank( ( W) ).

Proof. According to the proof of Theorem 4.1, to prove the existence of a linear system of certain form, say form Iq, we need to show the existence of ~/vector r~ such that

r ~ F -- 0 ~ (1)

# (2) n*Aj = 0 ~ f o r q § (3)

~ Aq ~s (~, (4)

where

Ai = ( a l j , a s , j , . . . , am j ) t.

Eq.(1) guarantees the existence of linear systems; Eq.(2) guarantees the linear systems are of Type I; Eqs.(3) and (4) further assure the linear systems are of Type Iq. Eqs.(1), (3) and (4) are equivalent to

r~(F,A, ,A ,_I , . . . ,Aq+I) = (Y

~(F,A~,A~_I, . . . ,Aq+I,Aq)* ~ (~.

Notice that 5 + 1 = (F, A., A . - 1 , . . . , Aq+,) and Fq = (F, A,, A , - I , . . . , Aq+I, Aq). Accord- ing to Lemma 4.1, the above equations hold if and only ifrank(Fq+l) < rank(Fq). Similarly,

Eqs.(1)-(4) are equivalent to ntFq+l = {) and nh(Fq+l, W) ~ 0. These two equations will be held if and only if rank(Fq+t) < rank((Fq+l, W)). Thus, we have proved the first half of the theorem; the second half can be handled similarly.

The Gauss elimination algorithm can be used to compute the rank of matrices and to compute vector ~ as well. We shall not give the Gauss el~nination algorithm here. Readers can find it in any textbook of numerical algorithms. Let us use Example 4.1 to show the process. Suppose we want to know if the data dependence can be approximated by linear system of form I3. According to Theorem 4.2, we need to check where rank(F) < rank(F3) and rank(F) < rank((F, W)); if these conditions hold we also need to compute ~. such that

n'F = O, n-'*F3 ~- 0, and ~ V ~ 0. One can notice that in the following process these two jobs are to be done in one p u s . We star t with a unit matrix U, coefficient matr ix F3 and Y . ( 00)

U = 0 1 0 , F3-- 1 1 0 0 1 1 1

Now, use the first row to cancel the rest of entries (100) Ue-- - 1 1 0 , F3~-- 0 0

- 1 0 1 0 0

Interchange row 2 and row 3,

12)(2) 1 2 , V - - 1 .

1 1 0

in the first column of F3.

0 0 , V+-- - 1 . 0 - 1 - - 2

(100) (111 2) U+-- - 1 0 1 , F3 +- 0 0 0 - 1 ,

1 1 0 0 0 0 0

V + -

(2) �9 - 2 ~

- 1

Page 11: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 247

Notice tha t F3 is diagonalized th rough so called elementary row operations. From these matrices we see that rank(F) = 1, rank(F3) = 2 and rank((F, W)) = 2; thus linear sys tem

of form/ '3 exists. Row 2 of U forms r~, tha t is, r~ = ( - 1 , 0 , 1). Since

A1 = 2 , A2 = 2 , A2 = 2 . 3 2 1

w e h a v e

cl = ~ A 1 = 1,

c 2 = n t A 2 = 1,

c3 = n~A3 = - 1 ,

co -- n*V -- - 2 .

Hence, the desired linear sys tem is dl + d2 - d 3 q- 2 ---- 0 for flow dependence and dl + d2 - d3 - 2 -- 0 for ant i-dependence.

The t ime complexi ty for the above process on a matr ix of size m • r is O(m2r) . The largest matr ix among those coefficient matrices is (F1, V), which has a size of m • (3r -~ 1). Therefore, checking the existence of a specific form of linear sys tem and comput ing the corresponding coefficients take time O(m2r) .

Until now we are dealing with generat ing only one specific form of linear system. In applications, we may want to know all possible form of linear systems. T h a t is, we want to know, given an r-deep nested loop, if linear systems with forms of I 1 , / 2 , . . . , I t , I I t , . . . , IIT exist. One way to do this is to repeat the above process 2r times. It will take O(m2r 2) time. Here, we present a more elegant way, which takes one pass to identify all forms of I or all forms of II.

Let F~ be the diagonalized form of FI obta ined by executing e lementary row operat ions; let U ' and V t be the matrices obtained after executing those e lementary operat ions which diagonalize F1. Any nonzero entry of F~ tha t is to thc right of zero entries is called a breaking entry. In the previous example, F[(2, 4) and F~(3, 5) are breaking entries. Notice tha t no two breaking entries are in the same column. By the k.-th breaking entry, we refer to the k- th breaking ent ry counted from left to right. We define two functions Pos[F~] and Pos[W']. Pos[F~] returns a vector whose k-th component is Pos[F[](k) = (Jl,J2), where Jl and J2 are the indices of the k- th breaking entry. Function Pos[W'] re turns a scalar which is Pos[W' l = j such tha t W'(j) :fi 0 and W'(j) --- 0 for j l < P <_ m.

Suppose n is an r -deep nested loop with m-dimensional array reference. Let Pos[F~](k) be any componen t of Pos[F~], and let Fos[F~l(k) = (p, q).

1. If q > r and p < Pos[W'] then the da ta dependence of L can be approximated by linear systems of Form I 2 r - q - l ; the coefficients of ( c l , . . . , c2 r -q-1) of the sys tem

.. ' is the p- th row vector of U'; the are the p roduc t of u~(A1,. ,A2T-q-1), where up coefficient co is the p roduc t of uv(-Ao + Bo);

2. If q > r and p > Pos[W'] then the da t a dependence of L can be approximated by linear sys tems of Form II2~-q-1; the coefficients of (C l , . . . , c2~-q-1) of the sys tem are

I the p roduc t of up(A1,..., A2~-q-1), where up is the p- th row vector of U' .

As a summary , we present below the algori thm tha t finds all forms of linear systems for a given nested loop.

Page 12: A direct approach for finding loop transformation matrices

248 J. of Comput. Sci. & Technol. Vol. 11

A l g o r i t h m : Finding All Forms of Linear Systems of Dependence

Input: coefficient matrices A and B of index functions of an r-deep nested loop with m-dimensional array reference;

Output: coefficients of all forms of linear systems that approximate the nested loop.

1) Constructing F1, and V and U; 2) Using elementary row operation to diagonalize FI, and obtain F~, V' and U'; 3) Computing functions Pos[F~] and Pos[W']; 4) For every component in Pos[F~] do the following.

/* Let (p, q) be the value of the component */ 4.1) If q > r and p < Pos[W'] then

compute (Cl , . . . , c2r-q-1) ---- utp(A1,...,A2~-q-1); compute co = u~(-Ao + B0);

4.2) If q > r and p > Pos[W'] then compute (cl , . . . , c2,.-q-1) = u;(A1, . . . , A2r-q-1).

As an application of this algorithm, we consider the loop in Example 4.1 again. We star t with the unit matr ix U, coefficient matrix F1 and V: (100) (1112 2)(2)

U = 0 1 0 , F I = 1 1 1 2 2 2 , V = 1 . 0 0 1 1 1 1 1 2 3 0

After diagonalizing F1 using row elementary operations, we have

U t /_- (loo) (111 212 ) (2 )

- 1 0 1 , F~ e-- 0 0 0 - 1 1 1 , V 'e - - - 2 . - 1 1 0 0 0 0 0 1 0 - 1

By scanning matr ix F~, we obtain

Pos[F~] ---- ((2, 4), (3, 5))

and Pos[W'] = 3.

We first consider the component (p, q) == (2, 4). Since q > r and p <. Pos[W'] hold, linear system of form I2~-q-1 (i.e., I3) exists. Notice that u~ = ( - 1 , 0, 1). Hence, (2)

Cl = ( - - 1 , 0 , 1 ) 2 == 1, 3

c2 = ( - 1 , 0 , 1 ) 2 = 1, 2

c3 = ( - 1 , 0 , 1 ) 2 = - 1 , 1

1 - -2 . 0

co = - ( - 1 , o , 1 1

Page 13: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 24!}

Tile corresponding linear system are Sf : d 1 A- d 2 - d 3 -b 2 = 0 and S~ : dl + d2 - d3 - 2 = 0. We now consider the component (p,q) = (3, 5). Since q > r and p < Pos[W'] hold, linear system of form I2 , -q-~ (i.e., I2) exists. Notice that u~ = (--1, 1,0). Hence,

' =(-1,1,0) Cl = u3A1

(1) = (-1,1,o) 2

2

co = - ( - 1 , 1 , 0 ) 1 0

The corresponding linear systems are

(2) 2 = 0, 3

== [,

== 1.

S $ : d 2 + 1 = 0 a n d S~ : d 2 -- 1 = 0 .

As far as the time complexity is concerned, we notice that the diagonalization takes O(m2r) time since the size of FI is m • 2r, Pos[F~] and Pos[Y'] can be computed in O(m + r) time, and all the coefficients of linear systems can be computed in O ( m 2) time. Hence, the algorithm takes O(m2r) time, much faster than generating each form of systems individually which takes O(m2r 2) time.

5 F i n d i n g P a r a l l e l i s m T h r o u g h L i n e a r S y s t e m s o f D e - p e n d e n c e

In this section we show how linear system of dependence can be used to find parallelism in nested loop. Parallelism could be explicit, that is, some loops axe parallelizable; it could be non-explicit, that is, parallelizable loop may be obtained through loop transformation. We shall show how linear systems of dependence can help identifying both. Though our attention is on exploring parallelism in nested loops, as mentioned before, loop transforms also have other important applications.

1) Innermost Loop Level Parallelism

For those machines which favor low level parallelism, like superscalar or VLIW, it is desirable tha t the innermost loop of a nested loop is parallelizable. The following theorem helps to recognize this type of loops.

L e m m a 5.1. Let L be an r-deep nested loop with m-dimensional array reference. The innermost loop of L is paralIelizable if the data dependence can be approximated by linear system of either form Iq, where q < r, or form II~.

Indeed, if the former is the case, one of the components of any distance vector (dl, d2,. �9 dq) must be nonzero; thus, W~ is empty since r > q. If the latter is the case, any distance vector should not be in the form of ( 0 , 0 , . . . , 0 , dr), where dr # 0; thus, W. is empty. Combining this with the results described in the previous section, we have the follows.

C o r o l l a r y 5.1. Let L be an r-deep nested loop with m-dimensional array reference. The innermost loop of n is parallelizable if r a n k ( ~ ) < rank((Fr, V)) or rank((F, V)) <

Page 14: A direct approach for finding loop transformation matrices

250 J. of Comput. Sci. & Technol. Vo}. 11

When the condition in Corollary 5.1 does not meet, parallelizable innermost loop may still be available through loop transformations.

L e m m a 5.2. Let L be an r-deep nested loop with m-dimensional array reference. There exists a loop transformation T that produces a parallelizable innermost loop, if the data dependence can be approximated by linear systems of form It.

r r Proof. Let S/ : ~ j = l c j d j +co = 0 and S~ : ~ j : : l c j d j - c 0 = 0 be the loop, where co r 0. The following matr ix T performs the desired transform:

T = 0 To

where Ur -2 i s an ( r - 2 ) x ( r - 2) ident i ty matrix, and 7D is ( # 1 ) with # = min{-~-=- ' 1 0 ~ '

k #

C o - - C r - - 1 - -Co - - C r _ 1 c~ , ~ }. Indeed, due to Ur-2, any distance vector d ' in Wi, for 1 < i < r - 2,

satisfies Td->- 0. It is also clear that any d~E Wr, say (0, 0 , . . . , 0, 0, dr), will be t ransformed through T into (0, 0 , . . . , 0, d~, 0) which is lexicographically positive. The rest is to show that

for any d-6 Wr-~, Td>- 6. We leave this to readers. Note that ~[= (0 ,0 , . . . , d , - l , d~ ) can

be assumed. To show that Td ~- O, one just need to show d~- l# -~- dr > 0.

Consider Example 4.1 again. We have already known from the preceding section the following linear systems Sf : dl - d3 a- 1 = 0 and S~ : dl - d 3 - 1 = 0. Hence, the (100) transformation matr ix is T = 0 2 1 . The transformed program is as follows.

0 1 0

l'~xample 4.1 (continued):

d o I 1 - - 0 , n - 1 do I2 = 0 , n - 1

do I3 = 0 , n - 1 A[211 + I2 + 2/3 + 2, 211 + 212 + 2/3 + 1, 311 + 212 +13] . . . . . . . . h(A[I~ + I3, I1 + I2 + I3, 211 +/2])

e n d d o e n d d o

e n d d o

C o r o l l a r y 5.3. Let L be an r-deep nested loop with m-dimensional array reference. There exists a loop transformation T that produces a paraIlelizable innermost loop, i / rank (F) < rank(Fr) and rank(F) < r nk((F , V)).

The conclusion below follows Corollary 5.1 and Corollary 5.2 immediately.

T h e o r e m 5.1. Let L be an r-deep nested loop with m-dimensional array reference. Parallelizable innermost loop is either available through loop transformation or not available iZrank(F) < ra,~k((Fr, Y) ) .

2) Parallelizing Outermost Loops

For MIMD machines, higher level parallelism is desirable since the synchronization over- head can be reduced. Exploring the coarseset-grain parallelism - - making the outermost loop parallelizable - - is particularly interesting. I t is known that a parallelizable outermost loop is readily available if there is no data dependence carried by loop I0, or a parallelizable outermost loop can be obtained through transformation if the distance vectors of a nested loop do not span the entire iteration space [2'3].

Page 15: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 251

L e m m a 5.3. Let L be an r-depth nested loop with m-d imens iona l array reference. The ou termos t loop is parallelizabIe i f r a n k ( ( F2, V) ) < r a n k ( F 1 ) .

Indeed, when this condition holds, according to the results described in the previous section data dependence can be approximated by linear system of form I I : , that is, d: = 0. This implies tha t there is no data dependence carried by loop I1.

When this condition does not hold, we check the spanning space of distance vectors. It is quite clear that if data dependence can be approximated by linear systems of Type I I ,

i.e., ctd i + c2d2 -t- " '" + c r d r = 0 , distance vectors span in a space with dimension less than r. Applying Theorem 4.2, we have the following result.

L e m m a 5.4. Let L be an r-deep nested loop with m-d imens iona l array reference. There is a t rans format ion T that produces a parallelizable ou termos t loop, i f r a n k ( F , V) < rn.

Both non-unimodular transformation matr ix and unimodular transformation matr ix should be available�9 The way to obtain a unimodular matr ix has been introduced in [2,31. Here we just give the construction of non-unimodular matrix. It is an impor tant advantage of linear system presentation that a non-unimodular t ransformation matr ix can be directly obtained based on the coefficients of a system.

T h e o r e m 4.5. Let q ~ j = : c jd j = O, where 1 < q < r and Cq ~ O, be a s y s t e m of Form [q of nested loop. The r • r

[ cl 1 0

trans format ion ma t r i x

C2

0 1

0 0 0 0

0 0

�9 " C q _ 1 Cq

�9 . 0 0 �9 . 0 0

�9 . 1 0 �9 . 0 1

�9 . . 0 0

0 0 . .- 0 0 0 0 .- 0 0 0 0 .. 0 0

0 0 -. 0 0 1 0 . - 0 0

0 0 -- l 1

produces a paraUelizable ou termost loop.

To prove the legitimacy of T, we need to verify that ~ = T d >- 0 for any d C D. Indeed, for any distance vector (dl, d 2 , . . . , d~) in W1, • is in the form of (0, dl, d ~ , . . . , d~), which is lexicographically positive, since di > 1. For any distance vector (dl, d 2 , . " , dr) in Wj, 2 _<

j < q - 1, ~ is in the form of (0, 0 , . . . , 0, dj, d}+2, . . . , d~), which is lexicographically positive since dj > 0. Note that set W,~ is empty. For any distance vector ( d : , d 2 , . . . , d ~ ) in W j ,

q + 1 <_ j < r, j is in the form of (0, 0 , . . . , 0, dj, d~-+l, . . . , d'~), which is also lexicographically positive since dj > 0. Finally, we have already seen that the first component of transformed distance vector is zero. Hence, T produces a parallelizable outermost loop.

As an application, let us continue with Example 1.1. The coefficient matr ix (F ,W) is

0 0 0 1 "~ and its rank is the described in the 1. Using technique preceding section, 0 0 0 1 ] ' we have the linear system of dependence d2 + 2d3 = 0. Therefore, the desired transformation (01 ) matr ix is 1 0 0 , and the transformed program is as follows.

0 1 0

Ezample 1.1(continued):

d o I : = O , n - 1 do Is = 0 , n - 1

do 13 = 0, n - 1

Page 16: A direct approach for finding loop transformation matrices

252 J. of Comput. Sci. & Technol. Vol. 11

S[I1 + h + h + 1, I1 + 2 h + 3 h + 1] . . . . . . . . h(S([It + h + h , It + 2 h + 3hi)

e n d d o e n d d o

e n d d o

6 Linear S y s t e m s o f D e p e n d e n c e from I t e r a t i o n Space and I n d e x Func t ions as Wel l

To illustrate our idea about how to consider the effect of index functions and iteration space together, let us first study Example 6.1.

Example 6.1:

d o I 1 = O , n - 1 do I2 = I1 + 1, n - 1

s[h, h] = h(s[h, h]) e n d d o

e n d d o

1 - 1 / has rank of 1, apply the technique developed Since matrix F = - 1 1 a w e c a n

!

in the previous section as follows�9 It is not hard to see that distance vectors, if there is any, can be approximated by linear system dl + d2 = 0. Therefore, innermost loop level and outermost loop level parallelism can be explored. However, the parallelism is not limited to these two levels�9 In fact, each level of loops can be executed in a parallel fashion since there is no data dependence at all. To see this, information provided by iteration space must be considered.

Suppose (i1,i2) and "' "' "' (zl, z2) are two dependent iterations. Then they must satisfy i l = z 2 a n d i2 =: i t . On the other hand, by looking at the functions of loop.limits, we realize that

�9 , �9 Combining equation i2 = i t with inequality the indices must also satisfy i2 ;> i l and ~2 > ~1" �9 "' - i2 > 0. These i2 > il, we have z~ - il > 0; combining tile other two equations we have z 2

imply dl > 0 and d2 > 0. Recall that we obtain dl + d2 = 0 from index functions. We conclude from the inconsistency among these equations that there is no dependence in this double loop.

In the above example, beside the linear system obtained from index functions, we ob- tained two inequalities, dl > 0 and d2 > 0, from the combination of index functions and iteration space. The inequalities of dependence provide additional information about data dependence. In order to obtain inequalities of dependence, a geometrical approach is adopted below. We view the iteration space as polytope, and view the index functions as hyperplanes. The inequalities of dependence is obtained by analyzing the geometry relation between the polytope and hyperplanes. For the sake of simplicity we discuss here one dimensional array reference only; the approach can be easily extended to multi-dimensional array.

Let f ( f ) = ~]~=1 akIk + ao and g(I) = ~ = 1 b~Ik + bo be two index functions of r-depth nested loop with iteration space R. Since the loop limits are formed with linear functions of induction variables, the iteration space R defined by these functions is a polytope from a geometry point of view. A hyperplane is defined from index functions: f ( f ) - g( I ) = 0, i.e.,

~ = l ( a k - bk)I~: + a0 - b0 = 0. We say that hyl,erplane f ( f ) - . q ( I ) = 0 has no intersection with iteration space R, if any instance of iterations in R, say ( i l , . . . , i~ ) , satisfies either

Page 17: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 253

Ek=l ( a l e - - bk)i k n t- a0 -- b0 > 0 or ~ = 1 (ak - bk)ik + a0 - b0 < 0. We say tha t the hyperp lane is above i terat ion space R, if the former is true, o therwise we say tha t this hyperp lane is below i terat ion space R. The following theorem tells us when the inequali t ies of dependence can be obtained.

T h e o r e m 6.1 Suppose the hyperplane f ( I ) - g(f) :-- 0 of an r-depth nested loop has no intersection with iteration space R. Flow dependencies of the loop, if they exist, can be approximated by

r

Z b~d~ > o (~) k = l

and

~ a k d k > O, k = l

and anti-dependencies of the loop, if they exist, can be approximated by

~-~ bkdk < 0 k = l

and

(6)

(7)

and

r

E akdk < 0, k = l

and anti-dependencies of the loop, if they exist, can be approximated by

r

Z bkdk > 0 k = l

T

E a ~ > o. (12) k = l

Proof. T h e proof is quite s t ra ight-forward. We only show tha t for Inequal i ty (5) and (6); o thers can be handled similarly. Suppose flow dependence exists. Any pair of dependent i terat ions, say i and z', should satisfy

a k i k + a o = Z b k i ; + b o . (13) k = l k = l

(lo)

(11)

and

~-~a~dk < O, (8) k : l

if the hyperplane is above R; or if the hyperplane is below R, flow dependencies of the loop, if they exist, can be approximated by

r

b~d~ < o (9) k = l

Page 18: A direct approach for finding loop transformation matrices

254 J. of Comput. Sci. g~ Technol. Vol. 11

On the other hand, since the hyperplane f(1) - g(I) -- 0 is above R, { and/~ should further satisfy

E ( a k - bk)ik -1- a0 -- b0 > 0 (14) k = l

r

~ ( a ~ - b~)ii. + ao - bo > o. (15) k=l

Inequality (5) is obtained by combining Equation (13) with (14); inequality (6) is obtained by combining Equation (13) with (15).

6.1 Applications T h e o r e m 6.2. Suppose L is an r-depth nested loop with m-dimensional array reference

and R its iteration space, and f~([ ) = E~=I a~,~ b + a0 and 9~([) = Ej_-iq b~,jIj + bo, for

1 < k < m, are index functions. If there is a hyperplane fk ( f ) - g k ( f ) = 0 which has no intersection with R, then

1. the innermost loop I~ is parallelizable if ak,~ = bk,, # O; 2. loop I~s, for q <. p < r, are parallelizable if either ak,p = O, for q <_ p <_ r, or bk,p = O,

f o r q < _ p < r .

Proof. We assume that hyperplane fk( r ) - gk(. f) = 0 is above R; the other case can be r handled similarly. If ak,~ = bk,~ r 0, the flow dependence is approximated by ~ j = l a~,jdj >

0 and ~ = 1 bk,jdj > 0, and the anti-dependence is approximated by ~j=~ ak jd j < 0 and ~ = , bk,jdj < 0, according to Theorem 5.1. Obviously, any dependence in the form of (0, 0 , . . . , 0, d~) cannot satisfy these inequalities, and W~ is empty. Hence, I~ is parallelizable.

q--1 If, for instance, ak,p = 0 for q < p _<_ r, we then have }-~j=l ak,jdj > 0 for flow dependence q--1 and ~j---1 ak,jdj < 0 for anti-dependence, which implies that sets Wp's for q - 1 _< p < r

are empty. Hence, loop Ip 's for q - 1 < p <_ r are parallelizable..

7 C o n c l u s i o n

We present in this paper a new technique for generating loop transformation matrices. In our approach, it is not necessary to compute distance vectors or direction vectors. Instead, linear equations or inequalities of distance vectors are used to approximate da ta dependence. This method is effective in dealing with nested loops of any depth and is capable of exploiting parallelism of nested loops in different levels. Our future research will be focused on finding an efficient algorithm to determine whether the hyperplane is above or below or intersecting the iteration space R, and finding a general way to fully utilize the information provided by the inequalities in loop transformation.

R e f e r e n c e s

[1] Banerjee U. Unimodular transformations of double loops. In Advances in Languages and Com- piler for Parallel Processing, Nicolau A, Gelernter D, Gross T, Padua D (eds.), The MI'[' Press, 1991, pp. 192-219.

[2] Wolf M E, Lam M S. A data locality optimizing algorithm. In Proc. ACM SIGPLAN'91 Conf. Programming language Design Implementation, June 1991, pp. :30-44.

Page 19: A direct approach for finding loop transformation matrices

No. 3 A Direct Approach for Finding Loop Transformation Matrices 255

[3] Wolf M E, Lam M S. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 1991, 2(2).

[4] Maslov V. Delinearization: An efficient way to break multiloop dependence equations. In A C M SIGPLAN'9~ Conf. on Programming Languages Design and fraplementation, June, 1992.

[5] Allen J R, Kennedy K. Automatic translation of Fortran programs to vector form. A CM Trans. on Programming Languages and Systems, 1987, 9(4): 491-542.

[6] Banerjee U. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Norwell, Ma~s., 1988.

[7] Fang Z, Lu M, Lin H. An approach to solve the cache thrashing problem. In Proc. of 5th Int'l Parallel Processing Syrup., Anaheim, CA, April 1991, pp. 330-335.

[8] Fang F, Lu M. An iteration partition approach for cache and local memory thrashing on parallel processing. IEEE Trans. on Comput., 1993, 42(5): 529--546.

[9] Fang Z, Yew C, Tang C, Zhu C. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. on Comput., 1990, 39(7): 919-929.

[10] Gannon D, Jalby W, Gallivan K. Strategies for cache and local memory management by global program transformation. J. Parallel and Distribute Comput., 1988, 5: 587-616.

[11] Goff G, Kennedy K, Tseng C. Practical dependence testing. In A C M SIGPLAN'91 Conf. on Programming Languages Design and Implementation, June, 1991.

[12] Hagphighat M, Polychronopoulos C. Symbolic dependence analysis for high performance pa- ral|elizing compiler. In Advances in Languages and Compiler for Parallel Processing, Nicolau A, Gelernter D, Grdss T, Padua D (ads.), The MIT Press, 1991, pp. 310-330.

[13] Li Z, Yew P C, Zhu C. An efficient data dependence analysis for parallelizing compilers. IEEE Trans. on Parallel and Distributed Systems, 1990, 1(1).

[14] Maydan D E, Hennesy J I,, Lam M S. Efficient and exact data dependence analysis. In A C M SIGPLAN'91 Conf. on Programming Languages Design and Implementation, June, 1991.

[15] Padua D A, Wolfe M J. Advanced compiler optimizations for supercomputers. Communications of the ACM, 1986, 29(12).

[16] Shen Z, Li Z, Yew P C. An empirical study on array subscripts and data dependence. 1EEE Trans. on Parallel and Distributed Systems, 1991, 2: 145-150.

[17] Tzen T H, Ni L M. Dependence uniformization: A loop parallelization technique. IEEE Trans. on Parallel and Distributed Systems, 1993, 4(5).

[18] Wolfe. M J. Optimizing Supercompilers for Supercomputers. Cambridge, MA: MIT Press. 1989.

L I N H u n received his B.S. and M.S. degrees in electrical engineering from Pudan University, People's Republic of China., in 1983 and 1986, respectively. Beginning in 1986, he taught for three years in the Department of Electronics Engineering at Fudan University as a Lecturer, and he is currently a Ph.D. candidate in the l)epartment of Electrical Engineering at Texas A&M University, College Station, T X, USA. His research interests include the design and analysis of parallel algorithms for combinatorial optimization problems, and the parallelizing compiler.

LU IVIi received her M.S. and Ph.D. degrees in electrical engineering from Rice University, Houston, TX, USA, in 1984 and 1987, respectively.

She joined the Department of Electrical Engineering, Texas A&M University in 1987 where she is currently an Associate Professor. Her research interests include parallel computing, distributed proces.';ing, parallel computer axchitectures and applications, computational geometry and VLSI

Page 20: A direct approach for finding loop transformation matrices

256 J. of Comput. Sci. & Technol. Vol. 11

algorithms. She has published over 60 technical papers in these areas. Her research has been funded by the National Science Foundation and the Texas Advanced Technology Program.

Dr. LLI is a senior member of the IEEE Computer Society. She is the Associate Editor of a number of professional journals, and the Stream Chairman of the 7th International Conference on Computing and Information. She served on the panels of NSF and IEEE Workshop on Imprecise and Approximate Computation'92, and on the Program Committees of the International Conference on Computing and Information'94, the Joint Conference on Information Science'94 and the IASTED- ISMM International Conference on Parallel and Distributed Computing and Systems'95. She is a registered professional engineer.

Jesse Z. F A N G received his B. S. degree in mathematics from Fudan University, Shanghai, China, and his M.S. and Ph.D. degrees in computer science from The University of Nebraska, Lincoln in 1982 and 1984 respectively.

After graduate, hc taught at Computer Science Department at Wichita State University and was an visiting senior computer scientist in the Center for Supercomputing Research and Deve- lopment at the University of Illinois, Urbana-Champaign. From 1986 to 1989, hewas a consultant member of technical staff at the Concurrent Computer Corp. From 1989 to 1991, he worked on parallel/vectorized compiIer and supercomputing system design for CONVEX Computer Corp. as a program manager in Software Development Department. He is currently working on Hewlett- Packard Laboratories to develop compilers for Hewlett-Packard new generation RISC architecture as a project manager. His research interests are instruction-level parallel compiler technologies on RISC architecture, superscalar and VLIW RISC architecture, parallel processing system, paral- lel/vectorized compiler, scheduling and synchronization.