march 14, 20021 cmput680 - winter 2006 topic c: loop fusion kit barton cbarton

50
March 14, 2002 1 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton www.cs.ualberta.ca/~cbarton

Upload: jaylon-bellamy

Post on 31-Mar-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 1

CMPUT680 - Winter 2006

Topic C: Loop FusionKit Barton

www.cs.ualberta.ca/~cbarton

Page 2: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 2

Outline

• Definition of loop fusion

• Basic concepts

• Prerequisites of loop fusion

• A loop fusion algorithm

• Example

Page 3: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 3

Loop Fusion

• Combine 2 or more loops into a single loop

• This cannot violate any dependencies between the loop bodies

• Several conditions which must be met for fusion to occur

• Often these conditions are not initially satisfied

Page 4: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 4

Advantages of Loop Fusion

• Save increment and branch instructions

• Creates opportunities for data reuse

• Provide more instructions to instruction scheduler to balance the use of functional units

Page 5: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 5

Disadvantages of Loop Fusion

• Increase code size effecting instruction cache performance

• Increase register pressure within a loop

• Could cause the formation of loops with more complex control flow

Page 6: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 6

Background

• There has been extensive work done on loop fusion

• Most has focused on weighted loop fusion (Gao et al., Kennedy and McKinley, Megiddo and Sarkar)

• Extensive work has also been done it performing loop fusion to increase parallelism

Page 7: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 7

Weighted Loop Fusion

• Associates non-negative weights with each pair of loop nests

• Weights are a measurement of the expected gain if the two loops are fused

• Gains include potential for array contraction, data reuse and improved local register allocation

Page 8: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 8

Optimal Loop Fusion

• Fuse loops to optimize data reuse, taking into consideration resource constraints and register usage

• This problem is NP-Hard

Page 9: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 9

Maximal Loop Fusion

• Our approach is to perform maximal loop fusion

• Fuse as many loops as possible, without considering resource constraints

• Fuse loops as soon as possible, not considering the consequences

Page 10: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 Allen & Kennedy, p. 150, 353 10

Dominators and Post Dominators

• A node x in a directed graph G with a single exit node dominates node y in G if any path from the entry node of G to y must pass through x

• A node x in a directed graph G with a single exit node post-dominates node y in G if any path from y to the exit node of G must pass through x

Page 11: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 11

Requirements for Loop Fusion

i. Loops must have identical iteration counts (be conforming)

ii. Loops must be control-flow equivalent

iii. Loops must be adjacent

iv. There cannot be any negative distance dependencies between the loops

Page 12: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 12

Non-conforming Loops

• If iteration counts are different, one loop must be manipulated to make the iteration counts the same

1. Loop peeling

2. Introduce a guard into one of the loops

Page 13: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 13

Loop Peeling

• Find the difference between the iteration count of the two loops (n)

• Duplicate the body of the loop with the higher iteration count n times

• Update the iteration count of the peeled loop

Page 14: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 14

Loop Peeling Example

while (i < 10)

{a[i] = a[i - 1] * 2;

i++;

}

while (j < 12)

{b[j] = b[j - 1] - 2;

j++;

}

while (i < 10)

{

a[i] = a[i - 1] * 2;

i++;

}

while (j < 10)

{

b[j] = b[j - 1] - 2;

j++;

}

b[j] = b[j - 1] - 2;

j++;

b[j] = b[j - 1] - 2;

j++;

Page 15: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 15

Guarding Iterations

• Increase the iteration count of the loop with fewer iterations

• Insert a guard branch around statements that would not normally be executed

Page 16: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 16

Guarding Iterations Example

while (i < 10)

{a[i] = a[i - 1] * 2;

i++;

}

while (j < 12)

{b[j] = b[j - 1] - 2;

j++;

}

while (i < 12)

{

if (i < 10)

{

a[i] = a[i - 1] * 2;

i++;

}

}

while (j < 12)

{

b[j] = b[j - 1] - 2;

j++;

}

Page 17: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 17

Loop Peeling

• Advantage:• Does not generate control flow within a loop

body

• Disadvantage:• Generates additional code outside of loops,

which could possible intervene with other loops

Page 18: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 18

Guarding Iterations

• Advantages:• Does not introduce intervening code• Can be “undone” later

• Disadvantage:• Generates control flow within a loop

Page 19: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 19

Control Flow Equivalence

• Two loops are control-flow equivalent if when one executes, the other also executes

Loop 1

BB

Loop2

Loop 1

Loop 3

BB

Loop2

Page 20: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 20

Determining Control Flow Equivalence

• Use the concepts of dominators and post dominators. Two loops L1 and L2 are control-flow equivalent if the following two conditions are true:• L1 dominates L2; and • L2 post dominates L1.

Page 21: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 21

Intervening Code

• Two loops are adjacent if there are no statements between the two loops

• Can be determined using the CFG:• If the immediate successor of the first loop is

the second loop, the two loops are adjacent

• If two loops are not adjacent, there is intervening code between them.

Page 22: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 22

Dealing with Non-Adjacent Loops

• If two loops are not adjacent, we attempt to make them adjacent by moving the intervening code

• Intervening code can be moved:• Above the first loop• Below the second loop• Both

• as long as no data dependencies are violated

Page 23: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 23

Intervening Code Example

• Assume CFG has 20 nodes

• 0-5 are above Loop 1• 17-19 are below Loop 2• What algorithm should be

used to determine which nodes are between Loop1 and Loop2?

Loop 1

Loop 2

6

7

8 9

10 11 12

13 14

15

16

Page 24: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 24

Gathering Intervening Code

• Given two loops L1 and L2, a basic block B is intervening code between L1 and L2 if and only if:o B is strictly dominated by L1o B is not dominated by L2

• Once the dominance relations are known, the set subtraction can be efficiently computed using bit vectors

Page 25: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 25

Intervening Code Example

Loop 1

Loop 2

6

7

8 9

10 11 12

13 14

15

16

Loop 10000 0011 1111 1111 1111 1

Loop 2

0000 0000 0000 0000 1111 1

Difference

0000 0011 1111 1111 0000 0

Page 26: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 26

Analyze Intervening Code

• Build a DDG of the intervening code• Put all nodes with no predecessors into queue• For each node in the queue:

• If there are no dependencies between the node and the loop

• Mark node as moveable• Add all of the nodes immediate successors to the

queue

• All nodes marked can be moved around the loop

Page 27: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 27

Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

b := a * 2;

c := b + 6;

g := 0;

if (c < 100)

d := c/2;

else

e := c * 2;

h := g + 10;

Page 28: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 28

Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

g := 0;h := g + 10;while (i < N) {

a += i;i++;

}while (j < N) {

f := g + 6;j++;

}b := a * 2;c := b + 6;if (c < 100)

d := c/2;else

e := c * 2;

Page 29: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 29

Non-Adjacent loops example

b := a * 2;

c := b + 6;

g := 0;

if (c < 100)

d := c/2;

else

e := c * 2;

h := g + 10;

Node Queueb := a * 2;

g := 0;

DDG Loop 2

Moveable Nodes

c := b + 6;

if (c < 100)

d := c/2;

else

e := c * 2;

b := a * 2;

c := b + 6;

if (c < 100)

d := c/2;

else

e := c * 2;

while (j < N) {

f := g + 6;

j++;

}

Page 30: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 30

Non-Adjacent loops example

b := a * 2;

c := b + 6;

g := 0;

if (c < 100)

d := c/2;

else

e := c * 2;

h := g + 10;

Node Queueb := a * 2;

g := 0;

DDG Loop 1

Moveable Nodes

h := g + 10;

g := 0;

h := g + 10;

while (i < N) {

a += i;

i++;

}

Page 31: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 31

Dependencies Preventing Fusion

i = j = 1;

while (i < 10)

{

a[i] = c[i] + 10;

i++;

}

while (j < 10)

{

b[j] = a[j+1] * 2;

j++;

}

Can the following loops be fused?

Page 32: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 32

Dependencies Preventing Fusion

• If we look at the array access patterns of a[], we see the following

a[i] = c[i] + 10;

b[j] = a[j+1] * 2;

Page 33: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 33

Dependencies Preventing Fusion

• By aligning the array access patterns, we get the following:

a[i] = c[i] + 10;

b[j] = a[j+1] * 2;

Page 34: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 34

Loop Alignment

i = j = 1;

while (i < 10)

{

a[i] = c[i] + 10;

i++;

}

while (j < 10)

{

b[j] = a[j+1] * 2;

j++;

}

j = 1;

i = 2

a[1] = c[1] + 10;

while (i < 10)

{

a[i] = c[i] + 10;

i++;

}

while (j < 10)

{

b[j] = a[j+1] * 2;

j++;

}

Page 35: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 35

Loop Alignment

• Loop alignment can be used to remove dependencies between loop bodies

• Easy to do when all dependencies have the same distance

• Gets tricky when there are multiple dependencies with different distances

Page 36: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 36

Putting it all together

• We’ve seen ways to deal with each of the preconditions of loop fusion

• If the conditions are not met, we apply transformations to try and modify the code

• If the transformations are successful, loop fusion can occur

• But in what order should these transformations be applied?

Page 37: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 37

Loop Fusion Algorithm

For each Ni from outermost to innermost:

Gather control equivalent loops in Ni into LoopSets

For each set Si in LoopSets

remove non-eligible loops from Si

FusedLoops = trueDirection = forwardwhile FusedLoops == true

if |Si| < 2 breakCompute Dominance Relation

FusedLoops = LoopFusionPass(Si, Direction)Reverse Direction

Page 38: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 38

Loop Fusion AlgorithmLoopFusionPass(S, Direction)

FusedLoops = false

For each pair of loops Lj and Lk in S such that Lj dominates Lk in Direction

if (DependenceDistance(Lj, Lk) < 0) continue

if (InterveningCode(Lj, Lk) == true and

IsInterveningCodeMoveable(Lj, Lk) == false) continue

d = | IterationCount(Lj) – IterationCount(Lk) |

if (Lj and Lk are non-conforming and (d cannot be determined at compile time or d > MAXPEEL)) continue

if (Lj and Lk are non-conforming) Peel iterations

MoveInterveningCode(Lj, Lk)

if InterveningCode(Lj, Lk) == false

FuseLoops(Lj, Lk) FusedLoops = true

Return FusedLoops

Page 39: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 39

ExampleL1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Loop Set

L1

L2

L3

L4

Page 40: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 40

Peeling Loop 1L1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 41: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 41

Fuse L1 and L2S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1

d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 42: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 42

Compare L5 and L3

• We now compare loops L5 and L3

• They are not adjacent, but the intervening code can move

• Difference in iteration count is not know, so fusion fails

S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1

d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 43: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 43

Compare L5 and L4

Intervening CodeS7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1

d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S1: ds = 0.0

L3: do i3 = 1, m

ds = ds + d(i3)

end do

S2: if (n<m)

S3: c(n-2) = n

S4: else

S5: c(n-2) = m

Page 44: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 44

Peel L5S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1

d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 45: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 45

Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 46: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 46

Reverse PassS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Loop Set

L1

L3

L4

Sorted in Reverse Dominance Direction

L1

L3

L4

Page 47: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 47

Compare L4 and L3

• Compare L4 and L3• No dependencies to

prevent fusion• Iteration count cannot

be determined at compile time

• Fusion fails

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 48: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 48

Compare L4 and L5

Intervening Code

L3: do i3 = 1, m

ds = ds + d(i3)

end do

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

Page 49: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 49

Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do

Page 50: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton

March 14, 2002 50

Fuse L4 and L1S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL6: do i5 = 1, n-2 a(i6+2) = a(i6+2) * k1 d(i6+1) = a(i6+1) - b(i6+2) * k2 b(i6) = a(i6) + b(i6) / c(i6) end doL3: do i3 = 1, m ds = ds + d(i3) end do

S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do