basic loop parallelization

17
Basic Loop Parallelization Basic Loop Parallelization Taewook Eom RTOS Lab., School of EECS, SNU

Upload: jett

Post on 05-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Basic Loop Parallelization. Taewook Eom RTOS Lab., School of EECS, SNU. Outline. Data dependences Loop parallelization How to extract data dependence information. Loop Parallelization Goal. DoAll loops Loops whose iterations can execute in parallel Example For (i = 0; i < n; i++) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Basic Loop Parallelization

Basic Loop ParallelizationBasic Loop Parallelization

Taewook Eom

RTOS Lab., School of EECS, SNU

Page 2: Basic Loop Parallelization

2

OutlineOutline

Data dependences

Loop parallelization

How to extract data dependence information

Page 3: Basic Loop Parallelization

3

Loop Parallelization GoalLoop Parallelization Goal

DoAll loopsLoops whose iterations can execute in parallel

Example

For (i = 0; i < n; i++)

A[i] = B[i] + C[i]

Statically assign each iteration to a processor

Page 4: Basic Loop Parallelization

4

Data Dependence of ScalarsData Dependence of Scalars

Data DependencesTrue dependence: write read

Anti dependence: read write

Output dependence: write write

Data dependence exists from a dynamic instance i to i’ iff

either i or i’ is a write operation

i and i’ refer to the same variable

i executes before i’

Page 5: Basic Loop Parallelization

5

Data Dependences for Arrays (1)Data Dependences for Arrays (1)

DoAll loop vs. DoAcross loop

Loop-carried vs. Loop-independent dependenceDependence exists across iterations vs. within one iteration

Recognize DoAll loops (intuitively)Find data dependences in loops

No dependences crossing iteration boundaries ⇒ Parallelizable

for (i = 2; i < 5; i++)A[i] = A[i-2] + 1

for (i = 0; i < 3; i++)A[i] = A[i] + 1

Page 6: Basic Loop Parallelization

6

Data Dependences for Arrays (2)Data Dependences for Arrays (2)

AssumptionsA loop is in canonical form

• Index runs from 1 to N by increments of 1

Perfectly nested loops• Only the innermost loop has statements other than for

statements

ExampleFor (i1 = 1; i1 <= n1; i1++)

For (i2 = 1; i2 <= n2; i2++)

For (i3 = 1; i3 <= n3; i3++)

A[i1, i2, i3] = A[i1, i2, i3] + 1;

Page 7: Basic Loop Parallelization

7

Iteration Space (1)Iteration Space (1)

Iteration is represented as coordinates in iteration space

n-dimensional discrete space for n-deep loops

Iteration space diagrams show dependences between all iterations of a loop

Vs. Dependence graph shows all dependences between two statements in a loop

Example

i

j

Iteration Space Traversal

FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1

Page 8: Basic Loop Parallelization

8

Iteration Space (2)Iteration Space (2)

Sequential execution order of iterationLexicographic order

[0,0], [0,1], … , [0,6], [1,1], … , [1,6], [2,2], …

Iteration is lexicographically less than , iff

Examples• (1,6,0) < (2, 0, 0)

• (0,3,1) < (0,5,0)

cccc iiandiiiitsc )..,()..,.(. 1111

i

i

Page 9: Basic Loop Parallelization

9

Distance Vectors (1)Distance Vectors (1)

A loop has a distance vectorIf there exists data dependence from a node to a later node and

Indicates how many iterations apart are source and sink of dependence

Example

Distance vector = [1,1]

FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1

iid

i

i

d

i

j

Iteration Space

Page 10: Basic Loop Parallelization

10

Distance Vectors (2)Distance Vectors (2)

Since then is lexicographically

greater than or equal to 0

All dependence must point forward in time

Distance vector (infinitely large set)[0,0], [0,1], … , [0,n], [1,-n], … , [1,0], … , [1,n], … , [n,-n], … , [n,0], … , [n,n]

Summary: Direction vectors0 or lexicographically positive

[0,0], [0,+], [+,-], [+,0], [+,+]

ii

d 0d

Page 11: Basic Loop Parallelization

11

Direction Vectors (1)Direction Vectors (1)

The sign of the distance

There are different notations< or +: dependence from earlier to later iteration

= or 0: dependence within the same iteration

> or -: dependence from later to earlier iteration

ExampleFor (i = 0; i < n; i++) For (j = 0; j < n; j++) A[i, j] = A[i-2, j+3]

i = (1, 4)i’ = (3, 1)d = i’ - i = (2, -3)Direction = (<, >) or (+, -)

Page 12: Basic Loop Parallelization

12

Direction Vectors (2)Direction Vectors (2)

What can the direction vectors for 3-D loops be like?

[? ? ?]

[0 ? ?]

[0 0 ?]

[0 0 0]

[0 0 - ]

[0 0 +]

[0 + ?]

[+ ? ?]

[0 - ?]

[ - ? ?]

Backwarddirection

Page 13: Basic Loop Parallelization

13

Dependence VectorsDependence Vectors

Dependence VectorSet of distance vectors

ExampleFor (I = 0; i < n; i++)

For (j = 0; j < n; j++) {

A[i][j] = A[i][j] + 1;

B[i][j] = B[i][j-1] + 1;

}

Test for parallelization:A loop with data dependence is parallelizable if for all = (0)d

Dd

Distance vectors : [0, 0], [0, 1]Dependence vectors : [0,[0, 1]]

Page 14: Basic Loop Parallelization

14

n-Dimensional Loopsn-Dimensional Loops

1

0FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i, j-1] + 1

FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j] + 1

FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j-1] + 1

0

1

1

1

j

i

j

i

j

i

Loop i isparallelizable

Loop j isparallelizable

Loop j isparallelizable

Page 15: Basic Loop Parallelization

15

Test for ParallelizationTest for Parallelization

is loop-carried at level iif di is the first non-zero element

⇒ does not affect the parallelizability of inner loop

Ex) (0,0,2,0,1, …) : level 3 dependence

The i th loop of an n-dimensional loop is parallelizable if there does not exist any level i dependences

The i th loop of an n-dimensional loop is parallelizable for all dependences , either

(d1, … ,di-1) > 0 (= lexicographically forward)

(d1, … ,di) = 0

)...,,( 1 nddd

),...,,( 21 ndddd

Page 16: Basic Loop Parallelization

16

Improving ParallelizabilityImproving Parallelizability

Scalar Privatization

Loop-carried dependence is an anti dependence• No value is really carried across the loop

Data is local to each loop iteration• Eliminating loop carried dependences

Scalar privatization should be applied before any serious parallelization attempt

float t;for (i = 0; i < n; i++) { t = A[i]; b[i] = t * t;}

for (i = 0; i < n; i++) { float t; t = A[i]; b[i] = t * t;}

True dependence(loop-indep.)

Anti dependence(loop-carried)

Not parallelizable!! Parallelizable!!

Page 17: Basic Loop Parallelization

17

SummarySummary

Data Dependenceread/write

same variable

in direction of sequential ordering

RepresentationIteration space

Dependence vectors• Distance vectors

• Direction vectors

Parallelization Testing