a compilation framework for irregular memory accesses on ... · 4 compiler-directed communication...

42
MAINAK C HAUDHURI

Upload: others

Post on 22-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummary

A Compilation Framework forIrregular Memory Accesses

on the Cell Broadband Engine

MAINAK CHAUDHURI

Department of Computer Science and Engineering, IIT Kanpur

Pramod Bhatotia � IBM ResearchSanjeev Aggarwal � IIT Kanpur

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 2: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryOutline

1 Compiling Irregular Accesses

2 Data�ow Analysis

3 Compiler Transformations and Code Generation

4 Compiler-Directed Communication Mechanism

5 Run-time Parallelization

6 Experiments and Results

7 Summary

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 3: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCell Broadband Engine OverviewCell Architecture

SPU ( 128 128b registers)

Local Store(256 KB)

(LS)

Memory Flow Controller

(MFC)

Power Processing Unit(PPU)

L1 Cache

L2 Cache

Element Interconnect Bus (EIB)

SPE (x8)

IOController

BIF & IO

PPE

MIC

System Memory

IOController

BIF & IO

Memory Subsystem IO Subystem

The Cell Broadband Engine

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 4: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorIrregular Array Accesses

Irregular Array AccessAn array access is irregular if no closed-form expression, interms of the loop indices and constants, for the subscripts ofthe accessed array is available at compile-time.

do t =do i = num_edges

n1 = left[i ]n2 = right[i ]force = (x[n1]−x[n2])/4y [n1] + = forcey [n2] + = force

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 5: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorIrregular Array Accesses

Irregular Array AccessAn array access is irregular if no closed-form expression, interms of the loop indices and constants, for the subscripts ofthe accessed array is available at compile-time.

do t =do i = num_edges

n1 = left[i ]n2 = right[i ]force = (x[n1]−x[n2])/4y [n1] + = forcey [n2] + = force

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 6: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses

Compiling Irregular Accesses

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 7: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorCompiler Analysis for Irregular Array Accesses

force = x[r ight [ i ] ] - x[ lef t [ i ] ] ;

access pattern @ run-time

int main(){ ..... .......... for() { ................... for() { ................... } }

for() { force = x[r ight[ i ] ] - x[ left[ i ] ] ; } ......... ........}

static analysis for DMA comm.

MFC-DMA

MFC-DMA

MFC-DMA

Dataflow frameworkbased on run-timepreprocessing for DMA comm.

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 8: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorExplicit Dynamic Memory Management

Execution Unit

Local StoreSPU

DMA controller MFC

EIB

Main Memory

Limited amount of LS (256 KB)

Code

Data

Local Store

1

SPE vector processor computes quickly

2 Efficient t i l ing

3

Mult iple-buffering

4

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 9: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorCell Architecture Speci�c Challenges

EU LSSPU

MFC

EU LSSPU

MFC

EU LSSPU

MFC

EU LSSPU

MFC

L1-C L1-DPPU

L2

SPE-1

SPE-3

SPE-2

SPE-4

Main Memory

SPE-all

SPMD ||el execution

workload part i t ioning + monitor control

1

Vector processor Data al ignment

2 Branch prediction compiler hints

3

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 10: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryNeed for Run-time Parallelization

Sparse UpdatesSparse update refers to reduction operations on some elements ofan array within a loop where the access pattern of the array in theloop may be irregular.

for(i = 1 to n)A[B[i ]] = A[B[i ]] + X ;

Flow

S1: A(B(i))=...S2: ...=A(B(i))

Anti

S1: ...=A(B(i))S2: A(B(i))=...

Output

S1: A(B(i))=...S2: A(B(i))=...

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 11: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorRun-time Parallelization

EU LSSPU

MFC

SPE-0

EU LSSPU

MFC

No H/w Coherency

1 2 543 6

X2 3 2 3 1 3X

Iteration i

Array B

Array A X X X X

SPE-2

A[B[ i ] ]

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 12: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorRun-time Parallelization

EU LSSPU

MFC

SPE-0

EU LSSPU

MFC

No H/w Coherency

1 2 543 6

X2 3 2 3 1 3X

Iteration i

Array B

Array A X X X X

SPE-2

A[B[ i ] ]

4

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 13: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorRun-time Parallelization

EU LSSPU

MFC

SPE-0

EU LSSPU

MFC

No H/w Coherency

1 2 543 6

X2 3 2 3 1 3X

Iteration i

Array B

Array A X X X X

SPE-2

A[B[ i ] ]

4 3

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 14: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiling Irregular Accesses on the Cell ProcessorRun-time Parallelization

EU LSSPU

MFC

SPE-0

EU LSSPU

MFC

No H/w Coherency

1 2 543 6

X2 3 2 3 1 3X

Iteration i

Array B

Array A X X X X

SPE-2

A[B[ i ] ]

4 3 5

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 15: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryOverview of the System

Sequential Input program

Dataflow VariableAnalysis

Runtime System

SPMD Cell Code

CompilerTransformation

MemoryCommunicat ion

Runtime Parallelization

Parallelizing Compiler Framework

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 16: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryData�ow Analysis

Data�ow Analysis

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 17: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryData�ow Analysis for Memory Communication

Loop Flow Graph Builder

Result FlowVariable Analysis

Bit-VectorLibrary

GlobalInformation

Builder

global flow analysis

result flow analysis

Dependence Analyzer

Local FlowAnalysis

Global FlowAnalysis

Result FlowAnalysis

SourceLoop Flow

Graph

Global Info-DS

c2suifAST

(SUIF IR)

InspectorOutput

Figure: Data�ow analysis for memory communication

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 18: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler Transformations and Code Generation

Compiler Transformations and CodeGeneration

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 19: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler Transformations and Code Generation

Run-Time Library

Loop Transformations

Kernel Transformations

Communication Builder

Compiler Transformations and Code Generation

CommunicationTransformations

Multi-BufferingBuilder

Tiling InfoBuilder

InspectorOutput

GlobalInfo-DS

Input SourceSUIF IR

suif2c

TransformedSUIF IR

SPMD CellCode

Figure: Compiler transformations and code generation

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 20: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-Directed Communication Mechanism

Compiler-Directed CommunicationMechanism

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 21: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBlock Access Method for Regular Accesses

EU LSSPU

MFC

Main Memory

for( i=1 to 4){ sum += a[ i ] ;}

a [0 ]

a [1 ]a [2 ]a [3 ]

Block Access for Regular Accesses

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 22: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBlock Access Method for Regular Accesses

Execution Unit

Local StoreSPU

DMA controller MFC

EIB

Main Memory

1Tile-Size

Code

Data

Local Store

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 23: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBlock Access Method for Regular Accesses

Execution Unit

Local StoreSPU

DMA controller MFC

EIB

Main Memory

2Addressing

Code

Data

LS Address

EA

*Cache-line aligned*last 4 bits

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 24: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBlock Access Method for Regular Accesses

Execution Unit

Local StoreSPU

DMA controller MFC

EIB

Main Memory

3 DataPadding

Code

Data

Multiple of 16

Transfer unit

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 25: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBounded Method for Irregular Read Accesses

EU LSSPU

MFC

Main Memory

for( i=1 to 4){ sum += a[b[ i ] ] ;} a [b [0 ] ]

a [b [1 ] ]

a [b [2 ] ]a [b [3 ] ]

Bounded Access for Irregular read-only Accesses

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 26: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBounded Method for Irregular Read Accesses

EU LSSPU

MFC

Main Memory

for( i=1 to 4){ sum += a[b[ i ] ] ;}

a [b [0 ] ]

a [b [1 ] ]

a [b [2 ] ]a [b [3 ] ]

Bulk Access

a[0]

a[n]

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 27: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBounded Method for Irregular Read Accesses

EU LSSPU

MFC

Main Memory

for( i=1 to 4){ a [b [ i ] ]= a [b [ i ] ]+x ;}

a [b [0 ] ]

a [b [1 ] ]

a [b [2 ] ]a [b [3 ] ]

Gather Method

a[0]

a[n]

a [b [0 ] ]

a [b [1 ] ]

a [b [2 ] ]

a [b [3 ] ]

DMA-LIST

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 28: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBounded Method for Irregular Read Accesses

Execution Unit

Local StoreSPU

DMA controller MFC

Modif ied bounding box method

Main Memory

1 Prepare list ofbounding boxes

DMA-LIST

Executor prepares l ist of all the neededelements and ini t iates the communicat ionwith MFC DMA list command

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 29: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismBounded Method for Irregular Read Accesses

Execution Unit

Local StoreSPU

DMA controller MFC

Modif ied bounding box method

Main Memory

2

*Keeps the size of box as 128 bytes, since DMA transfer unit is 128bytes

DMA-LIST

*Prepares single bounding box within same-ti le for duplicated values of ref-array

128 bytes

*Does memcopy for values across the t i le

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 30: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismCompiler Controlled Cache for Sparse Updates

EU LSSPU

MFC

Main Memory

for( i=1 to 10){ a [b[ i ] ] += X;}

a [10]

Compiler Controlled Cache Regular Accesses

a[10]

a[10]

a[10]

1

2

3

DMARead-List

DMAWrite-List

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 31: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismCompiler Controlled Cache for Sparse Updates

EU LSSPU

MFC

Main Memory

for( i=1 to 10){ a [b[ i ] ] += X;}

Compiler Controlled Cache Regular Accesses

Tag

Tag

Tag

L-0

L-i

L-128

0

1

32

128 bytes

Tag Line word

Compare

Hit inCache

Miss inCache

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 32: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryCompiler-directed Communication MechanismCompiler Controlled Cache for Sparse Updates

EU LSSPU

MFC

Main Memory

Tag

Tag

Tag

EU LSSPU

MFC

SPE-0 SPE-1

Tag

Tag

Tag

X

Y

Shared

Dirty

Dir ty

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 33: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryRun-time Parallelization

Run-time Parallelization

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 34: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryParallelization of Irregular Reduction Loops

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 35: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryParallel Construction of the Dependence Chain

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 36: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryExperiment and Results

Experiments and Results

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 37: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryIterative Partial Di�erential Equation (PDE) Solver

Figure: Speedup for IRREG

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 38: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryMolecular Dynamics Code

Figure: Speedup for MOLDYN

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 39: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryNonbonded Force Calculations

Figure: Speedup for NBF

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 40: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummarySummary

Summary

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 41: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummarySummary

Data�ow analysis framework for determining the programpoints for memory communication.

Compiler transformation for run-time data processing andactual computation.

Builds the memory communication schedules.

Run-time parallelization of loops judiciously partitions the dataand computational work.

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor

Page 42: A Compilation Framework for Irregular Memory Accesses on ... · 4 Compiler-Directed Communication Mechanism 5 Run-time Parallelization 6 Experiments and Results 7 Summary ... Unit

Compiling Irregular AccessesData�ow AnalysisCompiler Transformations and Code GenerationCompiler-Directed Communication MechanismRun-time ParallelizationExperiments and ResultsSummaryThank You

Thank You

Mainak Chaudhuri Compiling Irregular Accesses for the Cell Processor