sparse code optimization

Post on 04-Jan-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Sparse code optimization. Automatic transformation of linked list pointer structures. Sven Groot. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - PowerPoint PPT Presentation

TRANSCRIPT

SPARSE CODE OPTIMIZATION

Automatic transformation of linked list pointer structures

Sven Groot



Sven Groot 3

RESTRUCTURING COMPILER Special type of optimizing compiler Restructures source code (e.g. to enable

vectorization or parallelization) Techniques:

Loop interchangeStrip miningLoop collapsingLoop fusion and fissionData structure transformation

Sparse compiler is a type of restructuring compiler

Sven Groot 4

RESTRUCTURING COMPILER (CONT’D) Loop interchange example

void MatrixMultiply1(float **result, float **left, float **right, int size, int rightWidth){ int row, col, x; for( row = 0; row < size; ++row ) { for( col = 0; col < rightWidth; ++col ) { for( x = 0; x < size; ++x ) { result[row][col] += left[row][x] * right[x][col]; } } }}

Sven Groot 5

RESTRUCTURING COMPILER (CONT’D) Loop interchange result

void MatrixMultiply2(float **result, float **left, float **right, int size, int rightWidth){ int row, col, x; for( row = 0; row < size; ++row ) { for( x = 0; x < size; ++x ) { for( col = 0; col < rightWidth; ++col ) { result[row][col] += left[row][x] * right[x][col]; } } }}

Sven Groot 6

RESTRUCTURING COMPILERS (CONT’D) Pointers pose problems

void MatrixMultiply3(float **result, float **left, float **right, int size, int rightWidth){ int row, col; float **tempRight; float *tempLeft;

for( row = 0; row < size; ++row ) { for( col = 0; col < rightWidth; ++col ) { tempRight = right; tempLeft = left[row]; while( tempRight < right + size ) { result[row][col] += *tempLeft * (*tempRight)[col]; ++tempRight; ++tempLeft; } } }}

Sven Groot 7

LINKED LIST MATRIX

Matrix ColHeadIndex=1

Col ColHeadIndex=2

Next ColHeadIndex=3

Next

RowHeadIndex=1

Row

RowHeadIndex=3

Next

CellCell

Cell

CellColNext

Cell

CellCell

Cell

CellColNext

RowNext

110 000 101

Sven Groot 8

LINKED LIST MATRIX (CONT’D)struct Cell { float Value; int ColIndex; int RowIndex; struct Cell *RowNext; // Cell in the next row struct Cell *ColNext; // Cell in the next column};struct RowHead{ int RowIndex; struct Cell *Cell; struct RowHead *Next;};struct ColHead{ int ColIndex; struct Cell *Cell; struct ColHead *Next;};struct Matrix{ int Dimensions; struct ColHead *Col; struct RowHead *Row;};

Sven Groot 9

LINKED LIST MATRIX (CONT’D) Matrix multiplication using linked lists

void MatrixMultiply(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col, row, x; for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row; for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } } } } }}

Sven Groot 10

LINKED LIST MATRIX (CONT’D) Alternative matrix multiplication

int MatrixMultiplyAlternative(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col; for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row; while( leftRow != NULL ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } } return 0;}

Sven Groot 11

TRANSFORMATION The goal: remove all references to the

linked list from the loop The means: move linked list references

into initialization loop Initialization copies linked list contents

into array Transformed loop uses array Two methods, sublimation and

annihilation Must be done automatically

Sven Groot 12

SUBLIMATION Transforming the innermost loop

for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } }

Sven Groot 13

SUBLIMATION (CONT’D) Initialization

Transformed main loop

leftCellArray = malloc(sizeof(float) * dimensions); for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArray[x] = leftCell->Value; } else leftCellArray[x] = 0; }

for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArray[x] * right[x][col]; }

Sven Groot 14

SUBLIMATION (CONT’D) Transforming the inner loop (alternative)

Initialization

Transformed main loop

while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; }

leftCellArray = malloc(sizeof(float) * dimensions); memset(leftCellArray, 0, sizeof(float) * dimensions); while( leftCell != NULL ) { leftCellArray[leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; }

for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * right[leftCellCounter][col]; }

Sven Groot 15

LOOP EXTRACTION Putting it in context for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell;

leftCellArray = malloc(sizeof(float) * dimensions); for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArray[x] = leftCell->Value; } else leftCellArray[x] = 0; }

for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArray[x] * right[x][col]; }

free(leftCellArray); } }

init

ializ

ati

on

Main

lo

op

Sven Groot 16

LOOP EXTRACTION (CONT’D) Initialization for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next;

leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions);

if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } }

Sven Groot 17

LOOP EXTRACTION (CONT’D) Transformed main loop for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } }

Sven Groot 18

LOOP EXTRACTION (CONT’D) Putting it in context (alternative) while( leftRow != NULL ) { leftCell = leftRow->Cell;

leftCellArray = malloc(sizeof(float) * dimensions); memset(leftCellArray, 0, sizeof(float) * dimensions); while( leftCell != NULL ) { leftCellArray[leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; }

for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * right[leftCellCounter][col]; }

free(leftCellArray);

leftRow = leftRow->Next; }

Sven Groot 19

LOOP EXTRACTION (CONT’D) Initialization (alternative)

Transformed main loop

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float));

if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRowCounter][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } }

Sven Groot 20

LOOP EXTRACTION (CONT’D) Once more, in context

Sven Groot 21

for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions); if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } }

for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } }

for( row = 0; row < dimensions; ++row ) free(leftCellArrayArray[row]); free(leftCellArrayArray); }

init

ializ

ati

on

main

loop

Sven Groot 22

for( col = 0; col < dimensions; ++col ) { leftRow = left.Row;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float)); if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) free(leftCellArrayArray[leftRowCounter]); free(leftCellArrayArray); }

init

ializ

ati

on

main

loop

Sven Groot 23

TRANSFORMATION RESULT

Sven Groot 24

void MatrixMultiplySublimation(struct Matrix left, float** right, float **result, int rightWidth) { struct RowHead *leftRow = left.Row; struct Cell *leftCell; float **leftCellArrayArray; int dimensions = left.Dimensions; int col, row, x; leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions); if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } } for( col = 0; col < rightWidth; ++col ) { for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } } } for( row = 0; row < dimensions; ++row ) free(leftCellArrayArray[row]); free(leftCellArrayArray);}

Generateddeclaratio

n

init

ializ

ati

on

main

loop

Sven Groot 25

void MatrixMultiplyAlternativeSublimation(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col; float **leftCellArrayArray; int leftCellCounter, leftRowCounter;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float)); if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( col = 0; col < rightWidth; ++col ) { for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRowCounter][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) free(leftCellArrayArray[leftRowCounter]); free(leftCellArrayArray);}

Generateddeclaration

s

init

ializ

ati

on

main

loop

Sven Groot 26

ANNIHILATION Alternative method of transformation No fill-in: omitted values stay omitted Sublimation:

Sparse loop: more iterationsSemi-dense loop: same number of iterations

AnnihilationSparse loop: same number of iterationsSemi-dense loop: less iterations

Can require other transformations

Sven Groot 27

ANNIHILATION (CONT’D) Recall the innermost loop

for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } }

Sven Groot 28

ANNIHILATION (CONT’D) Initialization

leftCellArraySize = 100; leftCellArray = malloc(sizeof(float) * leftCellArraySize); newDimensions = 0; leftCellCopy = leftCell; for( x = 0; x < dimensions; ++x ) { if( newDimensions >= leftCellArraySize ) { leftCellArraySize *= 2; leftCellArray = realloc(leftCellArray, sizeof(float) * leftCellArraySize); } if( leftCellCopy != NULL && leftCellCopy->ColIndex < x ) leftCellCopy = leftCellCopy->ColNext; if( leftCellCopy != NULL && leftCellCopy->ColIndex == x && leftCellCopy->RowIndex == row ) { leftCellArray[newDimensions] = leftCellCopy->Value; ++newDimensions; } }

Sven Groot 29

ANNIHILATION (CONT’D) Initialization (cont’d)

rightArraySize = 100; rightArray = malloc(sizeof(float*) * rightArraySize); newDimensions = 0; leftCellCopy = leftCell; for( x = 0; x < dimensions; ++x ) { if( newDimensions >= rightArraySize ) { rightArraySize *= 2; rightArray = realloc(rightArray, sizeof(float) * rightArraySize); } if( leftCellCopy != NULL && leftCellCopy->ColIndex < x ) leftCellCopy = leftCellCopy->ColNext; if( leftCellCopy != NULL && leftCellCopy->ColIndex == x && leftCellCopy->RowIndex == row ) { rightArray[newDimensions] = right[x]; ++newDimensions; } }

Sven Groot 30

ANNIHILATION (CONT’D) Transformed main loop

for( x = 0; x < newDimensions; ++x ) { result[row][col] += leftCellArray[x] * rightArray[x][col]; }

Sven Groot 31

ANNIHILATION (CONT’D) Inner loop (alternative)

while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; }

Sven Groot 32

ANNIHILATION (CONT’D) Initialization (alternative) leftCellArraySize = 100; leftCellArray = malloc(sizeof(float) * leftCellArraySize); newDimensions = 0; leftCellCopy = leftCell; while( leftCellCopy != NULL ) { if( newDimensions >= leftCellArraySize ) { leftCellArraySize *= 2; leftCellArray = realloc(leftCellArray, sizeof(float) * leftCellArraySize); } leftCellArray[newDimensions] = leftCellCopy->Value; ++newDimensions; leftCellCopy = leftCellCopy->ColNext; }

rightArraySize = 100; rightArray = malloc(sizeof(float*) * rightArraySize); newDimensions = 0; leftCellCopy = leftCell; while( leftCellCopy != NULL ) { if( newDimensions >= rightArraySize ) { rightArraySize *= 2; rightArray = realloc(rightArray, sizeof(float) * rightArraySize); } rightArray[newDimensions] = right[leftCell->ColIndex]; ++newDimensions; leftCellCopy = leftCellCopy->ColNext; }

left

Cel

lri

ght

Sven Groot 33

ANNIHILATION (CONT’D) Transformed main loop (alternative) for( leftCellCounter = 0; leftCellCounter < newDimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * rightArray[leftCellCounter][col]; }

Sven Groot 34

POST-INITIALIZATION Pre-initialization: before the main loop Post-initialization: after the main loop Needed when an expression that needs

to be transformed is written to Needs to use index expression Fill-in value not needed

Sven Groot 35

POST-INITIALIZATION (CONT’D) Example

Result

while( node != NULL ) { node->Value = node->Value * 2; node = node->Next; }

nodeArray = malloc(size * sizeof(int)); nodeCopy = node; memset(nodeArray, 0, size * sizeof(int)); while( nodeCopy != NULL ) { nodeArray[nodeCopy] = nodeCopy->Value; nodeCopy = nodeCopy->Next; }

for( nodeCounter = 0; nodeCounter < size; ++ nodeCounter ) { nodeArray[nodeCounter] = nodeArray[nodeCounter] * 2; }

nodeCopy = node; while( nodeCopy != NULL ) { nodeCopy->Value = nodeArray[nodeCopy->Index]; nodeCopy = nodeCopy->Next; }

Pre

-init

Main

loop

Post

-init

Sven Groot 36

AUTOMATED TRANSFORMATION Seven steps

1. Find candidate structures2. Analyze usage of these structures in the code3. Determine transformation safety4. Identify data members5. Generate dense data structures6. Transform7. Loop extraction

Code must be normalized

Sven Groot 37

CONDITIONS The linked list expression must not have side effects Loop termination control must be trivial The linked list iteration statement may be the only statement

in the loop body that modifies the linked list expression The “next” pointer member may not be a data member Any expression, other than the linked list iteration statement,

that might be moved to an initialization loop may not have side effects, and use only constants, loop-invariant values, linked list members and loop control variables

If the linked list expression is guarded, it must be possible to move that entire guard, including both the true and false parts, to the initialization loop.

When performing annihilation on a semi-dense loop, there must be a single guard that covers all statements in the loop body except for the linked list iteration statement and its guard, and statements related to loop control (such as those that increment the counter).

Sven Groot 38

TRANSFORMATION DIRECTIVES Fill in gaps in the compiler’s knowledge Embedded in source code as comments Examples

SAFE_CODE, UNSAFE_CODESAFE_LOOP, UNSAFE_LOOPDENSE_INDEXDENSE_DIMENSIONSFILL_INEtc.

Sven Groot 39

TRANSFORMATION DIRECTIVES (CONT’D) Example

/***SAFE_CODE***/ /***DENSE_INDEX(node, node->Index)***/ /***DENSE_DIMENSION(node, size)***/ while( node != NULL ) { node->Value = node->Value * 2; node = node->Next; } /***UNSAFE_CODE***/

Sven Groot 40

EXPERIMENTATION Tested on three matrices Sublimation code: ran through MT1 Annihilation code: loop interchange Used tools: Intel C Compiler, Intel

FORTRAN Compiler Test system: Dual Intel Xeon 3.06GHz,

1GB RAM

Sven Groot 41

EXPERIMENTATION (CONT’D)

sherman3 e40r5000 af23560

Size 5005x5005 17281x17281 23560x23560

Non-zero elements 20033 553965 484256

Density 0.080% 0.186% 0.087%

Results (seconds)

Original algorithm 22.813 s 266.404 s 443.982 s

Alternative algorithm 2.138 s 46.108 s 29.940 s

Annihilation 0.195 s 1.742 s 2.744 s

MT1 (Fortran) 0.095 s 5.228 s 2.216 s

MT1 (C) 0.124 s 5.153 s 3.314 s

Sven Groot 42

EXPERIMENTATION (CONT’D)

sherman3 e40r5000 af235600

50

100

150

200

250

300

Alternative al-gorithmAnnihilationMT1 (fortran)MT1 (C)

SPARSE CODE OPTIMIZATION

Automatic transformation of linked list pointer structures

Sven Groot

top related