sparse code optimization

43
SPARSE CODE OPTIMIZATION Automatic transformation of linked list pointer structures Sven Groot

Upload: yamka

Post on 04-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Sparse code optimization. Automatic transformation of linked list pointer structures. Sven Groot. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sparse code optimization

SPARSE CODE OPTIMIZATION

Automatic transformation of linked list pointer structures

Sven Groot

Page 2: Sparse code optimization

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Page 3: Sparse code optimization

Sven Groot 3

RESTRUCTURING COMPILER Special type of optimizing compiler Restructures source code (e.g. to enable

vectorization or parallelization) Techniques:

Loop interchangeStrip miningLoop collapsingLoop fusion and fissionData structure transformation

Sparse compiler is a type of restructuring compiler

Page 4: Sparse code optimization

Sven Groot 4

RESTRUCTURING COMPILER (CONT’D) Loop interchange example

void MatrixMultiply1(float **result, float **left, float **right, int size, int rightWidth){ int row, col, x; for( row = 0; row < size; ++row ) { for( col = 0; col < rightWidth; ++col ) { for( x = 0; x < size; ++x ) { result[row][col] += left[row][x] * right[x][col]; } } }}

Page 5: Sparse code optimization

Sven Groot 5

RESTRUCTURING COMPILER (CONT’D) Loop interchange result

void MatrixMultiply2(float **result, float **left, float **right, int size, int rightWidth){ int row, col, x; for( row = 0; row < size; ++row ) { for( x = 0; x < size; ++x ) { for( col = 0; col < rightWidth; ++col ) { result[row][col] += left[row][x] * right[x][col]; } } }}

Page 6: Sparse code optimization

Sven Groot 6

RESTRUCTURING COMPILERS (CONT’D) Pointers pose problems

void MatrixMultiply3(float **result, float **left, float **right, int size, int rightWidth){ int row, col; float **tempRight; float *tempLeft;

for( row = 0; row < size; ++row ) { for( col = 0; col < rightWidth; ++col ) { tempRight = right; tempLeft = left[row]; while( tempRight < right + size ) { result[row][col] += *tempLeft * (*tempRight)[col]; ++tempRight; ++tempLeft; } } }}

Page 7: Sparse code optimization

Sven Groot 7

LINKED LIST MATRIX

Matrix ColHeadIndex=1

Col ColHeadIndex=2

Next ColHeadIndex=3

Next

RowHeadIndex=1

Row

RowHeadIndex=3

Next

CellCell

Cell

CellColNext

Cell

CellCell

Cell

CellColNext

RowNext

110 000 101

Page 8: Sparse code optimization

Sven Groot 8

LINKED LIST MATRIX (CONT’D)struct Cell { float Value; int ColIndex; int RowIndex; struct Cell *RowNext; // Cell in the next row struct Cell *ColNext; // Cell in the next column};struct RowHead{ int RowIndex; struct Cell *Cell; struct RowHead *Next;};struct ColHead{ int ColIndex; struct Cell *Cell; struct ColHead *Next;};struct Matrix{ int Dimensions; struct ColHead *Col; struct RowHead *Row;};

Page 9: Sparse code optimization

Sven Groot 9

LINKED LIST MATRIX (CONT’D) Matrix multiplication using linked lists

void MatrixMultiply(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col, row, x; for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row; for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } } } } }}

Page 10: Sparse code optimization

Sven Groot 10

LINKED LIST MATRIX (CONT’D) Alternative matrix multiplication

int MatrixMultiplyAlternative(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col; for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row; while( leftRow != NULL ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } } return 0;}

Page 11: Sparse code optimization

Sven Groot 11

TRANSFORMATION The goal: remove all references to the

linked list from the loop The means: move linked list references

into initialization loop Initialization copies linked list contents

into array Transformed loop uses array Two methods, sublimation and

annihilation Must be done automatically

Page 12: Sparse code optimization

Sven Groot 12

SUBLIMATION Transforming the innermost loop

for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } }

Page 13: Sparse code optimization

Sven Groot 13

SUBLIMATION (CONT’D) Initialization

Transformed main loop

leftCellArray = malloc(sizeof(float) * dimensions); for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArray[x] = leftCell->Value; } else leftCellArray[x] = 0; }

for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArray[x] * right[x][col]; }

Page 14: Sparse code optimization

Sven Groot 14

SUBLIMATION (CONT’D) Transforming the inner loop (alternative)

Initialization

Transformed main loop

while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; }

leftCellArray = malloc(sizeof(float) * dimensions); memset(leftCellArray, 0, sizeof(float) * dimensions); while( leftCell != NULL ) { leftCellArray[leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; }

for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * right[leftCellCounter][col]; }

Page 15: Sparse code optimization

Sven Groot 15

LOOP EXTRACTION Putting it in context for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell;

leftCellArray = malloc(sizeof(float) * dimensions); for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArray[x] = leftCell->Value; } else leftCellArray[x] = 0; }

for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArray[x] * right[x][col]; }

free(leftCellArray); } }

init

ializ

ati

on

Main

lo

op

Page 16: Sparse code optimization

Sven Groot 16

LOOP EXTRACTION (CONT’D) Initialization for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next;

leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions);

if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } }

Page 17: Sparse code optimization

Sven Groot 17

LOOP EXTRACTION (CONT’D) Transformed main loop for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } }

Page 18: Sparse code optimization

Sven Groot 18

LOOP EXTRACTION (CONT’D) Putting it in context (alternative) while( leftRow != NULL ) { leftCell = leftRow->Cell;

leftCellArray = malloc(sizeof(float) * dimensions); memset(leftCellArray, 0, sizeof(float) * dimensions); while( leftCell != NULL ) { leftCellArray[leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; }

for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * right[leftCellCounter][col]; }

free(leftCellArray);

leftRow = leftRow->Next; }

Page 19: Sparse code optimization

Sven Groot 19

LOOP EXTRACTION (CONT’D) Initialization (alternative)

Transformed main loop

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float));

if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRowCounter][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } }

Page 20: Sparse code optimization

Sven Groot 20

LOOP EXTRACTION (CONT’D) Once more, in context

Page 21: Sparse code optimization

Sven Groot 21

for( col = 0; col < rightWidth; ++col ) { leftRow = left.Row;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions); if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } }

for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } }

for( row = 0; row < dimensions; ++row ) free(leftCellArrayArray[row]); free(leftCellArrayArray); }

init

ializ

ati

on

main

loop

Page 22: Sparse code optimization

Sven Groot 22

for( col = 0; col < dimensions; ++col ) { leftRow = left.Row;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float)); if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) free(leftCellArrayArray[leftRowCounter]); free(leftCellArrayArray); }

init

ializ

ati

on

main

loop

Page 23: Sparse code optimization

Sven Groot 23

TRANSFORMATION RESULT

Page 24: Sparse code optimization

Sven Groot 24

void MatrixMultiplySublimation(struct Matrix left, float** right, float **result, int rightWidth) { struct RowHead *leftRow = left.Row; struct Cell *leftCell; float **leftCellArrayArray; int dimensions = left.Dimensions; int col, row, x; leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( row = 0; row < dimensions; ++row ) { if( leftRow != NULL && leftRow->RowIndex < row ) leftRow = leftRow->Next; leftCellArrayArray[row] = malloc(sizeof(float*) * dimensions); memset(leftCellArrayArray[row], 0, sizeof(float) * dimensions); if( leftRow != NULL && leftRow->RowIndex == row ) { leftCell = leftRow->Cell; for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { leftCellArrayArray[row][x] = leftCell->Value; } else leftCellArrayArray[row][x] = 0; } } } for( col = 0; col < rightWidth; ++col ) { for( row = 0; row < dimensions; ++row ) { for( x = 0; x < dimensions; ++x ) { result[row][col] += leftCellArrayArray[row][x] * right[x][col]; } } } for( row = 0; row < dimensions; ++row ) free(leftCellArrayArray[row]); free(leftCellArrayArray);}

Generateddeclaratio

n

init

ializ

ati

on

main

loop

Page 25: Sparse code optimization

Sven Groot 25

void MatrixMultiplyAlternativeSublimation(struct Matrix left, float **right, float **result, int rightWidth){ struct RowHead *leftRow = left.Row; struct Cell *leftCell; int dimensions = left.Dimensions; int col; float **leftCellArrayArray; int leftCellCounter, leftRowCounter;

leftCellArrayArray = malloc(sizeof(float*) * dimensions); for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { leftCellArrayArray[leftRowCounter] = malloc(dimensions * sizeof(float)); memset(leftCellArrayArray[leftRowCounter], 0, dimensions * sizeof(float)); if( leftRow != NULL && leftRowCounter == leftRow->RowIndex ) { leftCell = leftRow->Cell; while( leftCell != NULL ) { leftCellArrayArray[leftRowCounter][leftCell->ColIndex] = leftCell->Value; leftCell = leftCell->ColNext; } leftRow = leftRow->Next; } }

for( col = 0; col < rightWidth; ++col ) { for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) { for( leftCellCounter = 0; leftCellCounter < dimensions; ++leftCellCounter ) { result[leftRowCounter][col] += leftCellArrayArray[leftRowCounter][leftCellCounter] * right[leftCellCounter][col]; } } }

for( leftRowCounter = 0; leftRowCounter < dimensions; ++leftRowCounter ) free(leftCellArrayArray[leftRowCounter]); free(leftCellArrayArray);}

Generateddeclaration

s

init

ializ

ati

on

main

loop

Page 26: Sparse code optimization

Sven Groot 26

ANNIHILATION Alternative method of transformation No fill-in: omitted values stay omitted Sublimation:

Sparse loop: more iterationsSemi-dense loop: same number of iterations

AnnihilationSparse loop: same number of iterationsSemi-dense loop: less iterations

Can require other transformations

Page 27: Sparse code optimization

Sven Groot 27

ANNIHILATION (CONT’D) Recall the innermost loop

for( x = 0; x < dimensions; ++x ) { if( leftCell != NULL && leftCell->ColIndex < x ) leftCell = leftCell->ColNext; if( leftCell != NULL && leftCell->ColIndex == x && leftCell->RowIndex == row ) { result[row][col] += leftCell->Value * right[x][col]; } }

Page 28: Sparse code optimization

Sven Groot 28

ANNIHILATION (CONT’D) Initialization

leftCellArraySize = 100; leftCellArray = malloc(sizeof(float) * leftCellArraySize); newDimensions = 0; leftCellCopy = leftCell; for( x = 0; x < dimensions; ++x ) { if( newDimensions >= leftCellArraySize ) { leftCellArraySize *= 2; leftCellArray = realloc(leftCellArray, sizeof(float) * leftCellArraySize); } if( leftCellCopy != NULL && leftCellCopy->ColIndex < x ) leftCellCopy = leftCellCopy->ColNext; if( leftCellCopy != NULL && leftCellCopy->ColIndex == x && leftCellCopy->RowIndex == row ) { leftCellArray[newDimensions] = leftCellCopy->Value; ++newDimensions; } }

Page 29: Sparse code optimization

Sven Groot 29

ANNIHILATION (CONT’D) Initialization (cont’d)

rightArraySize = 100; rightArray = malloc(sizeof(float*) * rightArraySize); newDimensions = 0; leftCellCopy = leftCell; for( x = 0; x < dimensions; ++x ) { if( newDimensions >= rightArraySize ) { rightArraySize *= 2; rightArray = realloc(rightArray, sizeof(float) * rightArraySize); } if( leftCellCopy != NULL && leftCellCopy->ColIndex < x ) leftCellCopy = leftCellCopy->ColNext; if( leftCellCopy != NULL && leftCellCopy->ColIndex == x && leftCellCopy->RowIndex == row ) { rightArray[newDimensions] = right[x]; ++newDimensions; } }

Page 30: Sparse code optimization

Sven Groot 30

ANNIHILATION (CONT’D) Transformed main loop

for( x = 0; x < newDimensions; ++x ) { result[row][col] += leftCellArray[x] * rightArray[x][col]; }

Page 31: Sparse code optimization

Sven Groot 31

ANNIHILATION (CONT’D) Inner loop (alternative)

while( leftCell != NULL ) { result[leftRow->RowIndex][col] += leftCell->Value * right[leftCell->ColIndex][col]; leftCell = leftCell->ColNext; }

Page 32: Sparse code optimization

Sven Groot 32

ANNIHILATION (CONT’D) Initialization (alternative) leftCellArraySize = 100; leftCellArray = malloc(sizeof(float) * leftCellArraySize); newDimensions = 0; leftCellCopy = leftCell; while( leftCellCopy != NULL ) { if( newDimensions >= leftCellArraySize ) { leftCellArraySize *= 2; leftCellArray = realloc(leftCellArray, sizeof(float) * leftCellArraySize); } leftCellArray[newDimensions] = leftCellCopy->Value; ++newDimensions; leftCellCopy = leftCellCopy->ColNext; }

rightArraySize = 100; rightArray = malloc(sizeof(float*) * rightArraySize); newDimensions = 0; leftCellCopy = leftCell; while( leftCellCopy != NULL ) { if( newDimensions >= rightArraySize ) { rightArraySize *= 2; rightArray = realloc(rightArray, sizeof(float) * rightArraySize); } rightArray[newDimensions] = right[leftCell->ColIndex]; ++newDimensions; leftCellCopy = leftCellCopy->ColNext; }

left

Cel

lri

ght

Page 33: Sparse code optimization

Sven Groot 33

ANNIHILATION (CONT’D) Transformed main loop (alternative) for( leftCellCounter = 0; leftCellCounter < newDimensions; ++leftCellCounter ) { result[leftRow->RowIndex][col] += leftCellArray[leftCellCounter] * rightArray[leftCellCounter][col]; }

Page 34: Sparse code optimization

Sven Groot 34

POST-INITIALIZATION Pre-initialization: before the main loop Post-initialization: after the main loop Needed when an expression that needs

to be transformed is written to Needs to use index expression Fill-in value not needed

Page 35: Sparse code optimization

Sven Groot 35

POST-INITIALIZATION (CONT’D) Example

Result

while( node != NULL ) { node->Value = node->Value * 2; node = node->Next; }

nodeArray = malloc(size * sizeof(int)); nodeCopy = node; memset(nodeArray, 0, size * sizeof(int)); while( nodeCopy != NULL ) { nodeArray[nodeCopy] = nodeCopy->Value; nodeCopy = nodeCopy->Next; }

for( nodeCounter = 0; nodeCounter < size; ++ nodeCounter ) { nodeArray[nodeCounter] = nodeArray[nodeCounter] * 2; }

nodeCopy = node; while( nodeCopy != NULL ) { nodeCopy->Value = nodeArray[nodeCopy->Index]; nodeCopy = nodeCopy->Next; }

Pre

-init

Main

loop

Post

-init

Page 36: Sparse code optimization

Sven Groot 36

AUTOMATED TRANSFORMATION Seven steps

1. Find candidate structures2. Analyze usage of these structures in the code3. Determine transformation safety4. Identify data members5. Generate dense data structures6. Transform7. Loop extraction

Code must be normalized

Page 37: Sparse code optimization

Sven Groot 37

CONDITIONS The linked list expression must not have side effects Loop termination control must be trivial The linked list iteration statement may be the only statement

in the loop body that modifies the linked list expression The “next” pointer member may not be a data member Any expression, other than the linked list iteration statement,

that might be moved to an initialization loop may not have side effects, and use only constants, loop-invariant values, linked list members and loop control variables

If the linked list expression is guarded, it must be possible to move that entire guard, including both the true and false parts, to the initialization loop.

When performing annihilation on a semi-dense loop, there must be a single guard that covers all statements in the loop body except for the linked list iteration statement and its guard, and statements related to loop control (such as those that increment the counter).

Page 38: Sparse code optimization

Sven Groot 38

TRANSFORMATION DIRECTIVES Fill in gaps in the compiler’s knowledge Embedded in source code as comments Examples

SAFE_CODE, UNSAFE_CODESAFE_LOOP, UNSAFE_LOOPDENSE_INDEXDENSE_DIMENSIONSFILL_INEtc.

Page 39: Sparse code optimization

Sven Groot 39

TRANSFORMATION DIRECTIVES (CONT’D) Example

/***SAFE_CODE***/ /***DENSE_INDEX(node, node->Index)***/ /***DENSE_DIMENSION(node, size)***/ while( node != NULL ) { node->Value = node->Value * 2; node = node->Next; } /***UNSAFE_CODE***/

Page 40: Sparse code optimization

Sven Groot 40

EXPERIMENTATION Tested on three matrices Sublimation code: ran through MT1 Annihilation code: loop interchange Used tools: Intel C Compiler, Intel

FORTRAN Compiler Test system: Dual Intel Xeon 3.06GHz,

1GB RAM

Page 41: Sparse code optimization

Sven Groot 41

EXPERIMENTATION (CONT’D)

sherman3 e40r5000 af23560

Size 5005x5005 17281x17281 23560x23560

Non-zero elements 20033 553965 484256

Density 0.080% 0.186% 0.087%

Results (seconds)

Original algorithm 22.813 s 266.404 s 443.982 s

Alternative algorithm 2.138 s 46.108 s 29.940 s

Annihilation 0.195 s 1.742 s 2.744 s

MT1 (Fortran) 0.095 s 5.228 s 2.216 s

MT1 (C) 0.124 s 5.153 s 3.314 s

Page 42: Sparse code optimization

Sven Groot 42

EXPERIMENTATION (CONT’D)

sherman3 e40r5000 af235600

50

100

150

200

250

300

Alternative al-gorithmAnnihilationMT1 (fortran)MT1 (C)

Page 43: Sparse code optimization

SPARSE CODE OPTIMIZATION

Automatic transformation of linked list pointer structures

Sven Groot