sparse matrix dense vector multiplication by pedro a. escallon parallel processing class florida...

26
Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Upload: sheila-lauren-stone

Post on 18-Jan-2018

233 views

Category:

Documents


0 download

DESCRIPTION

What To Improve Current algorithms use excessive indirect addressing Current optimizations depend on the structure of the matrix (distribution of the nonzero elements)

TRANSCRIPT

Page 1: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Sparse Matrix Dense Vector Multiplication

byPedro A. Escallon

Parallel Processing ClassFlorida Institute of Technology

April 2002

Page 2: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

The Problem

• Improve the speed of sparse matrix - dense vector multiplication using MPI in a beowolf parallel computer.

Page 3: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

What To Improve

• Current algorithms use excessive indirect addressing

• Current optimizations depend on the structure of the matrix (distribution of the nonzero elements)

Page 4: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Sparse Matrix Representations

• Coordinate format• Compressed Sparse Row (CSR)• Compressed Sparse Column (CSC)• Modified Sparse Row (MSR)

Page 5: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Compressed Sparse Row (CSR)

0 A01 A02 0

0 A11 0 A13

A20 0 0 0

0 2 4 5

0 2 1 3 0

A01 A02 A11 A13 A20

rS

ndx

val

Page 6: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

CSR Code

void sparseMul(int m, double *val, int *ndx, int *rS, double *x, double *y){ int i,j; for(i=0;i<m;i++) { for(j=rowStart[i];j<rS[i+1];j++) { y[i]+=(*val++)*x[*ndx++]; } }}

Page 7: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Goals

• Eliminate indirect addressing• Remove the dependency on the distribution

of the nonzero elements• Further compress the matrix storage• Most of all, to speed up the operation

Page 8: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Proposed Solution

{0,0} {1,A01} {2,A02} {-1,0} {1,A11} {3,A13} {-2,A20}

0 A01 A02 0

0 A11 0 A13

A20 0 0 0

A =

Page 9: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Data Structure

typedef struct { int rCol; double val;} dSparS_t;

{rCol,val}

Page 10: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Process

0 1 3 p

local_size

hdr.size

residual < p

local_size – hdr.size / presidual = hdr.size % p

A

Page 11: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Scatter

0 1 2 p

local_size

…A

0 1 2 p…local_A

Page 12: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Multiplication Codeif( (index=local_A[0].rCol) > 0 ) local_Y[0].val = local_A[0].val * X[index];else local_Y[0].val = local_A[0].val * X[0];local_Y[0].rCol = -1;k=1; h=0;while(k<local_size) { while((0<(index=local_A[k].rCol)) && (k<local_size))

local_Y[h].val += local_A[k++].val * X[index]; if(k<local_size) {

local_Y[h++].rCol = -index-1;local_Y[h].val = local_A[k++].val * X[0];

}}local_Y[h].rCol = local_Y[-1+h++].rCol+1;while(h < stride) local_Y[h++].rCol = -1;

Page 13: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Multiplication

local_size

local_A

stri d

e

local_Y

doam

in

Ran

g e

X

=*

Page 14: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Algorithm

local_A

X

Y.val

Y.rCol

{r0,v0}0

X[0]

=X[0]*v00

-

{c1,v1}0

X[c01]

+=X[c01]*v01

-

.. {r1,v0}1

.. X[0]

=X[0]*v00

-

{c2,v2}0

X[c02]

+=X[c02]*v02

-r1-1

{c1,v1}1

X[c11]

+=X[c11]*v11

-

Page 15: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Gather

0 1 2 p…local_Y

residual

gatherBuffer

split element striderange

Page 16: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Consolidation of Split Rows

residual

Y

nCols

+=

gatherBuffer

Page 17: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 10

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.103930 2.380285 0.096051 0.012123

P1 0.107588 0.457140 0.012000 0.011504

P2 0.107667 0.706087 0.012022 0.011642

P3 0.103155 0.951814 0.011971 0.011560

P4 0.107644 1.206376 0.012210 0.011536

P5 0.109243 1.452563 0.012032 0.011506

P6 0.108477 1.702571 0.012044 0.011506

P7 0.109446 1.948481 0.012004 0.011658

P8 0.055822 2.208924 0.012079 0.011540

P9 0.059023 2.459900 0.012009 0.011438

Page 18: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 8

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.089478 2.264316 0.121741 0.014860

P1 0.093083 0.569091 1.711789 0.014105

P2 0.093217 0.866460 1.429352 0.014227

P3 0.091012 1.160591 1.146954 0.014457

P4 0.081719 1.462335 0.865520 0.014365

P5 0.085375 1.756941 0.582353 0.014341

P6 0.085418 2.055651 0.299847 0.014362

P7 0.089087 2.350998 0.017813 0.014728

vavasis3.rua - Total non-zero values: 1,683,902 - p = 1

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.000002 1.412774 0.033015 0.112132

Page 19: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 4

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.051980 3.026846 0.217574 0.028587

P1 0.055605 1.725272 1.027928 0.028258

P2 0.055703 2.319343 0.451021 0.028141

P3 0.056422 3.212518 0.018073 0.027988

vavasis3.rua - Total non-zero values: 1,683,902 - p = 2

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.233200 5.810814 0.426097 0.056334

P1 0.236864 6.521328 0.032125 0.055866

Page 20: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (vavasis3)

P Computation Speedup E_p Gather C_p

1 0.112132 --- --- 0.033015 1.294430

2 0.056334 1.990485 0.995243 0.426097 8.563763

4 0.028587 3.922482 0.980621 1.027928 36.957883

8 0.014860 7.545895 0.943237 1.711789 116.194415

10 0.012123 9.249526 0.924953 0.096051 8.923039

vavasis3.rua - Calculated Results

Page 21: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 10

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.046136 0.093143 0.011733 0.000926

P1 0.048824 0.018207 0.001567 0.000423

P2 0.048627 0.027146 0.002054 0.000456

P3 0.044416 0.034386 0.002440 0.000445

P4 0.048214 0.046365 0.002457 0.000397

P5 0.048481 0.053511 0.001978 0.000425

P6 0.045666 0.063204 0.002015 0.000467

P7 0.048173 0.070167 0.002440 0.000419

P8 0.033947 0.088532 0.002323 0.000395

P9 0.032110 0.097866 0.001959 0.000479

Page 22: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 8

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.040159 0.103422 0.011810 0.001020

P1 0.042743 0.023353 0.001728 0.000549

P2 0.042709 0.035670 0.001777 0.000607

P3 0.039322 0.047141 0.001738 0.000599

P4 0.041584 0.064024 0.001724 0.000702

P5 0.039229 0.075528 0.001725 0.000568

P6 0.037206 0.089757 0.001733 0.000565

P7 0.039912 0.101267 0.002111 0.000541

bayer02.rua - Total non-zero values: 63,679 - p = 1

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.000003 0.063824 0.010975 0.006090

Page 23: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 4

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.049680 0.096930 0.018308 0.001888

P1 0.052379 0.048924 0.003765 0.001555

P2 0.051944 0.076405 0.003609 0.001561

P3 0.046413 0.101871 0.003636 0.001528

bayer02.rua - Total non-zero values: 63,679 - p = 2

Broadcast Time Scatter Time Gather Time Computation Time

P0 0.025494 0.520611 0.008192 0.003445

P1 0.028157 0.504081 0.032848 0.003121

Page 24: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Results (bayer02)

P Computation Speedup E_p Gather C_p

1 0.006090 --- --- 0.010975 2.802135

2 0.003445 1.767779 0.883890 0.032848 10.534978

4 0.001888 3.225636 0.806409 0.018308 10.697034

8 0.001020 5.970588 0.746324 0.011810 12.578431

10 0.000926 6.576674 0.657667 0.011733 13.670626

bayer02.rua - Calculated Results

Page 25: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Conclusions

• The proposed representation speeds up the matrix calculation

• Data mismatch solution before gather should be improved

• There seems to be a communication penalty for using moving structured data

Page 26: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002

Bibliography

• “Optimizing the Performance of Sparse Matrix-Vector Multiplication” dissertation by Eun-Jin Im.

• “Iterative Methods for Sparse Linear Systems” by Yousef Saad

• “Users’ Guide for the Harwell-Boeing Sparse Matrix Collection” by Iain S. Duff