![Page 1: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/1.jpg)
Sparse Matrix Dense Vector Multiplication
byPedro A. Escallon
Parallel Processing ClassFlorida Institute of Technology
April 2002
![Page 2: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/2.jpg)
The Problem
• Improve the speed of sparse matrix - dense vector multiplication using MPI in a beowolf parallel computer.
![Page 3: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/3.jpg)
What To Improve
• Current algorithms use excessive indirect addressing
• Current optimizations depend on the structure of the matrix (distribution of the nonzero elements)
![Page 4: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/4.jpg)
Sparse Matrix Representations
• Coordinate format• Compressed Sparse Row (CSR)• Compressed Sparse Column (CSC)• Modified Sparse Row (MSR)
![Page 5: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/5.jpg)
Compressed Sparse Row (CSR)
0 A01 A02 0
0 A11 0 A13
A20 0 0 0
0 2 4 5
0 2 1 3 0
A01 A02 A11 A13 A20
rS
ndx
val
![Page 6: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/6.jpg)
CSR Code
void sparseMul(int m, double *val, int *ndx, int *rS, double *x, double *y){ int i,j; for(i=0;i<m;i++) { for(j=rowStart[i];j<rS[i+1];j++) { y[i]+=(*val++)*x[*ndx++]; } }}
![Page 7: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/7.jpg)
Goals
• Eliminate indirect addressing• Remove the dependency on the distribution
of the nonzero elements• Further compress the matrix storage• Most of all, to speed up the operation
![Page 8: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/8.jpg)
Proposed Solution
{0,0} {1,A01} {2,A02} {-1,0} {1,A11} {3,A13} {-2,A20}
0 A01 A02 0
0 A11 0 A13
A20 0 0 0
A =
![Page 9: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/9.jpg)
Data Structure
typedef struct { int rCol; double val;} dSparS_t;
{rCol,val}
![Page 10: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/10.jpg)
Process
0 1 3 p
local_size
hdr.size
…
residual < p
local_size – hdr.size / presidual = hdr.size % p
A
![Page 11: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/11.jpg)
Scatter
0 1 2 p
local_size
…A
0 1 2 p…local_A
![Page 12: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/12.jpg)
Multiplication Codeif( (index=local_A[0].rCol) > 0 ) local_Y[0].val = local_A[0].val * X[index];else local_Y[0].val = local_A[0].val * X[0];local_Y[0].rCol = -1;k=1; h=0;while(k<local_size) { while((0<(index=local_A[k].rCol)) && (k<local_size))
local_Y[h].val += local_A[k++].val * X[index]; if(k<local_size) {
local_Y[h++].rCol = -index-1;local_Y[h].val = local_A[k++].val * X[0];
}}local_Y[h].rCol = local_Y[-1+h++].rCol+1;while(h < stride) local_Y[h++].rCol = -1;
![Page 13: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/13.jpg)
Multiplication
local_size
local_A
stri d
e
local_Y
doam
in
Ran
g e
X
=*
![Page 14: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/14.jpg)
Algorithm
local_A
X
Y.val
Y.rCol
{r0,v0}0
X[0]
=X[0]*v00
-
{c1,v1}0
X[c01]
+=X[c01]*v01
-
.. {r1,v0}1
.. X[0]
=X[0]*v00
-
{c2,v2}0
X[c02]
+=X[c02]*v02
-r1-1
{c1,v1}1
X[c11]
+=X[c11]*v11
-
![Page 15: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/15.jpg)
Gather
…
0 1 2 p…local_Y
residual
gatherBuffer
split element striderange
![Page 16: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/16.jpg)
Consolidation of Split Rows
…
residual
Y
nCols
…
+=
gatherBuffer
![Page 17: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/17.jpg)
Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 10
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.103930 2.380285 0.096051 0.012123
P1 0.107588 0.457140 0.012000 0.011504
P2 0.107667 0.706087 0.012022 0.011642
P3 0.103155 0.951814 0.011971 0.011560
P4 0.107644 1.206376 0.012210 0.011536
P5 0.109243 1.452563 0.012032 0.011506
P6 0.108477 1.702571 0.012044 0.011506
P7 0.109446 1.948481 0.012004 0.011658
P8 0.055822 2.208924 0.012079 0.011540
P9 0.059023 2.459900 0.012009 0.011438
![Page 18: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/18.jpg)
Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 8
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.089478 2.264316 0.121741 0.014860
P1 0.093083 0.569091 1.711789 0.014105
P2 0.093217 0.866460 1.429352 0.014227
P3 0.091012 1.160591 1.146954 0.014457
P4 0.081719 1.462335 0.865520 0.014365
P5 0.085375 1.756941 0.582353 0.014341
P6 0.085418 2.055651 0.299847 0.014362
P7 0.089087 2.350998 0.017813 0.014728
vavasis3.rua - Total non-zero values: 1,683,902 - p = 1
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.000002 1.412774 0.033015 0.112132
![Page 19: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/19.jpg)
Results (vavasis3)vavasis3.rua - Total non-zero values: 1,683,902 - p = 4
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.051980 3.026846 0.217574 0.028587
P1 0.055605 1.725272 1.027928 0.028258
P2 0.055703 2.319343 0.451021 0.028141
P3 0.056422 3.212518 0.018073 0.027988
vavasis3.rua - Total non-zero values: 1,683,902 - p = 2
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.233200 5.810814 0.426097 0.056334
P1 0.236864 6.521328 0.032125 0.055866
![Page 20: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/20.jpg)
Results (vavasis3)
P Computation Speedup E_p Gather C_p
1 0.112132 --- --- 0.033015 1.294430
2 0.056334 1.990485 0.995243 0.426097 8.563763
4 0.028587 3.922482 0.980621 1.027928 36.957883
8 0.014860 7.545895 0.943237 1.711789 116.194415
10 0.012123 9.249526 0.924953 0.096051 8.923039
vavasis3.rua - Calculated Results
![Page 21: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/21.jpg)
Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 10
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.046136 0.093143 0.011733 0.000926
P1 0.048824 0.018207 0.001567 0.000423
P2 0.048627 0.027146 0.002054 0.000456
P3 0.044416 0.034386 0.002440 0.000445
P4 0.048214 0.046365 0.002457 0.000397
P5 0.048481 0.053511 0.001978 0.000425
P6 0.045666 0.063204 0.002015 0.000467
P7 0.048173 0.070167 0.002440 0.000419
P8 0.033947 0.088532 0.002323 0.000395
P9 0.032110 0.097866 0.001959 0.000479
![Page 22: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/22.jpg)
Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 8
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.040159 0.103422 0.011810 0.001020
P1 0.042743 0.023353 0.001728 0.000549
P2 0.042709 0.035670 0.001777 0.000607
P3 0.039322 0.047141 0.001738 0.000599
P4 0.041584 0.064024 0.001724 0.000702
P5 0.039229 0.075528 0.001725 0.000568
P6 0.037206 0.089757 0.001733 0.000565
P7 0.039912 0.101267 0.002111 0.000541
bayer02.rua - Total non-zero values: 63,679 - p = 1
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.000003 0.063824 0.010975 0.006090
![Page 23: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/23.jpg)
Results (bayer02)bayer02.rua - Total non-zero values: 63,679 - p = 4
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.049680 0.096930 0.018308 0.001888
P1 0.052379 0.048924 0.003765 0.001555
P2 0.051944 0.076405 0.003609 0.001561
P3 0.046413 0.101871 0.003636 0.001528
bayer02.rua - Total non-zero values: 63,679 - p = 2
Broadcast Time Scatter Time Gather Time Computation Time
P0 0.025494 0.520611 0.008192 0.003445
P1 0.028157 0.504081 0.032848 0.003121
![Page 24: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/24.jpg)
Results (bayer02)
P Computation Speedup E_p Gather C_p
1 0.006090 --- --- 0.010975 2.802135
2 0.003445 1.767779 0.883890 0.032848 10.534978
4 0.001888 3.225636 0.806409 0.018308 10.697034
8 0.001020 5.970588 0.746324 0.011810 12.578431
10 0.000926 6.576674 0.657667 0.011733 13.670626
bayer02.rua - Calculated Results
![Page 25: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/25.jpg)
Conclusions
• The proposed representation speeds up the matrix calculation
• Data mismatch solution before gather should be improved
• There seems to be a communication penalty for using moving structured data
![Page 26: Sparse Matrix Dense Vector Multiplication by Pedro A. Escallon Parallel Processing Class Florida Institute of Technology April 2002](https://reader033.vdocuments.site/reader033/viewer/2022051104/5a4d1b647f8b9ab0599af416/html5/thumbnails/26.jpg)
Bibliography
• “Optimizing the Performance of Sparse Matrix-Vector Multiplication” dissertation by Eun-Jin Im.
• “Iterative Methods for Sparse Linear Systems” by Yousef Saad
• “Users’ Guide for the Harwell-Boeing Sparse Matrix Collection” by Iain S. Duff