![Page 1: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/1.jpg)
1
Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix
Makoto Yamashita @ Tokyo-TechKatsuki Fujisawa @ Chuo UniversityMituhiro Fukuda @ Tokyo-TechYoshiaki Futakata @ University of VirginiaKazuhiro Kobayashi @ National Maritime Research InstituteMasakazu Kojima @ Tokyo-TechKazuhide Nakata @ Tokyo-TechMaho Nakata @ RIKEN
ISMP 2009 @ Chicago [2009/08/26]
![Page 2: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/2.jpg)
2
Extremely Large SDPs Arising from various fields
Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems
Most computation time is related to Schur complement matrix (SCM)
[SDPARA]Parallel computation for SCM In particular, sparse SCM
![Page 3: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/3.jpg)
3
Outline
1. SemiDefinite Programming and Schur complement matrix
2. Parallel Implementation3. Parallel for Sparse Schur complement4. Numerical Results5. Future works
![Page 4: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/4.jpg)
4
Standard form of SDP
![Page 5: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/5.jpg)
5
Primal-Dual Interior-Point Methods
![Page 6: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/6.jpg)
6
Computation for Search Direction
Schur complement matrix ⇒ Cholesky Factorizaiton
Exploitation of Sparsity in 1.ELEMENTS
2.CHOLESKY
![Page 7: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/7.jpg)
7
Bottlenecks on Single Processor
Apply Parallel Computation to the Bottlenecks
in secondOpteron 246 (2.0GHz)
LiOH HF
m 10592 15018
ELEMENTS 6150( 43%) 16719( 35%)
CHOLESKY 7744( 54%) 20995( 44%)
TOTAL 14250(100%) 47483(100%)
![Page 8: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/8.jpg)
8
SDPARA SDPA parallel version
(generic SDP solver) MPI & ScaLAPACK
Row-wise distribution for ELEMENTS parallel Cholesky factorization for CHOLESKY
http://sdpa.indsys.chuo-u.ac.jp/sdpa/
![Page 9: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/9.jpg)
9
Row-wise distribution for evaluation of the Schur complement matrix
4 CPU is availableEach CPU computes only their assigned rows
. No communication between CPUsEfficient memory management
![Page 10: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/10.jpg)
10
Parallel Cholesky factorization We adopt Scalapack for the Cholesky factorization of t
he Schur complement matrix We redistribute the matrix from row-wise to two-dimen
sional block-cyclic distribtuion
Redistribution
![Page 11: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/11.jpg)
11
Computation time on SDP from Quantum Chemistry [LiOH]
14250
3514969
414
61501654
30884
7744
1186357
141
1
10
100
1000
10000
100000
1 4 16 64#processors
second TOTAL
ELEMENTSCHOLESKY
AIST super clusterOpteron 246 (2.0GHz)
6GB memory/node
![Page 12: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/12.jpg)
12
Sclability on SDP from Quantum Chemistry [NF]
1
10
100
1 2 4 8 16 32 64#processors
scalability TOTAL
ELEMENTSCHOLESKY
Total 29 times
ELEMENTS 63 times
CHOLESKY 39 times
ELEMENTS is very effective
![Page 13: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/13.jpg)
13
Sparse Schur complement matrix
Schur complement matrix becomes very sparse for some applications.
⇒Simple Row-wise loses its efficiencyfrom Control Theory(100%) from Sensor Network(2.12%)
![Page 14: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/14.jpg)
14
Sparseness ofSchur complement matrix
Many applications havediagonal block structure
![Page 15: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/15.jpg)
15
Exploitation of Sparsityin SDPA
We change the formula by row-wise
F1
F2
F3
![Page 16: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/16.jpg)
16
ELEMENTS forSparse Schur complement
150 40 30 20
135 20
70 10
50 5
30
3
Load on each CPU
CPU1:190
CPU2:185
CPU3:188
![Page 17: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/17.jpg)
17
CHOLESKY forSparse Schur complement Parallel Sparse Cholesky factorization implemente
d in MUMPS MUMPS adopts Multiple Frontal method
150 40 30 20
135 20
70 10
50 5
30
3
Memory storage on each processor should
be consecutive.
The distribution for ELEMENTS matches
this method.
![Page 18: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/18.jpg)
18
Computation time for SDPs from Polynomial Optimization Problem
1126645 486 479
270 251
411207
10555
2916
664391
243 336179 188
1
10
100
1000
10000
1 2 4 8 16 32#processors
second TOTAL
ELEMENTSCHOLESKY
tsubasaXeon E5440 (2.83GHz)
8GB memory/node
Parallel Sparse Cholesky achieves mild scalability.ELEMENTS attains 24x speed-up on 32 CPUs.
![Page 19: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/19.jpg)
19
ELEMENTS Load-balance on 32 CPUs
Only first processor has a little heavier computation.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Processor Number
Tim
e(se
cond
)
0
200000
400000
600000
800000
1000000
1200000
1400000
#dis
trib
uted
ele
men
ts
Time(second) #distributed elements
![Page 20: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/20.jpg)
20
Automatic selection ofsparse / dense SCM Dense Parallel Cholesk
y achieves higher scalability than Sparse Parallel Cholesky
Dense becomes better for many processors.
We estimate both computation time using computation cost and scalability. 1
10
1 2 4 8 16 32#processors
second auto
densesparse
![Page 21: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/21.jpg)
21
Sparse/Dense CHOLESKY for a small SDP from POP
70 52 4424
14 14
13663
3523
13 13
71 52 44 36 30 30
1
10
100
1000
1 2 4 8 16 32#processors
second auto
densesparse
tsubasaXeon E5440 (2.83GHz)
8GB memory/node
Only on 4 CPUs, the auto selection failed.(since scalability on sparse cholesky
is unstable on 4 CPUs.)
![Page 22: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/22.jpg)
22
Numerical Results
Comparison with PCSDP Sensor Network Problem
generated by SFSDP Multi Threading
Quantum Chemistry
![Page 23: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/23.jpg)
23
SDPs from Sensor Network#sensors 1,000 (m=16,450: density 1.23%)
#CPU 1 2 4 8 16
SDPARA 28.2 22.1 16.7 13.8 27.3
PCSDP M.O. 1527 887 591 368
#sensors 35,000 (m=527,096: density )
#CPU 1 2 4 8 16
SDPARA 1080 845 614 540 506
PCSDP Memory Over. if #sensors >= 4,000
(time unit : second)
![Page 24: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/24.jpg)
24
MPI + Multi Threading for Quantum Chemistry
N.4P.DZ.pqgt11t2p(m=7230)
5376336206
5803
2785418134
2992
142739190
1630
78954729
931
46502479
565
100
1000
10000
100000
1 2 4 8 16
#nodes
PCSDPSDPARA(1)SDPARA(2)SDPARA(4)SDPARA(8)
seco
nd
64x speed-up on [16nodesx8threads]
![Page 25: Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814a3c550346895db755c2/html5/thumbnails/25.jpg)
25
Concluding Remarks & Future works
1. New parallel schemes for sparse Schur complement matrix
2. Reasonable Scalability3. Extremely large-scale SDPs with sparse Sc
hur complement matrix
Improvement on Multi-Threading for sparse Schur complement matrix