![Page 1: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/1.jpg)
An Efficient Parallel Solver for SDD Linear
Systems
Richard PengM.I.T.
Joint work with Dan Spielman (Yale)
![Page 2: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/2.jpg)
Efficient Parallel Solvers for SDD Linear Systems
Richard PengM.I.T.
Work in progress with Dehua Cheng (USC),Yu Cheng (USC), Yintat Lee (MIT), Yan Liu (USC), Dan Spielman (Yale), and Shanghua Teng (USC)
![Page 3: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/3.jpg)
OUTLINE
• LGx = b
•Why is it hard?• Key Tool• Parallel Solver•Other Forms
![Page 4: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/4.jpg)
LARGE GRAPHS
Images
Algorithmic challenges: How to store?
How to analyze?
How to optimize?
Meshes
Roads
Social networks
![Page 5: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/5.jpg)
GRAPH LAPLACIAN
Row/column vertexOff-diagonal -weightDiagonal weighted degree
11
2
Input: graph Laplacian L, vector bOutput: vector x s.t. Lx ≈ b
Lx=b
n verticesm edges
![Page 6: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/6.jpg)
THE LAPLACIAN PARADIGM
Lx=b
Directly related:Elliptic systems
Few iterations: Eigenvectors,Heat kernels
Many iterations / modify algorithmGraph problemsImage processing
![Page 7: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/7.jpg)
Direct Methods: O(n3)O(n2.3727)Iterative methods: O(nm), O(mκ1/2)Combinatorial Preconditioning• [Vaidya`91]: O(m7/4)• [Boman-Hendrickson`01]: O(mn)• [Spielman-Teng `03, `04]: O(m1.31)O(mlogcn)• [KMP`10][KMP`11][KOSZ 13][LS`13]
[CKMPPRX`14]: O(mlog2n)O(mlog1/2n)
SOLVERS
Lx=b1
1
2
n x n matrixm non-zeros
![Page 8: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/8.jpg)
Nearly-linear work parallel Laplacian solvers• [KM `07]: O(n1/6+a) for planar• [BGKMPT `11]: O(m1/3+a)
PARALLEL SPEEDUPS
Speedups by splitting work• Time: max # of dependent steps• Work: # operations
Common architectures: multicore, MapReduce
![Page 9: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/9.jpg)
OUR RESULT
Input: Graph Laplacian LG with condition number κOutput: Access to operator Z s.t. Z ≈ε LG
-1
Cost: O(logc1m logc2κ log(1/ε)) depth O(m logc1m logc2κ log(1/ε)) work
Note: LG is low rank, omitting pseudoinverses
• Logarithmic dependency on error
• κ ≤ O(n2wmax/wmin)Extension: sparse approximation of LG
p for any -1 ≤ p ≤ 1 with poly(1/ε) dependency
![Page 10: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/10.jpg)
SUMMARY
• Would like to solve LGx = b
• Goal: polylog depth, nearly-linear work
![Page 11: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/11.jpg)
OUTLINE
• LGx = b
•Why is it hard?• Key Tool• Parallel Solver•Other Forms
![Page 12: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/12.jpg)
EXTREME INSTANCES
Highly connected, need global steps
Long paths / tree, need many steps
Solvers must handle both simultaneously
Each easy on their own:
Iterative method Gaussian elimination
![Page 13: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/13.jpg)
PREVIOUS FAST ALGORITHMSCombinatoria
l preconditioni
ng
Spectral sparsification
Tree RoutingLow stretch
spanning trees
Local partitioning
Tree Contraction
Iterative Methods
• Reduce G to a sparser G’• Terminate at a spanning tree
T
• Polynomial in LGLT-1
• Need: LG-1LT
=(LGLT-
1)-1Horner’s method:• degree d O(dlogn) depth• [Spielman-Teng` 04]: d ≈
n1/2
• Fast due to sparser graphs
Focus of subsequent improvements
‘Driver’
![Page 14: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/14.jpg)
If |a| ≤ ρ, κ = (1-ρ)-1 terms give good approximation to (1 – a)-1
POLYNOMIAL APPROXIMATIONS
Division with multiplication:(1 – a)-1 = 1 + a + a2 + a3 + a4 + a5…
• Spectral theorem: this works for marices!• Better: Chebyshev / heavy ball:
d = O(κ1/2) sufficient Optimal ([OSV `12])Exists G (,e.g. cycle)
where κ(LGLT-1) needs to
be Ω(n)
Ω(n1/2) lower bound on depth?
![Page 15: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/15.jpg)
LOWER BOUND FOR LOWER BOUND
[BGKMPT `11]: O(m1/3+a) via. (pseudo) inverse:• Preprocess: O(log2n) depth, O(nω) work• Solve: O(logn) depth, O(n2) work
• Inverse is dense, expensive to use
• Only use on O(n1/3) sized instancesPossible improvement: can we make LG
-1 sparse?
Multiplying by LG-
1 is highly parallel!
[George `73][LRT `79]:yes for planar graphs
![Page 16: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/16.jpg)
SUMMARY
• Would like to solve LGx = b
• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high
depth• Equivalent: sparse inverse
representations
Aside: cut approximation / oblivious routing schemes by [Madry `10][Sherman `13][KLOS `13] are parallel, can be viewed as asynchronous iterative methods
![Page 17: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/17.jpg)
OUTLINE
• LGx = b
•Why is it hard?•Key Tool• Parallel Solver•Other Forms
![Page 18: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/18.jpg)
DEGREE D POLYNOMIAL DEPTH D?
Apply to power method:(1 – a)-1 = 1 + a + a2 + a3 + a4 + a5 + a6 + a7 …=(1 + a) (1 + a2) (1 + a4)…
• a16 = (((a2)2)2)2
• Repeated squaring sidesteps assumption in lower bound!
Matrix version: I +
(A)2i
![Page 19: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/19.jpg)
REDUCTION TO (I – A)-1
• Adjust/rescale so diagonal = I• Add to diag(L) to make it full
rank
A:Weighted degree < 1Random walk,|A| < 1
![Page 20: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/20.jpg)
INTERPRETATION
A: one step transition of random walk
A2i
: 2i step transition of random walkOne step of walk on each Ai =
A2i
A
I
(I – A)-1 = (I + A)(I + A2)…(I +
A2i
)…
• O(logκ) matrix multiplications• O(nωlogκlogn) work
Need: size reductions
Until A2i
becomes `expander’
![Page 21: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/21.jpg)
SIMILAR TO
Connectivity Parallel Solver
Iteration Ai+1 ≈ Ai2 Ai+1 ≈ Ai
2
Until |Ad| small |Ad| small
Size Reduction Low degree Sparse graph
Method Derandomized Randomized
Solution transfer
Connectivity (I - Ai)xi = bi
• Multiscale methods• NC algorithm for shortest path• Logspace connectivity: [Reingold `02]• Deterministic squaring: [Rozenman Vadhan
`05]
![Page 22: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/22.jpg)
SUMMARY
• Would like to solve LGx = b
• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high
depth• Equivalent: sparse inverse representations• Squaring gets around lower bound
![Page 23: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/23.jpg)
OUTLINE
• LGx = b
•Why is it hard?• Key Tool•Parallel Solver•Other Forms
![Page 24: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/24.jpg)
• b x: linear operator, Z• Algorithm matrix Z ≈ε (I –
A)-1
WHAT IS AN ALGORITHM
b x
Goal: Z = sum/product of a few matrices
Input OutputZ
• ≈ε:, spectral similarity with relative error ε
• Symmetric, invertible, composable (additive)
![Page 25: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/25.jpg)
SQUARING
• [BSS`09]: exists I - A’ ≈ε I – A2 with O(nε-2) entries• [ST `04][SS`08][OV `11] + some
modifications: O(nlogcn ε-2) entries, efficient, parallel
[Koutis `14]: faster algorithm based on spanners /low diameter decompositions
![Page 26: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/26.jpg)
APPROXIMATE INVERSE CHAIN
I - A1 ≈ε I – A2
I – A2 ≈ε I – A12
…I – Ai ≈ε I – Ai-1
2
I - Ad ≈ I
I - A0
I - Ad≈ I
• Convergence: |Ai+1|<|Ai|/2
• I – Ai+1 ≈ε I – Ai2: |Ai+1|<|Ai|/ 1.5
d = O(logκ)
![Page 27: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/27.jpg)
ISSUE 1
Only have 1 – ai+1 ≈ 1 – ai
2Solution: apply one at a time
(1 – ai)-1 = (1 + ai)(1 – ai2)-1
≈ (1 + ai)(1 – ai+1)-1
Induction: zi+1 ≈ (1 – ai+1)-1
I - A0
I - Ad≈ I
zi = (1 + ai) zi+1 ≈ (1 + ai)(1 – ai+1)-1 ≈(1 – ai)-1
Need to invoke: (1 – a)-1
= (1 + a) (1 + a2) (1 + a4)…
zd = (1 – ad)-1 ≈ 1
![Page 28: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/28.jpg)
ISSUE 2
In matrix setting, replacements by approximations need to be symmetric:
Z ≈ Z’ UTZU ≈ UTZ’U
In Zi, terms around (I - Ai2)-1 ≈
Zi+1 needs to be symmetric
(I – Ai) Zi+1 is not symmetric around Zi+1
Solution 1 ([PS `14]):(1 – a)-1=1/2 ( 1 + (1 + a)(1 – a2)-1(1 + a))
![Page 29: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/29.jpg)
ALGORITHM
Zi+1 ≈ α+ε (1 – Ai2)-1
(I – Ai)-1 = ½ [I+(1 + Ai) (I – Ai2)-1 (1
+ Ai)]
• Composition: Zi ≈ α+ε (I – Ai)-1
• Total error = dε= O(logκε)
Chain: (I – Ai+1)-1 ≈ε (I – Ai2)-
1
Zi ½ [I+(1 + Ai) Zi+1(I + Ai)]
Induction: Zi+1 ≈α (I – Ai+1) -1
![Page 30: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/30.jpg)
PSEUDOCODE
x = Solve(I, A0, … Ad, b)
1. For i from 1 to d,set bi = (I + Ai) bi-1.
2. Set xd = bd.
3. For i from d - 1 downto 0,
set xi = ½[bi+(I +Ai)xi+1].
![Page 31: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/31.jpg)
TOTAL COST
• d = O(logκ)• ε = 1 / d• nnz(Ai): O(nlogcnlog2κ)
O(logcnlogκ) depth, O(nlogcnlog3κ) work
• Multigrid V-cycle like call structure: each level makes one call to next
• Answer from d = O(log(κ))matrix-vector multiplications
![Page 32: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/32.jpg)
SUMMARY
• Would like to solve LGx = b
• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high
depth• Equivalent: sparse inverse representations• Squaring gets around lower bound• Can keep squares sparse• Operator view of algorithms can drive its
design
![Page 33: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/33.jpg)
OUTLINE
• LGx = b
•Why is it hard?• Key Tool• Parallel Solver•Other Forms
![Page 34: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/34.jpg)
REPRESENTATION OF (I – A)-1
Algorithm from [PS `14] gives: (I – A)-1 ≈ ½[I + (I + A0)[I + (I + A1)(I – A2)-1(I + A1)](I + A0)]
Sum and product of O(logκ) matricesNeed: just a product
Gaussian graphical models sampling:• Sample from Gaussian with covariance I –
A• Need C s.t. CTC ≈ (I – A)-1
![Page 35: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/35.jpg)
SOLUTION 2
(I – A)-1= (I + A)1/2(I – A2)-1(I + A)1/2
≈ (I + A)1/2(I – A1)-1(I + A)1/2
Repeat on A1: (I – A)-1 ≈ CTC
where C = (I + A0)1/2(I + A1)1/2…(I + Ad)1/2
How to evaluate (I + Ai)1/2?
• Well-conditioned matrix• Mclaurin series
expansion= low degree polynomial
• What about (I + A0)1/2?
A1 ≈ A02:
• Eigenvalues between [0,1]
• Eigenvalues of I + Ai in [1,2]
![Page 36: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/36.jpg)
SOLUTION 3 ([CCLPT `14])
(I – A)-1= (I + A/2)1/2(I – A/2 - A2/2)-1(I + A/2)1/2
• Modified chain: I – Ai+1≈ I – Ai/2 - Ai
2/2
• I + Ai/2 has eigenvalues in [1/2, 3/2]
• Replace with O(loglogκ) degree polynomial / Mclaurin series, T1/2C = T1/2(I + A0/2) T1/2(I + A1/2)…T1/2 (I + Ad/2)
gives (I – A)-1 ≈ CTC, Generalization to (I – A)p (-1 < p <1): T-p/2(I + A0) T-p/2(I + A1) …T-p/2 (I + Ad)
![Page 37: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/37.jpg)
SUMMARY
• Would like to solve LGx = b
• Goal: polylog depth, nearly-linear work• `Standard’ numerical methods have high
depth• Equivalent: sparse inverse representations• Squaring gets around lower bound• Can keep squares sparse• Operator view of algorithms can drive its
design• Entire class of algorithms /
factorizations• Can approximate wider class of
functions
![Page 38: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/38.jpg)
OPEN QUESTIONSGeneralizations:• (Sparse) squaring as an iterative method?• Connections to multigrid/multiscale
methods?• Other functions? log(I - A)? Rational
functions?• Other structured systems?• Different notions of sparsification?
More efficient:• How fast for O(n) sized sparsifier?• Better sparsifiers? for I – A2?• How to represent resistances?• O(n) time solver? (O(mlogcn) preprocessing)
Applications / implementations• How fast can spectral sparsifiers run?• What does Lp give for -1<p<1?• Trees (from sparsifiers) as a stand-alone tool?
![Page 39: An Efficient Parallel Solver for SDD Linear Systems Richard Peng M.I.T. Joint work with Dan Spielman (Yale)](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cca5503460f949934ce/html5/thumbnails/39.jpg)
THANK YOU!
Questions?
Manuscripts on arXiv:• http://arxiv.org/abs/1311.3286• http://arxiv.org/abs/1410.5392