ms58: approaches to reducing communication in krylov ...erinc/ppt/carson_siamla15.pdf ·...
TRANSCRIPT
![Page 1: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/1.jpg)
MS58: Approaches to Reducing Communication in Krylov Subspace Methods
Organizers: Laura Grigori (INRIA) and Erin Carson (NYU)
Talks:
1. The s-Step Lanczos Method and its Behavior in Finite Precision (Erin Carson, James W. Demmel)
2. Enlarged Krylov Subspace Methods for Reducing Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf)
3. Preconditioning Communication-Avoiding Krylov Methods (Siva Rajamanickam, Ichitaro Yamazaki, Andrey Prokopenko, Erik G. Boman, Michael Heroux, Jack J. Dongarra)
4. Sparse Approximate Inverse Preconditioners for Communication-Avoiding Bicgstab Solvers (Maryam MehriDehnavi, Erin Carson, Nicholas Knight, James W. Demmel, David Fernandez)
1
![Page 2: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/2.jpg)
The s-Step Lanczos Method and its Behavior in Finite
Precision
Erin Carson, NYU
James Demmel, UC Berkeley
SIAM LA ‘15
October 30, 2015
![Page 3: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/3.jpg)
Why Avoid “Communication”?
• Algorithms have two costs: computation and communication
• Communication : moving data between levels of memory hierarchy (sequential), between processors (parallel)
• On today’s computers, communication is expensive, computation is cheap, in terms of both time and energy!
2
Sequential Parallel
CPU Cache
CPU DRAM
DRAM
CPU DRAM
CPU DRAM
CPU DRAM
![Page 4: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/4.jpg)
Future Exascale Systems
PetascaleSystems (2009)
Predicted ExascaleSystems*
Factor Improvement
System Peak 2 ⋅ 1015 flops 1018 flops ~1000
Node Memory Bandwidth
25 GB/s 0.4-4 TB/s ~10-100
Total Node Interconnect Bandwidth
3.5 GB/s 100-400 GB/s ~100
Memory Latency 100 ns 50 ns ~1
Interconnect Latency 1 𝜇s 0.5 𝜇s ~1
3
*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)
![Page 5: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/5.jpg)
Future Exascale Systems
PetascaleSystems (2009)
Predicted ExascaleSystems*
Factor Improvement
System Peak 2 ⋅ 1015 flops 1018 flops ~1000
Node Memory Bandwidth
25 GB/s 0.4-4 TB/s ~10-100
Total Node Interconnect Bandwidth
3.5 GB/s 100-400 GB/s ~100
Memory Latency 100 ns 50 ns ~1
Interconnect Latency 1 𝜇s 0.5 𝜇s ~1
3
• Gaps between communication/computation cost only growing larger in future systems
*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)
![Page 6: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/6.jpg)
Future Exascale Systems
PetascaleSystems (2009)
Predicted ExascaleSystems*
Factor Improvement
System Peak 2 ⋅ 1015 flops 1018 flops ~1000
Node Memory Bandwidth
25 GB/s 0.4-4 TB/s ~10-100
Total Node Interconnect Bandwidth
3.5 GB/s 100-400 GB/s ~100
Memory Latency 100 ns 50 ns ~1
Interconnect Latency 1 𝜇s 0.5 𝜇s ~1
3
• Gaps between communication/computation cost only growing larger in future systems
*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)
• Avoiding communication will be essential for applications at exascale!
![Page 7: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/7.jpg)
Krylov Subspace Methods
4
• In each iteration,
• Add a dimension to the Krylov subspace 𝒦𝑚
• Orthogonalize (with respect to some ℒ𝑚)
• Examples: Lanczos/Conjugate Gradient (CG), Arnoldi/Generalized Minimum Residual (GMRES), Biconjugate Gradient (BICG), BICGSTAB, GKL, LSQR, etc.
• Projection process onto the expanding Krylov subspace
𝒦𝑚 𝐴, 𝑟0 = span 𝑟0, 𝐴𝑟0, 𝐴2𝑟0, … , 𝐴𝑚−1𝑟0
• General class of iterative solvers: used for linear systems, eigenvalue problems, singular value problems, least squares, etc.
ℒ
𝑟new
𝐴𝛿
𝑟0
0
![Page 8: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/8.jpg)
Krylov Solvers: Limited by CommunicationIn terms of communication:
5
![Page 9: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/9.jpg)
Krylov Solvers: Limited by CommunicationIn terms of communication:
5
“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory
×
![Page 10: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/10.jpg)
Krylov Solvers: Limited by CommunicationIn terms of communication:
5
“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory
“Orthogonalize (with respect to some ℒ𝑚)” Inner products
Parallel: global reduction (All-Reduce)Sequential: multiple reads/writes to slow memory
×
×
![Page 11: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/11.jpg)
Krylov Solvers: Limited by CommunicationIn terms of communication:
Dependencies between communication-bound kernels in each iteration limit performance!
SpMV
orthogonalize
5
“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory
“Orthogonalize (with respect to some ℒ𝑚)” Inner products
Parallel: global reduction (All-Reduce)Sequential: multiple reads/writes to slow memory
×
×
![Page 12: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/12.jpg)
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑖 = 1, 2, … , until convergence do
𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖
𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖
𝛽𝑖+1 = 𝑤𝑖 2
𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1
𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖
end for
6
The Classical Lanczos Method
![Page 13: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/13.jpg)
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑖 = 1, 2, … , until convergence do
𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖
𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖
𝛽𝑖+1 = 𝑤𝑖 2
𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1
𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖
end for
6
SpMV
The Classical Lanczos Method
![Page 14: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/14.jpg)
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑖 = 1, 2, … , until convergence do
𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖
𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖
𝛽𝑖+1 = 𝑤𝑖 2
𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1
𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖
end for
The Classical Lanczos Method
6
SpMV
Inner products
![Page 15: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/15.jpg)
Communication-Avoiding KSMs
7
• Idea: Compute blocks of 𝑠 iterations at once
• Communicate every 𝑠 iterations instead of every iteration
• Reduces communication cost by 𝑶(𝒔)!
• (latency in parallel, latency and bandwidth in sequential)
![Page 16: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/16.jpg)
Communication-Avoiding KSMs
7
• Idea: Compute blocks of 𝑠 iterations at once
• Communicate every 𝑠 iterations instead of every iteration
• Reduces communication cost by 𝑶(𝒔)!
• (latency in parallel, latency and bandwidth in sequential)
• An idea rediscovered many times…• First related work: s-dimensional steepest descent - Khabaza
(‘63), Forsythe (‘68), Marchuk and Kuznecov (‘68): • Flurry of work on s-step Krylov methods in ‘80s/early ‘90s: see,
e.g., Van Rosendale, 1983; Chronopoulos and Gear, 1989• Goals: increasing parallelism, avoiding I/O, increasing
“convergence rate”
![Page 17: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/17.jpg)
Communication-Avoiding KSMs
7
• Idea: Compute blocks of 𝑠 iterations at once
• Communicate every 𝑠 iterations instead of every iteration
• Reduces communication cost by 𝑶(𝒔)!
• (latency in parallel, latency and bandwidth in sequential)
• An idea rediscovered many times…• First related work: s-dimensional steepest descent - Khabaza
(‘63), Forsythe (‘68), Marchuk and Kuznecov (‘68): • Flurry of work on s-step Krylov methods in ‘80s/early ‘90s: see,
e.g., Van Rosendale, 1983; Chronopoulos and Gear, 1989• Goals: increasing parallelism, avoiding I/O, increasing
“convergence rate”
• Resurgence of interest in recent years due to growing problem sizes; growing relative cost of communication
![Page 18: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/18.jpg)
Communication-Avoiding KSMs: CA-Lanczos
8
• Main idea: Unroll iteration loop by a factor of 𝑠; split iteration loop into an outer loop (k) and an inner loop (j)
• Key observation: starting at some iteration 𝑖 ≡ 𝑠𝑘 + 𝑗,
𝑣𝑠𝑘+𝑗 , 𝑢𝑠𝑘+𝑗 ∈ 𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1 for 𝑗 ∈ 1, … , 𝑠 + 1
![Page 19: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/19.jpg)
Communication-Avoiding KSMs: CA-Lanczos
8
Expand solution space 𝒔 dimensions at once• Compute “basis matrix” 𝒴𝑘 with columns spanning
𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1
• Requires reading 𝑨/communicating vectors only once• Using “matrix powers kernel”
Orthogonalize all at once• Compute/store block of inner products between basis vectors in
Gram matrix:
𝒢𝑘 = 𝒴𝑘𝑇𝒴𝑘
• Communication cost of one global reduction
Outer loop 𝒌: Communication step
• Main idea: Unroll iteration loop by a factor of 𝑠; split iteration loop into an outer loop (k) and an inner loop (j)
• Key observation: starting at some iteration 𝑖 ≡ 𝑠𝑘 + 𝑗,
𝑣𝑠𝑘+𝑗 , 𝑢𝑠𝑘+𝑗 ∈ 𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1 for 𝑗 ∈ 1, … , 𝑠 + 1
![Page 20: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/20.jpg)
9
Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:
𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗
′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′
Inner loop:Computation
steps, no communication!
Communication-Avoiding KSMs: CA-Lanczos
![Page 21: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/21.jpg)
9
Communication-Avoiding KSMs: CA-Lanczos
𝐴𝑣𝑖+1
×𝑛
𝑛
Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:
𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗
′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′
Inner loop:Computation
steps, no communication!
![Page 22: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/22.jpg)
9
→
ℬ𝑘𝑣𝑘,𝑗+1′
𝑂(𝑠)
𝑂(𝑠)
×
Communication-Avoiding KSMs: CA-Lanczos
𝐴𝑣𝑖+1
×𝑛
𝑛
Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:
𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗
′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′
Inner loop:Computation
steps, no communication!
![Page 23: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/23.jpg)
9
→
ℬ𝑘𝑣𝑘,𝑗+1′
𝑂(𝑠)
𝑂(𝑠)
×
Communication-Avoiding KSMs: CA-Lanczos
𝑣𝑖𝑇𝑢𝑖
×
𝐴𝑣𝑖+1
×𝑛
𝑛
Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:
𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗
′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′
Inner loop:Computation
steps, no communication!
![Page 24: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/24.jpg)
9
→
→
ℬ𝑘𝑣𝑘,𝑗+1′
𝑂(𝑠)
𝑂(𝑠)
×
× ×
𝑣𝑘,𝑖′𝑇 𝒢𝑘𝑢𝑘,𝑖
′
Communication-Avoiding KSMs: CA-Lanczos
𝑣𝑖𝑇𝑢𝑖
×
𝐴𝑣𝑖+1
×𝑛
𝑛
Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:
𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗
′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′
Inner loop:Computation
steps, no communication!
![Page 25: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/25.jpg)
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘
𝑇𝒴𝑘
Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1
′ = 𝑒𝑠+2
for 𝑗 = 1, … , 𝑠 do
𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗
′
𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗
′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′
𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗
′ 1/2
𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗
′ / 𝛽𝑠𝑘+𝑗+1
𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1
′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′
end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1
′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′
end for
10
The CA-Lanczos Method
![Page 26: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/26.jpg)
via CA Matrix Powers Kernel
Global reduction
to compute 𝒢𝑘
10
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘
𝑇𝒴𝑘
Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1
′ = 𝑒𝑠+2
for 𝑗 = 1, … , 𝑠 do
𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗
′
𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗
′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′
𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗
′ 1/2
𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗
′ / 𝛽𝑠𝑘+𝑗+1
𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1
′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′
end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1
′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′
end for
The CA-Lanczos Method
![Page 27: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/27.jpg)
via CA Matrix Powers Kernel
Global reduction
to compute 𝒢𝑘
10
Local computations: no communication!
Given: initial vector 𝑣1 with 𝑣1 2= 1
𝑢1 = 𝐴𝑣1
for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘
𝑇𝒴𝑘
Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1
′ = 𝑒𝑠+2
for 𝑗 = 1, … , 𝑠 do
𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗
′
𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗
′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′
𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗
′ 1/2
𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗
′ / 𝛽𝑠𝑘+𝑗+1
𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1
′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′
end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1
′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′
end for
The CA-Lanczos Method
![Page 28: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/28.jpg)
Complexity Comparison
11
Flops Words Moved Messages
SpMV Orth. SpMV Orth. SpMV Orth.
Classical CG
𝑠𝑛
𝑝
𝑠𝑛
𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝
CA-CG𝑠𝑛
𝑝𝑠2𝑛
𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝
Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:
(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)
All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)
![Page 29: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/29.jpg)
Complexity Comparison
11
Flops Words Moved Messages
SpMV Orth. SpMV Orth. SpMV Orth.
Classical CG
𝑠𝑛
𝑝
𝑠𝑛
𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝
CA-CG𝑠𝑛
𝑝𝑠2𝑛
𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝
All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)
Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:
(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)
![Page 30: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/30.jpg)
Complexity Comparison
11
Flops Words Moved Messages
SpMV Orth. SpMV Orth. SpMV Orth.
Classical CG
𝑠𝑛
𝑝
𝑠𝑛
𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝
CA-CG𝑠𝑛
𝑝𝑠2𝑛
𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝
All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)
Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:
(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)
![Page 31: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/31.jpg)
From Theory to Practice
• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure
• We can auto-tune to find the best 𝑠 based on these properties
• That is, find 𝑠 that gives the fastest speed per iteration
12
![Page 32: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/32.jpg)
From Theory to Practice
• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure
• We can auto-tune to find the best 𝑠 based on these properties
• That is, find 𝑠 that gives the fastest speed per iteration
• In practice, we don’t just care about speed per iteration, but also the number of iterations
Runtime = (time/iteration) x (# iterations)
12
![Page 33: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/33.jpg)
From Theory to Practice
• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure
• We can auto-tune to find the best 𝑠 based on these properties
• That is, find 𝑠 that gives the fastest speed per iteration
• In practice, we don’t just care about speed per iteration, but also the number of iterations
Runtime = (time/iteration) x (# iterations)
• We also need to consider how convergence rate and accuracy are affected by choice of 𝑠!
12
![Page 34: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/34.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
![Page 35: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/35.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• But can behave much differently in finite precision!
![Page 36: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/36.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
![Page 37: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/37.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
• Two effects of roundoff error:
![Page 38: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/38.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
• Two effects of roundoff error:
1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited
![Page 39: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/39.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
• Two effects of roundoff error:
1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited
2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected
![Page 40: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/40.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
• Two effects of roundoff error:
Runtime = (time/iteration) x (# iterations)
1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited
2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected
![Page 41: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/41.jpg)
From Theory to Practice
13
• CA-KSMs are mathematically equivalent to classical KSMs
• Roundoff error bounds generally grow with increasing 𝑠
• But can behave much differently in finite precision!
• Two effects of roundoff error:
Runtime = (time/iteration) x (# iterations)
1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited
2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected
![Page 42: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/42.jpg)
Paige’s Results for Classical Lanczos
• Using bounds on local rounding errors in Lanczos, Paige showed that
1. The computed Ritz values always lie between the extreme eigenvalues of 𝐴 to within a small multiple of machine precision.
2. At least one small interval containing an eigenvalue of 𝐴 is found by the 𝑛th iteration.
3. The algorithm behaves numerically like Lanczos with full reorthogonalization until a very close eigenvalue approximation is found.
4. The loss of orthogonality among basis vectors follows a rigorous pattern and implies that some Ritz values have converged.
14
![Page 43: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/43.jpg)
Paige’s Results for Classical Lanczos
• Using bounds on local rounding errors in Lanczos, Paige showed that
1. The computed Ritz values always lie between the extreme eigenvalues of 𝐴 to within a small multiple of machine precision.
2. At least one small interval containing an eigenvalue of 𝐴 is found by the 𝑛th iteration.
3. The algorithm behaves numerically like Lanczos with full reorthogonalization until a very close eigenvalue approximation is found.
4. The loss of orthogonality among basis vectors follows a rigorous pattern and implies that some Ritz values have converged.
Do the same statements hold for CA-Lanczos?
14
![Page 44: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/44.jpg)
Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)
𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚
𝑇 + 𝛿 𝑉𝑚
𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =
𝛼1 𝛽2
𝛽2 ⋱ ⋱
⋱ ⋱ 𝛽𝑚
𝛽𝑚 𝛼𝑚
Paige’s Lanczos Convergence Analysis
15
![Page 45: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/45.jpg)
Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)
𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚
𝑇 + 𝛿 𝑉𝑚
𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =
𝛼1 𝛽2
𝛽2 ⋱ ⋱
⋱ ⋱ 𝛽𝑚
𝛽𝑚 𝛼𝑚
Paige’s Lanczos Convergence Analysis
for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+12 + 𝛼𝑖
2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2
2 ≤ 4𝑖 3휀0 + 휀1 𝜎2
15
where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2
![Page 46: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/46.jpg)
Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)
𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚
𝑇 + 𝛿 𝑉𝑚
𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =
𝛼1 𝛽2
𝛽2 ⋱ ⋱
⋱ ⋱ 𝛽𝑚
𝛽𝑚 𝛼𝑚
Paige’s Lanczos Convergence Analysis
Classic Lanczos (Paige, 1976):
for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+12 + 𝛼𝑖
2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2
2 ≤ 4𝑖 3휀0 + 휀1 𝜎2
15
where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2
휀0 = 𝑂 휀𝑛
휀1 = 𝑂 휀𝑁𝜃
![Page 47: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/47.jpg)
Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)
𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚
𝑇 + 𝛿 𝑉𝑚
𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =
𝛼1 𝛽2
𝛽2 ⋱ ⋱
⋱ ⋱ 𝛽𝑚
𝛽𝑚 𝛼𝑚
Paige’s Lanczos Convergence Analysis
Classic Lanczos (Paige, 1976):
for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+12 + 𝛼𝑖
2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2
2 ≤ 4𝑖 3휀0 + 휀1 𝜎2
15
where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2
CA-Lanczos:
휀0 = 𝑂 휀𝑛
휀1 = 𝑂 휀𝑁𝜃
휀0 = 𝑂 휀𝑛𝚪𝟐
휀1 = 𝑂 휀𝑁𝜃𝚪
![Page 48: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/48.jpg)
Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)
𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚
𝑇 + 𝛿 𝑉𝑚
𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =
𝛼1 𝛽2
𝛽2 ⋱ ⋱
⋱ ⋱ 𝛽𝑚
𝛽𝑚 𝛼𝑚
Paige’s Lanczos Convergence Analysis
Classic Lanczos (Paige, 1976):
for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+12 + 𝛼𝑖
2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2
2 ≤ 4𝑖 3휀0 + 휀1 𝜎2
15
where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2
CA-Lanczos:
휀0 = 𝑂 휀𝑛
휀1 = 𝑂 휀𝑁𝜃
휀0 = 𝑂 휀𝑛𝚪𝟐
휀1 = 𝑂 휀𝑁𝜃𝚪
Γ ≤ maxℓ≤𝑘
𝒴ℓ+
2 ∙ 𝒴ℓ 2 ≤ 2𝑠+1 ∙ maxℓ≤𝑘
𝜅 𝒴ℓ
![Page 49: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/49.jpg)
• Roundoff errors in CA variant follow same pattern as classical variant, but amplified by factor of Γ or Γ2
• Theoretically confirms empirical observations on importance of basis conditioning (dating back to late ‘80s)
• A loose bound for the amplification term:
Γ ≤ maxℓ≤𝑘
𝒴ℓ+
2 ∙ 𝒴ℓ 2 ≤ 2𝑠+1 ∙ maxℓ≤𝑘
𝜅 𝒴ℓ
• What we really need: 𝒴 |𝑦′| 2 ≤ Γ 𝒴𝑦′ 2 to hold for the computed basis 𝒴and coordinate vector 𝑦′ in every bound.
• Tighter bound on 𝚪 possible; requires some light bookkeeping
• Example: for bounds on 𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 and 𝑣𝑖+1
𝑇 𝑣𝑖+1 − 1 , we can use the definition
Γ𝑘,𝑗 ≡ max𝑥∈{ 𝑤𝑘,𝑗
′ , 𝑢𝑘,𝑗′ , 𝑣𝑘,𝑗
′ , 𝑣𝑘,𝑗−1′ }
𝒴𝑘 𝑥2
𝒴𝑘𝑥2
The Amplification Term Γ
16
![Page 50: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/50.jpg)
Problem: 2D Poisson, 𝑛 = 256, random starting vector
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
Computed value
Bound Amplification factor Γ2
𝒔 = 𝟒
![Page 51: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/51.jpg)
Problem: 2D Poisson, 𝑛 = 256, random starting vector
Computed value
Bound Amplification factor Γ
𝒔 = 𝟖
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
![Page 52: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/52.jpg)
𝒔 = 𝟏𝟐
Problem: 2D Poisson, 𝑛 = 256, random starting vector
Computed value
Bound Amplification factor Γ2
𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2
𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎
![Page 53: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/53.jpg)
Results for CA-Lanczos
18
• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence
hold for CA-Lanczos?
![Page 54: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/54.jpg)
• The answer is YES!
Results for CA-Lanczos
18
…but
• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence
hold for CA-Lanczos?
![Page 55: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/55.jpg)
• Only if:
• 휀0 ≡ 2휀 𝑛+11𝑠+15 Γ2 ≤1
12
• i.e., Γ ≤ 24𝜖 𝑛 + 11𝑠 + 15− 1 2
= 𝑂 𝑛𝜖 −1/2
• Otherwise, e.g., can lose orthogonality due to computation with (numerically) rank-deficient basis
• The answer is YES!
Results for CA-Lanczos
18
…but
• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence
hold for CA-Lanczos?
![Page 56: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/56.jpg)
• Only if:
• 휀0 ≡ 2휀 𝑛+11𝑠+15 Γ2 ≤1
12
• i.e., Γ ≤ 24𝜖 𝑛 + 11𝑠 + 15− 1 2
= 𝑂 𝑛𝜖 −1/2
• Otherwise, e.g., can lose orthogonality due to computation with (numerically) rank-deficient basis
• The answer is YES!
Results for CA-Lanczos
18
…but
• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence
hold for CA-Lanczos?
• Take-away: we can use this bound on Γ to design a better algorithm!• Mixed precision, selective reorthogonalization, dynamic basis size, etc.
![Page 57: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/57.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector
Top plots:
Computed Γ2
24(𝜖(𝑛 + 11𝑠 + 15) −1
Bottom Plots:
𝒔 = 𝟐
Computed Ritz values True eigenvalues
Bounds on range of computed Ritz values
![Page 58: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/58.jpg)
𝒔 = 𝟒
Bottom Plots:
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector
Top plots:
Computed Γ2
24(𝜖(𝑛 + 11𝑠 + 15) −1
Computed Ritz values True eigenvalues
Bounds on range of computed Ritz values
![Page 59: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/59.jpg)
𝒔 = 𝟏𝟐
Bottom Plots:
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector
Top plots:
Computed Γ2
24(𝜖(𝑛 + 11𝑠 + 15) −1
Computed Ritz values True eigenvalues
Bounds on range of computed Ritz values
![Page 60: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/60.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
![Page 61: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/61.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
Measure of loss of orthogonality
Measure of Ritz value convergence
![Page 62: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/62.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
![Page 63: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/63.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
![Page 64: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/64.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
![Page 65: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/65.jpg)
Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector
max𝑖
|𝑧𝑖𝑚 𝑇
𝑣𝑚+1|
min𝑖
𝛽𝑚+1𝜂𝑚,𝑖(𝑚)
![Page 66: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/66.jpg)
Extending the results of Greenbaum (1989):
21
Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.
![Page 67: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/67.jpg)
𝜆
Extending the results of Greenbaum (1989):
21
Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.
![Page 68: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/68.jpg)
𝜆
𝑂(𝜖𝑛3 𝐴 )
Classical Lanczos
Extending the results of Greenbaum (1989):
21
Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.
![Page 69: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/69.jpg)
𝜆
𝑂(𝜖𝑛3 𝐴 )
𝑂(𝜖𝑛3 𝐴 𝚪𝟐)
Classical Lanczos
CA-Lanczos
Extending the results of Greenbaum (1989):
Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.
21
![Page 70: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/70.jpg)
𝜆
𝑂(𝜖𝑛3 𝐴 )
𝑂(𝜖𝑛3 𝐴 𝚪𝟐)
Classical Lanczos
CA-Lanczos
Extending the results of Greenbaum (1989):
Ongoing work…
21
Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.
![Page 71: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding](https://reader035.vdocuments.site/reader035/viewer/2022063015/5fd2d6691d956152f43dd5c1/html5/thumbnails/71.jpg)
Future Directions
22
• New Algorithms/Applications• Application of communication-avoiding ideas and solvers to new
computational science domains
• Design of new high-performance preconditioners
• Improving Usability• Automating parameter selection via “numerical auto-tuning”
• Finite-Precision Analysis• Bounds on stability and convergence for other Krylov methods
(particularly in the nonsymmetric case)
• Extension of “Backwards-like” error analyses
Broad research agenda: Design methods for large-scale problems that optimize performance subject to application-specific numerical constraints