a high-performance parallel hybrid method for large sparse

25
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute, USA) Sparse Days Meeting 2007 October 10th-12th, 2007, CERFACS, Toulouse, France 1/25 Numerical experiments with additive Schwaz preconditioner

Upload: others

Post on 18-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A High-Performance Parallel Hybrid Method for Large Sparse

Outline

A High-Performance Parallel Hybrid Method forLarge Sparse Linear Systems

Azzam Haidar

CERFACS, Toulouse

joint work with

Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute, USA)

Sparse Days Meeting 2007

October 10th-12th, 2007, CERFACS, Toulouse, France

1/25 Numerical experiments with additive Schwaz preconditioner

Page 2: A High-Performance Parallel Hybrid Method for Large Sparse

Outline

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

2/25 Numerical experiments with additive Schwaz preconditioner

Page 3: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

3/25 Numerical experiments with additive Schwaz preconditioner

Page 4: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Motivations

Solution of very Large/Huge ill-conditioned linear systems

Such problems can require thousands of CPU-hours and manyGigabytes of memory

Direct solvers:- Robust and usually do not fail- Memory and computational costs grow nonlinearly

Iterative solvers:- Reduce memory requirements- They may fail to converge- Typically implemented with preconditioning to accelerate convergence

In an effort to reduce these requirements, a parallel mechanism forcombining solvers is needed

4/25 Numerical experiments with additive Schwaz preconditioner

Page 5: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

GoalDevelop a robust scalable parallel hybrid direct/iterative linear solvers

Exploit the efficiency and robustness of the sparse direct solvers

Take advantage of the natural scalable parallel implementation ofiterative solvers

Develop robust parallel preconditioners for iterative solvers

Non-overlapping domain decomposition

Natural approach for PDE’s

Extend to general sparse matrices

Partition the problem into subdomains,subgraphs

Use a direct solver on the subdomains

Robust preconditioned iterative solver oninterface

5/25 Numerical experiments with additive Schwaz preconditioner

Page 6: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

6/25 Numerical experiments with additive Schwaz preconditioner

Page 7: A High-Performance Parallel Hybrid Method for Large Sparse

Background

Algebraic splitting and block Gaussian elimination: N sub-domains case0BBB@AI1I1 . . . 0 AI1Γ1

.... . .

......

0 . . . AIN IN AIN ΓN

AΓ1I1 . . . AΓN IN AΓΓ

1CCCA0BBB@

uI1...

uINuΓ

1CCCA =

0BBB@fI1...

fINfΓ

1CCCASuΓ =

NX

i=1

RTΓi

S(i)RΓi

!uΓ = fΓ −

NXi=1

RTΓi

AΓi Ii A−1Ii Ii

fIi

where S(i) = A(i)Γi Γi

− AΓi Ii A−1Ii Ii

AIi Γi

Spectral properties for elliptic PDE’s

κ(A) = O(h−2) κ(S) = O(h−1)

||e(k)||A ≤ 2 ·

pκ(A)− 1pκ(A) + 1

!k

||e(0)||A

Page 8: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Description of the preconditionerVariant of Additive Shwarz preconditioner MAS

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

8/25 Numerical experiments with additive Schwaz preconditioner

Page 9: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Description of the preconditionerVariant of Additive Shwarz preconditioner MAS

Additive Schwarz preconditioner [ Carvalho, Giraud, Meurant, 01]

Preconditionner properties

MAS =

#domains∑i=1

RTi (S(i))−1Ri

S(i) =

0BB@Smm Smg Smk Sm`

Sgm Sgg Sgk Sg`

Skm Skg Skk Sk`

S`m S`g S`k S``

1CCAAssembled local Schur complement

S(i) =

0BBB@S(i)

mm Smg Smk Sm`

Sgm S(i)gg Sgk Sg`

Skm Skg S(i)kk Sk`

S`m S`g S`k S(i)``

1CCCAlocal Schur complement

Smm =∑

j∈adj(m)

S(j)mm

9/25 Numerical experiments with additive Schwaz preconditioner

Page 10: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Description of the preconditionerVariant of Additive Shwarz preconditioner MAS

Parallel implementation for solving Au = f

Each subdomain A(i) is handled by one processor

A(i) ≡„

AIi Ii AIi Γi

AIi Γi AΓi Γi

«

Concurrent partial factorizations are performed on each processor toform the so called “local Schur complement”

S(i) = A(i)Γi Γi

− AΓi Ii A−1Ii Ii

AIi Γi

The reduced system Sx = b is solved using a distributed Krylov solver- One matrix vector product per iteration each processor compute S(i)(x (i))k = (y (i))k

- One local preconditioner apply (S(i))−1(z(i))k = (r (i))k

- Local neighbor-neighbor communication per iteration- Dot products per iteration (reduction)

Compute simultaneously the solution for the interior unknownsAIi Ii uIi = fIi − AIi Γi uΓi

10/25 Numerical experiments with additive Schwaz preconditioner

Page 11: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Description of the preconditionerVariant of Additive Shwarz preconditioner MAS

What tricks exist to construct cheaper preconditioners

Mixed arithmetic strategy

Idea: Exploit 32 bit operation whenever possible and ressort to double at critical stages

Compute and store the preconditioner in single precision arithmetic

Sparsification strategy

Allow entries whose magnitude exceeds a ”drop tolerance”

sk` ={

sk` if sk` ≥ ε(|skk | + |s``|)0 else

Two-level preconditioner [Carvalho, Giraud, Le Tallec, 01]

Domain based coarse space correction

M = MAS + RTOA−1

O R0 where A0 = R0SRTO

11/25 Numerical experiments with additive Schwaz preconditioner

Page 12: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

12/25 Numerical experiments with additive Schwaz preconditioner

Page 13: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Computational framework

Target computer

IBM-SP4 @ CERFACS (216 procs)

Blue Gene @ CERFACS (2048 procs)

System X @ VIRGINIA TECH (2200 procs)

Local direct solver : MUMPS [Amestoy, Duff, Koster, L’Excellent - 01]

Main features- Parallel distributed multifrontal solver (F90, MPI)- Symmetric and Unsymmetric factorizations- Element entry matrices, distributed matrices- Efficient Schur complement calculation- Iterative refinement and backward error analysis

Public domain: new version 4.7.3www.enseeiht.fr/apo/MUMPS - [email protected]

13/25 Numerical experiments with additive Schwaz preconditioner

Page 14: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Backward error convergence history

Mixed arithmetic and Sparse preconditioners behavior

3D Poisson problem

0 20 40 60 80 100 120 140 160 180 200 220 24025010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

# iter

||rk||/

||b||

32−bit calculation64−bit calculationmixed arithmetic calculationSparse with ξ=10−4

3D Heterogenous diffusion

0 20 40 60 80 100 120 140 160 180 200 220 24025010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

# iter

||rk||/

||b||

32−bit calculation64−bit calculationmixed arithmetic calculationSparse with ξ=10−4

Convergence history on the Schur complement for a problem with 43 millions dof mapped on

1000 processors

14/25 Numerical experiments with additive Schwaz preconditioner

Page 15: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Backward error history vs time

Mixed arithmetic and Sparse preconditioners behavior

3D Poisson problem

0 20 40 60 80 100 120 140 160 180 200 220226.3910

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Time(sec)

||rk||/

||b||

32−bit calculation64−bit calculationmixed arithmetic calculationSparse with ξ=10−4

3D Heterogenous diffusion

0 20 40 60 80 100 120 140 160 180 200 220226.3910

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Time(sec)

||rk||/

||b||

32−bit calculation64−bit calculationmixed arithmetic calculationSparse with ξ=10−4

Time (setup preconditioner + iterative loop) history on the Schur complement for a problem

with 43 millions dof mapped on 1000 processors

15/25 Numerical experiments with additive Schwaz preconditioner

Page 16: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Numerical scalability in 3D

3D Poisson problem

2764 125 216 343 512 729 10000

20

40

60

80

100

120

140

160

180

# Procs

Tim

e (s

ec)

3D Poisson Problem

64−bit calculation32−bit calculationSparse calculation

5.3.106 9.2.106 15.106 22.106 31.106 43.106

3D Heterogenous diffusion

2764 125 216 343 512 729 10000

20

40

60

80

100

120

140

160

180

# Procs

Tim

e (s

ec)

3D Heterogenous diffusion Problem

64−bit calculation32−bit calculationSparse calculation

5.3.106 9.2.106 15.106 22.106 31.106 43.106

The trend is similar for all variants using CG Krylov slover

The computing time increases slightly when increasing # sub-domains

The solved problem size vary from 1.1 up to 43 Millions of unknowns

16/25 Numerical experiments with additive Schwaz preconditioner

Page 17: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Numerical scalability in 3D

−div(K .∇φ) + v .∇φ = f

xy plan view of thecircular velocity field

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Circular flow velocity Problem −1−

Heterogenousconvection diffusion

0 27 125216 343 512 729 1,000 1,331 1,7280

20

40

60

80

100

120

140

160

180

# Procs

Tim

e (s

ec)

3D Heterogenous convection diffusion Problem −1−

Local Preconditionner

Local+Coarse correction

5 106 8 106 12 106 16 106 21 106 27 106

The trend is similar for all variants using GMRES Krylov slover

The computing time increases slightly when increasing # sub-domains

17/25 Numerical experiments with additive Schwaz preconditioner

Page 18: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Numerical scalability

Numerical scalability in 3D

−div(K .∇φ) + v .∇φ = f

xy plan view of the4-area velocity field

4−area velocity flow

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Anisotropic Heterogenousconvection diffusion

0 27 125216 343 512 729 1,000 1,331 1,7280

20

40

60

80

100

120

140

160

180

# Procs

Tim

e (s

ec)

3D Anisotropic Heterogenous convection diffusion Problem −2−

Local Preconditionner

Local+Coarse correction

5 106 8 106 12 106 16 106 21 106 27 106

The trend is similar for all variants using GMRES Krylov slover

The computing time increases slightly when increasing # sub-domains

18/25 Numerical experiments with additive Schwaz preconditioner

Page 19: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Outline

1 Motivations

2 Description in a PDE’s framework

3 Algebraic Additive Schwarz preconditionerDescription of the preconditionerVariant of Additive Shwarz preconditioner MAS

4 Parallel numerical experimentsNumerical scalability

5 Perspectives

19/25 Numerical experiments with additive Schwaz preconditioner

Page 20: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Perspectives joint work with S. Pralet, SAMTECH

Structural mechanic problem Parallel Performance and Scalability

32 64 800

50

100

150

200

250

# Procs

Tim

e (s

ec)

Structural mechanic problem

338000 elements

1.3 106 degree of freedoms

# processors 32 64 80

# iter 108 160 170Total time(sec) 240 132 66

20/25 Numerical experiments with additive Schwaz preconditioner

Page 21: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Perspectives joint work with S. Pralet, SAMTECH

Structural mechanic problem Parallel Performance and Scalability

0 16 32 64 960

50

100

150

200

250

300

350

400

# Procs

Tim

e (s

ec)

Structural mechanic problem

125000 elements

0.7 106 degree of freedoms

# processors 16 32 64 80

# iter 40 50 69 73Total time(sec) 347 143 76 39

21/25 Numerical experiments with additive Schwaz preconditioner

Page 22: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Perspectives

Objective

Study the behavior of the preconditioner on more general industrialproblems

Control the growth of iterations when increasing the # processors

Various possibilities

Numerical remedy: two-level preconditioner

- Coarse space correction, ie solve a closed problem on a coarsespace

- Various choices for the coarse component (eg one d.o.f. persub-domain)

Computer Science remedy : several processors per sub-domain

- Two-level of parallelism- 2D cyclic data storage

22/25 Numerical experiments with additive Schwaz preconditioner

Page 23: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Parallel computing alternative

Main characteristics of the two-level of parallelism

Allocate each subdomain to many processors

Benefit from the parallel efficiency of direct solver

23/25 Numerical experiments with additive Schwaz preconditioner

Page 24: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

More details

Acknowledgements

More details on this work can be found in [1, 2, 3, 4].

This research activity was partially developed in the framework of theANR-CIS project Solstice (ANR-06-CIS6- 010).

THANKS

24/25 Numerical experiments with additive Schwaz preconditioner

Page 25: A High-Performance Parallel Hybrid Method for Large Sparse

MotivationsDescription in a PDE’s framework

Algebraic Additive Schwarz preconditionerParallel numerical experiments

Perspectives

Bibliography

L. M. Carvalho, L. Giraud, and G. Meurant.Local preconditioners for two-level non-overlapping domain decomposition methods.Numerical Linear Algebra with Applications, 8(4):207–227, 2001.

L. Giraud, A. Haidar, and L. T. Watson.Parallel scalability study of three dimensional additive Schwarz preconditioners innon-overlapping domain decomposition.Technical Report TR/PA/07/05, CERFACS, Toulouse, France, 2007.Also appeared as ENSEEIHT-IRIT Technical report RT/APO/07/01.Under revision for Parallel Computing

L. Giraud, A. Haidar, and L. T. Watson.Mixed-precision preconditioners in parallel domain decomposition solvers.Technical Report TR/PA/06/84, CERFACS, Toulouse, France, 2006.Also appeared as ENSEEIHT-IRIT Technical report RT/APO/06/08.Accepted for publication in the proceedings of DD17.

J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, and J. Dongarra.Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy.Technical Report LAPACK Working Note #175 UT-CS-06-574, University of TennesseeComputer Science, April 2006.

25/25 Numerical experiments with additive Schwaz preconditioner