introduction to domain decomposition...

76
Introduction to Domain Decomposition Methods David E. Keyes Department of Applied Physics & Applied Mathematics Columbia University As delivered at Kavli Institute for Theoretical Physics, 17 March 2005

Upload: others

Post on 18-Jul-2020

34 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Introduction to Domain Decomposition Methods

David E. KeyesDepartment of Applied Physics & Applied Mathematics

Columbia University

As delivered at Kavli Institute for Theoretical Physics, 17 March 2005

Page 2: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Definition and motivation

Domain decomposition (DD) is a “divide and conquer” technique for arriving at the solution of problem defined over a domain from the solution of related problems posed on subdomainsMotivating assumption #1: the solution of the subproblems is qualitatively or quantitatively “easier” than the originalMotivating assumption #2: the original problem does not fit into the available memory spaceMotivating assumption #3 (parallel context): the subproblems can be solved with some concurrency

Page 3: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Remarks on definition

“Divide and conquer” is not a fully satisfactory description

“divide, conquer, and combine” is bettercombination is often through iterative means

True “divide-and-conquer” (only) algorithms are rare in computing (unfortunately)It might be preferable to focus on “subdomain composition” rather than “domain decomposition”

We often think we know all about “two” because two is “one and one”. We forget that we have to make a study of “and.”

A. S. Eddington (1882-1944)

Page 4: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Remarks on definition

Domain decomposition has generic and specific senses within the universe of parallel computing

generic sense: any data decomposition (considered in contrast to task decomposition)specific sense: the domain is the domain of definition of an operator equation (differential, integral, algebraic)

In a generic sense the process of constructing a parallel program consists of

Decomposition into tasksAssignment of tasks to processesOrchestration of processes

Communication and synchronization

Mapping of processes to processors

Page 5: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Subproblem structure

The subdomains may be of the same or different dimensionality as the original

2D2D

2D

1D0D

Page 6: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Plan of presentationImperative of domain decomposition (DD) for terascale computingBasic DD algorithmic concepts

SchwarzSchurSchwarz-Schur combinations

Basic DD convergence and scaling propertiesEn route:

mention some “high watermarks” for DD-based simulations

Page 7: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Prime sources for domain decomposition1996 1997 2001 2004

Page 8: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Other sources for domain decomposition1992

1994 1995

+ DDM.ORG and other proceedings volumes, 1988-2004

Page 9: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Platforms for high-end simulationASCI roadmap: go to 100 Teraflop/s by 2005Use variety of vendors

CompaqCrayIntelIBMSGI

Rely on commodity processor/memory units, with tightly coupled networkMassive software project to rewrite physics codes for distributed shared memory??

Page 10: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Chip(2 processors)

Compute Card(2 chips, 2x1x1)

Node Board(32 chips, 4x4x2)

16 Compute Cards

System(64 cabinets, 64x32x32)

Cabinet(32 Node boards, 8x8x16)

2.8/5.6 GF/s4 MB

5.6/11.2 GF/s0.5 GB DDR

90/180 GF/s8 GB DDR

2.9/5.7 TF/s256 GB DDR

180/360 TF/s16 TB DDR

IBM’s BlueGene/L, 64K procs, 180 Tflop/s

For the BGL academic consortium:Full single rack4.8 TFlop/s peak70% of peak LINPACK$2M per rack

Page 11: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Algorithmic requirements from architecture

Must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache

“horizontal” aspects “vertical” aspects

T3E

Page 12: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Building platforms is the “easy” part

Algorithms must behighly concurrent and straightforward to load balancelatency tolerantcache friendly (good temporal and spatial locality)highly scalable (in the sense of convergence)

Domain decomposition “natural” for all of these

Domain decomposition also “natural” for software engineering

Fortunate that its theory was built in advance of requirements!

Page 13: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

The earliest DD paper?

What Schwarz proposed…

Solve PDE in circle with BC taken from interior of square

Solve PDE in square with BC taken from interior of circle

And iterate!

Page 14: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

RationaleConvenient analytic means (separation of variables) are available for the regular problems in the subdomains, but not for the irregular “keyhole” problem defined by their unionSchwarz iteration defines a functional map from the values defined along (either) artificial interior boundary segment completing a subdomain (arc or segments) to an updated set of valuesA contraction map is derived for the errorRate of convergence is not necessarily rapid – this was not a concern of SchwarzSubproblems are not solved concurrently – neither was this Schwarz’ concern

Page 15: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Other early DD papers

Page 16: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Rationale

For Kron: direct Gaussian elimination has superlinear complexity

union of subproblems and the connecting problem (each also superlinear) could be solved in fewer overall operations than one large problem

For Przemieniecki: full airplane structural analysis would not fit in memory of available computers

individual subproblems fit in memory

Page 17: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Rationale

(N/P) < M

Let problem size be N, number of subdomains be P, and memory capacity be MLet problem solution complexity be Na, (a≥1)Then subproblem solution complexity is (N/P)a

Let the cost of connecting the subproblems be c(N,P)Kron wins if

Przemieniecki wins if

P (N/P)a + c(N,P) < Na

or c(N,P) < Na (1-P1-a)

NB: Kron does not win

directly if a=1 !

Page 18: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Contemporary interest

Goal is algorithmic scalability: fill up memory of arbitrarily large machines to increase resolution, while preserving nearly constant* running times with respect to proportionally smaller problem on one processor

*at worst logarithmically growing

Page 19: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Two definitions of scalability

“Strong scaling”execution time decreases in inverse proportion to the number of processorsfixed size problem overall

“Weak scaling”execution time remains constant, as problem size and processor number are increased in proportionfixed size problem per processoralso known as “Gustafson scaling”

poorlog T

log pgood

N constant

Slope= -1

T

p

good

poor

N ∝ p

Slope= 0

Page 20: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Strong scaling illus. (1999 Bell Prize)Newton-Krylov-Schwarz (NKS) algorithm for compressible and incompressible Euler and Navier-Stokes flows Used in NASA application FUN3D (M6 wing results below with 11M dof)

128 nodes 43min

3072 nodes 2.5min, 226Gf/s

15µs/unknown 70% efficient

Page 21: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

0

50

100

150

200

250

0 500 1000 1500 2000 2500 3000 3500 4000

ASCI-White Processors

Tim

e (s

econ

ds)

Total Salinas FETI-DP

Weak scaling illus. (2002 Bell Prize)

1mdof

4mdof

9mdof

18mdof

30mdof

60mdof

c/o C. Farhat and K. Pierson

Finite Element Tearing and Interconnection (FETI) algorithm for solid/shell modelsUsed in Sandia applications Salinas, Adagio, Andante

Page 22: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Decomposition strategies for Lu=f in Ω

Operator decomposition

Function space decomposition

Domain decomposition

∑=k

kLL

∑∑ Φ=Φ=k

kkk

kk uuff ,

kk Ω=Ω U

fuuyxkk II +=++ + )()1(][ ττ LL

Consider the implicitly discretized parabolic case

Page 23: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Operator decomposition

Consider ADIfuyux

kk II +−=+ + )()2/1( ][][ 2/2/ LL ττ

fuxuykk II +−=+ ++ )2/1()1( ][][ 2/2/ LL ττ

Iteration matrix consists of four multiplicative substeps per timestep

two sparse matrix-vector multipliestwo sets of unidirectional bandsolves

Parallelism within each substepBut global data exchanges between bandsolve substeps

Page 24: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Function space decomposition

Consider a spectral Galerkin method),()(),,(

1yxtatyxu j

N

jj Φ=∑

=

Nifuu iiidtd ,...,1),,(),(),( =Φ+Φ=Φ L

Nifa ijjijdtda

jijj ,...,1),,(),(),( =Φ+ΦΦ∑=ΦΦ∑ L

fMKaMdtda 11 −− +=

Method-of-lines system of ODEsPerhaps are diagonal matrices Parallelism across spectral indexBut global data exchanges to transform back to

)],[()],,[( ijij KM ΦΦ≡ΦΦ≡ L

physical variables at each step

Page 25: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

SPMD parallelism w/domain decomposition

Partitioning of the grid induces block structure on the system matrix (Jacobian)

Ω1

Ω2

Ω3

A23A21 A22rows assigned

to proc “2”

Page 26: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

DD relevant to any local stencil formulation

finite differences finite elements finite volumes

• All lead to sparse Jacobian matrices

J=

node i

row i• However, the inverses are generally dense; even the factors suffer unacceptable fill-in in 3D• Want to solve in subdomains only, and use to precondition full sparse problem

Page 27: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Digression for notation’s sakeWe need a convenient notation for mapping vectors (representing discrete samples of a continuous field) from full domain to subdomain and back

13

1

6

5

4

3

2

1

1 0000

0100

0001

uxx

xxxxxx

uR ≡⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜

⎥⎦

⎤⎢⎣

⎡=

⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜

=⎟⎟⎠

⎞⎜⎜⎝

⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢

=

000

0

000000100001

3

1

3

111

x

x

xx

uR T

x1x2

x3

x4

x5

x6

u1

⎥⎦

⎤⎢⎣

⎡=

0000

0100

0001

1RLet Ri be a Boolean operator that extracts the elements of the ith subdomain from the global vector

⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢

=

000000100001

1TR

Then RiT maps the elements

of the ith subdomain back into the global vector, padding with zeros

Page 28: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schwarz domain decomposition method

Consider restriction and extension operators for subdomains, , and for possible coarse grid,Replace discretized with

Solve by a Krylov methodMatrix-vector multiplies with

parallelism on each subdomainnearest-neighbor exchanges, global reductionspossible small global system (not needed for parabolic case)

iΩiR

0R

TRR 00 ,

Tii RR ,

fAu =fBAuB 11 −− =

iiTii

T RARRARB 10

100

1 −−− ∑+=

Tiii ARRA =

=

Page 29: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov bases for sparse systems

E.g., conjugate gradients (CG) for symmetric, positive definite systems, and generalized minimal residual (GMRES) for nonsymmetry or indefiniteness Krylov iteration is an algebraic projection method for converting a high-dimensional linear system into a lower-dimensional linear system

AVWH T≡=

=

bAx ==

bWg T=

=

Vyx = =gHy =

Page 30: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov bases for sparse systems

Page 31: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov bases, cont.

Page 32: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov bases, cont.

Page 33: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov bases, cont.

Page 34: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Remember this formula of Schwarz …

For a “good” approximation, B-1, to A-1:

iTii

Tii RARRRB 11 )( −− ∑=

Page 35: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Now, let’s compare!

Operator decomposition (ADI)natural row-based assignment requires global all-to-all, bulk data exchanges in each step (for transpose)

Function space decomposition (Fourier)Natural mode-based assignment requires global all-to-all, bulk data exchanges in each step (for transform)

Domain decomposition (Schwarz)Natural domain-based assignment requires localsurface data exchanges, global reductions, and optional small global problem

(Of course, domain decomposition can be interpreted as a special operator or function space decomposition)

Page 36: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schwarz subspace decomposition

Page 37: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schwarz subspace decomposition

Page 38: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Four steps in creating a parallel program

P0

Tasks Processes Processors

P1

P2 P3

p0 p1

p2 p3

p0 p1

p2 p3

Partitioning

Sequentialcomputation

Parallelprogram

Assignment

Decomposition

Mapping

Orchestration

Decomposition of computation in tasksAssignment of tasks to processesOrchestration of data access, communication, synchronizationMapping processes to processors

Page 39: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov-Schwarz parallelization summaryDecomposition into concurrent tasks

by domain

Assignment of tasks to processestypically one subdomain per process

Orchestration of communication between processesto perform sparse matvec – near neighbor communicationto perform subdomain solve – nothingto build Krylov basis – global inner productsto construct best fit solution – global sparse solve (redundantly)

Mapping of processes to processorstypically one process per processor

Page 40: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov-Schwarz kernel in parallel

local scatter

Jac-vec multiply

precond sweep

daxpy inner product

Krylov iteration

…P1:

P2:

Pn:M

What happens if, for instance, in this (schematicized) iteration, arithmetic speed is doubled, scalar all-gather is quartered, and local scatter is cut by one-third? Each phase is considered separately. Answer is to the right.

…P1:

P2:

Pn:M

Page 41: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Krylov-Schwarz compelling in serial, tooAs successive workingsets “drop” into a level of memory, capacity (and with effort conflict) misses disappear, leaving only compulsory misses, reducing demand on main memory bandwidthCache size is not easily manipulated, but domain size is

Traffic decreases as cache gets bigger or subdomains get smaller

Page 42: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Estimating scalability of stencil computations Given complexity estimates of the leading terms of:

the concurrent computation (per iteration phase)the concurrent communicationthe synchronization frequency

And a bulk synchronous model of the architecture including:internode communication (network topology and protocol reflecting horizontal memory structure)on-node computation (effective performance parameters including vertical memory structure)

One can estimate optimal concurrency and optimal execution time

on per-iteration basis, or overall (by taking into account any granularity-dependent convergence rate)simply differentiate time estimate in terms of (N,P) with respect to P, equate to

zero and solve for P in terms of N

Page 43: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Estimating 3D stencil costs (per iteration)

grid points in each direction n, total work N=O(n3)processors in each direction p, total procsP=O(p3)memory per node requirements O(N/P)

concurrent execution time per iteration A n3/p3

grid points on side of each processor subdomain n/pConcurrent neighbor commun. time per iteration B n2/p2

cost of global reductions in each iteration C log p or C p(1/d)

C includes synchronization frequency

same dimensionless units for measuring A, B, C

e.g., cost of scalar floating point multiply-add

Page 44: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

3D stencil computation illustrationRich local network, tree-based global reductions

total wall-clock time per iteration

for optimal p, , or

or (with ),

without “speeddown,” p can grow with nin the limit as

pCpnB

pnApnT log),( 2

2

3

3

++=

0=∂∂

pT

,023 3

2

4

3

=+−−pC

pnB

pnA

CAB

2

3

24332

≡θ

[ ] [ ] nCApopt ⋅⎟

⎠⎞

⎜⎝⎛ −−+−+⎟

⎠⎞

⎜⎝⎛= 3

13

131

)1(1)1(123 θθ

0→CB

nCApopt ⋅⎟⎠⎞

⎜⎝⎛=

31

3

Page 45: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

3D stencil computation illustration Rich local network, tree-based global reductions

optimal running time

where

limit of infinite neighbor bandwidth, zero neighbor latency ( )

(This analysis is on a per iteration basis; complete analysis multiplies this cost by an iteration count estimate that generally depends on n and p.)

( ),log))(,( 23 nCBAnpnT opt ρρρ

++=

[ ] [ ] ⎟⎠⎞

⎜⎝⎛ −−+−+⎟

⎠⎞

⎜⎝⎛= 3

13

131

)1(1)1(123 θθρCA

0→B

⎥⎦⎤

⎢⎣⎡ ++= .log

31log))(,( const

CAnCnpnT opt

Page 46: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Scalability results for DD stencil computations

With tree-based (logarithmic) global reductions and scalable nearest neighbor hardware:

optimal number of processors scales linearly with problem size

With 3D torus-based global reductions and scalable nearest neighbor hardware:

optimal number of processors scales as three-fourthspower of problem size (almost “scalable”)

With common network bus (heavy contention):

optimal number of processors scales as one-fourthpower of problem size (not “scalable”)

Page 47: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Factoring convergence rate into estimates

In terms of N and P, where for d-dimensional isotropic problems, N=h-d and P=H-d, for mesh parameter h and subdomain diameter H, iteration counts may be estimated as follows:

Ο(P1/3)Ο(P1/2)1-level Additive Schwarz

Ο(1)Ο(1)2-level Additive Schwarz

Ο((NP)1/6)Ο((NP)1/4)Domain Jacobi (δ=0)Ο(N1/3)Ο(N1/2)Point Jacobi

in 3Din 2DPreconditioning Type

Krylov-Schwarz iterative methods typically converge in a number of iterations that scales as the square-root of the condition number of the Schwarz-preconditioned system

Page 48: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Where do these results come from?Point Jacobi result is well known (see any book on the numerical analysis of elliptic problems)Subdomain Jacobi result has interesting history

Was derived independently from functional analysis, linear algebra, and graph theory

Schwarz theory is neatly and abstractly summarized in Section 5.2 Smith, Bjorstad & Gropp (1996) and Chapter 2 of Toselli & Widlund (2004)

condition number, κ ≤ ω [1+ρ(ε)] C02

C02 is a splitting constant for the subspaces of the decomposition

ρ(ε) is a measure of the orthogonality of the subspacesω is a measure of the approximation properties of the subspace solvers (can be unity for exact subdomain solves)These properties are estimated for different subspaces, different operators, and different subspace solvers and the “crank” is turned

Page 49: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Comments on the Schwarz resultsOriginal basic Schwarz estimates were for:

self-adjoint elliptic operatorspositive definite operatorsexact subdomain solves, two-way overlapping with generous overlap, δ=O(H) (original 2-level result was O(1+H/δ))

Subsequently extended to (within limits):nonself-adjointness (e.g, convection) indefiniteness (e.g., wave Helmholtz) inexact subdomain solvesone-way overlap communication (“restricted additive Schwarz”)small overlap

Tii RR ,

1−iA

Page 50: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Comments on the Schwarz results, cont.

Theory still requires “sufficiently fine” coarse meshHowever, coarse space need not be nested in the fine space or in the decomposition into subdomains

Practice is better than one has any right to expect

“In theory, theory and practice are the same ...In practice they’re not!”

Wave Helmholtz (e.g., acoustics) is delicate at high frequency:

standard Schwarz Dirichlet boundary conditions can lead to undamped resonances within subdomains,remedy involves Robin-type transmission boundary conditions on subdomain boundaries,

0=Γu

0)/( =∂∂+ Γnuu α

— Yogi Berra

Page 51: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Illustration of 1-level vs. 2-level tradeoff

Thermal Convection Problem (Ra = 1000)

1 proc

N.45

N.24

N0

2 – Level DDExact Coarse Solve

2 – Level DD Approx. Coarse Solve

1 – Level DD3D Results

512 procs

Total Unknowns

Avg

. Ite

ratio

ns p

er N

ewto

n St

ep

Temperature iso-lines on slice plane, velocity iso-surfaces and streamlines in 3D c/o J. Shadid and R. Tuminaro

Newton-Krylov solver with Aztec non-restarted GMRES with 1-level domain decomposition preconditioner, ILUT subdomain solver, and ML 2-level DD with Gauss-Seidel subdomain solver. Coarse Solver: “Exact” = SuperLU (1 proc), “Approx” = one step of ILU (8 proc. in parallel)

Page 52: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

“Unreasonable effectiveness” of SchwarzWhen does the sum of partial inverses equal the inverse of the sums? When the decomposition is right!

iriii raAr = T

iii Arra =Let be a complete set of orthonormal row eigenvectors for A : or

iiT

ii rarA Σ=Then

iT

iiT

iiiiT

ii rArrrrarA 111 )( −−− Σ=Σ=and

— the Schwarz formula!Good decompositions are a compromise between conditioning and parallel complexity, in practice

Page 53: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schur complement substructuring

Given a partition

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

ΓΓΓΓΓ

Γ

ff

uu

AAAA ii

i

iii

Γ−

ΓΓΓ −≡ iiii AAAAS 1iiii fAAfg 1−

ΓΓ −≡

Γ

gSu =Γ

Condense:

Properties of the Schur complement:smaller than original A, but generally denseexpensive to form, to store, to factor, and to solve

better conditioned than original A, for which κ(A)=O(h-2)for a single interface, κ(S)=O(h-1)

Therefore, solve iteratively, with action of S on each Krylov vector

Page 54: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schur preconditioningNote the factorization of the system matrix

Hence a perfect preconditioner is

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡= Γ

Γ SAAI

IAA

A iii

i

ii

00 1

1111 0

0

Γ

Γ−

−⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=

IAA

SAAI

Ai

iiiii

⎥⎦

⎤⎢⎣

−⎥⎦

⎤⎢⎣

⎡ −=

−Γ

−Γ

IAAA

SSAAI

iii

iiiii1

1

1

11 00

Page 55: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schur preconditioningLet M-1 be any good preconditioner for SLet

Then B-1 is a good preconditioner for A, for recall

Γ1111 0~

0

~ −

Γ

Γ−

−⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=

IAA

MAAIB

i

iiiii

1111 0

0

Γ

Γ−

−⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=

IAA

SAAI

Ai

iiiii

Page 56: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schur preconditioning

So, instead of , use full system

Here, solves with may be done approximately since all degrees of freedom are retainedOnce this simple block decomposition is understood, everything boils down to two more profound questions:

How to approximate S cheaply

How should the relative quality of M and compare

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

Γ

ΓΓΓΓ

Γ−

ff

Buu

AAAA

B ii

i

iii 11

iiA

gMSuM 11 −Γ

− =

iiA~

Page 57: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schur preconditioning

How to approximate S cheaply?Many techniques for a single interfaceFactorizations of narrow band approximationsSpectral (FFT-implementable) decompositionsAlgebraic “probing” of a specified sparsity pattern for inverse

For separator sets more complicated than a single interface, we componentize, creating the preconditioner of the union from the sum of preconditioners of the individual pieces

Page 58: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schwarz-on-Schur

Beyond a simple interface, preconditioning the Schur complement is complex in and of itself; Schwarz is used on the reduced problemNeumann-Neumann

Balancing Neumann-Neumann))()(( 1

011

01

01 −−−−− −Σ−+= SMIDRSRDSMIMM iii

Tiii

iiiTiii DRSRDM 11 −− Σ=

Numerous other variants allow inexact subdomain solves, combining additive Schwarz-like preconditioning of the separator set components with inexact subdomain solves on the subdomains

Page 59: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

As an illustration of the algorithmic structure, we consider the 2D Bramble-Pasciak-Schatz (1984) preconditioner for the case of many subdomains

Schwarz-on-Schur

Page 60: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

⎥⎦

⎤⎢⎣

⎡=

VVTEV

EVEE

SSSS

S

)()( 11 −−= hHOSκ)()( 1−= hOSκ

For this case , which is not as good as the single interface case, for whichThe Schur complement has the block structure

for which the following block diagonal preconditioner improves conditioning only to

Note that we can write M-1 equivalently as

Schwarz-on-Schur

⎥⎦

⎤⎢⎣

⎡=

−−

1

11

00

VV

EE

SS

M

))(log( 122 −− HhHO

jjjjiiii Vj VVTVEi EE

TE RSRRSRM ∑∑ −−− += 111

Page 61: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

))(log1()( 121 −− += HhCSMκ

If we replace the diagonal vertex term of M-1 with a coarse grid operator

then

where C may still retain dependencies on other bad parameters, such as jumps in the diffusion coefficientsThe edge term can be replaced with cheaper componentsThere are numerous variations in 2D and 3D that conquer various additional weaknesses

Schwarz-on-Schur

HHTHEi EE

TE RARRSRM

iiii

111 −−− += ∑

Page 62: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Schwarz polynomials

Polynomials of Schwarz projections that are combinations of additive and multiplicative may be appropriate for certain implementationsWe may solve the fine subdomains concurrently and follow with a coarse grid (redundantly/cooperatively)

)(1 AufBuu ii −Σ+← −

)(10 AufBuu −+← −

))(( 110

10

1 −−−− Σ−+= ii BABIBBThis leads to algorithm “Hybrid II” in S-B-G’96:

Convenient for “SPMD” (single prog/multiple data)

Page 63: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Onward to nonlinearityLinear versus nonlinear problems

Solving linear algebraic problems often constitutes 90% of the running time of a large simulationThe nonlinearity is often a fairly straightforward outer loop, in that it introduces no new types of messages or synchronizations, and has overall many fewer synchronizations than the preconditioned Krylov method or other linear solver inside it

We can wrap Newton, Picard, fixed-point or other iterations outside, linearize, and apply what we knowWe consider both Newton-outside and Newton-inside methods

Page 64: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Newton-Krylov-Schur-Schwarz: a solver “workhorse”

Newtonnonlinear solverasymptotically

quadratic

0)(')()( =+≈ uuFuFuF cc δuuu c δλ+=

Krylovaccelerator

spectrally adaptive

FuJ −=δminarg

,,, 2FJxu

FJJFFVx+=

≡∈ L

δ

Schurpreconditionerparallelizable by structure

FBuJB 11 −− −=δ

Schwarzpreconditionerparallelizable

by domain

iTii

Tii RARRRA 11 )(~ −− ∑=

11

1

0

~0~ −

Γ−

Γ

⎟⎟⎠

⎞⎜⎜⎝

⎛⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=

MAAI

IAAB iii

i

ii

Page 65: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Jacobian-free Newton-KrylovIn the Jacobian-Free Newton-Krylov (JFNK) method, a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector productsThese are approximated by the Fréchet derivatives

(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed

One builds the Krylov space on a true F’(u) (to within numerical approximation)

)]()([1)( uFvuFvuJ −+≈ εε

ε

Page 66: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

How to accommodate preconditioningKrylov iteration is expensive in memory and in function evaluations, so subspace dimension k must be kept small in practice, through preconditioning the Jacobian with an approximate inverse, so that the product matrix has low condition number in

Given the ability to apply the action of to a vector, preconditioning can be done on either the left, as above, or the right, as in, e.g., for matrix-free:

)]()([1 11 uFvBuFvJB −+≈ −− εε

bBxAB 11 )( −− =1−B

Page 67: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Philosophy of Jacobian-free NKTo evaluate the linear residual, we use the true F’(u) , giving a true Newton step and asymptotic quadratic Newton convergenceTo precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations:

Jacobian blocks decomposed for parallelism (Schwarz)Jacobian of lower-order discretizationJacobian with “lagged” values for expensive terms Jacobian stored in lower precision Jacobian of related discretization operator-split Jacobians physics-based preconditioning

Page 68: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Nonlinear Schwarz preconditioningNonlinear Schwarz has Newton both inside and outside and is fundamentally Jacobian-freeIt replaces with a new nonlinear system possessing the same root, Define a correction to the partition (e.g., subdomain) of the solution vector by solving the following local nonlinear system:

where is nonzero only in the components of the partitionThen sum the corrections: to get an implicit function of u

0)( =uF0)( =Φ u

thi

thi

)(uiδ

0))(( =+ uuFR ii δn

i u ℜ∈)(δ

)()( uu ii δ∑=Φ

Page 69: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Nonlinear Schwarz – pictureF(u)

11

11

0 0

Ri

RiuRiF

u

Page 70: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Nonlinear Schwarz – pictureF(u)

11

11

0 0

11

11

0 0

Rj

Riu

RjF

RiF

Rju

Ri

u

Page 71: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Nonlinear Schwarz – pictureF(u)

u

Fi’(ui)

Ri

Rj

δiu+δju

11

11

0 0

11

11

0 0 RiuRiF

RjuRjF

Page 72: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Nonlinear Schwarz, cont.It is simple to prove that if the Jacobian of F(u) is nonsingular in a neighborhood of the desired root then and have the same unique rootTo lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any :

The residual The Jacobian-vector product

Remarkably, (Cai-Keyes, 2000) it can be shown that

where and All required actions are available in terms of !

0)( =Φ u

nvu ℜ∈,)()( uu ii δ∑=Φ

0)( =uF

vu ')(Φ

JvRJRvu iiTii )()( 1' −∑≈Φ

)(' uFJ = Tiii JRRJ =

)(uF

Page 73: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Experimental example of nonlinear Schwarz

Vanilla Newton’s method Nonlinear Schwarz

Difficulty at critical Re

Stagnation beyond

critical Re

Convergence for all Re

Page 74: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Multiphysics coupling: nonlinear Schwarz

Given initial iterateFor k=1, 2, …, until convergence, do

Define byDefine by

Then solve in matrix-free manner

Jacobian:

Finally wvuu kk ,, 21 =

02

01 ,uu

0),( 121

111 =+ −− kk uuuF δ1211 ),( uuuG δ≡

0),( 21

21

12 =+−− uuuF kk δ2212 ),( uuuG δ≡

⎩⎨⎧

==

0),(0),(

2

1

vuGvuG

⎥⎥⎥⎥

⎢⎢⎢⎢

∂∂

⎟⎠⎞

⎜⎝⎛∂∂

∂∂

⎟⎠⎞

⎜⎝⎛∂∂

≈⎥⎥⎥

⎢⎢⎢

∂∂

∂∂

∂∂

∂∂

IuF

vF

vF

uFI

vG

uG

vG

uG

21

2

11

1

22

11

Page 75: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

State of the artDomain decomposition is the dominant paradigm in contemporary terascale PDE simulation Several freely available software toolkits exist, and successfully scale to thousands of tightly coupled processors for problems on quasi-static meshesConcerted efforts underway to make elements of these toolkits interoperate, and to allow expression of the best methods, which tend to be modular, hierarchical, recursive, and above all — adaptive!Many challenges loom at the “next scale” of computationImplementation of domain decomposition methods on parallel computers has inspired many useful variants of domain decomposition methods The past few years have produced an incredible variety of interesting results (in both the continuous and the discrete senses) in domain decomposition methods, with no slackening in sight

Page 76: Introduction to Domain Decomposition Methodsonline.itp.ucsb.edu/online/lattice05/keyes/pdf/Keyes...Introduction to Domain Decomposition Methods David E. Keyes Department of Applied

Closing inspiration

“… at this very moment the search is on – every numerical analyst has a favorite preconditioner, and you have a perfect chance to find a better one.”

- Gil Strang (1986)