a direct solver for time parallelization of wave equationshalpern/talkcsi.pdf · (h size of the...

Outline Introduction The wave equation Conclusion and Perspectives Bibliography

A direct solver for time parallelization of waveequations

Laurence HALPERNLAGA - Universite Paris 13 and C.N.R.S.

6th Parallel in Time Workshop Monte Verita, Octobre 23, 2017

Joint work with Martin Gander (Geneve), Johann Rannou and JulietteRyan (ONERA)PhD Thuy Thi Bich Tran

1/41


1 Introduction

2 The wave equationAn algorithm for the ODEError analysisDiagonalization of BOptimization of the algorithmApplication to an industrial case

3 Conclusion and Perspectives

4 Bibliography

2/41


Outline

1 Introduction



4 Bibliography

3/41


Parallelism and PDE: distribute the computation

Time discretization Space-discretization

Elliptic problem: Domain Decomposition in space (Cai) , if (H size of the subdomains) convergence independent of the mesh

How to solve in parallel @tu��u = f

un

�t��un = fn +

un�1

�t@tU + A U = F

Parallelism across the system:

Waveform relaxation (Domain

Decomposition in space )

�t⌧ H2

Parallelism across the steps (Domain

Decomposition in time)

Parallelism across the method:

Multistep methods.

4/41


Time discretization for the heat equation. 0-D

dtu + au = 0, u(0) = u0, t ∈ (0,T ) ⇐⇒ u(t) = e−atu0.

un − un−1

kn+ aun = 0, u0 = u0,

∑kn = T

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

u1

u2

...uN

+ a

u1

u2

...uN

=

1k1u0

0...0

(B + a I )u = F , u = (u1, . . . uN)

B = SDS−1, S(D + aI )S−1u = F

(1) SG = F , (2) (D + aI )v = G , (3) u = S v.

5/41


Time discretization for the heat equation. d-D

∂tu −∆u = 0, u(0) = u0.

un − un−1

kn−∆un = 0, u0 = u0.

1k1Ix −∆

− 1k2Ix

1k2Ix −∆ 0

0.. .

. . .

− 1kNIx

1kNIx −∆

.

︸︷︷︸M

u1

u2

...uN

︸︷︷︸u

=

1k1u0

0...0

︸︷︷︸F

Resolution by multigrid, or iterative, or direct methods.

u = e−t∆u0.

6/41


Time-space discretization for the heat equation.

Discretization in space, M degrees of freedom.

Mhuh = Fh, uh ∈ (RM)N.

Mh =

1k1Ix −∆h

− 1k2Ix

1k2Ix −∆h 0

0. . .

. . .

− 1kNIx

1kNIx −∆h

.

7/41


Time-space discretization for the heat equation.

Discretization in space, M degrees of freedom.

Mhuh = Fh, uh ∈ (RM)N.

Mh =

1k1Ix −∆h

− 1k2Ix

1k2Ix −∆h 0

0. . .

. . .

− 1kNIx

1kNIx −∆h

.

Mh = B ⊗ Ix + It ⊗ (−∆h), B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

7/41


Direct method (Maday-Ronquist, CRAS 2007)

(B ⊗ Ix + It ⊗ (−∆h)︸︷︷︸Mh

) uh = Fh, B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

8/41



(B ⊗ Ix + It ⊗ (−∆h)︸︷︷︸Mh

) uh = Fh, B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

B = SDS−1

8/41



(B ⊗ Ix + It ⊗ (−∆h)︸︷︷︸Mh

) uh = Fh, B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

B = SDS−1

(S ⊗ Ix)(D ⊗ Ix + It ⊗ (−∆h))(S−1 ⊗ Ix) uh = Fh

8/41



(B ⊗ Ix + It ⊗ (−∆h)︸︷︷︸Mh

) uh = Fh, B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

B = SDS−1


(1) (S ⊗ Ix)G = Fh,

(2) ( 1kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix) v

8/41



(B ⊗ Ix + It ⊗ (−∆h)︸︷︷︸Mh

) uh = Fh, B =

1k1

− 1k2

1k2

0

0. . .

. . .

− 1kN

1kN

.

B = SDS−1


(1) (S ⊗ Ix)G = Fh,

(2) ( 1kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix) v

N equations in space can thus be solved independently on the processors.(2) is better conditioned than ∆h, easily parallelized with OSM.

8/41



The method we have just proposed is first order in time, and since itrequires that all the time steps are different, the accuracy will be relatedto the largest time step.

In order to make the method more efficient, we propose to use a higherorder scheme in time with time steps kn = ρn−1k1, withρ larger but close to 1, e.g. ρ = 1.2.

Note that, as can be expected, choosing ρ much closer to 1 may lead toinstabilities due to numerical errors.

9/41

16

Error analysis (1)We look for an exact solution of the form Uz = sin

(πl x

)sin

(πL y

)sin (ωπt)

(a) space solution on a 30× 20× 1 mesh (b) time solution

we use a known solution to compare

1) the ρ = 1 sequential newmark scheme error

2) the ρ < 1 sequential newmark scheme error

3) the ρ < 1 parallel newmark scheme error

4) the introduced parallelism error (i.e (3) - (2) )

Johann Rannou

17

Error analysis (2)Nt = 8, ρ = 0.80 → too much discretization error

17

Error analysis (2)Nt = 8, ρ = 0.95 → too much roundoff error

17

Error analysis (2)Nt = 8, ρ = 0.90 → OK


Specifications

Choice of the timesteps kn = ρnk1,∑N

n=1 kn = T

(B ⊗ Ix + It ⊗ (−∆h)) uh = Fh, B = SDS−1, D = diag(k1, . . . , kn).

(1) (S ⊗ Ix)G = Fh, ,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)V

1 The timesteps have to be all different for B to be diagonalizable.

2 The matrix S must be easy and cheap to invert (closed form is amust).

3 The precision of the scheme can be affected.4 Therefore it is better to keep the time steps close to equidistant.5 Then the condition number of matrix S increases, deteriorating the

results of steps (1) and (3).

10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v

1 The timesteps have to be all different for B to be diagonalizable.2 The matrix S must be easy and cheap to invert (closed form is a

must).

3 The precision of the scheme can be affected.4 Therefore it is better to keep the time steps close to equidistant.5 Then the condition number of matrix S increases, deteriorating the


10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v


must).3 The precision of the scheme can be affected.

4 Therefore it is better to keep the time steps close to equidistant.5 Then the condition number of matrix S increases, deteriorating the


10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v


must).3 The precision of the scheme can be affected.4 Therefore it is better to keep the time steps close to equidistant.

5 Then the condition number of matrix S increases, deteriorating theresults of steps (1) and (3).

10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v


must).3 The precision of the scheme can be affected.4 Therefore it is better to keep the time steps close to equidistant.5 Then the condition number of matrix S increases, deteriorating the


10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v




QUANTIFY ?

10/41


Specifications


n=1 kn = T


(1) (S ⊗ Ix)G = Fh,(2) ( 1

kn−∆h)vn = G n, 1 ≤ n ≤ N,

(3) uh = (S ⊗ Ix)v




QUANTIFY ? STRATEGIZE ?

10/41


Definitions

u(t, ·) = S(t)u0

(B ⊗ Ix + It ⊗ (−∆h)) u = Fh

u ←→ T = (k1, . . . kN)

u ←→ T = (k , . . . , k),

(1) (S ⊗ Ix)G = Fh,

(2)( 1

kn−∆h

)un = G n, 1 ≤ n ≤ N,

(3) u = (S ⊗ Ix)v.

11/41


Total error

|S(T )u0 − uN ||u0|

≤ |S(T )u0 − uN ||u0|︸︷︷︸

truncation error with equal time steps

+|uN − uN ||u0|︸︷︷︸

error due to heterogeneous time steps

+|uN − uN ||u0|︸︷︷︸

error due to diagonalization

12/41


Outline

1 Introduction



4 Bibliography

13/41


Program for the wave equation

• The PDE u −∆u = 0.

• Work on the O.D.E. u + a2u = 0 (Fourier in space, a = ‖ξ‖) withCrank-Nicolson scheme.

1 Evaluate the loss of precision produced by a set of kn = ρn−1k1 for

ρ = 1 + ε .

2 write (B + a2I )U = F , and B = SDS−1

3 Find explicit forms for S and S−1.4 For given a and T , estimate the round-off error for the resolution of

the diagonalized system.5 For given a and T , equilibrate 1 and 4.

• Apply to the P.D.E.

14/41






ρ = 1 + ε .


3 Find explicit forms for S and S−1.

4 For given a and T , estimate the round-off error for the resolution ofthe diagonalized system.

5 For given a and T , equilibrate 1 and 4.


14/41






ρ = 1 + ε .



the diagonalized system.

5 For given a and T , equilibrate 1 and 4.


14/41






ρ = 1 + ε .





14/41






ρ = 1 + ε .





Perturbation analysis in ε

14/41


1 Introduction



4 Bibliography

15/41


The Crank-Nicolson method

dtu = udt u = uu + a2u = 0

1kn

(un − un−1) = 12 (un + un−1),

1kn

(un − un−1) = 12 (un + un−1),

un + a2un = 0.

U =

(uau

), Un =

(unaun

)

16/41


The Crank-Nicolson method

dtu = udt u = uu + a2u = 0

1kn

(un − un−1) = 12 (un + un−1),

1kn

(un − un−1) = 12 (un + un−1),

un + a2un = 0.

U =

(uau

), Un =

(unaun

)

kn(ρ) := ρn−1k1, Tρ := (k1 · · · , kN) = k1(1, · · · , ρN−1),N∑

n=1kn = T

THEOREM Given a,T and N, for ε small,

‖UN(T1+ε)− UN(T1)‖ = φ( aT2N ,N) ε2 ‖U0‖+O(ε3),

where φ(y ,N) := N(N2−1)6

y3

(1+y2)2 .

16/41


Matrix formulation (J. Rannou, T. Tran’s Thesis)

dtu = u

dt u = u

u − a2u = 0

1kn

(un − un−1) = 12 (un + un−1)

1kn

(un − un−1) = 12 (un + un−1)

un − a2un = 0

17/41



dtu = u

dt u = u

u − a2u = 0

1kn

(un − un−1) = 12 (un + un−1)

1kn

(un − un−1) = 12 (un + un−1)

un − a2un = 0

u = (u1, · · · , uN) (B + aI )u = f

17/41



dtu = u

dt u = u

u − a2u = 0

1kn

(un − un−1) = 12 (un + un−1)

1kn

(un − un−1) = 12 (un + un−1)

un − a2un = 0

u = (u1, · · · , uN) (B + aI )u = f

B = (C−1B1)2

C =1

2

11 1

. . .. . .

1 1

B1 =

1/k1

−1/k2 1/k2

. . .. . .

−1/kN 1/kN

17/41



dtu = u

dt u = u

u − a2u = 0

1kn

(un − un−1) = 12 (un + un−1)

1kn

(un − un−1) = 12 (un + un−1)

un − a2un = 0

u = (u1, · · · , uN) (B + aI )u = f

B = (C−1B1)2

C =1

2

11 1

. . .. . .

1 1

B1 =

1

k1

1− 1ρ

1ρ

. . .. . .

− 1ρN−1

1ρN−1

17/41


Computation of the eigenvectors

Special family : triangular unipotent Toeplitz matrices

T (X1, . . . ,XM−1) =

1

X1. . . 0

X2. . . 1

.... . .

. . .. . .

XN−1 X2 X1 1

THEOREM kn = ρn−1k1 =⇒ B = VDV−1, with

V = T (P1, . . . ,PN−1), with Pn :=n∏

j=1

1 + ρj

1− ρj ,

V−1 = T (Q1, . . . ,QN−1), with Qn := ρ−nn∏

j=1

1 + ρ−j+2

1− ρ−j

D = diag(4

k12, · · · , 4

kN2)

18/41


Sketch of proof

THEOREM

(1) V = T (P1, . . . ,PN−1) Pn =n∏

i=1

1 + ρj

1− ρj easy

(2) V−1 = T (Q1, . . . ,QN−1) Qn =n∏

i=1

1 + ρ−j+2

1− ρ−j ??

Equivalent to proving that

Pn + Pn−1Q1 + . . .+ P1Qn−1 + Qn = 0 for 1 ≤ n ≤ N − 1

Convention: P0 = Q0 = 1.

19/41


Sketch of proof

n∑

k=0

Pk Qn−k = 0, Pn =n∏

i=1

1 + ρj

1− ρj

20/41


Sketch of proof

n∑

k=0


i=1

1 + ρj

1− ρj

Gauss’ hypergeometric series (1812)

2F1(a1, a2; b; x) :=∞∑n=0

[a1]n[a2]n

[b]n n!xn,

[a]n := a(a+1) · · · (a+n−1) =Γ(a + n)

Γ(a)

20/41


Sketch of proof

n∑

k=0


i=1

1 + ρj

1− ρj


2F1(a1, a2; b; x) :=∞∑n=0

[a1]n[a2]n

[b]n n!xn,

[a]n := a(a+1) · · · (a+n−1) =Γ(a + n)

Γ(a)

Heine’s q-hypergeometric series (1847)

2ϕ1(a1, a2; b; ρ; x) :=∞∑n=0

(a1; ρ)n(a2; ρ)n

(b; ρ)n(ρ; ρ)nxn,

(a; ρ)n := (1−a)(1−ρa) · · · (1−ρn−1a)

20/41


Sketch of proof

n∑

k=0


i=1

1 + ρj

1− ρj


2F1(a1, a2; b; x) :=∞∑n=0

[a1]n[a2]n

[b]n n!xn,

[a]n := a(a+1) · · · (a+n−1) =Γ(a + n)

Γ(a)

Summation formula

2F1(a1, a2; b; 1) =Γ(b)Γ(b − a1 − a2)

Γ(b − a1)Γ(b − a2)


2ϕ1(a1, a2; b; ρ; x) :=∞∑n=0

(a1; ρ)n(a2; ρ)n


(a; ρ)n := (1−a)(1−ρa) · · · (1−ρn−1a)

Summation formula

2ϕ1(a1, a2; b; ρ;b

a1a2) =

( ba1

; ρ)∞( ba2

; ρ)∞

(b; ρ)∞( ba1a2

; ρ)∞

20/41


Sketch of proof

n∑

k=0


i=1

1 + ρj

1− ρj


2F1(a1, a2; b; x) :=∞∑n=0

[a1]n[a2]n

[b]n n!xn,

[a]n := a(a+1) · · · (a+n−1) =Γ(a + n)

Γ(a)

Summation formula

2F1(a1, a2; b; 1) =Γ(b)Γ(b − a1 − a2)

Γ(b − a1)Γ(b − a2)


2ϕ1(a1, a2; b; ρ; x) :=∞∑n=0

(a1; ρ)n(a2; ρ)n


(a; ρ)n := (1−a)(1−ρa) · · · (1−ρn−1a)

Summation formula

2ϕ1(a1, a2; b; ρ;b

a1a2) =

( ba1

; ρ)∞( ba2

; ρ)∞

(b; ρ)∞( ba1a2

; ρ)∞

Pn =(−ρ; ρ)n(ρ; ρ)n

20/41


Sketch of proof, continue

n∑k=0


i=1

1 + ρj

1 − ρj=

(−ρ; ρ)n

(ρ; ρ)n

2ϕ1(a1, a2; b; ρ;b

a1a2) :=

∞∑k=0

(a1; ρ)k (a2; ρ)k

(b; ρ)k (ρ; ρ)k

(b

a1a2

)k

=( ba1

; ρ)∞( ba2

; ρ)∞

(b; ρ)∞( ba1a2

; ρ)∞

(a; ρ)k :=k−1∏

i=0

(1− ρia)

21/41



n∑k=0


i=1

1 + ρj

1 − ρj=

(−ρ; ρ)n

(ρ; ρ)n

2ϕ1(a1, a2; b; ρ;b

a1a2) :=

∞∑k=0

(a1; ρ)k (a2; ρ)k

(b; ρ)k (ρ; ρ)k

(b

a1a2

)k

=( ba1

; ρ)∞( ba2

; ρ)∞

(b; ρ)∞( ba1a2

; ρ)∞

(a; ρ)k :=k−1∏

i=0

(1− ρia)

q-Zhu-Vandermonde formula

21/41



n∑k=0


i=1

1 + ρj

1 − ρj=

(−ρ; ρ)n

(ρ; ρ)n

2ϕ1(a1, a2; b; ρ;b

a1a2) :=

∞∑k=0

(a1; ρ)k (a2; ρ)k

(b; ρ)k (ρ; ρ)k

(b

a1a2

)k

=( ba1

; ρ)∞( ba2

; ρ)∞

(b; ρ)∞( ba1a2

; ρ)∞

(a; ρ)k :=k−1∏

i=0

(1− ρia)

q-Zhu-Vandermonde formula

a1 = ρ−k , a2 = −ρ, b = −ρ−k+2,

n∑

k=0

(−ρ; ρ)k(ρ−n; ρ)k(ρ; ρ)k(−ρ−n+2; ρ)k

ρk = 0.

21/41


Matrix S, properties

Normalize the eigenvectors with respect to the `2 norm: S = V D,di = 1/‖V (i)‖2.

B = VDV−1 = SDS−1.

22/41


Roundoff estimate

(1) (B + aI )u = F , (2) S(D + aI )S−1u = F

Backward error analysis (Higham, Golub):u denotes the machine precision

(2) ⇐⇒ (B+δB)u = F , ‖δB‖ ≤ (2N+1)u‖ |S ||S−1| ‖‖D+aI‖+O(u2).

‖u− u‖‖u‖ ≤ cond(B)

‖δB‖‖B‖ ≤ (2N + 1)u ‖B−1‖ ‖ |S ||S−1| ‖ ‖D + aI‖

23/41


Roundoff estimate

(1) (B + aI )u = F , (2) S(D + aI )S−1u = F

Backward error analysis (Higham, Golub):u denotes the machine precision

(2) ⇐⇒ (B+δB)u = F , ‖δB‖ ≤ (2N+1)u‖ |S ||S−1| ‖‖D+aI‖+O(u2).

‖u− u‖‖u‖ ≤ cond(B)

‖δB‖‖B‖ ≤ (2N + 1)u ‖B−1‖ ‖ |S ||S−1| ‖ ‖D + aI‖

THEOREM.‖u− u‖∞‖u‖∞

. u ψ1( aT2N ,N)ε−(N−1) ,

where ψ1(y ,N) :=22(N+1)

(N − 1)!(1 + 2N(N − 1))(1 + y2).

Sharper estimate: ψ3(y ,N) := 22N− 12 N

(N−1)!1

y2+1 .

23/41


-50

-40

-30

15

-20

-10

0

10

20

4010

aT

20

N

50

1

2

3

Figure: Comparison of the logarithm of the functions ψj , j = 1, 2, 3

24/41


Total error

|u(tN)− uN ||u0|︸︷︷︸

Total error

.|u(tN)− uN ||u0|︸︷︷︸

error 1: approximation with equal time steps

+|uN − uN ||u0|︸︷︷︸

error 2: due to heterogeneous time steps

φ(y ,N)ε2

+|uN − uN ||u0|︸︷︷︸

error 3: due to diagonalization

ψ3(y ,N) u ε−(N−1)

25/41


Total error

Total error . error 1: approximation with equal time steps+ error 2: due to heterogeneous time steps+ error 3: due to diagonalization

10-2

10-1

100

10-15

10-10

10-5

100

105

10-2

10-1

100

10-20

10-10

100

1010

1020

T = 5, a = 1, N = 10 T = 10, a = 1, N = 20

25/41


Optimization of ε

THEOREM For ε = ε∗(aT ,N) with

ε∗(aT ,N) =

(3 22N

(N2 − 1)(N − 1)!

1 + y2

y3u

) 1N+1

, with y =aT

2N,

the error due to time parallelization is asymptotically comparable to theone produced by the geometric time partition.

26/41


Optimization of ε

THEOREM For ε = ε∗(aT ,N) with

ε∗(aT ,N) =

(3 22N

(N2 − 1)(N − 1)!

1 + y2

y3u

) 1N+1

, with y =aT

2N,

the error due to time parallelization is asymptotically comparable to theone produced by the geometric time partition.

φ(y ,N)ε2

︸︷︷︸Discretization error

= ψ3(y ,N) u ε−(N−1)

︸︷︷︸Parallelization error

26/41


Limiting value ε∗(aT ,N)

0.0010.0010.0010.001

0.010.010.010.01

0.030.030.03

0.03

0.050.05

0.05

0.05

0.0750.075

0.075

0.075

0.09

0.09

0.09

0.09

0.1

0.1

0.1

0.1

0.11

0.11

0.11

0.11

0.1

2

0.12

0.12

0.12

0.1

3

0.1

3

0.1

4

0.1

4

2 4 6 8 10 12 14

aT

5

10

15

20

25

30

35

40

N

ε∗(aT ,N)

0.00010.00010.00010.0001

0.010.01

0.01

0.01

0.10.1

0.10.1

11

1

1

1

5

5

5

5

5

0.0

1

0.0

1

0.0

1

0.0

1

0.1

0.1

0.1

0.1

0.1

0.3

0.3

0.3

0.3

0.3

1

1

1

1

2 4 6 8 10 12 14

aT

5

10

15

20

25

30

35

40

Nerror 2/error 1(blue), and error 1

(red).

27/41


One dimensional wave equation

-110

0

1

xt

0.5 0.51 0

-10 1

0

1

xt

0.5 0.501

-10 1

0

1

xt

0.5 0.51 0

Figure: Approximate solutions obtained by the time parallel algorithm usingdiagonalization. Left: ε = 0.015. Middle: ε = ε∗ = 0.05. Right: ε = 0.3.

28/41


Two dimensional wave equation

10-2

10-1

100

10-15

10-10

10-5

100

105

10-2

10-1

100

10-20

10-10

100

1010

1020

Figure: Discretization and parallelization errors in 1d, together with ourtheoretical bounds for the PDE. Left: T = 1, N = 10. Right: T = 2, N = 20.

29/41


Description

Response of a carbon/epoxy laminated composite panel (used inaeronautical industry) to an impact-like loading (transverse isotropicHooke law).

Figure: Mesh configuration and loading for the elasticity problem.

2000 time steps over the 10ms simulation range. 152607 degrees offreedom, 2000 time steps over the 10ms simulation range (time windows).

30/41


Results, MPI

Figure: Computing times for the industrial elasticity problem.

31/41


Results

0 2 4 6 8 10

time (ms)

0.06

0.05

0.04

0.03

0.02

0.01

0.00

0.01

dis

pla

cem

ent

(mm

)

sequential solution (ε= 0)

N=16 parallel solution (ε= 0. 25)

Figure: Deflection of the central node on the back face of the plate for thesequential and the parallel solution with N = 16.

N 2 4 8 16

Eff := Time(1proc)N×Time(Nproc) 0.96 0.86 0.66 0.45

32/41


Improving the efficiency: asynchronous computations

N Numwin Time Error Eff1 128 0.497E+01 5.77E-0072 64 0.254E+01 6.13E-007 97.83 %4 32 0.132E+01 7.71E-007 94.13 %8 16 0.709E+00 1.71E-006 88.75 %

16 8 0.407E+00 5.15E-005 77.65 %

33/41


Improving the efficiency: asynchronous computations

N Numwin Time Error Eff1 128 0.497E+01 5.77E-0072 64 0.254E+01 6.13E-007 97.83 %4 32 0.132E+01 7.71E-007 94.13 %8 16 0.709E+00 1.71E-006 88.75 %

16 8 0.407E+00 5.15E-005 77.65 %

N 2 4 8 16CG 0.243E+01 0.125E+01 0.636E+00 0.319E+00

Total 0.254E+01 0.132E+01 0.709E+00 0.407E+00

33/41


Outline

1 Introduction



4 Bibliography

34/41


Conclusion

1 Robust strategy for parallelization in time. Independent of thespace-discretization.

2 The gain in optimal number of processors is significative: one couldsolve the problem using 30 processors, and would obtain an errorwhich is within a factor two of the sequential computation.

3 Extension to nonlinear problems, coupled with Newton (DD23).

35/41


Perspectives

1 Parallelization in space in combination with the time-parallel methodto solve the PDE thus adding another dimension to theparallelization process through a completely parallel time-spacesubdomains.

2 Application to control problems.

36/41


Outline

1 Introduction



4 Bibliography

37/41


A few references for iterative methods

J.-L. Lions, Y. Maday, and G. Turinici.Resolution d’EDP par un schema en temps “parareel”.C. R. Acad. Sci. Paris Ser. I Math., 332(7):661–668, 2001.

Amodio, Pierluigi, and Luigi Brugnano.Parallel solution in time of ODEs: some achievements and perspectives.Applied Numerical Mathematics ,59 (3): 424–435, 2009.

M. J. Gander and S. Guttel.PARAEXP: A parallel integrator for linear initial-value problems.SIAM Journal on Scientific Computing, 35(2):C123–C142, 2013.

38/41


References for the direct method

Y. Maday and E. M. Rønquist.Parallelization in time through tensor-product space–time solvers.Comptes Rendus Mathematiques, 346(1):113–118, 2008.

J. Rannou, J. Ryan,Time parallelization of linear transient dynamic problems through theNewmark tensor-product form.ECCOMAS, Wien, september 2012

M. Gander, L. Halpern, J. Ryan, and T. T. B. Tran.A direct solver for time parallelization.DD22, Lugano, september 2014, proceeding to appear

M. Gander, L. Halpern, J. Rannou, and J. RyanA Direct Time Parallel Solver by Diagonalization for the Wave Equation.

to be submitted soon

39/41


Nonlinear problems

ut = f (u),un − un−1

kn= f (un), F(u) := Bu− f (u) = 0

Newton’s method, D(u) := diag(f ′(u1), f ′(u2), . . . , f ′(un))

(B − D(um−1))um = f(um−1)− D(um−1)um−1,

Quasi-Newton D(u) ≈ 1

N

n∑

j=1

f ′(uj) I .

40/41


Nonlinear problems

iteration

0 1 2 3 4 5

err

or

10-5

100

Newton Ex 1

Quasi-Newton Ex 1

Newton Ex 2

Quasi-Newton Ex 2

epsilon10

-310

-210

-110

0

err

or

10-2

10-1

100

101

Newton Ex 1Quasi-Newton Ex 1Newton Ex 2Quasi-Newton Ex 2

Figure: Left: linear convergence of the time parallel Quasi-Newton method fortwo model problems(−u2 and

√u) . Right: accuracy for different choices of

the time grid stretching ε.

41/41

a direct solver for time parallelization of wave equationshalpern/talkcsi.pdf · (h size of the...

Documents