bi-cg, cgs, bi-cgstab and implementation aspects...convergence behavior bi-cg 0 10 20 30 40 50 60 70...

25
Bi-CG, CGS, Bi-CGSTAB and implementation aspects Henk van der Vorst January 8, 2007, Francqui Masterclass – p. 1/??

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CG, CGS, Bi-CGSTABand implementation aspects

Henk van der Vorst

January 8, 2007, Francqui Masterclass – p. 1/??

Page 2: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Krylov subspace

Standard iteration: x(i) = x(i−1) + r(i−1)

Take x(0) = 0, then x(i) = r(0) + r(1) + . . . + r(i−1)

Hence x(i) =∑

k(I − A)kr(0)

This shows that x(i) can be expressed as a sum

of powers of A times r(0)

x(i) ∈ span{r(0), Ar(0), . . . , A(i−1)r(0)} ≡ Ki(A; r(0))

Krylov subspace of dimension i generated by A and r(0)

general x(i) ∈ Ki(A; r(0)) can be written as x(i) = Qi−1(A)r(0)

corresponding residual r(i) = b − Ax(i) = (I − AQi−1(A))r(0)

Hence r(i) = Pi(A)r(0), with Pi(0) = 1

January 8, 2007, Francqui Masterclass – p. 2/??

Page 3: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

The Petrov-Galerkin approach

The usual approach is to construct an xi, such that

ri ⊥ Ki(AT ; s0), s0 = r0, or AT r0, or random, or ...

Can be shown that this can be done by construction of biorthog onal

basis {vj} for Ki(A; r0) and {wj} for Ki(AT ; s0), with vTj wk = 0

these two sets of basis vectors can be generated by three term recurrences

Leads to AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di

We look for xi ∈ Ki(A; r0), which means xi = Viyi, such that

W Ti (b − AViyi) = 0, and hence W T

i ViTi,iyi = W Ti b, or:

DiTi,iyi = bi, solution as for CG: BiCG

many practical problems: breakdowns; irregular convergen ce

January 8, 2007, Francqui Masterclass – p. 3/??

Page 4: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Convergence behavior Bi-CG

0 10 20 30 40 50 60 70 80 90−14

−12

−10

−8

−6

−4

−2

0

2

4comparison of Bi−CG and CGS for definite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: Bi-CG

January 8, 2007, Francqui Masterclass – p. 4/??

Page 5: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Convergence behavior Bi-CG (2)

0 50 100 150−6

−4

−2

0

2

4

6comparison of Bi−CG and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: Bi-CG

January 8, 2007, Francqui Masterclass – p. 5/??

Page 6: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CG and variants

with short recurrencies we can construct

xi such that ri ⊥ Ki(AT ; s0)

• 2 MV’s per iteration ( A and AT )

• CG-like computational overhead (twice!)

• CG-like memory requirements (twice!)

• not optimal in Ki(A; r0)

• more iterations than GMRES: ‖AxBiCGi − b‖2 ≥ ‖AxGMRES

i − b‖2

The choice of s0 gives freedown, e.g., r0, AT r0, random

January 8, 2007, Francqui Masterclass – p. 6/??

Page 7: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

variants of Bi-CG: QMR

Bi-orthogonalization leads to:

AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di

Try GMRES idea, that is try to minimize ‖b − Axi‖2 for xi ∈ Ki(A; r0)

Since xi = Viy and b = µv1,

we have ‖b − Axi‖2 = ‖b − AVixi‖2 = ‖µVi+1e1 − Vi+1Ti+1,iy‖2

In the case of GMRES, we had Vi+1 orthogonal, so could skip Vi+1

Now we pretend as if the Bi-CG Vi+1 is orthogonal, and we minimize:

‖µe1 − Ti+1,iy‖2 NOT minimum residual: Quasi-Minumum Residual

Solve small system as in GMRES (with Givens rotations)

QMR (Freund & Nachtigal, 1991)

- slightly better than Bi-CG - more smooth convergence - more iterations than GMRES

January 8, 2007, Francqui Masterclass – p. 7/??

Page 8: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

variants of Bi-CG: CGS

Basis for Km(A; r0) and Km(AT ; s0) with same 3-term recursions:

ri = Ri(A)r0, and also: ri = Ri(AT )s0

Bi-CG co efficients through innerproducts like (ri, ri)

Sonneveld (1984):

(ri, ri) = (Ri(A)r0, Ri(AT )s0) = (Ri(A)Ri(A)r0, s0)

rj not necessary!!; no operations with AT

However: now we need recursions for R2i (A)r0 and other vectors

By the way: would be nice to have ri = R2i (A)r0 and corresponding xi; why?

January 8, 2007, Francqui Masterclass – p. 8/??

Page 9: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CG Algorithm

r0 = b − Ax0, r0 arbitraryfor i = 1, 2, 3, ...

ρi−1 = (ri−1, ri−1)if i = 1

p1 = r0; p1 = r1

elseβi−1 = ρi−1/ρi−2; pi = ri−1 + βi−1pi−1

pi = ri−1 + βi−1pi−1

qi = Api; qi = AT pi

αi = ρi−1/(pi, qi)xi = xi−1 + αipi

ri = ri−1 − αiqi

ri = ri−1 − αiqi

end

January 8, 2007, Francqui Masterclass – p. 9/??

Page 10: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CG recursions

Focus on recursions in Bi-CG

pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiqi = ri−1 − αiApi

pi and ri can be expressed as: ri = Ri(A)r0 and pi = Pi−1(A)r0

We are interested in ri = R2i (A)r0

From recursion for ri: Ri(A) = Ri−1(A) − αiAPi−1(A)

and from pi we have: Pi−1(A) = Ri−1(A) + βi−1Pi−2(A)

Squaring the expression for Ri(A) gives:

R2i (A) = R2

i−1(A) + α2i A

2P 2i−1(A) − 2αiARi−1(A)Pi−1(A)

Now we need also recursions for P 2i−1(A) and Ri−1(A)Pi−1(A)

January 8, 2007, Francqui Masterclass – p. 10/??

Page 11: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CG recursions (2)

recursions for P 2i−1(A) and Ri−1(A)Pi−1(A):

pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiApi with ri = Ri(A)r0 and pi = Pi−1(A)r0

Squaring the expression for pi gives:

P 2i−1(A) = R2

i−1(A) + β2i−1P

2i−2(A) + 2βi−1Ri−1(A)Pi−2(A)

Continuing in this fashion leads to recursions for:

ri ≡ R2i (A)r0 (and for corresponding xi)

pi ≡ P 2i−1(A)r0

ui ≡ Ri−1(A)Pi−1(A)r0 and

qi−1 ≡ Ri−1(A)Pi−2(A)r0

CGS

January 8, 2007, Francqui Masterclass – p. 11/??

Page 12: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

CGS Algorithm

r0 = b − Ax0, r arbitraryfor i = 1, 2, 3, ...

ρi−1 = (r, ri−1)if i = 1

u1 = r0; p1 = u1

elseβi−1 = ρi−1/ρi−2; ui = ri−1 + βi−1pi−1

pi = ui + βi−1(qi−1 + βi−1pi−1)Solve p from Kp = pi

vi = Apαi = ρi−1/(r, vi)qi = ui − αivi

Solve z from Kz = ui + qi

xi = xi−1 + αizri = ri−1 − αiAz

end

January 8, 2007, Francqui Masterclass – p. 12/??

Page 13: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Convergence behavior Bi-CG (2)

0 50 100 150−6

−4

−2

0

2

4

6comparison of Bi−CG and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: Bi-CG

January 8, 2007, Francqui Masterclass – p. 13/??

Page 14: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

variants of Bi-CG: CGS

• CGS (Sonneveld, 1989)

- 2 MV’s in BiCG can be used to apply BiCG twice: CGS

- same costs as BiCG

- often twice as fast

- very irregular convergence

- often faster than GMRES

- more MV’s than GMRES ( but far less overhead)

January 8, 2007, Francqui Masterclass – p. 14/??

Page 15: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Convergence behavior CGS

0 20 40 60 80 100 120 140 160 180−15

−10

−5

0

5

10comparison exact error and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: true residuals

January 8, 2007, Francqui Masterclass – p. 15/??

Page 16: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Computed and true residuals

Algorithm Template for Krylov Method:Input: x0; r0 = b − Ax0;For i = 1, 2, · · · until convergence

Generate pi by the method;xi = xi−1 + pi

ri = ri−1 − Api

End for

rn is the computed residual

b − Axn is the true residual

in exact arithmetic they are equal

Examples: CG, Bi-CG, CGS, and BiCGSTAB

January 8, 2007, Francqui Masterclass – p. 16/??

Page 17: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Are peaks bad?

Bi-CG type processes (Bi-CG, CGS, ...):

xi = xi−1 + αipi

ri = ri−1 − αiApi

errors in xi no effect on ri

In finite precision:

ri = ri−1 − αiApi − αi∆Api

|∆A| ≤ nAξ|A|

ri − (b − Axi) = −∑i

j=1 αj∆Apj

|‖ri‖2 − ‖b − Axi‖2| ≤2 i nAξ‖|A|‖‖A−1‖maxj ‖rj‖

January 8, 2007, Francqui Masterclass – p. 17/??

Page 18: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Cure: reliable updating

from suggestion by Neumaier ’94 made for CGS

x = x0; r = r0; xu = 0. . .for i = 0, 1, 2, . . .

. . .xu = xu + αipi

r = r − αiApi

. . .if (‖r‖ < ‖r‖ ∧ i − iprev < mi)

x = x + xu

r = r = b − Axxu = 0

endif

if ‖r‖ ≈ ξ‖r0‖: ri ≈ b − Axi

for analysis: Sleijpen and VDV ’94, simple criteria for mi: Ye and VDV ’99

January 8, 2007, Francqui Masterclass – p. 18/??

Page 19: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

motivation to improve CGS

GOALS:

smoother convergence

faster convergence

POSSIBILITIES:

clever choice of s0

instead of ri = R2i (A)r0: ri = Ri(A)Ri(A)r0

with ”damping” Ri

January 8, 2007, Francqui Masterclass – p. 19/??

Page 20: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Variants of Bi-CG: Bi-CGSTAB

Construct ri = Ri(A)Ri(A)r0

Idea: take simple Ri(A):

Ri(A) = (I − ω1A)(I − ω2A) · · · (I − ωiA)

Leads to simple recursions, but how to select ωj?

Take ωj such that it minimizes ‖rj‖2 wrt ωj , for

residuals that are expressed as rj = Rj(A)Rj(A)r0

Leads directly to Bi-CGSTAB (vdv 1992)

in fact combination of Bi-CG with product of GMRES(1) steps

≈ same costs as BiCG, often much faster than BiCG

much smoother than CGS, often faster than CGS

breakdown when GMRES(1) stagnates, poor when GMRES(1) is ve ry poor

January 8, 2007, Francqui Masterclass – p. 20/??

Page 21: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Bi-CGSTAB Algorithm (with prec.)

r0 = b − Ax0

ρ−1 = α−1 = ω−1 = 1v−1 = p−1 = 0for i = 0, 1, 2, ...

ρi = (r0, ri), βi−1 = ρi

ρi−1

αi−1

ωi−1

pi = ri + βi−1(pi−1 − ωi−1vi−1)Solve p from Kp = pi

vi = Apαi = ρi/(r0, vi)s = ri − αivi

Solve z from Kz = st = Az

ωi = (t,s)(t,t)

x(i+1) = x(i) + αip + ωiz

if x(i+1) is accurate enough then stopri+1 = s − ωit

end

January 8, 2007, Francqui Masterclass – p. 21/??

Page 22: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

variants of Bi-CG (3)

Bi-CGSTAB2: Gutknecht 1993: recombine successive Bi-CGST AB iterations

BiCGSTAB(2) (Sleijpen, Fokkema, vdVorst ’94) :

- after each two BiCG steps: GMRES(2)

- often faster than BiCGSTAB

- also for near skew-symm. matrices

- ≈ same costs as BiCG (and CGS)

- can be further generalized: BiCGSTAB( ℓ)

- BiCGSTAB(4): fast and rather robust

- but, of course, breakdown when GMRES( ℓ) stagnates

January 8, 2007, Francqui Masterclass – p. 22/??

Page 23: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

avoiding breakdown

two reasons for breakdown in Bi-CGSTAB methods

(1) Bi-CG part may break-down: Look-ahead techniques (comp licated)

(2) GMRES-part gives no reduction: no expansion of Krylov su bspace

In that case, use a combination of GMRES and FOM

Gives locally larger ‖ri‖, but often helps to restore

global convergence (Sleijpen and VDV ’95)

January 8, 2007, Francqui Masterclass – p. 23/??

Page 24: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

how to select?

For Ax = b, with A 6= AT , A ∈ IRn×n

1. If overhead no problem: GMRES

2. if too much overhead:

QMR, BiCGSTAB,

TFQMR, CGS, Bi-CGSTAB( ℓ)

3. Variable preconditioning: GMRESR, FGMRES

January 8, 2007, Francqui Masterclass – p. 24/??

Page 25: Bi-CG, CGS, Bi-CGSTAB and implementation aspects...Convergence behavior Bi-CG 0 10 20 30 40 50 60 70 80 90 −14 −12 −10 −8 −6 −4 −2 0 2 4 comparison of Bi−CG and CGS

Often preconditioning required

Convergence behavior depends on spectral properties

Iterative methods often applied to

ℓ left-preconditioned systemK−1Ax = K−1b

r right-preconditioned systemAK−1z = b

c central-preconditioned systemL−1AU−1w = L−1b

If K (= LU) is a good aproximation to A then all iterative methods robust

January 8, 2007, Francqui Masterclass – p. 25/??