bi-cg, cgs, bi-cgstab and implementation aspects...convergence behavior bi-cg 0 10 20 30 40 50 60 70...

Bi-CG, CGS, Bi-CGSTABand implementation aspects

Henk van der Vorst

January 8, 2007, Francqui Masterclass – p. 1/??

http://www.math.uu.nl/people/vorst

Krylov subspace

Standard iteration: x(i) = x(i−1) + r(i−1)

Take x(0) = 0, then x(i) = r(0) + r(1) + . . . + r(i−1)

Hence x(i) =∑

k(I − A)kr(0)

This shows that x(i) can be expressed as a sum

of powers of A times r(0)

x(i) ∈ span{r(0), Ar(0), . . . , A(i−1)r(0)} ≡ Ki(A; r(0))

Krylov subspace of dimension i generated by A and r(0)

general x(i) ∈ Ki(A; r(0)) can be written as x(i) = Qi−1(A)r(0)

corresponding residual r(i) = b − Ax(i) = (I − AQi−1(A))r(0)

Hence r(i) = Pi(A)r(0), with Pi(0) = 1


The Petrov-Galerkin approach

The usual approach is to construct an xi, such that

ri ⊥ Ki(AT ; s0), s0 = r0, or AT r0, or random, or ...

Can be shown that this can be done by construction of biorthog onal

basis {vj} for Ki(A; r0) and {wj} for Ki(AT ; s0), with vTj wk = 0

these two sets of basis vectors can be generated by three term recurrences

Leads to AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di

We look for xi ∈ Ki(A; r0), which means xi = Viyi, such that

W Ti (b − AViyi) = 0, and hence W T

i ViTi,iyi = W Ti b, or:

DiTi,iyi = bi, solution as for CG: BiCG

many practical problems: breakdowns; irregular convergen ce


Convergence behavior Bi-CG

0 10 20 30 40 50 60 70 80 90−14

−12

−10

−8

−6

−4

−2

0

2

4comparison of Bi−CG and CGS for definite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: Bi-CG


Convergence behavior Bi-CG (2)

0 50 100 150−6

−4

−2

0

2

4

6comparison of Bi−CG and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)



Bi-CG and variants

with short recurrencies we can construct

xi such that ri ⊥ Ki(AT ; s0)

• 2 MV’s per iteration ( A and AT )

• CG-like computational overhead (twice!)

• CG-like memory requirements (twice!)

• not optimal in Ki(A; r0)

• more iterations than GMRES: ‖AxBiCGi − b‖2 ≥ ‖AxGMRES

i − b‖2

The choice of s0 gives freedown, e.g., r0, AT r0, random


variants of Bi-CG: QMR

Bi-orthogonalization leads to:

AVi = Vi+1Ti+1,i and ATWi = Wi+1Ti+1,i, with W Ti Vi = Di

Try GMRES idea, that is try to minimize ‖b − Axi‖2 for xi ∈ Ki(A; r0)

Since xi = Viy and b = µv1,

we have ‖b − Axi‖2 = ‖b − AVixi‖2 = ‖µVi+1e1 − Vi+1Ti+1,iy‖2

In the case of GMRES, we had Vi+1 orthogonal, so could skip Vi+1

Now we pretend as if the Bi-CG Vi+1 is orthogonal, and we minimize:

‖µe1 − Ti+1,iy‖2 NOT minimum residual: Quasi-Minumum Residual

Solve small system as in GMRES (with Givens rotations)

QMR (Freund & Nachtigal, 1991)

- slightly better than Bi-CG - more smooth convergence - more iterations than GMRES


variants of Bi-CG: CGS

Basis for Km(A; r0) and Km(AT ; s0) with same 3-term recursions:

ri = Ri(A)r0, and also: ri = Ri(AT )s0

Bi-CG co efficients through innerproducts like (ri, ri)

Sonneveld (1984):

(ri, ri) = (Ri(A)r0, Ri(AT )s0) = (Ri(A)Ri(A)r0, s0)

rj not necessary!!; no operations with AT

However: now we need recursions for R2i (A)r0 and other vectors

By the way: would be nice to have ri = R2i (A)r0 and corresponding xi; why?


Bi-CG Algorithm

r0 = b − Ax0, r0 arbitraryfor i = 1, 2, 3, ...

ρi−1 = (ri−1, ri−1)if i = 1

p1 = r0; p1 = r1

elseβi−1 = ρi−1/ρi−2; pi = ri−1 + βi−1pi−1

pi = ri−1 + βi−1pi−1

qi = Api; qi = AT pi

αi = ρi−1/(pi, qi)xi = xi−1 + αipi

ri = ri−1 − αiqi

ri = ri−1 − αiqi

end


Bi-CG recursions

Focus on recursions in Bi-CG

pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiqi = ri−1 − αiApi

pi and ri can be expressed as: ri = Ri(A)r0 and pi = Pi−1(A)r0

We are interested in ri = R2i (A)r0

From recursion for ri: Ri(A) = Ri−1(A) − αiAPi−1(A)

and from pi we have: Pi−1(A) = Ri−1(A) + βi−1Pi−2(A)

Squaring the expression for Ri(A) gives:

R2i (A) = R2

i−1(A) + α2i A

2P 2i−1(A) − 2αiARi−1(A)Pi−1(A)

Now we need also recursions for P 2i−1(A) and Ri−1(A)Pi−1(A)


Bi-CG recursions (2)

recursions for P 2i−1(A) and Ri−1(A)Pi−1(A):

pi = ri−1 + βi−1pi−1 and ri = ri−1 − αiApi with ri = Ri(A)r0 and pi = Pi−1(A)r0

Squaring the expression for pi gives:

P 2i−1(A) = R2

i−1(A) + β2i−1P

2i−2(A) + 2βi−1Ri−1(A)Pi−2(A)

Continuing in this fashion leads to recursions for:

ri ≡ R2i (A)r0 (and for corresponding xi)

pi ≡ P 2i−1(A)r0

ui ≡ Ri−1(A)Pi−1(A)r0 and

qi−1 ≡ Ri−1(A)Pi−2(A)r0

CGS


CGS Algorithm

r0 = b − Ax0, r arbitraryfor i = 1, 2, 3, ...

ρi−1 = (r, ri−1)if i = 1

u1 = r0; p1 = u1

elseβi−1 = ρi−1/ρi−2; ui = ri−1 + βi−1pi−1

pi = ui + βi−1(qi−1 + βi−1pi−1)Solve p from Kp = pi

vi = Apαi = ρi−1/(r, vi)qi = ui − αivi

Solve z from Kz = ui + qi

xi = xi−1 + αizri = ri−1 − αiAz

end


Convergence behavior Bi-CG (2)

0 50 100 150−6

−4

−2

0

2

4

6comparison of Bi−CG and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)



variants of Bi-CG: CGS

• CGS (Sonneveld, 1989)

- 2 MV’s in BiCG can be used to apply BiCG twice: CGS

- same costs as BiCG

- often twice as fast

- very irregular convergence

- often faster than GMRES

- more MV’s than GMRES ( but far less overhead)


Convergence behavior CGS

0 20 40 60 80 100 120 140 160 180−15

−10

−5

0

5

10comparison exact error and CGS for indefinite A

iteration number

10lo

g(re

sidu

al)

dots: CGS- ri, line: true residuals


Computed and true residuals

Algorithm Template for Krylov Method:Input: x0; r0 = b − Ax0;For i = 1, 2, · · · until convergence

Generate pi by the method;xi = xi−1 + pi

ri = ri−1 − Api

End for

rn is the computed residual

b − Axn is the true residual

in exact arithmetic they are equal

Examples: CG, Bi-CG, CGS, and BiCGSTAB


Are peaks bad?

Bi-CG type processes (Bi-CG, CGS, ...):

xi = xi−1 + αipi

ri = ri−1 − αiApi

errors in xi no effect on ri

In finite precision:

ri = ri−1 − αiApi − αi∆Api

|∆A| ≤ nAξ|A|

ri − (b − Axi) = −∑i

j=1 αj∆Apj

|‖ri‖2 − ‖b − Axi‖2| ≤2 i nAξ‖|A|‖‖A−1‖maxj ‖rj‖


Cure: reliable updating

from suggestion by Neumaier ’94 made for CGS

x = x0; r = r0; xu = 0. . .for i = 0, 1, 2, . . .

. . .xu = xu + αipi

r = r − αiApi

. . .if (‖r‖ < ‖r‖ ∧ i − iprev < mi)

x = x + xu

r = r = b − Axxu = 0

endif

if ‖r‖ ≈ ξ‖r0‖: ri ≈ b − Axi

for analysis: Sleijpen and VDV ’94, simple criteria for mi: Ye and VDV ’99


motivation to improve CGS

GOALS:

smoother convergence

faster convergence

POSSIBILITIES:

clever choice of s0

instead of ri = R2i (A)r0: ri = Ri(A)Ri(A)r0

with ”damping” Ri


Variants of Bi-CG: Bi-CGSTAB

Construct ri = Ri(A)Ri(A)r0

Idea: take simple Ri(A):

Ri(A) = (I − ω1A)(I − ω2A) · · · (I − ωiA)

Leads to simple recursions, but how to select ωj?

Take ωj such that it minimizes ‖rj‖2 wrt ωj , for

residuals that are expressed as rj = Rj(A)Rj(A)r0

Leads directly to Bi-CGSTAB (vdv 1992)

in fact combination of Bi-CG with product of GMRES(1) steps

≈ same costs as BiCG, often much faster than BiCG

much smoother than CGS, often faster than CGS

breakdown when GMRES(1) stagnates, poor when GMRES(1) is ve ry poor


Bi-CGSTAB Algorithm (with prec.)

r0 = b − Ax0

ρ−1 = α−1 = ω−1 = 1v−1 = p−1 = 0for i = 0, 1, 2, ...

ρi = (r0, ri), βi−1 = ρi

ρi−1

αi−1

ωi−1

pi = ri + βi−1(pi−1 − ωi−1vi−1)Solve p from Kp = pi

vi = Apαi = ρi/(r0, vi)s = ri − αivi

Solve z from Kz = st = Az

ωi = (t,s)(t,t)

x(i+1) = x(i) + αip + ωiz

if x(i+1) is accurate enough then stopri+1 = s − ωit

end


variants of Bi-CG (3)

Bi-CGSTAB2: Gutknecht 1993: recombine successive Bi-CGST AB iterations

BiCGSTAB(2) (Sleijpen, Fokkema, vdVorst ’94) :

- after each two BiCG steps: GMRES(2)

- often faster than BiCGSTAB

- also for near skew-symm. matrices

- ≈ same costs as BiCG (and CGS)

- can be further generalized: BiCGSTAB( ℓ)

- BiCGSTAB(4): fast and rather robust

- but, of course, breakdown when GMRES( ℓ) stagnates


avoiding breakdown

two reasons for breakdown in Bi-CGSTAB methods

(1) Bi-CG part may break-down: Look-ahead techniques (comp licated)

(2) GMRES-part gives no reduction: no expansion of Krylov su bspace

In that case, use a combination of GMRES and FOM

Gives locally larger ‖ri‖, but often helps to restore

global convergence (Sleijpen and VDV ’95)


how to select?

For Ax = b, with A 6= AT , A ∈ IRn×n

1. If overhead no problem: GMRES

2. if too much overhead:

QMR, BiCGSTAB,

TFQMR, CGS, Bi-CGSTAB( ℓ)

3. Variable preconditioning: GMRESR, FGMRES


Often preconditioning required

Convergence behavior depends on spectral properties

Iterative methods often applied to

ℓ left-preconditioned systemK−1Ax = K−1b

r right-preconditioned systemAK−1z = b

c central-preconditioned systemL−1AU−1w = L−1b

If K (= LU) is a good aproximation to A then all iterative methods robust


bi-cg, cgs, bi-cgstab and implementation aspects...convergence behavior bi-cg 0 10 20 30 40 50 60 70...

Documents