analisis libro 1
TRANSCRIPT
-
7/27/2019 Analisis Libro 1
1/99
Script for the lectures on:Numerical Linear Algebra
Einfuhrung in die numerische Mathematik
Prof. Dr. P.E. Kloeden
Institut fur Mathematik
Johann Wolfgang Goethe Universitat
Zimmer 101, Robert-Mayer-Strae 10
Telefon: (069) 798 28622 Sekretariat (069) 798 22422
email: [email protected]
February 9, 2009
-
7/27/2019 Analisis Libro 1
2/99
Contents
1 Computer Arithmetic 3
2 Vector and matrix norms 7
2.1 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Condition number of a matrix . . . . . . . . . . . . . . . . 12
2.2.2 Fixed point theorem and successive iterations . . . . . . . 13
3 Linear systems of equations 15
3.1 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Row interchange . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Formulation as matrix multiplication . . . . . . . . . . . . . . . . 18
4 The LU decomposition 21
4.1 Row interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Post iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 The LU decomposition of the transposed matrix . . . . . . . . . 28
5 Matrices with a special structure 30
5.1 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.1 Gaussian elimination without row interchange . . . . . . . 30
5.1.2 The LDLT and Cholesky decompositions . . . . . . . . . 31
5.2 Positive definite symmetric matrices . . . . . . . . . . . . . . . . 335.3 Diagonally dominant matrices . . . . . . . . . . . . . . . . . . . . 36
5.4 Band matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Tridiagonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 The QR decomposition 44
6.1 Householder matrices . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2 Construction of the QR factors . . . . . . . . . . . . . . . . . . . 49
6.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1
-
7/27/2019 Analisis Libro 1
3/99
2
7 Iterative methods 53
7.1 Relaxation methods . . . . . . . . . . . . . . . . . . . . . . . . . 587.2 The SOR method . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8 Krylov space methods 65
8.1 Krylov spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.1.1 Properties of Krylov spaces . . . . . . . . . . . . . . . . . 66
8.2 The OR-approach for symmetric, positive definite matrices . . . 67
8.2.1 Existence, uniqueness and minimality . . . . . . . . . . . 67
8.2.2 The OR approach for an A-conjugate basis . . . . . . . . 68
8.3 The CG method for positive definite matrices . . . . . . . . . . . 70
8.3.1 Computing A-conjugate search directions in Kn(A, b) . . 708.3.2 The Algorithm for the CG method . . . . . . . . . . . . . 728.3.3 The CG method for the normal equations . . . . . . . . . 73
8.4 The GMRES method and Arnoldi process . . . . . . . . . . . . . 73
8.4.1 The Arnoldi process . . . . . . . . . . . . . . . . . . . . . 74
8.4.2 A matrix version of the Arnoldi process . . . . . . . . . . 76
9 Calculating eigenvalues 78
9.1 The location of eigenvalues . . . . . . . . . . . . . . . . . . . . . 78
9.1.1 Gerschgorins theorem . . . . . . . . . . . . . . . . . . . . 81
9.2 The power method . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3 The QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.3.1 The QR transformation of Hessenberg matrices . . . . . 929.3.2 Convergence in the simplest case . . . . . . . . . . . . . . 96
-
7/27/2019 Analisis Libro 1
4/99
Chapter 1
Computer Arithmetic
Literature Oevel, Kap. 1.2.3
The number field of a computer is only finite. Hence, in general, we can only
calculate numbers approximately with a computer.
Computers use a floating point representation for numbers, i.e. the numbers
have the form
x =
sign0. x1x2 . . . xd
mantissa10t
base +exponentThe length of the mantissa (here: d) determines the accuracy of the com-
puter representation of a number. For a base b (here b = 10, for computers
b = 2 is typical) we have xi {0, 1, . . . , b 1} and x1 = 0. The exponent is alsobounded: N + 1 t N.
Two immediate consequences of the boundedness of the exponent are overflow
and underflow :
overflow if|x| > 10N, the calculation stops!
underflow if |x| < 10N, the calculation continues with x = 0 (possiblywith difficulties later).
Often we can avoid difficulties with a clever reformulation, e.g.
a = 10N1 = a2 = 102N2 > 10N, overflow!but
a2 + 1 = |a|
1 + a2
3
-
7/27/2019 Analisis Libro 1
5/99
CHAPTER 1. COMPUTER ARITHMETIC 4
= 10N1
1 + 1022N
, underflow 0
= 10N1
1 + 0
= 10N1
underflow can also be dangerous. let a = 10N1
1a2 + 1 a =
1
a
1 + a2 a overflow!
=
1
a awith underflow
=1
0!
but with a little bit of algebra:
1a2 + 1 a =
1a2 + 1 a
a2 + 1 + aa2 + 1 + a
=
a2 + 1 + a
= a
1 + a2 + a
2 10N1 after underflow
Overflow and Underflow are extreme cases.
There is a a more general complication: although each x
[1/10, 1) has a
unique decimal representation
x = 0 x1 . . . xdxd+1 . . . =i=1
xi 10i (x1 = 0),
many x have a representation with more than d digits, e.g.
x =1
3= 0 333 . . . 3 . . . .
In a computer such numbers are replaced by numbers with d digits.
-
7/27/2019 Analisis Libro 1
6/99
CHAPTER 1. COMPUTER ARITHMETIC 5
Truncation x = 0.x1x2 . . . xd . . . 0.x1x2 . . . xd
Rounding x = 0.x1x2 . . . xdxd+1 . . . 0.x1 . . . xd1xdwhere
xd = xd if xd+1 < 5xd + 1 if xd+1 5
Example
0.1234 4 = 0.1234 (Truncation or Rounding),but
0.1234 5 = 0.1234 Truncation, 0.12345 = 0.1235 Rounding
The representation error by rounding is often smaller than by truncation
|x T r(x)| 10 10d1x [1/10, 1)
|x R(x)| 5 10d1
A similar situation holds for more general floating point numbers
x = 0.x1 . . . xd xd+1 10t, x1 = 0
d digits
T r(x) = 0.x1 . . . xd 10t
R(x) = 0.x1 . . .
xd 10t
Here the relative error is more appropriate, e.g.x T r(x)x 10 10d1
Remark Rounding is preferred, but one should nevertheless be careful, e.g.
with 5 digit arithmetic (N 1)
-
7/27/2019 Analisis Libro 1
7/99
CHAPTER 1. COMPUTER ARITHMETIC 6
37654 + 25.874 =37679.874
37679 = 0.874 exact != 37680 (Rounding) 37679 = 1 ! not a single digit is accurate
but
37654 37679 =25
+25.874 = 25 + 25.874 = 0.874 !
i.e. the order in which numbers are handle can be important better with
similar magnitudes.
In addition: Rounding errors can accumulate. How does this happen: this de-
pends on the type of calculation, as we shall see later !
-
7/27/2019 Analisis Libro 1
8/99
Chapter 2
Vector and matrix norms
Literatur Oevel, Kap. 5.13; Stummel/Hainer, Kap. 5
Problems:
many applications in practice involve linear systems of equations Ax = b
= Error estimate, convergence ??
Ax Ay Kx y
e.g., line of best fit problems lead to equations for measurement errors ofthe form
Ax b minx
Starting point: Vector space Rn over a field K = C or K = R
Example x = (x1, x2)T R2
0
x
7
-
7/27/2019 Analisis Libro 1
9/99
CHAPTER 2. VECTOR AND MATRIX NORMS 8
x2 =
x21 + x12 length of x
Definition: A normed space (E, ) consists of a vector space E over a field Kand a norm : E R, where
(i) x 0 for all x(ii) x = 0 x = 0
(iii) x = || x for all K(iv) x + y x + y triangle inequality
Examples: For x = (x1, . . . , xn)T Rn
x2 = |x1|2 + . . . + |xn|2 euclidian norm x = max{|x1|, . . . , |xn|} maximum norm x1 = |x1| + . . . + |xn| summation norm (or Manhattan norm)
are norms on Rn and the inequalities
x1 x2 xhold for all x Rn. In addition, there are the p-norms:
|xp := p
|x1|p + . . . + |xn|p, p = 1, 2, 3, . . . .
Geometrical visualisation
d(x, y) = x y is the distance between two points x and y
Unit ball S(p)1 = {x Rn : xp = 1}, p = 1,2,3, . . . .
Unit balls for p = 1, 2 and
-
7/27/2019 Analisis Libro 1
10/99
CHAPTER 2. VECTOR AND MATRIX NORMS 9
theorem All norms on Rn are equivalent, i.e. for two
norms a and b there exist numbers c, C > 0 withcxa xb Cxa, x Rn
proof:
We will show that an arbitrary norm on Rn is equivalent to the sum-mation norm 1.
Every vector x Rn has a unique coordinate representation x = ni=1 xiei,where ei is the ith unit vector, i.e., with ei,j = i,j (Kronecker delta). Thus we
have
x n
i=1
|xi|ei C1n
i=1
|xi| = x1,
where C1 = maxi=1,...,n ei.The mapping x x is Lipschitz continuous w.r.t. (i.e. with respect to)
the norm x y1:
|x y | x y C1x y1
and S1 = {x Rn : x1 = 1} is compact w.r.t. the norm x y1. Thereforethere exists a constant C0 > 0 with
C0 := minxS1
x,
where C0 > 0, because x = 0 if and only if x = 0. For an arbitrary x Rn \ {0} we have xx1 S1 and xx1
= 1x1 x C0 C0x1 x
Example: x x2
nx
Definition: A sequence {x(k)}k Rn is said to converge to x Rn, when
x(k)i xi k
for i = 1,. . ., n, i.e., component-wise convergence.
theorem:
x(k) x, k x(k) x 0 k
-
7/27/2019 Analisis Libro 1
11/99
CHAPTER 2. VECTOR AND MATRIX NORMS 10
proof: Consider W.L.O.G. (i.e., without loss of generality) x = x =max{|x1|, . . . , |xn|}.
2.1 Matrix norms
The space
Kmn := {A : A = [ai,j ] matrix with m rows and n columns, ai,j K}
is a vector space over the field K
Consider matrices A, B Kmn and a scalar K:
A + B Kmn, A Kmn
Task: Define a norm A on Kmn !
1st Approach: vectorize A = [ai,j ], i.e., reformulate the m n-matrix mn-dim. vector through
a = ai,j , = i + m(j 1), i = 1, . . . , m, j = 1, . . . , n .
Maximum norm: Amax = maxi maxj |ai,j |,
Frobenius norm: A =
mi=1
nj=1
|ai,j |2
2nd Approach: the induced (or natural) matrix norm:
A = supx=0
Axx = maxx=1Ax,
which is often called the operator norm.
Definition: A matrix norm M is said to be consistent with the vector norm V, ifAxV AMxV, A Knn, x Kn
Remark: After matrix multiplication a vector x will become at most A timesbigger.
Remark: The induced matrix norm is the smallest of all consistent norms.
Examples: A Rnn x = (x1, . . . , xn)T Rn
-
7/27/2019 Analisis Libro 1
12/99
CHAPTER 2. VECTOR AND MATRIX NORMS 11
(1) Maximum norm x = max{|x1|, . . . , |xn|}
Ax = maxi
nj=1
ai,jxj
maxi
nj=1
|ai,j | A
maxj
|xj | x
row summation norm A = maxi
nj=1
|ai,j |.
(2) Summation norm x1 = |x1| + . . . + |xn|
Column summation norm
A1 = maxj
ni=1
|ai,j |
(3) A2 =
largest eigenvalue of ATA
theorem All matrix norms onKnn are equivalent.
theorem A(k) converges to A component-wise A(k) A 0 for k .
proof: As for Vector norms.
theorem The induced matrix norms on Knn aresubmultiplicative, i.e.
A B A B, A, B Knn
proof:
ABxx =
ABxx
BxBx =
ABxBx
Bxx
y=Bx=
Ayy
Bxx A B
Definition: Let A be a square matrix: Then
(A) = max{|| : eigenvalue of A}
is called the spectral radius of A
-
7/27/2019 Analisis Libro 1
13/99
CHAPTER 2. VECTOR AND MATRIX NORMS 12
theorem The spectral radius (A) is the smallest of all submultiplicative matrix
norms onKnn
, i.e., A (A)for all matrix norms .
proof: See Oevel, page 158.
Remark: (A) = A2 for symmetric matrices A.
2.2 Error estimates
Let us consider the effects of computational errors in solving a system of linear
equations Ax = b
a-posteriori error Let xnum be the numerical solution of Ax = b
Define the defect d = Axnum b
The error r = x xnum satisfies Ar = d.
Let
x
be a vector norm and
A
the induced matrix norm. Then
(1) Ar = d d A r r A1d
(2) r = A1d r A1 d
2.2.1 Condition number of a matrix
We have
A1 d x xnum A1 dbut it is better to use the relative error
1K
db x xnumx Kdb
where K = A A1 = cond(A) (i.e. divid here by A1b = x):
Definition: cond(A) is called the condition number of A.
Remarks:
(1) 1 AA1 A A1 = cond(A).
-
7/27/2019 Analisis Libro 1
14/99
CHAPTER 2. VECTOR AND MATRIX NORMS 13
(2) cond(A) is a measure of the degree of invertibility of A
a measure of the quality of the problem
(3) A orthogonal condition number w.r.t. 2 always has the minimalvalue 1.
(4) 1A1 =
maxy=0A1y
y1
y=Ax= minx=0
Axx = minx=1 Ax
cond (A) =maxx=1
Axmin
x=1
Ax .
2.2.2 Fixed point theorem and successive iterations
A contraction is a mapping f : Rd Rd withf(x) f(y) Kx y x, y Rd,
for a constant K < 1.
theorem LetT be a dd-matrix andc Rd. The linear mapping(x) = T x + c is a contraction if and only if T < 1.
proof
T = maxz=0
T zz = maxx=y
T x T yx y .
theorem The successive iterations x(i+1) = T x(i) + c converge
for all x(0) if and only if (T) < 1.
A fixed point theorem ensures that a mapping (x) has a unique fixed point,
i.e. a point x Rd with x = (x) and that the successive iterations x(i+1)= (x(i)) converge to this fixed point. In the fixed point theorem of Banach the
mapping is a contraction.
We use the following estimates for the iterative solution of a system of linearequations Ax = b, which we rewrite as
x = T x + c
e.g. write x = (I A)x + b, so T = I A but there are other and betterpossibilities (later!)
a priori error (what we know at the start)
x(i) x Ti
1 T
x(1) x(0)
-
7/27/2019 Analisis Libro 1
15/99
CHAPTER 2. VECTOR AND MATRIX NORMS 14
a posteriori error (what we know in the course of a calculation)
x(i) x T1 T
x(i) x(i1)We use these estimate in particular for a stop command let > 0 be the
desired precision and i = i() the first whole number for which the right hand
side above is smaller than or equal to . If we stop after i() iterations, then we
have (at least) the precision .
-
7/27/2019 Analisis Libro 1
16/99
Chapter 3
Linear systems of equations
Literatur Oevel, Kap. 5; Schwarz Kap.1; Stummel/Hainer, Kap. 6
Consider a linear systems of equations with n equations and n unknowns
a1,1x1 + . . . + a1,nxn = b1... . . . . . .
... =...
an,1x1 + . . . + an,nxn = bn
or in matrix-vector formAx = b,
where A = [ai,j ] is an n n invertible matrix and
x =
x1...
xn
, b =
b1...
bn
are n-dimensional vectors.This system has a unique solution
x = A1b,
which we can represent explicitly using Cramers rule, i.e. with
A1 =1
det(A) adjunct matrix (A)T
n2 determinants
The Cramer formula is not very practical as a solution method and is often
almost impossible to use
det(A) =p
sign(p) a1,p1 . . . an,pn ,
15
-
7/27/2019 Analisis Libro 1
17/99
CHAPTER 3. LINEAR SYSTEMS OF EQUATIONS 16
where the summation is over all permutations p = (p1, . . . , pn) of{1, 2, . . . , n}.There are n! permutations, so we need 0(n!) arithmetic operations, e.g.
100! 9 10157
Such a summation is often numerically unstable due to possible cancellation
errors.
3.1 Gaussian elimination
A practical alternative is the Gaussian elimination method, by means of which
we convert the original system of equations to an easily solved triangular systemusing successive linear transformations
(1)
a1,1x1 + a1,2x2 + . . . + a1,nxn = b1a2,1x1 + a2,2x2 + . . . + a2,nxn = b2...
an,1x1 + an,2x2 + . . . + an,nxn = bn
Let a1,1 = 0. Then we eliminate x1 from the last n 1 equations with thelinear transformation
ai,j
ai,j = ai,j
ai,1a1,1 a1,j
j = 1, . . . , n
bi bi = bi ai,1a1,1 b1
forr i = 2, . . ., n
i.e. we subtractai1a11
times the first equation from the equations for i =
2, . . . , n.
Then we obtain the equivalent system of equations (i.e. with the same
solution):
(2)
a1,1x1 + a1,2x2 + . . . + a1,nxn = b1
a2,2x2 + . . . + a2,nxn = b
2
...
an,2x2 + . . . + an,nxn = b
n
We repeat the procedure for the last n1 equations with the n1 unknownsx2, . . ., xn (under the assumption that a
2,2 = 0) and so on for the last n
equations with the n unknowns xn, . . ., xn for = 1,. . .,n 1.
After the final step we obtain an equivalent triangular system of the form
-
7/27/2019 Analisis Libro 1
18/99
CHAPTER 3. LINEAR SYSTEMS OF EQUATIONS 17
a(n)1,1x1 + a(n)1,2x2 + a(n)1,3x3 + . . . + a(n)1,nxn = b(n)1a(n)2,2x2 + a
(n)2,3x3 + . . . + a
(n)2,n = b
(n)2
a(n)3,3x3 + . . . + a
(n)3,nxn = b
(n)3
...
a(n)n,nxn = b
(n)n
which we can solve through backwards substitution.xn = b
(n)n /a
(n)n,n
xi =
b(n)i
nj=i+1 a
(n)i,j xj
/a
(n)i,i
for i = n
1, . . . , 1.
For this we need
n1i=1
nj=i+1
1 =
n1i=1
(n i) = n(n 1)2
additions/subtractions and
1 +
n1i=1
1 + nj=i+1
1
= 1 + n1i=1
(n i + 1) = n(n + 1)2
multiplications/divisions.
in total n2 operations.
The Gaussian elimination method needs
n1j=1
ni=j+1
1 + n+1k=j+1
1
= 16
(n 1)n(2n + 5) 0(n3)
multiplications/divisions and a similar number of additions/subtractions.
Thus for a large n Gaussian elimination with backwards substitution needs
0(n3) arithmetic operations.
Compare with 0(n!) for Cramers formula
3.1.1 Summary
Write Ax = b as A(1)x = b(1), i.e. with a(1)i,j ai,j and b(1)i bi.
We have replaced A(1)x = b(1) by successively simplified systems:
-
7/27/2019 Analisis Libro 1
19/99
CHAPTER 3. LINEAR SYSTEMS OF EQUATIONS 18
A(1)
x = b(1)
A(2)x = b(2)
A(3)x = b(3)
A(k)x = b(k)
=
=
=
=
using the following linear transformations
a(+1)i,j a()i,j i = 1, . . . , , j = 1, . . . , n + 1
and
a
(+1)
i,j = a
()
i,j a()i,
a(), a
()
,j , i = + 1, . . . , n, j = 1, . . . , n + 1
where we have written a()i,n+1 = b
()i .
In fact we only have to calculate the a(+1)i,j with i + 1 for j + 1,
because the other components are all equal to 0 or will become equal to 0.
3.1.2 Row interchange
Above we have asumed that a(), = 0 fur = 1, . . ., n. This does not always
hold, but (due the assumption that det(A) = 0) we can always exchange theth row for which a(
), = 0 with a row under it with a
()j, = 0 where j >
For simplicity we will assume for now that a row interchange is not necessary.
3.2 Formulation as matrix multiplication
We can represent the elimination procedure by matix multiplication
A(k+1) = F(k)A(k)
-
7/27/2019 Analisis Libro 1
20/99
CHAPTER 3. LINEAR SYSTEMS OF EQUATIONS 19
where
F(k) =
1 1
1
k+1,k 1...
n,k 1
(the other components are all equal to 0) with
i,j =a(j)i,j
a
(j)
j,j
, i = j, . . . , n , j = 1, . . . , n 1.
Thus we have
A(n) = F(n1) . . . F (1)A(1)
or
A(1) = F(1)1
. . . F (n1)1
A(n) invertible since det(F(k)) = 1
But
F(k)1
=
1 1
1
k+1,k 1...
n,k 1
where the other components are all equal to 0, and
L := F(1)1
. . . F (n1)1
=
1
2,1 13,1 3,2 1
......
.... . .
. . . 1 n,1 n,2 n,3 . . . n,n1 1
a lower triangular matrix,
i.e. A = LU, where
-
7/27/2019 Analisis Libro 1
21/99
CHAPTER 3. LINEAR SYSTEMS OF EQUATIONS 20
R = U = A(n) =
a(n)1,1 a
(n)1,2 . . . a
(n)1,n
a(n)2,2 . . . a
(n)2,n
. . ....
a(n)n,n
an upper triangular matrix.
(German: Links-/Rechts- dreiecksmatrizen with A = LR)
This LU decomposition has many advantages
(1)
det(A) = det(L)det(U) = 1.a(n)1,1a
(n)2,2 . . . a
(n)n,n
(2) We can quickly solve Ax = b for all b without having to repeat the elimi-
nation procedure:
b = Ax = LU x = Ly with U x = y
solve (1) Ly = b (forwards substitution)
then (2) U x = y (backwards substitution)
Here forwards substitution for Ly = b means
y1 = b1, yi = bi i1j=1
i,j yj , i = 2, . . . , n
Forwards and backwards substitution both need O(n2) arithmetic opera-
tions.
(3) To calculate the inverse matrix A1
we solve the n systems of equations
Ax(j) = e(j) jth unit vector, j = 1, . . . , n
i.e. Ly(j) = e(j), then U x(j) = y(j)
A1 = [x(1)| . . . |x(n)].
This method needs 0(n3) arithmetic operations.
-
7/27/2019 Analisis Libro 1
22/99
Chapter 4
The LU decomposition
Literatur Oevel, Kap. 5.5; Schwarz Kap.1.1; Stummel/Hainer, Kap. 6.1
Consider an n n invertible matrix A = [ai,j ]. If row interchange is notnecessary, the Gaussian elimination procedure
A = A(1) A(2) A(n)
leads to the following LU decomposition:
A = LU =
1 2,1 1
3,1 3,2. . .
. . . . . . . . . 1
n,1 n,2 . . . n,n1 1
a(n)1,1 . . . a
(n)1,n
a(n)2,1 . . . a(n)2,n
. . ....
a(n)n,n
where the i,j are defined by
i,j =a(j)i,j
a(j)j,j
, i = j + 1, . . . , n, j = 1, . . . , n 1.
theorem The LU decomposition is unique.
proof: Let L1, U1 and L2, U2 be two LU decompositions of the matrix A. Then
A = L1U1 and A = L2U2
and therefore
L1U1 = L2U2
or
L12 L1 = U2U11
21
-
7/27/2019 Analisis Libro 1
23/99
CHAPTER 4. THE LU DECOMPOSITION 22
But L1, L2 and therefore L12 , L1 and L
12 L1 are lower triangular matri-
ces. Similarly, U1, U2 and therefore U1
1 , U2 and U2U1
1 are upper triangularmatrices. Their products can only be equal when they are diagonal matrices, i.e.
L12 L1 = U2U11 = D =
d1,1
d2,2. . .
dn,n
But L1, L2 and therefore L
12 , L1 and L
12 L1 all have 1 as their diagonal
components. Thus D = I, the n n identity matrix, i.e.
L12 L1 = I = U2U11
or L1 = L2 und U1 = U2. Hence the LU decomposition is unique.
In the kth step A(k) A(k+1) of the elimination process we apply the lineartransformations
a(k+1)i,j a(k)i,j , i = 1, . . . , k, j = 1, . . . , n
and
a(k+1)i,j = a
(k)i,j i,ka(k)k,j , i = k + 1, . . . , n, j = 1, . . . , n
But the new a(k+1)i,j are all equal to 0 for j = 1, . . ., k + 1 when i = k + 1,
. . ., n. Thus we can replace these components immediately by 0 without doing
a calculation (which may have a round off error). Then the corresponding ijth
components are always equal to 0 in the following steps.
Thus it is not necessary to store these 0 components of the matrix. In-
stead we can store the i,j components in these free places. (We do not have
to store the diagonal components since we know that they are equal to 1).
In the following steps of the elimination procedure these i,j components are
not transformed.
Consequently, we need to store only one n
n matrix instead of two.
A(1) A(2)
A(3)
l2,1l2,1
ln,1ln,1
l3,2
ln,2
-
7/27/2019 Analisis Libro 1
24/99
CHAPTER 4. THE LU DECOMPOSITION 23
4.1 Row interchange
An interchange of the th and jth rows with j > corresponds to a matrix
multiplication by the permutation matrix
P =
1
1
0 1 () 1
1 1 0 (j)
1
1
where det(P) = 1.
If a(k)k,k = 0 then we can find such a permutation matrix P
(k). We replace
A(k) by P(k)A(k) and continue with the elimination procedure. If a(k)k,k = 0, then
we take P(k) I. At the end we obtain an LU decomposition
LU = P A
where P = P(n1) . . . P (2)P(1) with
det P = (1)A , A = #{k : P(k) = I}
det(A) = (1)A a(n)1,1a(n)2,2 . . . a(n)n,n.
Thus in the derivation of the LU decomposition we have to exchange the whole
row
[i,1, . . . , i,k1, a(k)i,k , . . . , a
(k)i,n ]
if a row interchange is necessary. In this case we should introduce a permutation vector
p(k) =
p(
k)1
...
p(k)n
in order to retain a list of the corresponding row ordering, i.e. elimination with
row interchange gives
-
7/27/2019 Analisis Libro 1
25/99
CHAPTER 4. THE LU DECOMPOSITION 24
a(1)ij a
(2)ij
a(n)ij
A
(1)
= A A
(2)
A(n)
= R
p(1)
p
(2)
p
(n)
after interchange, elimination
1
2
n
li1
lij
......
...
p(2)1
p(2)2
p(2)n
p(n)1
p(n)2
p(n)n
with the permutation vectors
p(1) =
1
2...
n
, p(2) =
p(2)1
p(2)2...
p(2)n
, . . . , p(n) =
p(n)1
p(n)2...
p(n)n
At the end we define a permutation matrix P = [pi,j ] by
pi,j =
1 if j = p
(n)i
0 otherwise
(This permutation matrix and the permutation matrix P = P(n1) . . . P (2)P(1)
defined above are the same).
This way we obtain
P A = LU
with
L =
1
2,1 1...
. . .
n,1 n,2 ... 1
and U =
a(n)1,1 . . . a
(n)1,n
a(n)2,2 . . . a
(n)2,n
. . ....
. . . a(n)n,n
Example: Oevel, page 115, example 5.6
A =
0 0 1 1
2 2 2 2
1 2 2 2
1 2 3 6
, P =
0 1 0 0
0 0 1 0
1 0 0 0
0 0 0 1
P A =
2 2 2 2
1 2 2 2
0 0 1 1
1 2 3 6
= LR =
1 0 0 0
1/2 1 0 0
0 0 1 0
1/2 1 1 1
2 2 2 2
0 1 1 1
0 0 1 1
0 0 0 3
-
7/27/2019 Analisis Libro 1
26/99
CHAPTER 4. THE LU DECOMPOSITION 25
proof
A =
0 0 1 1
2 2 2 2
1 2 2 2
1 2 3 6
(p|A) =
1 0 0 1 1
2 2 2 2 2
3 1 2 2 2
4 1 2 3 6
P
2 2 2 2 2
1 0 0 1 1
3 1 2 2 2
4 1 2 3 6
E
2 2 2 2 2
1 0 0 1 1
3 1/2 1 1 1
4 1/2 1 2 5
P
2 2 2 2 2
3 1/2 1 1 1
1 0 0 1 1
4 1/2 1 2 5
i,j
E
2 2 2 2 2
3 1/2 1 1 1
1 0 0 1 1
4 1/2 1 1 3
E
2 2 2 2 2
3 1/2 1 1 1
1 0 0 1 1
4 1/2 1 1 3
i.e., with the combined permutation
p(1) =
1
2
3
4
p(2) =
2
3
1
4
and hence
P =
0 1 0 0
0 0 1 0
1 0 0 0
0 0 0 1
LU =
1 0 0 0
1/2 1 0 0
0 0 1 0
1/2 1 1 1
2 2 2 2
0 1 1 1
0 0 1 1
0 0 0 3
P A =
2 2 2 2
1 2 2 2
0 0 1 1
1 2 3 6
= LU
-
7/27/2019 Analisis Libro 1
27/99
CHAPTER 4. THE LU DECOMPOSITION 26
Here det P = 1, so
det A = det Pdet A = det P A = det LU = det L det U = 1 (2 1 1 3) = 6.
(This provides a quick and accurate way to calculate the determinant of the
matrix A)
4.2 Pivoting
Consider the elimination step A(k) A(k+1). If we assume that a(k)k,k = 0, thenwe use a
(k)k,k as the pivot element in the elimination process.
If a(k)k,k = 0, then we swap the kth row with the jth row for some j > k.
Then we use the the new component a(k)k,k (in fact the old component a
(k)j,k = 0)
as the pivot element.
How should we choose j ?
In addition, we can also encounter difficluties due to round off error when
a(k)k,k = 0, especially when it is very small or very large. In this situation it is
also useful to look for a new pivot element.
There are various pivoting strategies, which involve column as well as row
interchanges, for choosing an appropriate pivot element a(k)p,q with p, q k.
Such strategies are usually very expensive to use. (See text books for some
examples).
4.3 Post iteration
Literatur Schwarz: Seite 27-29, Stummel/Hainer: Seite 118-119
Consider the system of equations
Ax = b, A is n n and invertible
Let x be the exact solution, i.e. Ax b, and let x be the numerical solutionwith the defect or residue
d = Ax b (= 0)Then the error r = x x satisfies system of equations
Ar = d,
-
7/27/2019 Analisis Libro 1
28/99
CHAPTER 4. THE LU DECOMPOSITION 27
i.e. with the same matrix A.
We can solve this system quickly because we have already determined the
LU decomposition of the matrix A.
Method of post iteration
Compute:
(i) the LU decomposition of A
(ii) a numerical solution x
(iii) the defect d = Ax b
(iv) a numerical solution r of Ar = d with a higher precision
Then x + r should be a better approximation of the exact solution x than x.
Example: See Example 1.7 in Schwarz.
We can use the error estimate
A1 b x A1 b
from Chapter 2 for the system Ax = b, where the vector and matrix norms are
consistent.
Now consider the system of equations Ar = d. Then we have the following
estimate of the absolute error
A1 d x x r
A1 d
and the relative error
1
K
db
x xx K
db ,
where K = A A1 is the condition number of the matrix A. The termx is unknown, but x is known and is often an adequate approximation forit in this relative error expression.
-
7/27/2019 Analisis Libro 1
29/99
CHAPTER 4. THE LU DECOMPOSITION 28
4.4 The LUdecomposition of the transposed ma-
trix
Consider an n n invertible matrix A = [ai,j ] and suppose that we can applyGauss elimination without row interchange, i.e.
A(1) A(k) A(k+1) A(n)
with the linear transformations
a
(k+1)
i,j a(k)
i,j , i = 1, . . . , k, j = 1, . . . , n
and
a(k+1)i,j = a
(k)i,j i,ka(k)k,j , i = k + 1, . . . , n, j = 1, . . . , n
where
i,j =a(j)i,j
a(j)j,j
, j = 1, . . . , n 1, i = j + 1, . . . , n
(No row interchange means that all a(j)j,j = 0.)
Then we obtain
A = LU =
1
2,1 1...
. . .
n,1 n,2 . . . n,n1 1
a(n)1,1 . . . . . . a
(n)1,n
a(n)2,2 . . . a
(n)2,n
. . ....
a(n)n,n
Moreover, this LU decomposition is unique!
Thus we have
AT = (LU)T = UTLT
=
a(n)1,1
a(n)1,2 a
(n)2,2
......
. . .
a(n)1,n a
(n)2,n . . . a
(n)n,n
1 2,1 . . . n,1
1 . . . n,2. . .
n,n1 1
Define
-
7/27/2019 Analisis Libro 1
30/99
CHAPTER 4. THE LU DECOMPOSITION 29
D =
a(n)1,1
a(n)2,2
. . .
a(n)n,n
All a
(n)j,j = 0 D1 exist with
D1 =
1/a(n)1,1
1/a(n)2,2
. . .
1/a(n)n,n
Thus we obtain
AT = UTLT = (UTD1)(DLT)]
= (D1U)T(LD)T
=
1 1
. . .
1
. . .
= LRwith L = (D1U)T and U = (LD)T, i.e. the LU decomposition of AT is
AT = LU.
-
7/27/2019 Analisis Libro 1
31/99
Chapter 5
Matrices with a special
structure
Literatur Oevel, Kap. 5; Schwarz Kap.1; Stummel/Hainer, Kap. 5
The Gaussian elimination method is often a lot easier and more efficient
when the matrices have a special structure, e.g.
symmetric
band positive definite
diagonally dominant
Such matrices often arise in various applications, e.g. in numerical methods
for splines or partial differential equations.
5.1 Symmetric matrices
A symmetric matrix satisfies AT = A, i.e. ai,j = aj,i, i, j = 1, . . . , n.
5.1.1 Gaussian elimination without row interchange
The computational cost for Gaussian elimination, i.e. the LU decomposition,
for symmetric matrices is reduced by roughly a half.
This is easy to see when row interchange is not needed. Consider
30
-
7/27/2019 Analisis Libro 1
32/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 31
a1,1x1 + a1,2x2 + . . . + a1,nxn = b1a2,1x1 + a2,2x2 + . . . + a2,nxn = b2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
an,1x1 + an,2x2 + . . . + an,nxn = bn
Using the linear transformations
ai,j = ai,j ai,1a1,1
a1,j, i = 2, . . . , n, j = 1, . . . , n + 1 (bi ai,n+1)
we obtain the reduced system of equations
a1,1x1 + a1,2x2 + . . . + a1,nxn = b1a2,2x2 + . . . + a2,nxn = b
2
...
an,2x2 + . . . + an,nxn = b
n
The last (n 1) (n 1) block here is also symmetric, because
aj,i = aj,i aj,1a1,1
a1,i = ai,j a1,ia1,1
aj,1 = ai,j a1,ia1,1
a1,j = ai,j
fur i, j = 2, . . . , n.
This means that we need to evaluate the components ai,j only for i = 2, . . . , nand j = i , . . . , n the other values then follow immediately by symmetry aj,i= ai,j .
If a row interchange is necessary, then at the same time we should also ex-
change the corresponding columns in order to retain the symmetry of the new
(n k 1) (n k 1) reduced matrix.
In this case we have
LU = P APT
(instead of = P A)
See Oevel: Example 5.7, page 120.
5.1.2 The LDLT and Cholesky decompositions
Let A be symmetric, i.e. AT = A or ai,j = aj,i for all i, j = 1, . . . , n. Then we
have
LU = AT = A = LU.
-
7/27/2019 Analisis Libro 1
33/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 32
i.e. A = LU =
L
U
Let D be the diagonal matrix D = diag(a1,1, . . . , an,n).Then
AT = (LU)T = UTLT = UTD1DLT = (D1U)T(DLT) = LU,where D1U is an upper triangular matrix with 1 along the diagonal and DLT
is also an upper triangular matrix .
The LU decomposition of A is unique
L = L or (D1U)T = L and
U = U or DLT = U
U = DLTThus the LU decomposition of A is
A = LU = LDLT
This is the LDLT decomposition for this we assume that A is invertible and
symmetric, and that Gaussian elimination goes through without requiring any
row interchange.
Suppose in addition that all a(n)j,j > 0 and define
D =
a(n)1,1 a(n)2,2
. . .
a(n)n,n
and
L = L
D =
1,1 2,1 2,2
......
. . .
n,1 n,2 . . . n,n
i.e. with
j,j =
a(n)j,j , j = 1, . . . , n
i,j = i,j
a(n)j,j j = 1, . . . , n 1, i = j + 1, . . . , n
Then we have
A = LDLT = (L
D)(
DLT) = (L
D)(L
D)T = LLT
-
7/27/2019 Analisis Libro 1
34/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 33
i.e. A = LLT this is called the Cholesky dcomposition of A
Here A is
invertible symmetric Gaussian elimination goes through without requiring any row interchange,
i.e. all a(n)j,j > 0.
When do these properties all hold?
5.2 Positive definite symmetric matrices
A symmetric matrix A = [ai,j ] Rnn is said to be positive definite if thequadratic form
Q(x) = xT Ax =n
i,j=1
ai,jxixj , x =
x1...
xn
is positive, i.e. if
(1) Q(x) 0 for all x Rn and(2) Q(x) = 0 if and only if x = 0
Remark: let A = [ai,j ] be positive definite. Then
(i) A is invertible
(ii) all aj,j > 0, j = 1, . . . , n
proof:
(1) Ax = 0 with x = 0 xTAx = 0 contradiction!
(ii) Q(e(j)) = aj,j > 0, wheree(j) =
0
1
0
jth element.
We will now assume that the matrix A = [ai,j ] positive definite.
From Linear Algebra we know that the eigenvalues of A are real and positive.
-
7/27/2019 Analisis Libro 1
35/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 34
However, eigenvalues are difficult to calculate.
We want to be able to carry out Gaussian elimination without row inter-
change.
theorem (See Schwarz, Theorem 1.8, page 40)
A symmetric matrix A = [ai,j ] Rnn with a1,1 > 0 is positivedefinite if and only if the reduced matrix [ai,j ] R(n1)(n1) with
ai,j = ai,j ai,1a1,1
a1,j, i, j = 2, . . . , n
is positive definite
proof: By assumption we have a1,1 > 0 and ai,j = aj,i, i, j = 1, . . . , n
Q(x) = ni,j=1 ai,jxixj= a1,1x
21 + 2
ni=2 ai,1x1xi +
ni,j=2 ai,jxixj
= a1,1
x1 +
ni=2
ai,1a1,1
xi
2+n
i,j=2
ai,j ai,1
a1,1a1,j
xixj
= a1,1 x1 +ni=2
ai,1a1,1
xi2
+ni,j=2 a
i,jxixj
(a) Necessary Suppose that A = [ai,j ] is positive definite. Consider a vectorx = (x2, . . . , xn) = (0, . . . , 0) in Rn1 and setx1 =
ni=2
ai,1a1,1
xi.
Then for x = (x1|x) := (x1, x2, . . . , xn) = (0, 0, . . . , 0) in Rn we have0 < Q(x) = a1,1 02 +
n
i,j=2ai,jxixj
i.e. 0 0)
It follows from (i) that x2 = x3 = . . . = xn = 0.
Then (ii) gives
a1,1x21 = 0,
but a1,1 > 0, so x1 = 0, i.e. x = (0, . . . , 0) Rn.
Hence A is positive definite.
From this theorem and the above remarks we can use Gaussian elimination
without row interchange when the matrix A = [ai,j ] is positive definite.
Moreover,
a(j)j,j = a
(n)j,j > 0
holds for each j = 1, 2, . . . , n.
Thus the matrix A = [ai,j ] has a Cholesky decomposition
A = LLT.
Instead of computing L byL = L
D as the product of matrices, we can
calculate the i,j components directly with the following transformations:
Algorithm for the Cholesky decomposition
Define a(1)i,j ai,j , i, j = 1, 2, . . . , n.
Compute for k = 1, 2, . . . , n 1(1) k,k =
a(
k)k,k
(2) i,k =a(k)i,k
k,k, i = k + 1, . . . , n
(3) a(k+1)i,j = a
(k)i,j i,k j,k, i, j = k + 1, . . . , n
N.B. The a(k)i,j components for k > 1 here are not the same as those in the
corresponding step of the Gaussian elimination procedure.
-
7/27/2019 Analisis Libro 1
37/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 36
5.3 Diagonally dominant matrices
The Gaussian elimination procedure is possible without row interchange for di-
agonally dominant matrices.
A matrix A = [ai,j ] Rnn is called diagonally dominant ifnj=1
j=i
|ai,j | |ai,i| , i = 1, . . . , n ,
holds.
Example: 4
4 matrix
A =
2 1 0 0
1 2 1 00 1 2 10 0 1 2
theorem (See Stummel/Hainer, Theorem 12, page 122)
If A = [ai,j ] Rnn is invertible and diagonally dominant, thenthe reduced matrix [ai,j ] R(n1)(n1) with
ai,j = ai,j ai,1a1,1 a
1,j , i, j = 2, . . . , n ,
is invertible and diagonally dominant with no diagonal component
equal to 0.
proof: If a diagonal element ai,i of A is eqal to zero, then the ith row of A
consists entirely of zeros which is not possible, because A is invertible.
In particular, we have a1,1 = 0.
Then for i = 2, . . . , n we have
nj=2
j=i
ai,j = nj=2
j=i
ai,j ai,1a1,1 a1,j
nj=2
j=i
|ai,j | + |ai,1||a1,1|nj=2
j=i
|a1,j|
(|ai,i| |ai,1|) + |ai,1||a1,1| (|a1,1| |a1,i|)
-
7/27/2019 Analisis Libro 1
38/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 37
=
|ai,i
| ai,1
a1,1
a1,i
a1,i ai,1a1,1a1,i
=ai,i
[ai,j ] is diagonally dominant. This matrix is also invertible, because
det A = a1,1 det[ai,j ] = 0
and a1,1 = 0.
Then, as above, we must have all ai,i = 0.
If the matrix A is diagonally dominant and symmetric, then A has a LDLT
decomposition,
A = LDLT,
because all di,i = a(i)i,i = 0.
5.4 Band matrices
Literatur Oevel, Kap. 5.7; Schwarz, Kap. 1.3.2; Stummel/Hainer, Kap. 6.2.3
Let n 1 and 0 p, q n 1.
An n n matrix A = [ai,j ] is called a band matrix of band type (p, q), if
ai,j = 0 for j < i p and j > i + q i = 1, 2, . . . , n
+ . . . + + . . . +... ... ... . . . . . . . . .
. . . . . . . . . +. . .
. . .. . .
. . .. . .
...
. . . + . . .
q upper minor diagonals: components denoted by +
-
7/27/2019 Analisis Libro 1
39/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 38
diagonal: components denoted by
p lower minor diagonals: components denoted by
The ai,j in the band are arbitrary (0 is allowed), but the other ai,j, which
are not in the band are all equal to 0.
The number 1 + p + q is called the band width of the band matrix of band
type (p, q).
Examples
(1) diagonal matrix (0, 0)
D =
d1,1
d2,2. . .
dn,n
(2) lower triangular matrix (n 1, 0)
L = 1,1
2,1 1,2.... . .
n,1, . . . . . . n,n
(3) upper triangular matrix (0, n 1)
U =
r1,1 r1,2 . . . r1,n
r2,2 r2,n. . .
...
rn,n
(4) upper Hessenberg matrix (1, n 1)
(5) tridiagonal matrix (1, 1)
If such a band matrix is also positive definite and symmetric or diagonally
dominant, then we can use Gaussian elimination without row interchange. The
reduced matrices are band matrices of the same type, etc.
-
7/27/2019 Analisis Libro 1
40/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 39
theorem (See Stummel/Hainer, Theorem 26, page 127).
Let A = [ai,j ] be a band matrix of band type (p, q), for which the
elimination procedure goes through without row interchange.
Then all of the reduced matrices are band matrices of the same
type (p, q) and the factors L, U of the LU decomposition of A
band triangular matrices of band types (p, 0), (0, q).
proof: Consider the first step of the elimination process
A aijaij
with the linear transformations
a1,j = a1,j, j = 1, . . . , n
and
ai,j = ai,j ai,1a1,1
a1,j , i, j = 2, . . . , n
Let i 2 and j > i + q
ai,j = 0 and a1,j = 0 for j > i + q > 1 + q
ai,j = 0
Let i 2 and 2 j < i p
ai,j = 0 and ai,1 = 0
ai,j = 0 for 1 < j < i p
Together:
the reduced (n 1) (n 1) matrix [ai,j ] with i, j = 2, . . . , n is a bandmatrix of band type (p, q).
The other elimination steps follow successively by induction: A(k1) A(k)with
a(k)i,j a(k1)i,j , i = 1, . . . , k 1, j = 1, . . . , n
a(i)i,j , i = 1, . . . , k 1, j = i , . . . , n
-
7/27/2019 Analisis Libro 1
41/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 40
and a(k)i,j = 0 for j > i + q and k j < i p, if i k.
But U = A(n) = [a(n)i,j ] with
a(n)i,j a(i)i,j , i = 1, . . . , n, j = i , . . . , n
a(n)i,j = 0 fur j > i + q, i = 1, . . . , n
U is a band matrix of type (0, q).
In addition,
i,k =a(
k)i,k
a(k)k,k
for i = k + 1, . . . , n and k = 1, . . . , n 1.
But a(k)i,k = 0 for j = k < i p with i k, i.e. for i > k+p and k = 1, . . . , n1
i,k = 0 for k < i p and k = 1, . . . , n 1, i.e.
L =
1 2,1 1
......
. . .
n,1 n,2 . . . n,n1 1
is a band matrix of type (p, 0).
The elimination process needs
n1j=1
min(j+p,n)i=j+1
1 + min(j+q1,n)k=j+1
1
multiplications/divisions and a similar number of additions/subtractions. From
this expression we see that
npq p
6(p2 + 3q2 + 3(p q) + 2) p q
npq q6
(q2 + 3p2 + 3(qp) + 2) q pmultiplications/divisions.
Fo p, q n O(npq)
For fixed p and q this order estimate depends only linearly on n.
Compare with 0(n3) for a general matrix, i.e. with p = q = n 1
-
7/27/2019 Analisis Libro 1
42/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 41
5.5 Tridiagonal matrices
Literatur Schwarz, 1.3.3; Stummel/Hainer, 6.2.4
A tridiagonal matrix is a band matrix of band type (1, 1).
T =
t1,1 t1,2 t2,1 t2,2 t2,3
t3,2 t3,3. . .
. . . . . .
If Gaussian elimination is possible without row interchange, then L is a band
matrix of type (1, 0) and U is a band matrix of type (0, 1). We replace the entire
process by a simple recursion transformation.
Write:
T =
a1 b1 c1 a2 b2
c2 a3 b3. . .
. . .. . .
cn2 an1 bn1 cn1 an
Define d1 = a1 and then compute
dj = aj +pjbj1j = 2, . . . , n
pj = cj1dj1
The LU decomposition of T is given by
L =
1 p2 1
p3 1. . .
. . .
pn1 1 pn 1
, U =
d1 b1 d2 b2
. . .. . .
dn1 bn
1
dn
If A is also a symmetric matrix, then A has an LDLT decomposition.
A = LDLT
D =
d1
d2. . .
dn
-
7/27/2019 Analisis Libro 1
43/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 42
Example:
T =
2 1
1 2 11 2 1
. . .. . .
. . .
i.e. with
aj 2 for j = 1, . . . , n and bj cj 1 for j = 1, . . . , n 1T is symmetric: TT T
diagonally dominant
| 1| |2|
| 1| + | 1| |2|Define d1 = a1 = 2
dj = aj +pjbj1j = 2, . . . , n
pj = cj1dj1
dj = 2 1dj1
dj = j + 1j
pj = 1dj1
= 1/ (j 1) + 1j 1 = j 1j
i.e.
dj =j + 1
j, j = 1, . . . , n
and
pj =j 1
jj = 2, . . . , n
Then we have
L =
1
12
1
23
1
. . .. . .
n1n
1
and
U =
2 1 32
143
1. . .
. . .n
n1 1 n+1
n
-
7/27/2019 Analisis Libro 1
44/99
CHAPTER 5. MATRICES WITH A SPECIAL STRUCTURE 43
or for the LDLT decomposition L as above
D =
2
32
43
. . .
n+1n
A Cholesky decomposition A = LLT with L = LD(1/2) is also possible,
because the matrix A is positive definite.
-
7/27/2019 Analisis Libro 1
45/99
Chapter 6
The QR decomposition
Literatur Oevel, Kap. 5.9
Consider an invertible nn matrix A = [ai,j ]. IfA has an LU decompositionA = LU, then we can solve the system of equations
Ax = b
quickly by applying forwards/backwards substitution to the simpler triangular
systems
Ly = b, U x = y.
There are other systems which are quickly solvable, e.g. systems with an
orthogonal coefficient matrix Q, i.e., for which QTQ = I Q1 = QT. Thenwe have
Qx = b x = QTb one matrix/vector multiplication
The QR decomposition of an invertible matrix A is
A = QR,
where Q is an orthogonal matrix and R is an upper triangular matrix.
Ax = b QRx = b Rx = QTb
i.e. with one matrix-vector multiplication, then backwards substitution.
Moreover this method is numerically stabler then the corresponding method
with the LU decomposition in the sense that the upper or right triangular sys-
tem is not worse conditioned than the original system.
44
-
7/27/2019 Analisis Libro 1
46/99
CHAPTER 6. THE QR DECOMPOSITION 45
theorem LetA = QR. Then
cond2 (R) = cond2 (A)
proof: We have
A2 = maxx=0
Ax2x2
with the euclidian norm x2 =
xTx.
Thus
R2 = QTA2 because R = Q1A = QTA
= maxx=0
QT
Ax2x2
= maxx=0
xT(QTA)T(QTA)x
x2
= maxx=o
xTATQQTAx
x2
= maxx=0
xTATAx
x2 because QQT = I
as well as
R12 = A1Q2
= maxx=0
A1Qx2x2
= maxy=Qx
x=0
A1y2y2 because y
Ty = xTQTQx = xTx
= maxy=0
A1y2y2
= A12From this we obtain
cond2(R) = R2 R12
= A2 A12 = cond2(A).
Remarks:
(1) cond2(Q) = 1 for an orthogonal matrix Q,
-
7/27/2019 Analisis Libro 1
47/99
CHAPTER 6. THE QR DECOMPOSITION 46
(2) for an LU decomposition we often have
cond2(U) cond2(A)
(3) the QR decomposition is unique up to a multiplication Q = QD, R = DR
by a diagonal matrix
D = diag(1, 1, . . . , 1).
(See Oevel, Theorem 5.12, page 132.)
We will construct the factors Q and R with the help of Householder matrices.
6.1 Householder matrices
The Householder matrix H = H(v) of a vector
v =
v1...
vn
Rn \ {0}is the n n matrix defined by
H = H(v) = I 2vTv
vvT
wher
I = n n identity matrix
vTv =n
i=1 v2i = |v|22 scalar
vvT = n n matrix [vivj ] .Example:
v = 12 R2 H(v) = 1 00 1 25 1 1 1 22 1 2 2 =
1 2/5 4/54/5 1 8/5
=
3/5 4/5
4/5 3/5
theorem A Householder matrix is symmetric and orthogonal.
-
7/27/2019 Analisis Libro 1
48/99
CHAPTER 6. THE QR DECOMPOSITION 47
proof Write H = H(v) = I 2vTv
vvT = [hi,j] and
|v
|instead of
|v
|2. D Then
we havehi,j = i,j 2|v|2 vivj = hj,i,
where i,j is the Kronecker delta symbol, i.e.
i,j =
1 i = j
0 otherwise
i.e. HT = H H is symmetric.
We also have
HH = I 2|v|2 vvT I 2|v|2 vvT= I 4|v|2 vv
T +4
|v|4
vvT
vvT
But
vvT
vvT
= v
vTv
scalar
vT = |v|2vvT.
Thus we obtain
HH = I 4|v|2 vvT +
4
|v|2 vvT = I,
i.e. HH = I H1 = H.
But H = HT H1 = HT or HTH = HHT = I, i.e. H is also orthog-onal.
Consider a given vector
a =
a1a2...
an
Rn \ {0}
and a given index j {1, . . . , n} such that aj = 0
Define
1) C = c(j)(a) := sgn (aj)
a2j + a2j+1 + . . . + a
2n
-
7/27/2019 Analisis Libro 1
49/99
CHAPTER 6. THE QR DECOMPOSITION 48
2) v = v(j)(a) :=
0
...
0
c + ajaj+1
...
an
(where c + aj = 0 due the sign in 1)).
3) H = H(j)(a) :=
I 2vTv
vvT, if v = v(j)(a) = 0
I otherwise
theorem H = H(j)(a) is a Householder matrix with the
following properties:
(a) H(j)(a)a =
a1...
aj1c0...
0
j
(b) H(j)(a)b = b for each vector b =
b1b2...
bj10...
0
proof
2vTa = 2(c + aj) aj + a
2j+1 + . . . + a
2n
=
c2 + 2caj + a
2j
+ a2j+1 + . . . + a
2n
= (c + aj)2 + a2j+1 + . . . + a
2n = v
Tv
-
7/27/2019 Analisis Libro 1
50/99
CHAPTER 6. THE QR DECOMPOSITION 49
H(j)(a)a = I 2vTv
vvT a= Ia 2
vTvv
vTa
= a v =
a1...
aj1c0
...
0
j
It is clear that vTb = 0
H(j)(a)b =
I 2
vTvvvT
b
= Ib 2vTv
v
vTb
= b
6.2 Construction of the QR factors
Write A = A(1) =
a(1)1 |a(1)2 | . . . |a(1)n
, i.e. with the column vectors of A.
Define H(1) = H(1)
a(1)1
and A(2) = H(1)A(1).
Then we have
A(2) = H(1)
a(1)1 |a(1)2 | . . . |a(1)n
=
H(1)a
(1)1 |H(1)a(1)2 | . . . |H(1)a(1)n
=
0...
0
|a(2)2 | . . . |a(2)n
i.e. a(2)k = H(1)a(1)k for k = 2, . . . , n.
-
7/27/2019 Analisis Libro 1
51/99
CHAPTER 6. THE QR DECOMPOSITION 50
Define now
H(2)
= H(2)
(a(2)
2 )
and
A(3) = H(2)A(2)
=
H(2)
0...
0
|H(2)a(2)2 |H(2)a(2)3 | . . . |H(2)a(2)n
=
00...
0
|
0...
0
|a(3)3 | . . . |a(3)n
The first two columns are invariant and a(3)k = H
(2)a(2)k for k = 3, . . ., n.
Repeat for j = 2, . . . , n 1
H(j) = H(j)(a(
j)j )
A(j+1) = H(j)A(j)
until the end.
A(n) = H(n1) . . . H (2)H(1)A(1) an upper triangular matrix!
i.e. R = A(n) = H(n1) . . . H (1)A(1)
But the H(j) are orthogonal and symmetric: H(j)1
= H(j)T
= H(j)
A = A(1)
= H(1)
H(2)
. . . H (n
1)
QR
where Q is orthogonal as the product of orthogonal matrices, i.e.
Q := H(1)H(2) . . . H (n1)
But Q is not necessarily a symmetric matrix since the product of symmetric
matrices need not be symmetric.
-
7/27/2019 Analisis Libro 1
52/99
CHAPTER 6. THE QR DECOMPOSITION 51
6.2.1 Example
A(1) = A =
1 2 30 0 12 3 4
a(1)1 =
102
c(1) = sgn (1) 12 + 02 + 22 =
5
v = v(1) =
1 +
5
0
2
vTv = (1 +
5)2 + 02 + 22 = 10 + 2
5 = 2
5(1 +
5)
vvT =
(1 +
5)2 0 2(1 +
5)
0 0 0
2(1 + 5) 0 4
2
vTvvvT =
1 +15
0 25
0 0 025
0 1 15
H(1) = I 2vTv
vvT =
15
0 25
0 1 0
25
0 15
A(2) = H(1)A(1) =
5
85
115
0 0 1
0 15
25
a(2) =
85
0
15
c(2) = sgn(0)
+1!
02 +
15
2=
15
-
7/27/2019 Analisis Libro 1
53/99
CHAPTER 6. THE QR DECOMPOSITION 52
v = v(2) =
015 1
5
, vTv = 0 + 15
+1
5=
2
5
und
vvT =
0 0 00 15 150 1
515
H(2) = I
2
vTv
vvT = 1 0 0
0 0 1
0 1 0
A(3) = H(2)A(2) =
5 85
115
0 15
25
0 0 1
(The first row is unchanged, while the second and third rows are exchanged here
automatically!)
R = A(3)
= 5 8
5 11
5
0 1
5 2
50 0 1
Q = H(1)H(2) =
15
25
0
0 0 1
25
15
0
Q is not symmetric! but QTQ = I orthogonal!
QR = A here!
-
7/27/2019 Analisis Libro 1
54/99
Chapter 7
Iterative methods for linear
systems
Literatur Oevel, Kap. 5.17, Schwarz, Kap. 11.1; Stummel/Hainer: Kap.
8.1-8.2
The numerical solution of a linear system of equations
Ax = b
through the LU or QR decomposition needs only finitely many arithmetic op-erations, O(n3). But the computational cost is so high for n >>> 1 that such
a direct method is not practical. In such cases, in particular if the matrix has
many zero components, an iterative method can be realistic although in prin-
ciple it needs infinitely many iterations.
Let T be an n n matrix and consider iterative method
x(k+1) = T x(k) + b
with x(k), x(k+1), b Rn.
If the sequence {x(k)} converges, then by continuity we havex = T x + b,
i.e. the limit x is a fixed point of the linear mapping F(x) = T x + b on Rn
or
(I T)x = b uniquely solvable with solution x = (I T)1b if and only if I T is
invertible.
53
-
7/27/2019 Analisis Libro 1
55/99
CHAPTER 7. ITERATIVE METHODS 54
Question: When does the iteration sequence {x(k)} converge?
We can use the Banach contraction mapping or fixed point theorem, i.e.,
F Contraction x(k) x, where x is the unique fixed point of F.
In our case: F contraction T contraction, becauseF(x) F(y) = T x T y T x y < x y, falls T < 1.
(We assume here that the matrix and vector norms are consistent!)
Question: What has this got to do with the linear system Ax = b ?
Let A be invertible and let x = A1b be the unique solution. Then we have
x = x + (Ax b).This equation suggests the iterative method
x(k+1) = (I + A)x(k) bi.e. with T = I+ A and b instead of b.
But T = I + A is almost never a contraction.
Assume now that ai,i = 0 for i = 1, . . . , n, and define
D =
a1,1
a1,2. . .
an,n
This D is invertible with
D1 =
1/a1,1 . . .
1/an,n
Consider the equation
b = Ax = (A D)x + Dxor
Dx = (D A)x + bi.e. x = D1(D A)x + D1b.
The last equation suggests the iterative method
-
7/27/2019 Analisis Libro 1
56/99
CHAPTER 7. ITERATIVE METHODS 55
x(k+1) = (I D1A)x(k) + D1b k = 0, 1, 2, . . .
or
x(k+1)i =
nj=1=i
ai,jx(k)j + bi
/ai,i i = 1, . . . , n, k = 0, 1, 2, . . .This is called the Jacobi iteration method.
In this case we have
T = D1(D A) = I D1A
Question: When is this T a contraction?
Consider the max matrix norm (row summation)
T = maxi=1,...,n
nj=1
|ti,j |
For T = I D1A we have
ti,j = i,j ai,jai,i
=
0 i = j
ai,j
ai,i i = j,
where i,j is the Kronecker delta symbol.
Hence we have
nj=1
|ti,j | =nj=1=i
|ai,j ||ai,i| =
1
|ai,i|nj=1=i
|ai,j|
as well as
T = maxi=1,...,n
1
|ai,i|n
j=1=i|ai,j |
We will have T < 1, when
1
|ai,i|nj=1=i
|ai,j | < 1, i = 1, . . . , n
i.e. nj=1=i
|ai,j | < |ai,i| i=1,..., n
-
7/27/2019 Analisis Libro 1
57/99
CHAPTER 7. ITERATIVE METHODS 56
A is called strictly diagonal dominant.
Under this sufficient condition, all iteration sequences converge to the unique
solution x = A1b.
The convergence order is p = 1, i.e., linear convergence for a contractive
iteration method.
Question: Can we speed up this convergence?
As a variation of the Jacobi method, we can already use the component of the
vector x(k+1) which we have just calculated in the calculation of the following
components x(k+1), i.e.x(
k+1)1 =
nj=2 a1,jx(k)j + b1 /a1,1
x(k+1)i =
i1j=1 ai,jx(k+1)j nj=i+1 ai,jx(k)j + bi) ai,i, i = 2, . . . , n 1
x(k+1)n =
n1j=1 an,jx(k+1)j + bn /an,n
This new method is called the Gauss-Seidel iteration method
We can represent it in a matrix-vector form. Consider the follwing additive decomposition
of A,
A = D L R,where D = diag[ai,i] (as above) and
L =
0 a2,1 0a3,1 a3,2 0
.... . .
0
an,1 an,n1 0
R =
0 a1,2 a1,3 . . . a1,n0 0 a2,3 . . . a2,2
0. . .
0 an1,n 0
The Gauss-Seidel method then reads
Dx(k+1) = Lx(k+1) + Rx(k) + b
or
x(k+1) = (D L)1Rx(k) + (D L)1b k=1,2...,where
D L =
a1,1 a2,1 a2,2
......
. . .
an,1 an,2 . . . an,n
-
7/27/2019 Analisis Libro 1
58/99
CHAPTER 7. ITERATIVE METHODS 57
is invertible, because det(D L) = a1,1a2,2 . . . an,n = 0.
In the case the iteration method is
x(k+1 = T x(k) + b
with T = (D L)1R and b = (D L)1b.
theorem: (See Oevel Theorem 5.35, page 177-178)
LetA be strictly diagonal dominant. Then
T = (D L)1R I D1A < 1,
i.e. T is a contraction and the iteration sequences convergeto the unique solution x = A1b.
proof: We have already seen in the Jacobi method above that
K = I D1A= max
i=1,...,n
1
|ai,i|nj=1=i
|ai,j |
< 1
when A is strictly diagonal dominant. Consider now the mapping
y = T x
For i = 1,
y1 = 1a1,1
nj=2
a1,jxj
|y1| 1|a1,1|n
j=2
|a1,j | |xj |
maxi=1,...,n1
|ai,i|
n
j=1=i |ai,j |x= Kx
i.e. |y1| Kx
The proof continues by induction. Let i = 2, . . . , n and assume that |yj | Kx for j = 1, . . . , i 1. Then we have
yi = 1ai,i
i1j=1
ai,jyj 1ai,i
nj=i+1
ai,jxj
-
7/27/2019 Analisis Libro 1
59/99
CHAPTER 7. ITERATIVE METHODS 58
|yi| 1|ai,i|i1j=1
|ai,j | |yj |Kx
+1
|ai,i|n
j=i+1
|ai,j | |xj |x
x
K1
|ai,i|i1j=1
|ai,j | + 1|ai,i|n
j=i+1
|ai,j |
But K < 1
|yi| |x| 1|ai,i|n
j=1=i|ai,j |
maxi=1,...,n
1
|ai,i|nj=1=i
|ai,j |x = Kx
By induction we have
|yi| Kx, i = 1, . . . , n
y = maxi=1,...,n
|yi| Kx
i.e. y = T x| Kx. Hence
T = maxx=0
T xx K
i.e. T = (D L)1A contraction constant of
the Gauss-Seidel method
I D1A contraction constant of
the Jacobi method
< 1
the Gauss-Seidel method should converegce quicker than the Jacobimethod, because its contraction constant is smaller.
(But the convergence order still remains linear, because the GS method is a
contraction iteration method.)
7.1 Relaxation methods
Literatur Oevel, Kap. 5.18; Schwarz, Kap. 11.1; Stummel/Hainer, Kap. 8.3
We can often modify an interation method to ensure the convergence to a
desired fixed point or to accelerate the convergence.
Consider the following mapping in R1
-
7/27/2019 Analisis Libro 1
60/99
CHAPTER 7. ITERATIVE METHODS 59
(1) f1(x) =12
x + 2
(2) f2(x) = 2x 1
The function f1 has a unique fixed point x = 4 and the iterations
x(k+1) = f1
x(k)
converge to x = 4, since f1 is a contraction with contraction constant
12
. We
can accelerate this convergence with the iteration method
x(k+1) = f1,
x(k)
where
f1,(x) = (1 )x + 12
x + 2
and is a parameter with ( = 0)
x = 4 is the unique fixed point of the function f1, for each = 0. But
f1,(x) =
1 2
x + 2.
f1, is a contraction with contraction constant 1 2 for 0 < < 4,because
1
2 < 1, for 0 < < 4.But 1
2
< 12
or 12
< 1 2
< 12
when 1 < < 3.
For such the iteration method converges quicker!
Consider now f2(x) = 2x 1 with the unique fixed point x = 1. thismapping is not a contraction and the iteration sequences x(k+1) = f2(x(
k))
diverge. Instead consider now
x(k+1) = f2,(x(k))
with
f2,(x) = (1 )x + f2(x) = (1 )x + (2x 1).x = 1 is the unique fixed point for each = 0.
But f2,(x) = (1 + )x is a contraction with contraction constant
|1 + | < 1,
-
7/27/2019 Analisis Libro 1
61/99
CHAPTER 7. ITERATIVE METHODS 60
provided 1 < 1 + < 1, i.e. when 2 < < 0. For such the iterationsequences converge to the desired fixed point x = 1.
Such modified iteration methods are called relaxation methods and the pa-
rameter is called the relaxation parameter.
Now consider a linear system of equations
Ax = b,
with A an n invertible matrix. The unique solution is x = A1b
Assume that s
ai,i = 0, i = 1, . . . , n ,so the diagonal matrix D = diag [ai,i] is invertible. The Jacobi method here is
x(k+1) = D1(D A)x(k) + D1b,
i.e. with the mapping
F(x) = D1(D A)x + D1b
= x + D1(b
Ax)
Clearly x = F(x), but without additional assumptions we cannot be certain
the the iterations of the Jacobi method will converge to x.
Consider instead the mapping
F(x) = (1 )x + F(x) = x + D1(b Ax),
which has a unique fixed point (when = 0)
x = F(x) = A1b.
We want to choose so that the iteration method
x(k+1) = F(x(k)) = x(k) + D1(b Ax(k))
converges to x.
For this we assume that A positive definite and symmetric. Then ai,i > 0
for i = 1, . . ., n, and the diagonal matrix
D = diag [ai,i]
-
7/27/2019 Analisis Libro 1
62/99
CHAPTER 7. ITERATIVE METHODS 61
is also positive definite and invertible. We define the following scalar product
(., .)D and the corresponding norm D durch
(x, y)D = xTDy =
nj=1
aj,jxjyj , xD =
xTDx =
(x, x)D =
nj=1
aj,jx2j
for x, y Rn.
theorem: (Stummel/Hainer, Theorem 8, page 165)
Let A Rnn be positive definite and symmetric. The relaxed Jacobimethod
x(k+1) = F(x(k)) := x(k) + D1 b Ax(k)converges to x = A1b for each initial vector x(0) Rn if and only ifthe parameter and the largest eigenvalue 1 of D
1A satisfy
0 < < 2/1.
proof: The mapping
x D1Ax, x Rn,is symmetric and positive definite in the scalar product (., .)D in the sense that
(D1Ax,y)D = (D1Ax)TDy= xTATD1Dy
= xTAy
= xTD D1Ay
= (x, D1Ay)D
and
(D1Ax,x)D = xTAx > 0 for all x = 0= 0 if and only if x = 0
The matrix D1A thus has a complete orthonormal system of eigenvectorsv1, . . ., vn in the space Rn w.r.t. the scalar product (., .)D with associated realpositive eigenvalues 1 . . . n > 0,
i.e. D1Avj = jvj , (vi, vj)D = i,j , Avj = jDvj .
The matrix T = I D1A has eigenvalues 1 j with associated eigen-
-
7/27/2019 Analisis Libro 1
63/99
CHAPTER 7. ITERATIVE METHODS 62
vectors v1, . . ., vn.
Tx2D =n
j=1|1 j |2 |(x, vj)D|2
maxj=1,...,n
|1 j|2 n
j=1
|(x, vj)D|2
maxj=1,...,n
|1 j|2
q2
x2D
for each x Rn
If 1 is the largest eigenvalue of D1A and 0 < < 2/1, then we have
1 > 1 j 1 1 > 1, j = 1, . . . , n q < 1
i.e. T (and hence F) is a contraction w.r.t. the norm D on Rn. For 0 or 2/1 we have
q = |1 1| 1.Thus for the initial value x(0) = v1 (the eigenvector for 1)
Tk v1D = qk 1In this case the necessary and sufficient condition Tk 0 for the convergence
of the method is violated.
Remarks:
(1) The a-posteriori and a-priori error estimates
x(k) xD q1 q x
(k1) x(k)D
qk
1 q x(0)
x(1)
D
hold under the conditon that 0 < < 2/1.
(2) We can choose the parameter so that the factor
q = maxj=1,...,n
|1 j |
is as small as possible. Namely, =2
1 + n
q = 1 n1 + n
q for all (0, 2/1)
-
7/27/2019 Analisis Libro 1
64/99
CHAPTER 7. ITERATIVE METHODS 63
(3) The eigenvalues of D1A are often unknown, but we have the following
estimate
1 D1A = 1 + maxi=1,...,n
nj=1j=i
ai,jai,i
convergence, if
0 < < 2/D1A 21
A is strictly diagonald dominant D1A < 2But we also have convergence when A is not strictly diagonal dominant
(provided is small enough!)
7.2 The SOR method
Literatur Oevel, Kap. 5.18; Schwarz Kap. 11.1; Stummel/Hainer, Kap. 8.3
The SOR method is a relaxed version of the Gauss-Seidel method.
Let A be an nn invertible matrix with ai,i = 0 for i = 1, . . . , n and considerthe additive decomposition
A = D L R(see first part of the chapter).
The Gauss-Seidel method is
x(k+1) = (D L)1Rx(k) + (D L)1b
and has the fixed point x = A1b, i.e.
Dx(k+1) = Lx(k+1) + Rx(k) + b
with
Dx = Lx + Rx + b
or
Dx = [Lx + Rx + b] ( = 0)
Dx = (1 )Dx + [Lx + Rx + b]
As with the Gauss-Seidel method we can introduce an iteration method
which uses the newly computed components of x(k+1) straight away, i.e.
Dx(k+1) = (1 )Dx(k) + [Lx(k+1) + Rx(k) + b]
-
7/27/2019 Analisis Libro 1
65/99
CHAPTER 7. ITERATIVE METHODS 64
or
(D L)x(k+1)
= [(1 )D + R]x(k)
+ b
or 1
D L
x(k+1) =
1
1
D + R
x(k) + b.
We have
det
1
D L
= det
a1,1/
a2,1 a2,2/...
.... . .
an,1 an,2 . . . an,n/
=1
na1,1 . . . an,n = 0
1 D L is invertible for all = 0. Therefore we have
x(k+1) =
1
D L
1 1
1
D + R
x(n) +
1
D L
1b
This method is called the SOR method [SOR = Successive Over Relaxed]
Let A be positive definite and symmetric. From Theorem 16 in Stum-
mel/Hainer (page 167) we have
x(k) x = A1bfor all x(0) Rn 0 < < 2
The parameter region (0,2) here is more favourable than that for the relaxed
Jacobi method, i.e., (0, 2/1), because it does not depend on the particular ma-
trix A.
-
7/27/2019 Analisis Libro 1
66/99
Chapter 8
Krylov space methods
Literature Plato, Kap. 11
8.1 Krylov spaces
We consider again the approximation of the solution of a linear systems of equa-
tions Ax = b, where A RNN is regular (i.e. invertible) and b RN, with theunique solution x = A1b RN.
Let
{0} D1 D2 RN (8.1)be linear subspaces (finitely or infinitely many), which we will specify in more
detail later.
We will investigate the following approaches to determining different se-
quences of vectors xn Dn, n = 1,2, .
Definition 1 (Orthogonal residual approach)
xn Dn, Axn b Dn , n = 1, 2, . (8.2)
Definition 2 (Minimal residual approach)
xn Dn, Axn b2 = minxDn
Ax b2, n = 1, 2, . (8.3)
Here
M := y RN : < y,x >2= 0 for each x Mdenotes the orthogonal complement of an arbitrary set M RN, while 2denotes the Euclidean vector norm and < , >2 the corresponding Euclideanscalar product.
65
-
7/27/2019 Analisis Libro 1
67/99
CHAPTER 8. KRYLOV SPACE METHODS 66
The vector Ax b is called the residual of x RN (with respect to thesystem of equations Ax = b).
Krylov spaces play a leading role in the choice of the subspaces in the above
approaches.
Definition 3 The Krylov spaces corresponding to a given matrix A RNNand vector b are defined by
Kn(A, b) := span
b, Ab, , An1b RN, n = 1, 2, ,with K0(A, b) := {0}.
8.1.1 Properties of Krylov spaces
Krylov spaces are clearly increasing:
{0} = K0(A, b) K1(A, b) K2(A, b) .
and there is a a uniquely determined integer 0 n N such that
Kn1(A, b) Kn(A, b) = Kn+1(A, b) = ,
x = A1b Kn(A, b), x / Kn(A, b) for n = 0, 1, , n 1.
These properties follow immediately from the next lemma
Lemma 4 Given a regular matrixA RNN and a vectorb RN the followingstatements are equivalent for each integer n 1:
(a) The vectorsb, Ab, , Anb are linearly dependent;
(b) Kn(A, b) = Kn+1(A, b);
(c) AKn(A, b) Kn(A, b);
(d) there exists a linear subspaceM RN with dim M n such that b MandM is invariant w.r.t. A, i.e. A(M) M;
(e) x := A1b Kn(A, b).
Proof hint: The Cayley-Hamilton theorem says that the matrix A RNN is azero of its own characteristic polynomial, i.e.
pA(A) = AN + N1AN1 + + 1A1 + 0IN = 0 RNN,
where
pA(z) = det(zIN A) = zN + N1zN1 + + 1z1 + 0.
-
7/27/2019 Analisis Libro 1
68/99
CHAPTER 8. KRYLOV SPACE METHODS 67
8.2 The OR-approach for symmetric, positive
definite matrices
Here we consider the Orthogonal Residual approach for general subspaces under
the additional assumption that A RNN is a symmetric and positive definitematrix, i.e.,
A RNN, A = A > 0. (8.4)Define
< x, y >A = xAy, xA =
xAx, x, y RN.
Since A = A > 0, it follows that < , >A is a scalar product on RN withcorresponding norm A.
In the Orthogonal Residual approach it is often more convenient to derive
error estimates in the norm A rather than the Euclidean norm 2.
8.2.1 Existence, uniqueness and minimality
Here we discuss the existence and uniqueness of the vectors xn that arise in the
Orthogonal Residual approach
Theorem 5 For a given symmetric and positive definite matrix A
RNN the vectors xn, n = 1, 2, . . . in the Orthogonal Residualapproach with general subspaces Dn are unique and
xn xA = minxDn
x xA, n = 1, 2, (8.5)
Proof
1. Uniqueness: Assume for a fixed n that two vectors xn and xn satisfy
property (8.2). Then
xn xn Dn und A(xn xn) Dn .
and it follows that
xn xn2A = xn xn, A(xn xn)2 = 0 = xn = xn.
2. Solvability: Consider an arbitrary basis d0, d1, , dm1 of Dn. Then avector
xn =m1j=0
jdj Dn
satisfies property (8.2) if and only if
Axn b Dn Axn b, dk2 = 0 for k = 0, , m 1,
-
7/27/2019 Analisis Libro 1
69/99
CHAPTER 8. KRYLOV SPACE METHODS 68
i.e. if and only if the m coefficients 0, , m1 satisfy the system of m linearequations
m1j=0
Adj , dk2 j = b, dk2 for k = 0, , m 1. (8.6)
This system of equations is uniquely solvable due to the uniqueness of the solu-
tion which we showed above. (There are just three possibilities: no solution at
all or just one solution or infinitely many solutions).
3. Minimality: Finally, for an arbitrary vector x Dn we have
x
x
2A =
xn
x + x
xn
2A
= xn x2A + 2=0
A(xn x), Dn
x xn Dn
2
+x xn2A
xn x2A,
i.e., the unique solution xn satisfies the minimality property in (8.5).
8.2.2 The OR approach for an A-conjugate basis
In the proof of Theorem 5 we used an abitrary basis for the subspace Dn.The resulting system of equations (8.6) is much easier to solve when we use aparticular basis.
Definition 6 LetA RNN be a symmetric and positive definite matrix. Thenthe vectors d0, d1, , dm1 RN \ {0} are said to be A-conjugate if
Adk, dj2 = 0, for k = j.
Remark: A-conjugacy and pairwise orthogonality w.r.t. the scalar product
< , >A are the same.
-
7/27/2019 Analisis Libro 1
70/99
CHAPTER 8. KRYLOV SPACE METHODS 69
The Orthogonal Residual approach is simple to implement for symmetric,
positive definite matrices A RNN
when an A-conjugate basis ofDn is given.
Theorem 7 Suppose that for a given symmetric, positive definite matrix
A RNN and givenA-conjugate vectorsd0, d1, RN\{0} we define
Dn = span {d0, , dn1} , n = 1, 2, .
Then, the vectors xn in the Orthogonal Residual approach w.r.t. to these
Dn have the representation for n = 1, 2,
xn =n1
j=0jdj with j =
rj , dj2
Adj , dj
2
, (8.7)
where rj := Axj b for j 1 and r0 := b.
Proof We make use of the A-conjugacy in the proof of Theorem 5 and obtain
xn =n1j=0
jdj with j =b, dj2
Adj , dj2, j = 0, 1, , n 1. (8.8)
The number j in (8.8) agrees with (8.7), which is clear for j = 0 and follows
for j 1 from
b Axn =rn
, dn2
= b, dn2 n1j=0
ajAdj , dn2 = b, dn2 , n = 1, 2, ,
because
Adj , dn2
= 0 for j = n.
Remark 1: We see from the representation (8.7) that the number j is
independent of n is, so we have
xn+1 = xn + ndn, (8.9)
rn+1 = rn + nAdn (8.10)
with x0 := 0 und r0 = b.This gives a further simplification of the implementation (8.7), because we
have already calculated the matrix-vector product Adn which is needed for the
determination of n. Thus we do not need any more matrix-vector products in
order to calculate the residual rn+1 through (8.10).
This representation is important because most stop commands are based on
the value of the residual.
-
7/27/2019 Analisis Libro 1
71/99
CHAPTER 8. KRYLOV SPACE METHODS 70
Reamrk 2: In view of formula (8.9) the vector dn is called the search direction
and the number n is called the stepsize. They are optimal in the following sense
xn+1 xA = mintR
xn + tdn xA .
8.3 The CG method for positive definite matri-
ces
We will now use the Krylov spaces in the Orthogonal Residual approach.
Definition 8 For a symmetric, positive definite matrix A RNN, the con-jugate gradient method is is given by (8.9)-(8.10) with the special choice of
subspaces
Dn = Kn(A, b), n = 0, 1, . (8.11)
This method is often abbreviated as the CG method.
8.3.1 Computing A-conjugate search directions in Kn(A, b)Here we use the notation from Theorem 7. Starting from an already constructed
A-conjugate basis d0, , dn1 for Kn(A, b), we will construct an A-conjugatebasis for Kn+1(A, b) by the Gram-Schmidt orthogonalisation of the vectors
d0,
, dn1,
rn
RN
with respect to the scalarproduct < , >A.In the proof of Lemma 9 we will see that a Gram-Schmidt orthogonalisation
of the two vectors dn1, rn RN suffices.
Lemma 9 Suppose for a given symmetric, positive definite matrix A RNNthat the search directions are chosen so that
dn := rn + n1dn1 with n1 := Arn, dn12Adn1, dn12(8.12)
for n = 1, , n 1, and d0 := b, where n denotes the first index for whichrn = 0.
These vectors d0, d1, , dn1 RN are A-conjugate and
span {d0, , dn1} = span {b, r1, r2, rn1} = Kn(A, b) (8.13)
holds for n = 1, , n .
Proof: We will show the A-conjugacy of the vectors d0, d1, d2, , dn1 RN as well as both identities (8.12)-(8.13) by means of mathematical induction
-
7/27/2019 Analisis Libro 1
72/99
CHAPTER 8. KRYLOV SPACE METHODS 71
over n = 1, 2, , n . The first step is clear fromspan {d0} = span {b} = K1(A, b)
Now consider a fixed index 1 n n 1 and assume that the procedure(8.12) delivers a system d0 = b, d1, d2, , dn1 of A-conjugate vectors withthe property (8.13). From (8.2) we have rn Kn(A, b) and, for the case rn= 0, the vectors d0, , dn1, rn are linearly independent. A Gram-Schmidtorthogonalisation of these vectors w.r.t. the scalar product < , >A deliversthe vector
dn := rn +n1j=0
Arn, dj2Adj , dj2
dj
()= rn + n1dn1, (8.14)
where () follows from the facts that AKn1(A, b) Kn(A, b) and rn Kn(A, b):Arn, dj2 = rn, Adj2 = 0, j = 0, 1, , n 2.
The vectors d0, , dn1, dn are A-conjugate by the construction andspan {d0, , dn1, dn} = span {b, r1, r2, , rn}
holds. In view of equation (8.10) we also have span {b, r1, r2, , rn} Kn+1(A, b).The required inequality follows on dimensional grounds.
Remark: The solution of the system of equations Ax = b is provided simul-
taneously from the stopping criterion described in Lemma 9, thus xn = x.Since the two vector systems in (8.13) are linearly independent it follows that
dim Kn(A, b) = n, for n = 0, 1, , n,and hence, necessariliy, that
n N.
As an immediate consequence of the proof of Lemma 9 we obtain the follow-
ing representation for stepsize, which is typically used in numerical implemen-
tations.
Corollary 10 In the notation given in Lemma 9 we have the represen-
tation
n =rn22
Adn, dn2, n = 0, 1, , n 1, (8.15)
n1 =rn22
rn122, n = 1, , n 1, (8.16)
with r0 := b.
Proof With rn Kn(A, b) and the expression (8.12) for the search direction
-
7/27/2019 Analisis Libro 1
73/99
CHAPTER 8. KRYLOV SPACE METHODS 72
dn we obtain < rn, dn >2 = rn22. Together with (8.7) this yields (8.15).This representation (8.15) for n together with the identity rn = rn1 +
n1Adn1 (i.e., the identity (8.10) wih n replaced by n 1) gives
rn22 =
rn, rn1 =0
2
+ n1 rn, Adn12 = n1rn122,
and thus (8.15) also holds for n1.
8.3.2 The Algorithm for the CG method
We combine the above considerations and results to obtain the following algo-
rithm:
Algorithm for the CG method
Step 0: Set r0 = Ax0 b.
Step n = 0, 1, :
(a) Ifrn = 0, then stop, n = n.
(b) If, on the other hand, rn = 0, then in Step n + 1 proceed as follows:
dn = r0, if n = 0
rn + n1dn1, n1 = rn22rn122, if n 1
xn+1 = xn + ndn, n =rn22
Adn, dn2,
rn+1 = rn + nAdn
Remark: The expression conjugate gradient method originates in the follow-
ing two properties:
For each index n the residual rn is identical to the gradient of the energyfunctional
J(x) = 12
Ax,x2 < x, b >2at xn = rn = J(xn).
As an immediate consequence of (8.2) and (8.13), we have
rk, rj2 = 0 for k = j.
-
7/27/2019 Analisis Libro 1
74/99
CHAPTER 8. KRYLOV SPACE METHODS 73
8.3.3 The CG method for the normal equations
If the regular system of linear equations Ax = b is symmetric and indefinite
or nonsymmetric, then we can apply the classical CG method to the normal
equations
AAx = Ab.
In this case the method is called the CGNR method.
As a direct consequence of Theorem 7, the following minimality property is
obtained for the iterates of the CGNR method
Axn b2 = minxKn(AA,Ab)
Ax b2 (8.17)
This property justifies the letter R in the CGNR notation, while the letter
N stands for normal equations. It is also clear from this property that
with the special choice of subspaces Dn = Kn(AA, Ab) for n = 0, 1, , theCGNR method coincides with the Minimal Residual approach (8.3).
Two matrix-vector multiplications are required in each iteration step of the
CG algorithm applied to the normal equations AAx = Ab (i.e. to calculateAdn and AAdn), but the numerically more expensive calculation of the matrixAA is not required.
8.4 The GMRES method and Arnoldi process
The GMRES method provides another possibility for solving a regular system of
linear equations Ax = b with a symmetric indefinite or a nonsymmetric matrix
A RNN.
Definition 11 The GMRES method is defined by the Minimal Residual ap-
proach (8.3 ) with the special choice of subspaces Dn = Kn(A, b), i.e., so that
xn Kn(A, b), Axn b2 = minxKn(A,b)
Ax b2, n = 0, , n (8.18)
The abbreviation GMRES stands for generalized minimal residual method.
For n = 1, 3, the basic procedure for realising the GMRES method is asfollows:
(a) Use the Arnoldi process (which will be described below) to generate an
orthogonal basis for Kn(A, b) with repsect to the Euclidean scalar product.
(b) With this orthogonal basis, the minimization problem (8.18) can be refor-
mulated as a simpler minimization problem which can be quickly solved.
-
7/27/2019 Analisis Libro 1
75/99
CHAPTER 8. KRYLOV SPACE METHODS 74
8.4.1 The Arnoldi process
The Arnoldi process is easy to explain: starting from a given normalized vector
q1 RN a sequence of pairwise orthonormal vectors q1, q2, w.r.t. the classi-cal scalar product < , >2 is generated by Gram-Schmidt orthogonalization ofthe vectors q1, Aq1, Aq2, . (The vectors required are generated in the courseof the process and are not known a priori).
The following algorithm describes the exact procedure.
Algorithm for the Arnoldi process
Starting from a given vector 0 = b RN set
q1 =b
b2 RN
an proceed as follows for n = 1, 2,
(1) Orthogonalization: Set
hk,n := (Aqn)qk R, k = 1, 2, , n, (8.19)
qn+1 := Aqn n
k=1hk,nqk RN (8.20)
(2) Normalization: The process stops if qn+1 = 0, in which case the
stopping index is denoted by n = n. On the other hand, if qn+1= 0, then set
hn+1,n := qn+12 R, qn+1 := qn+1hn+1,n
RN (8.21)
Remark: The quantities in (8.19)-(8.20) imply that the Arnoldi process will
stop for the first time that Aqn span{q1, qn} holds.
The following Lemma summarizes the most important properties associated
-
7/27/2019 Analisis Libro 1
76/99
CHAPTER 8. KRYLOV SPACE METHODS 75
with the Arnoldi process.
Lemma 12 The vectors q1, , qn RN produced by the Arnoldi pro-cess are pairwise orthonormal, and
span{q1, , qn} = span{q1, , qn1, Aqn1} = Kn(A, b) (8.22)
for n = 1, , n. If the matrix A is regular, then the unique solution x RN of the system of equations Ax = b satisfies
x Kn(A, b). (8.23)
Proof: The pairwise orthogonality is obtained by mathematical inductionw.r.t. n using (8.19):
qn+1, qk2 =1
hn+1,n
(Aqn)
qk hk,n
= 0, k = 1, 2, . . . , n
for n = 1, 2, . . ., n 1. The property qn+12 = 1 follows by (8.21).The two identities in (8.22) will now be proved too by mathematical induc-
tion w.r.t. n . In view of q1 = b/b2, the assertion is true for n = 1.The induction step 1 n 1 n n will now be verified. Since n n
the vectors q1, , qn1, Aqn1 RN are linearly independent and thus by theconstruction the first identity in (8.22) is true. The second identity in (8.22)
is obtained as follows: The relation follows from Aqn1 AKn1(A, b)) Kn(A, b) and the identity = then results from a dimension argument.
n = dim span {q1, , qn1, Aqn1} dim Kn(A, b) n.
We prove the statement in (8.23) as follows: from the definition of n wehave
Aqn span {q1, , qn} = Kn(A, b),and by construction
Aqk Kk+1(A, b) Kn(A, b), k = 1, , n 1,
which combine to give A(Kn(A, b)) Kn(A, b). Again by a dimension argu-ment, the mapping A : Kn(A, b) Kn(A, b) is bijective, so, since b Kn(A, b),we have then x Kn(A, b).
Remark 1: It is clear from the statement (8.23) that
dim Kn(A, b) = n for n = 1, , n,
so the Arnoldi process necessarily stops after at most N steps, n N.
-
7/27/2019 Analisis Libro 1
77/99
CHAPTER 8. KRYLOV SPACE METHODS 76
Remark 2: If the matrix A is symmetric, then the identity
hk,n = qn Aqk = 0
holds for k n 2 because
Aqk Kk+1(A, b) Kn1(A, b)
and qn Kn(A, b).The Gram-Schmidt orthogonalization (8.