numerical linear algebra applications jin
TRANSCRIPT
-
8/12/2019 Numerical Linear Algebra Applications Jin
1/196
Numerical Linear AlgebraAnd Its Applications
Xiao-Qing JIN 1 Yi-Min WEI 2
August 29, 2008
1Department of Mathematics, University of Macau, Macau, P. R. China.2Department of Mathematics, Fudan University, Shanghai, P.R. China
-
8/12/2019 Numerical Linear Algebra Applications Jin
2/196
2
-
8/12/2019 Numerical Linear Algebra Applications Jin
3/196
i
To Our Families
-
8/12/2019 Numerical Linear Algebra Applications Jin
4/196
ii
CONTENTS
page
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Basic symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic problems in NLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Why shall we study numerical methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Matrix factorizations (decompositions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Perturbation and error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Operation cost and convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2 Direct Methods for Linear Systems . . . . . . . . . . . . . . . . . 9
2.1 Triangular linear systems andLU factorization . . . . . . . . . . . . . . . . . . . . . 9
2.2 LU factor ization with pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Cholesky factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3 Perturbation and Error Analysis . . . . . . . . . . . . . . . . . . . 25
3.1 Vector and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Perturbation analysis for linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Error analysis on floating point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Error analysis on partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 4 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Least squares problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
-
8/12/2019 Numerical Linear Algebra Applications Jin
5/196
iii
4.3 QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 5 Classical Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1 Jacobi and Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 SOR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 6 Krylov Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1 Steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 Practical CG method and convergence analysis . . . . . . . . . . . . . . . . . . . . 92
6.4 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.5 GMRES method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Chapter 7 Nonsymmetric Eigenvalue Problems . . . . . . . . . . . . . . . 1117.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.3 Inverse power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.5 Real version ofQR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Chapter 8 Symmetric Eigenvalue Problems . . . . . . . . . . . . . . . . . . . 1318.1 Basic spectral properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.2 Symmetric QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.3 Jacobi method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
-
8/12/2019 Numerical Linear Algebra Applications Jin
6/196
iv
8.4 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.5 Divide-and-conquer method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 46
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Chapter 9 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.2 Background of BVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3 Strang-type preconditioner for ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.4 Strang-type preconditioner for DDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.5 Strang-type preconditioner for NDDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.6 Strang-type preconditioner for SPDDEs .. . . . . . . . . . . . . . . . . . . . . . . . . 177
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 8 5
-
8/12/2019 Numerical Linear Algebra Applications Jin
7/196
v
Preface
Numerical linear algebra, also called matrix computation, has been a center of sci-
entific and engineering computing since 1946, the first modern computer was born.Most of problems in science and engineering finally become problems in matrix compu-tation. Therefore, it is important for us to study numerical linear algebra. This bookgives an elementary introduction to matrix computation and it also includes some newresults obtained in recent years. In the beginning of this book, we first give an outlineof numerical linear algebra in Chapter 1.
In Chapter 2, we introduce Gaussian elimination, a basic direct method, for solvinggeneral linear systems. Usually, Gaussian elimination is used for solving a dense linearsystem with median size and no special structure. The operation cost of Gaussianelimination is O(n3) where n is the size of the system. The pivoting technique is also
studied.In Chapter 3, in order to discuss effects of perturbation and error on numerical
solutions, we introduce vector and matrix norms and study their properties. The erroranalysis on floating point operations and on partial pivoting technique is also given.
In Chapter 4, linear least squares problems are studied. We will concentrate onthe problem of finding the least squares solution of an overdetermined linear systemAx = b where A has more rows than columns. Some orthogonal transformations andthe QR decomposition are used to design efficient algorithms for solving least squaresproblems.
We study classical iterative methods for the solution of Ax = b in Chapter 5.Iterative methods are quite different from direct methods such as Gaussian elimination.Direct methods based on an LUfactorization of the matrix A are prohibitive in termsof computing time and computer storage if A is quite large. Usually, in most largeproblems, the matrices are sparse. The sparsity may be lost during the LU factorizationprocedure and then at the end ofLU factorization, the storage becomes a crucial issue.For such kind of problem, we can use a class of methods called iterative methods. Weonly consider some classical iterative methods in this chapter.
In Chapter 6, we introduce another class of iterative methods called Krylov sub-space methods proposed recently. We will only study two versions among those Krylovsubspace methods: the conjugate gradient (CG) method and the generalized mini-
mum residual (GMRES) method. The CG method proposed in 1952 is one of the bestknown iterative method for solving symmetric positive definite linear systems. TheGMRES method was proposed in 1986 for solving nonsymmetric linear systems. Thepreconditioning technique is also studied.
Eigenvalue problems are particularly interesting in scientific computing. In Chapter
-
8/12/2019 Numerical Linear Algebra Applications Jin
8/196
vi
7, nonsymmetric eigenvalue problems are studied. We introduce some well-knownmethods such as the power method, the inverse power method and the QR method.
The symmetric eigenvalue problem with its nice properties and rich mathematicaltheory is one of the most interesting topics in numerical linear algebra. In Chapter 8,we will study this topic. The symmetric QR iteration method, the Jacobi method, thebisection method and a divide-and-conquer technique will be discussed in this chapter.
In Chapter 9, we will briefly survey some of the latest developments in using bound-ary value methods for solving systems of ordinary differential equations with initialvalues. These methods require the solutions of one or more nonsymmetric, large andsparse linear systems. Therefore, we will use the GMRES method in Chapter 6 withsome preconditioners for solving these linear systems. One of the main results is thatif an A1,2-stable boundary value method is used for an m-by-m system of ODEs,then the preconditioned matrix can be decomposed as I+L where I is the identitymatrix and the rank of L is at most 2m(1+ 2). It follows that when the GMRESmethod is applied to the preconditioned system, the method will converge in at most2m(1 + 2) + 1 iterations. Applications to different delay differential equations are alsogiven.
If any other mathematical topic is as fundamental to the mathematicalsciences as calculus and differential equations, it is numerical linear algebra. L. Trefethen and D. Bau III
Acknowledgments: We would like to thank Professor Raymond H. F. Chan of
the Department of Mathematics, Chinese University of Hong Kong, for his constantencouragement, long-standing friendship, financial support; Professor Z. H. Cao of theDepartment of Mathematics, Fudan University, for his many helpful discussions anduseful suggestions. We also would like to thank our friend Professor Z. C. Shi forhis encouraging support and valuable comments. Of course, special appreciation goesto two important institutions in the authors life: University of Macau and FudanUniversity for providing a wonderful intellectual atmosphere for writing this book.Most of the writing was done during evenings, weekends and holidays. Finally, thanksare also due to our families for their endless love, understanding, encouragement andsupport essential to the completion of this book. The most heartfelt thanks to all ofthem!
The publication of the book is supported in part by the research grants No.RG024/01-02S/JXQ/FST, No.RG031/02-03S/JXQ/FST and No.RG064/03-04S/JXQ/FST fromUniversity of Macau; the research grant No.10471027 from the National Natural ScienceFoundation of China and some financial support from Shanghai Education Committeeand Fudan University.
-
8/12/2019 Numerical Linear Algebra Applications Jin
9/196
vii
Authors words on the corrected and revised second printing: In its secondprinting, we corrected some minor mathematical and typographical mistakes in thefirst printing of the book. We would like to thank all those people who pointed these
out to us. Additional comments and some revision have been made in Chapter 7.The references have been updated. More exercises are also to be found in the book.The second printing of the book is supported by the research grant No.RG081/04-05S/JXQ/FST.
-
8/12/2019 Numerical Linear Algebra Applications Jin
10/196
viii
-
8/12/2019 Numerical Linear Algebra Applications Jin
11/196
Chapter 1
Introduction
Numerical linear algebra (NLA) is also called matrix computation. It has been a centerof scientific and engineering computing since the first modern computer came to thisworld around 1946. Most of problems in science and engineering are finally transferredinto problems in NLA. Thus, it is very important for us to study NLA. This book givesan elementary introduction to NLA and it also includes some new results obtained inrecent years.
1.1 Basic symbols
We will use the following symbols throughout this book.
LetR
denote the set of real numbers,C
denote the set of complex numbers andi 1. Let Rn denote the set of realn-vectors and Cn denote the set of complexn-vectors.
Vectors will almost always be column vectors.
Let Rmn denote the linear vector space ofm-by-nreal matrices and Cmn denotethe linear vector space ofm-by-n complex matrices.
We will use the upper case letters such as A, B, C, and , etc, to denotematrices and use the lower case letters such as x, y , z , etc, to denote vectors.
The symbol aij will denote the ij -th entry in a matrix A.
The symbolAT will denote the transpose of the matrix A and A will denote theconjugate transpose of the matrix A.
Let a1, , am Rn (or Cn). We will use span{a1, , am} to denote the linearvector space of all the linear combinations ofa1, , am.
1
-
8/12/2019 Numerical Linear Algebra Applications Jin
12/196
2 CHAPTER 1. INTRODUCTION
Let rank(A) denote the rank of the matrix A.
Let dim(S) denote the dimension of the vector space S.
We will use det(A) to denote the determinant of the matrix A and use diag(a11, , ann)to denote the n-by-n diagonal matrix:
diag(a11, , ann) =
a11 0 0
0 a22. . .
......
. . . . . . 0
0 0 ann
.
For matrixA = [aij], the symbol |A| will denote the matrix with entries (|A|)ij =|aij|.
The symbol Iwill denote the identity matrix, i.e.,
I=
1 0 00 1
. . . ...
... . . .
. . . 00 0 1
,
andei will denote the i-th unit vector, i.e., the i-th column vector ofI.
We will use to denote a norm of matrix or vector. The symbols 1, 2and will denote the p-norm withp = 1, 2, , respectively.
As in MATLAB, in algorithms,A(i, j) will denote the (i, j)-th entry of matrixA;A(i, :) andA(:, j) will denote thei-th row and thej-th column ofA, respectively;A(i1: i2, k) will express the column vector constructed by using entries from thei1-th entry to thei2-th entry in thek-th column ofA;A(k, j1 : j2) will express therow vector constructed by using entries from the j1-th entry to the j2-th entryin the k-th row of A; A(k : l, p : q) will denote the (l k+ 1)-by-(qp+ 1)submatrix constructed by using the rows from the k-th row to the l-th row andthe columns from the p-th column to the q-th column.
1.2 Basic problems in NLA
NLA includes the following three main important problems which will be studied inthis book:
-
8/12/2019 Numerical Linear Algebra Applications Jin
13/196
1.3. WHY SHALL WE STUDY NUMERICAL METHODS? 3
(1) Find the solution of linear systems
Ax= b
whereA is ann-by-n nonsingular matrix and b is an n-vector.
(2) Linear least squares problems: For anym-by-nmatrixA and anm-vectorb, findann-vectorx such that
Ax b2= minyRn
Ay b2.
(3) Eigenvalues problems: For any n-by-n matrix A, find a part (or all) of its eigen-values and corresponding eigenvectors. We remark here that a complex numberis called an eigenvalue ofAif there exists a nonzero vector x Cn such that
Ax= x,
wherex is called the eigenvector ofA associated with .
Besides these main problems, there are many other fundamental problems in NLA,for instance, total least squares problems, matrix equations, generalized inverses, in-verse problems of eigenvalues, and singular value problems, etc.
1.3 Why shall we study numerical methods?
To answer this question, let us consider the following linear system,
Ax= b
where A is an n-by-n nonsingular matrix and x = (x1, x2, , xn)T. If we use thewell-known Cramer rule, then we have the following solution:
x1=det(A1)
det(A), x2=
det(A2)
det(A), , xn= det(An)
det(A) ,
whereAi, for i = 1, 2, , n, are matrices with the i-th column replaced by the vectorb. Then we should compute n+ 1 determinants det(Ai), i = 1, 2, , n, and det(A).
There are [n!(n 1)](n + 1) = (n 1)(n + 1)!multiplications. Whenn = 25, by using a computer with 10 billion operations/sec., weneed
24 26!1010 3600 24 365 30.6 billion years.
-
8/12/2019 Numerical Linear Algebra Applications Jin
14/196
4 CHAPTER 1. INTRODUCTION
If one uses Gaussian elimination, it requires
n
i=1
(i 1)(i + 1) =n
i=1
i2
n=1
6 n(n + 1)(2n + 1) n= O(n3
)
multiplications. Then less than 1 second, we could solve 25-by-25 linear systems byusing the same computer. From above discussions, we note that for solving the sameproblem by using different numerical methods, the results are much different. There-fore, it is essential for us to study the properties of numerical methods.
1.4 Matrix factorizations (decompositions)
For any linear system Ax = b, if we can factorize (decompose) A as A = LU where L
is a lower triangular matrix and Uis an upper triangular matrix, then we haveLy= b
U x= y.(1.1)
By substituting, we can easily solve (1.1) and then Ax= b. Therefore, matrix factor-izations (decompositions) are very important tools in NLA. The following theorem isbasic and useful in linear algebra, see [17].
Theorem 1.1 (Jordan Decomposition Theorem) IfACnn, then there exists
a nonsingular matrixX Cnn such that
X1AX=J diag(J1, J2, , Jp),
orA= X JX1, whereJis called the Jordan canonical form ofA and
Ji=
i 1 0 00 i 1
. . . ...
... 0 . . .
. . . 0...
. . . . . . 1
0 0 i
Cnini ,
fori = 1, 2, , p, are called Jordan blocks withn1 + +np= n. The Jordan canonicalform ofA is unique up to the permutation of diagonal Jordan blocks. IfA Rnn withonly real eigenvalues, then the matrixXcan be taken to be real.
-
8/12/2019 Numerical Linear Algebra Applications Jin
15/196
1.5. PERTURBATION AND ERROR ANALYSIS 5
1.5 Perturbation and error analysis
The solutions provided by numerical algorithms are seldom absolutely correct. Usu-
ally, there are two kinds of errors. First, errors appear in input data caused by priorcomputations or measurements. Second, there may be errors caused by algorithmsthemselves because of approximations made within algorithms. Thus, we need to carryout a perturbation and error analysis.
(1) Perturbation.
For a given x, we want to compute the value of function f(x). Suppose thereis a perturbation x ofx and|x|/|x| is very small. We want to find a positivenumber c(x) as small as possible such that
|f(x + x) f(x)|
|f(x)
| c(x) |x|
|x|.
Thenc(x) is called the condition number off(x) atx. Ifc(x) is large, we say thatthe functionfis ill-conditioned at x; ifc(x) is small, we say that the function fis well-conditioned at x.
Remark: A computational problem being ill-conditioned or not has no relationwith numerical methods that we used.
(2) Error.
By using some numerical methods, we calculate the value of a function f at apoint x and we obtain y. Because of the rounding error (or chopping error),usually
y=f(x).If there exists x such that
y= f(x + x), |x| |x|,where is a positive constant having a closed relation with numerical methodsand computers used, then we say that the method is stable if is small; themethod is unstable if is large.
Remark: A numerical method being stable or not has no relation with computa-tional problems that we faced.
With the perturbation and error analysis, we obtain|y f(x)||f(x)| =
|f(x + x) f(x)||f(x)| c(x)
|x||x| c(x).
Therefore, whether a numerical result is accurate depends on both the stability of thenumerical method and the condition number of the computational problem.
-
8/12/2019 Numerical Linear Algebra Applications Jin
16/196
6 CHAPTER 1. INTRODUCTION
1.6 Operation cost and convergence rate
Usually, numerical algorithms are divided into two classes:(i) direct methods;
(ii) iterative methods.
By using direct methods, one can obtain an accurate solution of computational prob-lems within finite steps in exact arithmetic. By using iterative methods, one can onlyobtain an approximate solution of computational problems within finite steps.
The operation cost is an important measurement of algorithms. The operationcost of an algorithm is the total operations of +, ,, used in the algorithm.We remark that the speed of algorithms is only partially depending on the operationcost. In modern computers, the speed of operations is much faster than that of datatransfer. Therefore, sometimes, the speed of an algorithm is mainly depending on thetotal amount of data transfers.
For direct methods, usually, we use the operation cost as a main measurement ofthe speed of algorithms. For iterative methods, we need to consider
(i) operation cost in each iteration;
(ii) convergence rate of the method.
For a sequence{xk} provided by an iterative algorithm, if{xk} x, the exactsolution, and if{xk} satisfies
xk x cxk1 x, k= 1, 2, ,where 0< c
-
8/12/2019 Numerical Linear Algebra Applications Jin
17/196
1.6. OPERATION COST AND CONVERGENCE RATE 7
3. Let
A=
A11 A12A21 A22
,
whereAij , fori, j = 1, 2, are square matrices with det(A11) = 0, and satisfyA11A21= A21A11.
Thendet(A) = det(A11A22 A21A12).
4. Show that det(I uv) = 1 vu where u, v Cm are column vectors.5. Prove Hadamards inequality forA Cnn:
|det(A)| n
j=1
aj2,
whereaj =A(:, j) andaj2= ni=1
|A(i, j)|2 1
2
. When does the equality hold?
6. Let B be nilpotent, i.e., there exists an integer k > 0 such that Bk = 0. Show that ifAB= BA, then
det(A + B) = det(A).
7. Let A be an m-by-n matrix and B be an n-by-m matrix. Show that the matrices AB 0
B 0
and
0 0B BA
are similar. Conclude that the nonzero eigenvalues ofAB are the same as those ofBA,
anddet(Im+ AB) = det(In+ BA).
8. A matrix M Cnn is Hermitian positive definite if it satisfiesM=M, xM x >0,
for all x = 0 Cn. LetA and B be Hermitian positive definite matrices.
(1) Show that the matrix productAB has positive eigenvalues.
(2) Show thatAB is Hermitian if and only ifA and B commute.
9. Show that any matrixA
Cnn can be written uniquely in the form
A= B + iC,
whereB and Care Hermitian.
10. Show that if A is skew-Hermitian, i.e., A =A, then all its eigenvalues lie on theimaginary axis.
-
8/12/2019 Numerical Linear Algebra Applications Jin
18/196
8 CHAPTER 1. INTRODUCTION
11. Let
A=
A11 A12A21 A22
.
Assume that A11, A22 are square, and A11, A22 A21A111A12 are nonsingular. Let
B=
B11 B12B21 B22
be the inverse ofA. Show that
B22= (A22 A21A111A12)1, B12= A111A12B22,
B21= B22A21A111, B11= A111 B12A21A111.
12. Suppose thatA and B are Hermitian with A being positive definite. Show that A+Bis positive definite if and only if all the eigenvalues ofA1B are greater than
1.
13. Let A be idempotent, i.e., A2 =A. Show that each eigenvalue ofA is either 0 or 1.
14. Let A be a matrix with all entries equal to one. Show that A can be written as A =eeT, where eT = (1, 1, , 1), and A is positive semi-definite. Find the eigenvalues andeigenvectors ofA.
15. Prove that any matrix A Cnn has a polar decomposition A = HQ, where H isHermitian positive semi-definite and Q is unitary. We recall that M Cnn is a unitarymatrix ifM1 =M. Moreover, ifAis nonsingular, thenHis Hermitian positive definiteand the polar decomposition ofA is unique.
-
8/12/2019 Numerical Linear Algebra Applications Jin
19/196
Chapter 2
Direct Methods for LinearSystems
The problem of solving linear systems is central in NLA. For solving linear systems, ingeneral, we have two classes of methods. One is called the direct method and the otheris called the iterative method. By using direct methods, within finite steps, one canobtain an accurate solution of computational problems in exact arithmetic. By usingiterative methods, within finite steps, one can only obtain an approximate solution ofcomputational problems.
In this chapter, we will introduce a basic direct method called Gaussian eliminationfor solving general linear systems. Usually, Gaussian elimination is used for solving a
dense linear system with median size and no special structure.
2.1 Triangular linear systems and LUfactorization
We first study triangular linear systems.
2.1.1 Triangular linear systems
We consider the following nonsingular lower triangular linear system
Ly= b (2.1)
9
-
8/12/2019 Numerical Linear Algebra Applications Jin
20/196
10 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
whereb = (b1, b2, , bn)T Rn is a known vector,y = (y1, y2, , yn)T is an unknownvector, and L = [lij] Rnn is given by
L=
l11 0 0l21 l22 0
...
l31 l32 l33. . .
......
... ...
. . . 0ln1 ln2 ln3 lnn
withlii= 0, i = 1, 2, , n. By the first equation in (2.1), we have
l11y1= b1,
and then
y1=
b1l11 .
Similarly, by the second equation in (2.1), we have
y2= 1
l22(b2 l21y1).
In general, if we have already obtained y1, y2, , yi1, then by using the i-th equationin (2.1), we have
yi= 1
lii
bi
i1j=1
lijyj
.
This algorithm is called the forward substitution method which needsO(n2) operations.
Now, we consider the following nonsingular upper triangular linear system
U x= y (2.2)
wherex = (x1, x2, , xn)T is an unknown vector, andU Rnn is given by
U=
u11 u12 u13 u1n0 u22 u23
...
0 0 u33. . .
......
... . . .
. . . ...
0
0 unn
with uii= 0, i= 1, 2, , n. Beginning from the last equation of (2.2), we can obtain
xn, xn1, , x1 step by step. The xn= yn/unn and xi is given by
xi= 1
uii
yi
nj=i+1
uijxj
-
8/12/2019 Numerical Linear Algebra Applications Jin
21/196
2.1. TRIANGULAR LINEAR SYSTEMS ANDLU FACTORIZATION 11
for i = n 1, , 1. This algorithm is called the backward substitution method whichalso needs O(n2) operations.
For general linear systems
Ax= b (2.3)
whereA Rnn andb Rn are known. If we can factorize the matrix A intoA= LU
where L is a lower triangular matrix and U is an upper triangular matrix, then wecould find the solution of (2.3) by the following two steps:
(1) By using the forward substitution method to find solution y ofLy = b.
(2) By using the backward substitution method to find solutionx ofU x= y.
Now the problem that we are facing is how to factorize the matrix A intoA = LU. Wetherefore introduce Gaussian transform matrices.
2.1.2 Gaussian transform matrix
LetLk = I lkeTk
whereI Rnn is the identity matrix,lk = (0, , 0, lk+1,k, , lnk)T Rn andek Rnis the k-th unit vector. Then for any k,
Lk =
1 0 00 . . . ...... 1
...... lk+1,k 1
......
... . . . 0
0 lnk 0 1
is called the Gaussian transform matrix. Such a matrix is a unit lower triangularmatrix. We remark that a unit triangular matrix is a triangular matrix with ones onits diagonal. For any given vector
x= (x1, x2, , xn)T Rn,we have
Lkx = (x1, , xk, xk+1 xklk+1,k, , xn xklnk)T
= (x1, , xk, 0, , 0)T
-
8/12/2019 Numerical Linear Algebra Applications Jin
22/196
12 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
if we takelik =
xixk
, i= k+ 1, , n
withxk= 0. It is easy to check thatL1k =I+ lke
Tk
by noting that eTk lk = 0.For a given matrix A Rnn, we have
LkA= (I lkeTk )A= A lk(eTk A)
andrank(lk(e
Tk A)) = 1.
Therefore,LkAis a rank-one modification of the matrix A.
2.1.3 Computation ofLU factorization
We consider the following simple example. Let
A=
1 5 92 4 73 3 10
.By using the Gaussian transform matrix
L1= 1 0 02 1 03 0 1
,we have
L1A=
1 5 90 6 110 12 17
.Followed by using the Gaussian transform matrix
L2= 1 0 00 1 0
0 2 1 ,we have
L2(L1A) U= 1 5 90 6 11
0 0 5
.
-
8/12/2019 Numerical Linear Algebra Applications Jin
23/196
2.1. TRIANGULAR LINEAR SYSTEMS ANDLU FACTORIZATION 13
Therefore, we finally have
A= LU
where
L (L2L1)1 =L11 L12 = 1 0 02 1 0
3 2 1
.For general n-by-n matrix A, we can use n 1 Gaussian transform matrices L1,
L2, , Ln1such thatLn1 L1Ais an upper triangular matrix. In fact, letA(0) Aand assume that we have already found k1 Gaussian transform matrices L1, , Lk1Rnn such that
A(k1) =Lk1 L1A=
A(k1)11 A
(k1)12
0 A(k1)22
whereA(k1)11 is a (k 1)-by-(k 1) upper triangular matrix and
A(k1)22 =
a(k1)kk a(k1)kn
... . . .
...
a(k1)nk a(k1)nn
.Ifa
(k1)kk = 0, then we can use the Gaussian transform matrix
Lk = I lkeTkwhere
lk = (0, , 0, lk+1,k, , lnk)T
with
lik =a
(k1)ik
a(k1)kk
, i= k+ 1, , n,
such that the last n k entries in the k-th column of LkA(k1) become zeros. Wetherefore have
A(k) LkA(k1) = A
(k)11 A
(k)12
0 A
(k)
22 where A
(k)11 is a k-by-k upper triangular matrix. Aftern 1 steps, we obtain A(n1)
which is an upper triangular matrix that we need. Let
L= (Ln1 L1)1, U=A(n1),
-
8/12/2019 Numerical Linear Algebra Applications Jin
24/196
14 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
then A = LU. Now we want to show that L is a unit lower triangular matrix. Bynoting thateTjli= 0 for j < i, we have
L = L11 L1n1= (I+ l1e
T1)(I+ l2e
T2) (I+ ln1eTn1)
= I+ l1eT1 + + ln1eTn1
= I+ [l1, l2, , ln1, 0]
=
1 0 0 0
l21 1 0 0l
31 l
32 1
. . . ...
... ...
... . . . 0
ln1 ln2 ln3 1
.This computational process of the LU factorization is called Gaussian elimination.Thus, we have the following algorithm.
Algorithm 2.1 (Gaussian elimination)
for k= 1 :n 1A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)
A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)A(k+ 1 :n, k)A(k, k+ 1 :n)end
The operation cost of Gaussian elimination is
n1k=1
(n k) + 2(n k)2
=
n(n 1)2
+n(n 1)(2n 1)
3
= 2
3n3 + O(n2) =O(n3).
We remark that in Gaussian elimination, a(k1)kk , k = 1, , n 1, are required tobe nonzero. We have the following theorem.
Theorem 2.1 The entries a(i1)ii = 0, i = 1, , k, if and only if all the leading
principal submatricesAi ofA, i= 1, , k, are nonsingular.
-
8/12/2019 Numerical Linear Algebra Applications Jin
25/196
2.2. LU FACTORIZATION WITH PIVOTING 15
Proof: By induction, for k = 1, it is obviously true. Assume that the statement istrue until k 1. We want to show that ifA1, , Ak1 are nonsingular, then
Ak is nonsingular
a
(k1)
kk = 0 .
By assumption, we know that
a(i1)ii = 0, i= 1, , k 1.
By using k 1 Gaussian transform matricesL1, , Lk1, we obtain
A(k1) =Lk1 L1A=
A(k1)11 A
(k1)12
0 A(k1)22
(2.4)
where A(k1)11 is an upper triangular matrix with nonzero diagonal entries a
(i1)ii , i =
1, , k1. Therefore, thek-th leading principal submatrix ofA(k1)
has the followingform A
(k1)11 0 a
(k1)kk
.
Let (L1)k, , (Lk1)k denote the k-th leading principal submatrices ofL1, , Lk1,respectively. By using (2.4), we obtain
(Lk1)k (L1)kAk =
A(k1)11 0 a
(k1)kk
.
By noting that Li,i = 1,
, k
1, are unit lower triangular matrices, we immediatelyknow that
det(Ak) =a(k1)kk det(A
(k1)11 ) = 0
if and only ifa(k1)kk = 0.
Thus, we have
Theorem 2.2 If all the leading principal submatrices Ai of a matrix A Rnn arenonsingular fori= 1, , n 1, then there exists a uniqueLU factorization ofA.
2.2 LU factorization with pivotingBefore we study pivoting techniques, we first consider the following simple example:
0.3 1011 11 1
x1x2
=
0.70.9
.
-
8/12/2019 Numerical Linear Algebra Applications Jin
26/196
16 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
If we using Gaussian elimination with the 10-decimal-digit floating point arithmetic,we have
L= 1 00.3333333333 1012 1 and U = 0.3 1011 1
0 0.3333333333 1012
.
Then the computational solution is
x= (0.0000000000, 0.7000000000)T
which is not good comparing with the accurate solution
x= (0.2000000000006
, 0.6999999999994
)T.
If we just interchange the first equation and the second equation, we have 1 1
0.3 1011 1
x1x2
=
0.90.7
.
By using Gaussian elimination with the 10-decimal-digit floating point arithmetic again,we have L= 1 0
0.3 1011 1
, U= 1 10 1
.
Then the computational solution is
x= (0.2000000000, 0.7000000000)T
which is very good. So we need to introduce permutations into Gaussian elimination.We first define a permutation matrix.
Definition 2.1 A permutation matrixP is an identity matrix with permuted rows.
The important properties of the permutation matrix are included in the followinglemma. Its proof is straightforward.
Lemma 2.1 LetP, P1, P2 Rnn
be permutation matrices andX Rnn
. Then
(i) P X is the same as X with its rows permuted. XP is the same as X with itscolumns permuted.
(ii) P1 =PT.
-
8/12/2019 Numerical Linear Algebra Applications Jin
27/196
2.2. LU FACTORIZATION WITH PIVOTING 17
(iii) det(P) = 1.(iv) P1 P2 is also a permutation matrix.
Now we introduce the main theorem of this section.
Theorem 2.3 If A is nonsingular, then there exist permutation matricesP1 andP2,a unit lower triangular matrix L, and a nonsingular upper triangular matrix U suchthat
P1AP2= LU.
Only one ofP1 andP2 is necessary.
Proof: We use induction on the dimension n. Forn = 1, it is obviously true. Assume
that the statement is true for n 1. IfA is nonsingular, then it has a nonzero entry.Choose permutation matrices P
1 and P
2 such that the (1, 1)-th position ofP
1AP
2 isnonzero. Now we write a desired factorization and solve for unknown components:
P
1AP
2 =
a11 A12A21 A22
=
1 0L21 I
u11 U120 A22
=
u11 U12L21u11 L21U12+A22
,
(2.5)
whereA22,A22 are (n 1)-by-(n 1) matrices, andL21, UT12 are (n 1)-by-1 matrices.
Solving for the components of this 2-by-2 block factorization, we get
u11= a11= 0, U12= A12,
and
L21u11= A21, A22= L21U12+A22.Therefore, we obtain
L21 = A21
a11, A22= A22 L21U12.
We want to apply induction toA22, but to do so we need to check thatdet( A22) = 0.
Since
det(P
1AP
2) = det(A) = 0
-
8/12/2019 Numerical Linear Algebra Applications Jin
28/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
29/196
2.2. LU FACTORIZATION WITH PIVOTING 19
Algorithm 2.2 (Gaussian elimination with complete pivoting)
for k= 1 :n 1choose p, q, (k p, q n)such that
|A(p, q)| = max {|A(i, j)| : i= k : n, j = k : n}A(k, 1 :n) A(p, 1 :n)A(1 :n, k) A(1 :n, q)if A(k, k) = 0
A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)
A(k+ 1 :n, k)A(k, k+ 1 :n)else
stop
endend
We remark that although theLUfactorization with complete pivoting can overcomesome shortcomings of the LU factorization without pivoting, the cost of completepivoting is very high. Usually, it requires O(n3) operations in comparison with entriesof the matrix for pivoting.
In order to reduce the operation cost of pivoting, the LU factorization with partial
pivoting is proposed. In partial pivoting, at the k-th step, we choose a(k1)
pk from the
submatrixA(k1)22 which satisfies
|a(k1)pk | = max|a(k1)ik | :k i n
.
When A is nonsingular, the LU factorization with partial pivoting can be carried outuntil we finally obtain
P A= LU.
In this algorithm, the operation cost in comparison with entries of the matrix forpivoting is O(n2). We have
-
8/12/2019 Numerical Linear Algebra Applications Jin
30/196
20 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
Algorithm 2.3 (Gaussian elimination with partial pivoting)
for k= 1 :n
1
choose p, (k p n)such that|A(p, k)| = max {|A(i, k)| : i= k : n}A(k, 1 :n) A(p, 1 :n)if A(k, k) = 0
A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)
A(k+ 1 :n, k)A(k, k+ 1 :n)else
stopend
end
2.3 Cholesky factorization
LetA Rnn be symmetric positive definite, i.e., it satisfiesA= AT, xTAx >0,
for all x = 0 Rn. We have
Theorem 2.4 LetA Rnn be symmetric positive definite. Then there exists a lowertriangular matrixL Rnn with positive diagonal entries such that
A= LLT.
This factorization is called the Cholesky factorization.
Proof: SinceAis positive definite, all the principal submatrices ofAshould be positivedefinite. By Theorem 2.2, there exist a unit lower triangular matrixL and an uppertriangular matrix U such that
A=LU.Let
D= diag(u11, , unn),
U =D1U,
whereuii> 0, for i = 1, , n. Then we haveUTDLT =AT =A =LDU .Therefore, LTU1 =D1 UTLD.
-
8/12/2019 Numerical Linear Algebra Applications Jin
31/196
2.3. CHOLESKY FACTORIZATION 21
We note thatLTU1 is a unit upper triangular matrix and D1 UTLD is a lowertriangular matrix. Hence
LTU1 =I=D1 UTLDwhich impliesU=LT. Thus
A=LDLT.Let
L=Ldiag(u11, , unn).We finally have
A= LLT.
Thus, when a matrix A is symmetric positive definite, we could find the solution ofthe systemAx = b by the following three steps:
(1) Find the Cholesky factorization ofA: A= LLT.
(2) Find solutiony ofLy = b.
(3) Find solutionx ofLTx= y.
From Theorem 2.4, we know that we do not need a pivoting in Cholesky factor-ization. Also we could calculate L directly through a comparison in the correspondingentries between two sides ofA = LLT. We have the following algorithm.
Algorithm 2.4 (Cholesky factorization)
for k= 1 :n
A(k, k) =
A(k, k)A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)for j = k+ 1 :n
A(j : n, j) =A(j : n, j) A(j : n, k)A(j, k)end
end
The operation cost of Cholesky factorization is n3
/3.
-
8/12/2019 Numerical Linear Algebra Applications Jin
32/196
22 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
Exercises:
1. Let S, T Rnn be upper triangular matrices such that
(ST I)x= b
is a nonsingular system. Find an algorithm ofO(n2) operations for computingx.
2. Show that theLDLT factorization of a symmetric positive definite matrix A is unique.
3. LetA Rnn be symmetric positive definite. Find an algorithm for computing an uppertriangular matrix U Rnn such that A = U UT.
4. Let A = [aij ] Rnn be strictly diagonally dominant matrix, i.e.,
|akk | >n
j=1j=k|akj |, k= 1, 2, , n.
Prove that a strictly diagonally dominant matrix is nonsingular, and a strictly diagonallydominant symmetric matrix with positive diagonal entries is positive definite.
5. Let
A=
A11 A12A21 A22
with A11 being ak-by-k nonsingular matrix. Then
S= A22 A21A111A12is called the Schur complement ofA11 in A. Show that after k steps of Gaussian elimi-
nation without pivoting, A(k1)22 =S.
6. Let A be a symmetric positive definite matrix. At the end of the first step of Gaussianelimination, we have
a11 aT1
0 A22
.
Prove thatA22 is also symmetric positive definite.
7. Let A = [aij ] Rnn be a strictly diagonally dominant matrix. After one step ofGaussian elimination, we have
a11 a
T1
0 A22 .
Show thatA22 is also strictly diagonally dominant.
8. Show that ifP AQ= LUis obtained via Gaussian elimination with pivoting, then |uii| |uij |, forj = i + 1, , n.
9. Let H=A + iB be a Hermitian positive definite matrix, where A, B Rnn.
-
8/12/2019 Numerical Linear Algebra Applications Jin
33/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
34/196
24 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
-
8/12/2019 Numerical Linear Algebra Applications Jin
35/196
Chapter 3
Perturbation and Error Analysis
In this chapter, we will discuss effects of perturbation and error on numerical solutions.The error analysis on floating point operations and on partial pivoting technique is alsogiven. It is well-known that the essential notions of distance and size in linear vectorspaces are captured by norms. We therefore need to introduce vector and matrix normsand study their properties before we develop our perturbation and error analysis.
3.1 Vector and matrix norms
We first introduce vector norms.
3.1.1 Vector norms
Letx= (x1, x2, , xn)T Rn.
Definition 3.1 A vector norm onRn is a function that assigns to eachx Rn a realnumberx, called the norm ofx, such that the following three properties are satisfied
for allx, y Rn and all R:(i)x >0 ifx = 0, andx = 0 if and only ifx= 0;
(ii)x = || x;
(iii)x + y x + y.A useful class of vector norms is the p-norm defined by
xp
ni=1
|xi|p 1
p
25
-
8/12/2019 Numerical Linear Algebra Applications Jin
36/196
26 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
where 1 p. The following p-norms are the most commonly used norms in practice:
x1=ni=1
|xi|, x2 = ni=1
|xi|212
, x = max1in
|xi|.
The Cauchy-Schwarz inequality concerning 2 is given as follows,
|xTy| x2y2
for x, y Rn, which is a special case of the Holder inequality given as follows,
|xTy| xpyq
where 1/p + 1/q= 1.A very important property of vector norms on Rn is that all the vector norms on
Rn are equivalent as the following theorem said, see [35].
Theorem 3.1 If and are two norms onRn, then there exist two positiveconstantsc1 andc2 such that
c1x x c2x
for allx Rn.
For example, ifx Rn, then we have
x2 x1
nx2,
x x2
nxand
x x1 nx.
We remark that for any sequence of vectors
{xk
}wherexk = (x
(k)1 ,
, x
(k)n )T
Rn,
and x = (x1, , xn)T Rn, by Theorem 3.1, one can prove that
limk
xk x = 0 limk
|x(k)i xi| = 0,
for i= 1, , n.
-
8/12/2019 Numerical Linear Algebra Applications Jin
37/196
3.1. VECTOR AND MATRIX NORMS 27
3.1.2 Matrix norms
Let
A= [aij]ni,j=1 R
nn
.We now turn our attention to matrix norms.
Definition 3.2 A matrix norm is a function that assigns to each A Rnn a realnumberA, called the norm ofA, such that the following four properties are satisfied
for allA, B Rnn and all R:(i)A >0 ifA = 0, andA = 0 if and only ifA= 0;
(ii)A = || A;
(iii)A + B A + B;(iv)AB A B.
An important property of matrix norms on Rnn is that all the matrix norms onRnn are equivalent. For the relation between a vector norm and a matrix norm, wehave
Definition 3.3 If a matrix norm M and a vector norm v satisfy
Axv AMxv,
forA Rnn andx Rn, then these norms are called mutually consistent.
For any vector norm v, we can define a matrix norm in the following naturalway:
AM maxx=0
Axvxv = maxxv=1 Axv.
The most important matrix norms are the matrix p-norms induced by the vector p-norms for p = 1, 2, . We have the following theorem.
Theorem 3.2 Let
A= [aij]ni,j=1 Rnn.Then we have
(i)A1 = max1jn
ni=1
|aij|.
-
8/12/2019 Numerical Linear Algebra Applications Jin
38/196
28 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
(ii)A = max1in
nj=1
|aij|.
(iii)A2= max(ATA), wheremax(ATA) is the largest eigenvalue ofATA.Proof: We only give the proof of (i) and (iii). In the following, we always assume thatA = 0.
For (i), we partition the matrix A by columns:
A= [a1, , an].
Let
= aj01 = max1jn
aj1.
Then for any vector x Rn which satisfiesx1=ni=1
|xi| = 1, we have
Ax1 = nj=1
xjaj
1
n
j=1|xj | aj1
(n
j=1|xj|) max
1jnaj1
= aj01= .Letej0 denote the j0-th unit vector and then
Aej01= aj01 = .
Therefore
A1= maxx1=1
Ax1 = = max1jn
aj1 = max1jn
ni=1
|aij|.
For (iii), we have
A2 = maxx2=1
Ax2= maxx2=1
[(Ax)T(Ax)]1/2
= maxx2=1[xT(ATA)x]1/2.
SinceATAis positive semi-definite, its eigenvalues can be assumed to be in the followingorder:
1 2 n 0.
-
8/12/2019 Numerical Linear Algebra Applications Jin
39/196
3.1. VECTOR AND MATRIX NORMS 29
Letv1, v2, , vn Rn
denote the orthonormal eigenvectors corresponding to 1, 2, , n, respectively. Thenfor any vector x Rn withx2= 1, we have
x=ni=1
ivi,ni=1
2i = 1.
Therefore,
xTATAx=ni=1
i2i 1.
On the other hand, letx = v1, we have
xT
AT
Ax= vT1A
T
Av1 = vT11v1= 1.
Thus
A2= maxx2=1
Ax2=
1=
max(ATA).
We have the following theorem for the norm 2.
Theorem 3.3 LetA Rnn. Then we have
(i)A2= maxx2=1
maxy2=1
|yAx|, wherex, y Cn.
(ii)AT2= A2=ATA2.
(iii)A2= QAZ2, for any orthogonal matricesQ andZ. We recall that a matrixM Rnn is called orthogonal ifM1 =MT.
Proof: We only prove (i). We first introduce the dual norm D of a vector norm defined as follows,
yD = maxx=1
|yx|.
For 2, we have by the Cauchy-Schwarz inequality,
|yx| y2x2with equality when x = 1y2 y. Therefore, the dual norm of 2 is given by
yD2 = maxx2=1
|yx| = maxx2=1
y2x2= y2.
-
8/12/2019 Numerical Linear Algebra Applications Jin
40/196
30 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
So, 2 is its own dual. Now, we consider
A
2 = max
x2=1 Ax
2 = max
x2=1 Ax
D2
= maxx2=1
maxy2=1
|(Ax)y|
= maxx2=1
maxy2=1
|yAx|.
Another useful norm is the Frobenius norm which is defined by
A
F
n
j=1n
i=1 |aij|212
.
One of the most important properties of F is that for any orthogonal matrices Qand Z,
AF = QAZF.In the following, we will extend our discussion on norms to the field ofC. We remark
that from the viewpoint of norms, there is no essential difference between matrices orvectors in the field ofR and matrices or vectors in the field ofC.
Definition 3.4 Let A Cnn. Then the set of all the eigenvalues ofA is called thespectrum ofA and
(A) = max{|| : belongs to the spectrum ofA}
is called the spectral radius ofA.
For the relation between the spectral radius and matrix norms, we have
Theorem 3.4 LetA Cnn. Then(i) For any matrix norm, we have
(A) A.
(ii) For any >0, there exists a norm defined onCnn such that
A (A) + .
-
8/12/2019 Numerical Linear Algebra Applications Jin
41/196
3.1. VECTOR AND MATRIX NORMS 31
Proof: For (i), letx Cn satisfyx = 0, Ax= x, || =(A).
Then we have(A)xeT1 = xeT1 = AxeT1 A xeT1 .
Hence(A) A.
For (ii), by using Theorem 1.1 (Jordan Decomposition Theorem), we know thatthere is a nonsingular matrix X Cnn such that
X1AX=
1 1
2 2. . .
. . .
n1 n1n
wherei= 1 or 0. For any given >0, let
D= diag(1, , 2, , n1),
then
D1 X1AXD=
1 12 2
. . . . . .
n1 n1
n
.
Now, defineG = D1 X1GXD, G Cnn.
It is easy to see this matrix norm actually is induced by the vector norm definedas follows:
xXD = (XD)1x, x Cn.Therefore,
A= D1 X1AXD = max1in
(|i| + |i|) (A) + ,
wheren= 0.
We remark that for any sequence of matrices{A(k)} where A(k) = [a(k)ij ] Rnn,and A = [aij] Rnn,
limk
A(k) A = 0 limk
a(k)ij =aij,
-
8/12/2019 Numerical Linear Algebra Applications Jin
42/196
32 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
for i, j= 1, , n.
Theorem 3.5 LetA
Cnn. Then
limk
Ak = 0 (A)< 1.
Proof: We first assume thatlimk
Ak = 0.
Let be an eigenvalue ofA such that (A) =||. Then k is an eigenvalue ofAk foranyk . By Theorem 3.4 (i), we know that for any k,
(A)k = ||k = |k| (Ak) Ak.Therefore,
limk
(A)k = 0
which implies (A)< 1.Conversely, assume that(A)< 1. By Theorem 3.4 (ii), there exists a matrix norm
such thatA
-
8/12/2019 Numerical Linear Algebra Applications Jin
43/196
3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 33
Corollary 3.1 Letbe a norm defined onCnn withI = 1 andA Cnn satisfyA
-
8/12/2019 Numerical Linear Algebra Applications Jin
44/196
34 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Obviously, the condition number depends on the matrix norm used. When (A) issmall, thenA is said to be well-conditioned, whereas if(A) is large, then A is said tobe ill-conditioned. Note that for any p-norm, we have
1 = I = A A1 A A1 =(A).
Let x be an approximation of the exact solution x ofAx = b. The error vector isdefined as follows,
e= x x,i.e.,
x= x + e. (3.2)
The absolute error is given by
e = x xfor any vector norm. Ifx = 0, then the relative error is defined by
ex =
x xx .
We have by substituting (3.2) into Ax = b,
A(x + e) =Ax + Ae= b.
Therefore,
Ax= b Ae=b.The xis the exact solution ofAx= bwherebis a perturbed vector ofb. Sincex= A1band x= A1b, we have
x x = A1(b b) A1 b b. (3.3)
Similarly,
b = Ax A x,i.e.,
1
xA
b . (3.4)Combining (3.3), (3.4) and (3.1), we obtain the following theorem which gives the effectof perturbations of the vector b on the solution of Ax = b in terms of the conditionnumber.
-
8/12/2019 Numerical Linear Algebra Applications Jin
45/196
3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 35
Theorem 3.7 Let x be an approximate solution of the exact solution x of Ax = b.Then
x
x
x (A)b
b
b .
The next theorem includes the effect of perturbations of the coefficient matrix Aon the solution ofAx = b in terms of the condition number.
Theorem 3.8 LetA be a nonsingular matrix andA be a perturbed matrix ofA suchthat
A A A1
-
8/12/2019 Numerical Linear Algebra Applications Jin
46/196
36 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
we getx x
x
(1 A1E)1
A1E + (A)
b
.
By using
A1E A1 E =(A)EA ,
we finally havex x
x (A)
1 (A) EA
EA +
b
.
Theorems 3.7 and 3.8 give upper bounds for the relative error of x in terms ofthe condition number ofA. From Theorems 3.7 and 3.8, we know that if A is well-
conditioned, i.e.,(A) is small, the relative error in x will be small if the relative errorsin both A and b are small.
Corollary 3.2 Let be any matrix norm withI = 1 and A be a nonsingularmatrix withA +A being a perturbed matrix ofA such that
A1A
-
8/12/2019 Numerical Linear Algebra Applications Jin
47/196
3.3. ERROR ANALYSIS ON FLOATING POINT ARITHMETIC 37
By using identity
B1 =A1 B1(B A)A1,
we have,(A +A)1 A1 = (A +A)1AA1.
Then
(A +A)1 A1 A1 A (A +A)1 A12 A1 r .
Finally, we obtain
(A +A)1 A1A1
A1 A1 r
A1 A1 A1
A
(A)
1 (A) AA
AA .
3.3 Error analysis on floating point arithmetic
In computers, the floating point numbers fare expressed as
f= J, L J U,
where is the base, J is the order, and is the fraction. Usually, has the followingform:
= 0.d1d2
dt
wheret is the length (precision) of , d1= 0, and 0 di< , for i = 2, , t.Let
F= {0} {f :f = J, 0 di< , d1= 0, L J U}.ThenFcontains
2( 1)t1(U L + 1) + 1floating point numbers. These numbers are symmetrically distributed in the intervals[m, M] and [M, m], where
m= L1, M =U(1 t). (3.5)
We remark thatF is only a finite set which cannot contain all the real numbers inthese two intervals.
Letf l(x) denote the floating point number of any real number x. Then
f l(x) = 0, forx = 0.
-
8/12/2019 Numerical Linear Algebra Applications Jin
48/196
38 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Ifm |x| M, by rounding, f l(x) is the minimum of|f l(x) x| = min
fF|f x|.
By chopping, f l(x) is the minimum of
|f l(x) x| = min|f||x|
|f x|.
For example, let = 10, t = 3, L = 0 and U = 2. We consider the floating pointexpression ofx = 5.45627. By rounding, we have f l(x) = 0.546 10. By chopping, wehave f l(x) = 0.545 10. The following theorem gives an estimate of the relative errorof floating point expressions.
Theorem 3.9 Letm
|x
| M, wherem andM are defined by (3.5). Then
f l(x) =x(1 + ), || u,whereu is the machine precision, i.e.,
u=
12
1t, by rounding,
1t, by chopping.
Proof: In the following, we assume that x= 0 and x > 0. Let be an integer andsatisfy
1 x < . (3.6)Since the order of floating point numbers in [1, ) is , all the numbers
0.d1d2 dt
are distributed in the interval with distancet. For the rounding error, by (3.6), wehave
|f l(x) x| 12
t =1
211t 1
2x1t,
i.e.,|f l(x) x|
x 1
21t.
For the chopping error, we have
|f l(x) x| t =11t x1t,i.e.,
|f l(x) x|x
1t.
-
8/12/2019 Numerical Linear Algebra Applications Jin
49/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
50/196
40 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Letx = u. By the left inequality of (3.9), we have
(1 + u)n enu. (3.10)Letx = nu. By the right inequality of (3.9), we have
enu 1 + 1.01nu. (3.11)Combining (3.10) and (3.11), we have
(1 + u)n 1 + 1.01nu. (3.12)By (3.7), (3.8) and (3.12), the proof is complete.
We consider the following example.
Example 3.1. For givenx, y
Rn, estimate the upper bound of
|f l(xTy) xTy|.Let
Sk =f l ki=1
xiyi
.
By Theorem 3.10, we have
S1 = x1y1(1 + 1), |1| u,and
Sk = f l(Sk1+ f l(xkyk))
= [Sk1+ xkyk(1 + k)](1 + k), |k|,|k| u.Therefore,
f l(xTy) = Sn=ni=1
xiyi(1 + i)n
j=i(1 + j)
=ni=1
(1 + i)xiyi,
where
1 + i= (1 + i)n
j=i(1 + j)with1= 0. Thus, ifnu 0.01, we then have by Theorem 3.11,
|f l(xTy) xTy| ni=1
|i| |xiyi| 1.01nuni=1
|xiyi|.
-
8/12/2019 Numerical Linear Algebra Applications Jin
51/196
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 41
Before we finish this section, let us briefly discuss the floating point analysis onelementary matrix operations. We first introduce the following notations:
|E| = [|eij|],
whereE= [eij] Rnn and
|E| |F||eij| |fij|
for i, j = 1, 2, , n. Let A, B Rnn be matrices with entries inF, and F. ByTheorem 3.10, we have
f l(A) =A + E, |E| u|A|,
andf l(A + B) = (A + B) + E, |E| u|A + B|.
From Example 3.1, we also have
f l(AB) =AB + E, |E| 1.01nu|A| |B|.
Note that |A| |B| maybe is much larger than |AB|. Therefore the relative error ofABmay not be small.
3.4 Error analysis on partial pivoting
We will show that if Gaussian elimination with partial pivoting is used to solve Ax = b,then the computational solution xsatisfies
(A + E)x= b,
where E is an error matrix. An upper bound ofE is also given. We first study therounding error of the LU factorization ofA.
Lemma 3.1 Let A Rnn with floating point entries. Assume that A has an LUfactorization and6nu
1 whereu is the machine precision. Then by using Gaussian
elimination, we have LU=A + Ewhere
|E| 3nu(|A| + |L| |U|).
-
8/12/2019 Numerical Linear Algebra Applications Jin
52/196
42 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Proof: We use induction onn. Obviously, Lemma 3.1 is true for n = 1. Assume thatthe lemma holds for n 1. Now, we consider a matrix A Rnn:
A= wT
v A1
,
whereA1 R(n1)(n1). At the first step of Gaussian elimination, we should computethe vector l1= f l(v/) and modify the matrix A1 as
A1= f l(A1 f l(l1wT)).By Theorem 3.10, we have
l1 = v/ + f, |f| u|| |v| (3.13)
and A1= A1 l1wT + F, |F| (2 + u)u(|A1| + |l1| |w|T). (3.14)ForA1, by using the assumption, we obtain an LU factorization with a unit lowertriangular matrixL1 and an upper triangular matrixU1 such that
L1 U1=A1+ E1where
|E1| 3(n 1)u(| A1| + |L1| |U1|).Thus, we have
LU = 1 0l1L1
wT
0 U1
= A + E,
where
E= 0 0f E1+ F .By using (3.14), we obtain
| A1| (1 + 2u + u2)(|A1| + |l1| |w|T).
-
8/12/2019 Numerical Linear Algebra Applications Jin
53/196
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 43
Therefore, by using the condition 6nu 1, we have
|E1+ F
| |E1
|+
|F
| 3(n 1)u(| A1| + |L1| |U1|) + (2 + u)u(|A1| + |l1| |w|T) 3(n 1)u
(1 + 2u + u2)(|A1| + |l1| |w|T) + |L1| |U1|
+(2 + u)u(|A1| + |l1| |w|T)
u
3n 1 + [6n + 3(n 1)u 5]u
(|A1| + |l1| |w|T)
+3(n
1)u(
|L1| |U1|) 3nu(|A1| + |l1| |w|T + |L1| |U1|).
Combining with (3.13), we obtain
|E| =
0 0
|||f| |E1+ F|
3nu
0 0
|v| |A1| + |l1| |w|T + |
L1| |
U1|
3nu|| |w|T|v| |A1| + 1 0|l1| |L1| || |w|T0 |U1| = 3nu(|A| + |L| |U|).
The proof is complete.
Corollary 3.3 LetA Rnn be nonsingular with floating point entries and6nu 1.Assume that by using Gaussian elimination with partial pivoting, we obtain
LU =P A + EwhereL= [lij] is a unit lower triangular matrix with|lij| 1,Uis an upper triangularmatrix andP is a permutation matrix. ThenEsatisfies the following inequality:
|E| 3nu(|P A| + |L| |U|).
-
8/12/2019 Numerical Linear Algebra Applications Jin
54/196
44 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
After we obtain the LU factorization ofA, the problem of solving Ax = b becomesthe problem of solving the following two triangular systems:
Ly= P b, U x= y.Therefore, we need to estimate the rounding error of solving triangular systems.
Lemma 3.2 Let S Rnn be a nonsingular triangular matrix with floating pointentries and 1.01nu 0.01. By using the method proposed in Section 2.1.1 to solveSx = b, we then obtain a computational solutionx which satisfies
(S+ H)x= b,
where
|H| 1.01nu|S|.Proof: We use induction on n. Without loss of generality, let S = L be a lowertriangular matrix. Obviously, Lemma 3.2 is true for n = 1. Assume that the lemmais true for n 1. Now, we consider a lower triangular matrix L Rnn. Let x be thecomputational solution ofLx= b and we partition L, b and x as follows:
L=
l11 0
l1 L1
, b=
b1
c
, x=
x1
y
,
wherec, y Rn1 andL1 R(n1)(n1). By Theorem 3.10, we have
x1 = f l(b1/l11) = b1l11(1 + 1), |1| u. (3.15)
Note that y is the computational solution of the (n 1)-by-(n 1) system
L1y= f l(c x1l1).By assumption, we have
(L1+ H1)y= f l(c x1l1)where
|H1| 1.01(n 1)u|L1|. (3.16)
By Theorem 3.10 again, we obtain
f l(c x1l1) =f l(c f l(x1l1)) = (I+ D)1(c x1l1 x1Dl1),
whereD= diag(2, , n), D = diag(2, , n)
-
8/12/2019 Numerical Linear Algebra Applications Jin
55/196
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 45
with|i|,|i| u, i= 2, , n.
Therefore, x1l1+ x1Dl1+ (I+ D)(L1+ H1)y= c,
and then(L + H)x= b,
where
H=
1l11 0Dl1 H1+ D(L1+ H1)
.
By using (3.15), (3.16) and the condition 1.01nu 0.01, we have
|H
| |1| |l11| 0|D| |l1| |H1| + |D|(|L1| + |H1|)
u|l11| 0u|l1| |H1| + u(|L1| + |H1|)
u|l11| 0
|l1| [1.01(n 1) + 1 + 1.01(n 1)u]|L1|
1.01nu|L|.
We then have the main theorem of this section.
Theorem 3.12 LetA Rnn be a nonsingular matrix with floating point entries and1.01nu0.01. If Gaussian elimination with partial pivoting is used to solveAx = b,we then obtain a computational solutionx which satisfies
(A + A)x= b,
whereA u(3n + 5.04n3)A (3.17)
with the growth factor
1A maxi,j,k |a(k)ij|.
Proof: By using Gaussian elimination with partial pivoting, we have the following twotriangular systems: Ly= P b, U x= y.
-
8/12/2019 Numerical Linear Algebra Applications Jin
56/196
46 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
By using Lemma 3.2, the computational solution xshould satisfy
(L + F)(U+ G)x= P b,i.e.,
(LU+ FU+LG + F G)x= P b, (3.18)where
|F| 1.01nu|L|, |G| 1.01nu|U|. (3.19)SubstitutingLU=P A + Einto (3.18), we have
(A + A)x= b,
where
A = P
T
(E+ FU+LG + F G).By using (3.19), Corollary 3.3 and the condition 1.01nu 0.01, we have
|A| PT(3nu|P A| + (3n + 2.04n)u|L| |U|)= nuPT(3|P A| + 5.04|L| |U|). (3.20)
By Corollary 3.3 again, the absolute values of entries inL are less than or equal to 1.Therefore, we have
L n. (3.21)
We now define 1A maxi,j,k |a(k)ij|
and then we have
U nA. (3.22)Substituting (3.21) and (3.22) into (3.20), we have (3.17). The proof is complete.
We remark thatA usually is very small comparing with the initial error fromgiven data. Thus, Gaussian elimination with partial pivoting is numerically stable.
Exercises:1. Let
A=
1 0.999999
0.999999 1
.
ComputeA1, det(A) and the condition number ofA.
-
8/12/2019 Numerical Linear Algebra Applications Jin
57/196
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 47
2. Prove thatABF A2BF andABF AFB2.3. Prove thatA22 A1A for any square matrix A.
4. Show that A11 00 A22 2 A11 A12A21 A22 2 .5. Let A be nonsingular. Show that
A112 = minx2=1
Ax2.
6. Show that ifS is real and S= ST, then I Sis nonsingular and the matrix(I S)1(I+ S)
is orthogonal. This is known as the Cayley transform ofS.
7. Prove that if bothA and A + E are nonsingular, then
(A + E)1 A1A1 (A + E)
1 E.
8. Let A Rnn be nonsingular and let x, y, z Rn such that Ax = b and Ay = b+z.Show that z2
A2 x y2 A12z2.
9. Let A = [aij ] be an m-by-nmatrix. Define
|||A|||l = maxi,j
|aij |.
Is||| |||l a matrix norm? Give a reason for your answer.10. Show that ifX Cnn is nonsingular, then AX = X1AX2 defines a matrix norm.11. Let A = LDLT Rnn be a symmetric positive definite matrix and
D= diag(d11, , dnn).Show that
2(A) maxi
{dii}mini
{dii} .
12. Verify that
xy
F =
xy
2=
x2
y2,
for any x, y Cn.13. Show that if 0 =vRn and E Rnn, thenEI v vT
vTv
2F
= E2FEv22
vTv .
-
8/12/2019 Numerical Linear Algebra Applications Jin
58/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
59/196
Chapter 4
Least Squares Problems
In this chapter, we study linear least squares problems:
minyRn
Ay b2
where the data matrix A Rmn with m n and the observation vector b Rmare given. We introduce some well-known orthogonal transformations and the QRdecomposition for constructing efficient algorithms for these problems. For a literatureon least squares problems, we refer to [15, 21, 42, 44, 45, 48].
4.1 Least squares problems
In practice, if we are given m points t1, t2,
, tmwith data on these points y1, y2,
, ym,
and functions1(t), 2(t), , n(t) defined on these points, we then try to find f(x, t)defined by
f(x, t) n
j=1
xjj(t)
such that residuals defined by
ri(x) yi f(x, ti) =yi n
j=1
xjj(ti), i= 1, 2, , m,
can be as small as possible. In matrix form, we have
r(x) =b Axwhere
A=
1(t1) n(t1)... ...1(tm) n(tm)
,49
-
8/12/2019 Numerical Linear Algebra Applications Jin
60/196
50 CHAPTER 4. LEAST SQUARES PROBLEMS
andb= (y1, , ym)T, x= (x1, , xn)T, r(x) = (r1(x), , rm(x))T.
Whenm= n, we can require that r(x) = 0 and x can be found by solving the systemAx = b. Whenm > n, we require that r(x) can reach its minimum under the norm 2. We therefore introduce the following definition of the least squares problem.
Definition 4.1 LetA Rmn andb Rm. Findx Rn such that
b Ax2= r(x)2= minyRn
r(y)2= minyRn
b Ay2. (4.1)
It is called the least squares (LS) problem andr(x) is called the residual.
In the following, we only consider the case of
rank(A) =n < m.
We first study the solution x of the following equation
Ax= b, A Rmn. (4.2)
The range of matrix A is defined by
R(A) {y Rm :y = Ax, x Rn}.
It is easy to see that
R(A) = span
{a1,
, an
}whereai, i = 1, , n, are column vectors ofA. The nullspace ofA is defined by
N(A) {x Rn :Ax = 0}.
The dimension ofN(A) is denoted by null(A). The orthogonal complement of a sub-spaceS Rn is defined by
S {y Rn :yTx= 0, for all x S}.
We have the following theorems for (4.2).
Theorem 4.1 The equation (4.2) has solutions rank(A) =rank([A, b]).
Theorem 4.2 Let x be a special solution of (4.2). Then the solution set of (4.2) isgiven byx +N(A).
-
8/12/2019 Numerical Linear Algebra Applications Jin
61/196
4.1. LEAST SQUARES PROBLEMS 51
Corollary 4.1 Assume that the equation (4.2) has some solution. The solution isunique null(A) = 0.
We have the following essential theorem for the solution of (4.1).
Theorem 4.3 The LS problem (4.1) always has solutions. The solution is unique ifand only if null(A) = 0.
Proof: SinceRm = R(A) R(A),
the vector b can be expressed uniquely by
b= b1+ b2
where b1 R(A) and b2 R(A). For any x Rn, since b1 Ax R(A) and isorthogonal tob2, we therefore have
r(x)22 = b Ax22= (b1 Ax) + b222
= b1 Ax22+ b222.
Note that r(x)22reaches the minimum if and only ifb1 Ax22reaches the minimum.Since b1 R(A),r(x)22 reaches its minimum if and only if
Ax= b1,
i.e.,b1 Ax22= 0.
Thus, by Corollary 4.1, we know that the solution ofAx= b1is unique, i.e., the solutionof (4.1) is unique, if and only if
null(A) = 0.
LetX = {x Rn :x is a solution of (4.1)}.
We have
Theorem 4.4 A vectorx X if and only ifATAx= ATb. (4.3)
-
8/12/2019 Numerical Linear Algebra Applications Jin
62/196
52 CHAPTER 4. LEAST SQUARES PROBLEMS
Proof: Letx X. By Theorem 4.3, we know that Ax = b1 where b1 R(A) and
r(x) =b
Ax= b
b1= b2
R(A).
Therefore
ATr(x) =ATb2= 0.
Substitutingr(x) =b Ax into ATr(x) = 0, we obtain (4.3).Conversely, letx Rn satisfy
ATAx= ATb,
then for any y Rn, we have
b
A(x + y)
22 =
b
Ax
22
2yTAT(b
Ax) +
Ay
22
= b Ax22+ Ay22
b Ax22.
Thus, x X.We therefore have the following algorithm for LS problems:
(1) Compute C=ATAandd = ATb.
(2) Find the Cholesky factorization ofC= LLT.
(3) Solve triangular linear systems: Ly= d and LTx= y.
We remark that in computation ofATA, usually, the operation cost isO(n2m), andsome information of matrix A could be lost. For example, we consider
A=
1 1 1 0 00 00 0
.We have
ATA= 1 + 2 1 11 1 + 2 1
1 1 1 + 2
.Assume that = 103 and a 6-digital decimal floating system is used. Then 1 +2 =1 + 106 is rounded off to be 1, which means that ATA is singular!
-
8/12/2019 Numerical Linear Algebra Applications Jin
63/196
4.1. LEAST SQUARES PROBLEMS 53
We note that the solutionx of (4.3) can be expressed as
x= (ATA)1ATb.
If we let
A = (ATA)1AT,
then the LS solutionx could be written as
x= Ab.
Actually, the n-by-m matrix A is the Moore-Penrose generalized inverse ofA, whichis unique, see [14, 17, 42]. In general, we have
Definition 4.2 LetX
Rnm. If it satisfies the following conditions:
AXA= A, XAX =X, (AX)T =AX, (XA)T =X A,
thenX is called the Moore-Penrose generalized inverse ofA and denoted byA.
Now we develop the perturbation analysis of LS problems. Assume that there is aperturbationb onband letx,x +xdenote the solutions of the following LS problems,respectively,
minx
b Ax2, minx
(b + b) Ax2.Then
x= A
b,and
x + x = A(b + b) =Ab
where b= b + b. We have
Theorem 4.5 Letb1 and b1 denote orthogonal projections ofb and bonR(A), respec-tively. Ifb1= 0, then
x2x2 2(A)
b12b12
where2(A) = A2A
2 andb1= b1+ b1.
Proof: Letb2 denote the orthogonal projection ofb onR(A). Thenb= b1+ b2 andATb2 = 0. Note that
Ab= Ab1+ Ab2 = A
b1+ (ATA)1ATb2= A
b1.
-
8/12/2019 Numerical Linear Algebra Applications Jin
64/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
65/196
-
8/12/2019 Numerical Linear Algebra Applications Jin
66/196
56 CHAPTER 4. LEAST SQUARES PROBLEMS
where x1 is the first component of the vector x. Let the coefficient ofx be zero andthen we have the following equation:
1 2(x22 x1)x e122= 0.
Solving this equation for, we have = x2. Substituting it into (4.7), we thereforehave
Hx= x2e1.
We remark that for any vector 0= x Rn, by Theorem 4.8, one can construct aHouseholder matrix H such that the last n 1 components ofHx are zeros. We canuse the following two steps to construct the unit vector ofH:
(1) compute v = x x2e1;(2) compute = v/v2.
Now a natural question is: how to choose the sign in front ofx2? Usually, wechoose
v= x + sign(x1)x2e1,wherex1= 0 is the first component of the vector x, see [38]. Since
H=I 2T =I 2vTv
vvT =I vvT
where = 2/vTv, we only need to compute and v instead of forming . Thus, wehave the following algorithm.
Algorithm 4.1 (Householder transformation)
function:[v, ] =house(x)n= length(x)= x(2 :n)Tx(2 :n)
v(1) =x(1) +sign(x(1))
x(1)2 +
v(2 :n) =x(2 :n)
if = 0= 0
else= 2/(v(1)2 + )
end
-
8/12/2019 Numerical Linear Algebra Applications Jin
67/196
4.3. QR DECOMPOSITION 57
4.2.2 Givens rotation
A Givens rotation is defined as follows:
G(i ,k,) = I+ s(eieTk ekeTi) + (c 1)(eieTi + ekeTk )
=
1 ...
.... . .
... ...
c s ...
... s c
... ...
. . ....
... 1
,
wherec = cos and s = sin . It is easy to prove thatG(i ,k,) is an orthogonal matrix.Letx Rn andy = G(i ,k,)x. We then have
yi= cxi+ sxk, yk = sxi+ cxk, yj =xj, j=i, k.
If we want to make yk = 0, then we only need to take
c= xi
x2i + x2k
, s= xk
x2i + x2k
.
Therefore,
yi=
x2i + x2k, yk = 0.
We remark that for any vector 0= x Rn, one can construct a Givens rotationG(i ,k,) acting onx to make a nonzero component ofx be zero.
4.3 QR decomposition
Let A Rmn and b Rm. By Theorem 3.3 (iii), for any orthogonal matrix Q, wehave
Ax b2 = QT(Ax b)2.Therefore, the LS problem
minx
QTAx QTb2is equivalent to (4.1). We wish that we could find a suitable orthogonal matrix Q suchthat the original LS problem becomes an easier solvable LS problem. We have
-
8/12/2019 Numerical Linear Algebra Applications Jin
68/196
58 CHAPTER 4. LEAST SQUARES PROBLEMS
Theorem 4.9 (QR decomposition) Let A Rmn (m n). Then A has a QRdecomposition:
A= Q R0 , (4.8)whereQ Rmm is an orthogonal matrix andR Rnn is an upper triangular matrixwith nonnegative diagonal entries. The decomposition is unique whenm= n andA isnonsingular.
Proof: We use induction. When n = 1, we note that it is true by using Theorem4.8. Now, we assume that the theorem is true for all the matrices in Rp(n1) with
p n1. Let the first column ofA Rmn be a1. By Theorem 4.8 again, there existsan orthogonal matrix Q1 Rmm such that
QT1 a1=
a1
2e1.
Therefore, we have
QT1 A=
a12 vT0 A1
.
For the matrix A1 R(m1)(n1), we obtain by assumption,
A1= Q2
R2
0
,
whereQ2 R(m1)(m1) is an orthogonal matrix andR2is an upper triangular matrixwith nonnegative diagonal entries. Thus, let
Q= Q1
1 00 Q2
, R=
a12 vT0 R20 0
.ThenQ and R are the matrices satisfying the conditions of the theorem.
When A Rmm is nonsingular, we want to show that the QR decomposition isunique. Let
A= QR =Q RwhereQ,Q Rmm are orthogonal matrices, and R,R Rmm are upper triangularmatrices with nonnegative diagonal entries. Since A is nonsingular, we know that thediagonal entries ofR andR are positive. Therefore, the matricesQTQ=RR1are both orthogonal and upper triangular matrices with positive diagonal entries. ThusQTQ=RR1 =I ,
-
8/12/2019 Numerical Linear Algebra Applications Jin
69/196
4.3. QR DECOMPOSITION 59
i.e.,
Q= Q,
R= R.
A complex version of the QR decomposition is needed later on.
Corollary 4.2 LetA Cmn (m n). ThenA has aQR decomposition:
A= Q
R
0
,
where Q Cmm is a unitary matrix and R Cnn is an upper triangular matrixwith nonnegative diagonal entries. The decomposition is unique whenm= n andA isnonsingular.
Now we use the QR decomposition to solve the LS problem (4.1). Suppose that
A Rmn (m n) has linearly independent columns, b Rm, and A has a QRdecomposition (4.8). LetQ be partitioned as
Q= [Q1Q2] ,
and
QTb=
QT1QT2
b=
c1c2
.
ThenAx b22= QTAx QTb22 = Rx c122+ c222.
Thex is the solution of the LS problem (4.1) if and only if it is the solution ofRx = c1.Note that it is much easier to get the solution of (4.1) by solving Rx = c1 since R isan upper triangular matrix. We have the following algorithm for LS problems:
(1) Compute aQR decomposition ofA.
(2) Compute c1= QT1 b.
(3) Solve the upper triangular systemRx = c1.
Finally, we discuss how to use Householder transformations to compute the QRdecomposition of A. Let m = 7 and n = 5. Assume that we have already foundHouseholder transformations H1 and H2 such that
H2H1A=
0
0 0 + 0 0 + 0 0 + 0 0 + 0 0 +
.
-
8/12/2019 Numerical Linear Algebra Applications Jin
70/196
60 CHAPTER 4. LEAST SQUARES PROBLEMS
Now we construct a Householder transformationH3 R55 such thatH3
+
++++
= 0000
.LetH3= diag(I2,H3). We obtain
H3H2H1A=
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
.
In general, after n such steps, we can reduce the matrix A into the following form,
HnHn1 H1A=
R0
,
where R is an upper triangular matrix with nonnegative diagonal entries. By settingQ= H1 Hn, we obtain
A= Q
R
0
.
Thus, we have the following algorithm.
Algorithm 4.2 (QR decomposition: Householder transformation)
forj = 1 :n[v, ] =house(A(j : m, j))A(j: m, j : n) = (Imj+1 vvT)A(j : m, j :n)ifj < m
A(j+ 1 :m, j) =v(2 :m j+ 1)end
end
We remark that the QR decomposition is not only a basic tool for solving LSproblems but also an important tool for solving some other fundamental problems inNLA.
Exercises:
-
8/12/2019 Numerical Linear Algebra Applications Jin
71/196
4.3. QR DECOMPOSITION 61
1. Let A Rmn have full column rank. Prove thatA +Ealso has full column rank ifEsatisfiesE2 1A2 , where A = (ATA)1AT.
2. Let U = [uij
] be a nonsingular upper triangular matrix. Show that
(U) maxi
|uii|mini
|uii| ,
where(U) = UU1.3. Let A Rmn withm nand have full column rank. Show that
I AAT 0
rx
=
b0
has a solution where x minimizesAx b2.
4. Let x Rn
and P be a Householder transformation such thatP x= x2e1.
LetG12, G23, , Gn1,n be Givens rotations, and letQ= G12G23 Gn1,n.
Suppose Qx = x2e1. Is Pequal to Q? Give a proof or a counterexample.5. Let A Rmn. Show that X=A minimizesAX IF over all X Rnm. What is
the minimum?
6. Let x =
x1x2
C2. Find an algorithm to compute the following unitary matrix
Q= c s
s c
, c R, c2 + |s|2 = 1
such thatQx =
0
.
7. Suppose an m-by-nmatrix A has the form
A=
A1A2
,
whereA1 is an n-by-n nonsingular matrix and A2 is an (m n)-by-n arbitrary matrix.Prove thatA2 A11 2.
8. Consider the following well-known ill-conditioned matrix
A=
1 1 1 0 00 00 0
, || 1.
-
8/12/2019 Numerical Linear Algebra Applications Jin
72/196
62 CHAPTER 4. LEAST SQUARES PROBLEMS
(a) Choose a small such that rank(A) = 3. Then compute2(A) to show that A isill-conditioned.
(b) Find the LS solution withA given as above and b = (3, ,,)T by using
(i) the normalized equation method;
(ii) the QR method.
9. Let A = BCwhere B Cmr and C Crn withr= rank(A) = rank(B) = rank(C).
Show thatA =C(CC)1(BB)1B.
10. Let A = UV Cmn, where U Cmn satisfies UU = I, V Cnn satisfiesVV =Iand is an n-by-n diagonal matrix. Show that
A =VU.
11. Prove thatA = lim
0(AA + I)1A = lim
0A(AA + I)1.
12. Show thatR(A) N(A) = {0}.
13. Let A = [aij ] Cnn be idempotent. Then
R(A) N(A) = Cn, rank(A) =n
i=1aii.
14. Let A Cmn. Prove thatR(AA) = R(AA) = R(A),
R(AA) = R(AA) = R(A) = R(A),
N(AA) =N(AA) =N(A) =N(A),
N(AA) =N(AA) =N(A).
ThereforeAA andAA are orthogonal projectors.
15. Prove Corollary 4.2.
-
8/12/2019 Numerical Linear Algebra Applications Jin
73/196
Chapter 5
Classical Iterative Methods
We study classical iterative methods for the solution of Ax = b. Iterative methods,originally proposed by Gauss in 1823, Liouville in 1837, and Jacobi in 1845, are quitedifferent from direct methods such as Gaussian elimination, see [2].
Direct methods based on an LU factorization ofA become prohibitive in terms ofcomputing time and computer storage if the matrix A is quite large. In some practicalsituation such as the discretization of partial differential equations, the matrix size canbe as large as several hundreds of thousands. For such problems, direct methods becomeimpractical. Furthermore, most large problems are sparse, and usually the sparsity islost during LU factorizations. Therefore, we have to face a very large matrix withmany nonzero entries at the end ofLUfactorizations, and then the storage becomes acrucial issue. For such kind of problems, we can use a class of methods called iterativemethods. In this chapter, we only consider some classical iterative methods.
We remark that the disadvantage with classical iterative methods is that the conver-gence rate maybe is slow or they may even diverge, and a stopping criterion is neededto be found.
5.1 Jacobi and Gauss-Seidel method
5.1.1 Jacobi method
Consider the following linear system
Ax= b
whereA = [aij] Rnn. We can write the matrix A in the following formA= D L U,
whereD= diag(a11, a22, , ann),
63
-
8/12/2019 Numerical Linear Algebra Applications Jin
74/196
64 CHAPTER 5. CLASSICAL ITERATIVE METHODS
L=
0
a21 0
a31
a32 0
... ... . . . . . .
an1 an2 an,n1 0
,and
U =
0 a12 a13 a1n
0 a23 a2n. . .
. . . ...
0 an1,n0
.Then it is easy to see that
x= BJx + g,
whereBJ =D
1(L + U), g= D1b.
The matrix BJis called the Jacobi iteration matrix. The corresponding iteration
xk =BJxk1+ g, k= 1, 2, , (5.1)
is known as the Jacobi method if an initial vector x0 =
x(0)1 , x
(0)2 , , x(0)n
Tis given.
5.1.2 Gauss-Seidel method
In the Jacobi method, to compute the components of the vector
xk+1=
x(k+1)1 , x
(k+1)2 , , x(k+1)n
T,
only the components of the vector xk are used. However, note that to compute x(k+1)i ,
we could use x(k+1)1 , x
(k+1)2 , , x(k+1)i1 , which were already available for us. Thus a
natural modification of the Jacobi method is to rewrite the Jacobi iteration (5.1) in thefollowing form
xk = (D L)1U xk1+ (D L)1b, k= 1, 2, . (5.2)The idea is to use each new component as soon as it is available in the computation ofthe next component. The iteration (5.2) is known as the Gauss-Seidel method.
Note that the matrix D L is a lower triangular matrix with a11, , ann on thediagonal. Because these entries are assumed to be nonzero, the matrix D L is non-singular. The matrix
BGS= (D L)1Uis called the Gauss-Seidel iteration matrix.
-
8/12/2019 Numerical Linear Algebra Applications Jin
75/196
5.2. CONVERGENCE ANALYSIS 65
5.2 Convergence analysis
5.2.1 Convergence theorems
It is often hard to make a good initial approximation x0. Thus, it will be nice to haveconditions that will guarantee the convergence of Jacobi, Gauss-Seidel methods for anyarbitrary choice of the initial approximation.
Both of the Jacobi iteration and the Gauss-Seidel iteration can be expressed by
xk+1= Bxk+ g, k= 0, 1, . (5.3)
For the Jacobi iteration, we have
BJ =D1(L + U), g= D1b;
and for the Gauss-Seidel iteration, we have
BGS= (D L)1U, g= (D L)1b.
The iteration (5.3) is called linear stationary iteration, where B Rnn is called theiteration matrix, g Rn the constant term, and x0 Rn the initial vector. In thefollowing, we give a convergence theorem.
Theorem 5.1 The iteration (5.3) converges with an arbitrary initial guess x0 if andonly if the matrixBk 0 ask .
Proof: Fromx = Bx + g andxk+1= Bxk+ g, we have
x xk+1= B(x xk). (5.4)
Because it is true for any value ofk , we can write
x xk =B(x xk1). (5.5)
Substituting (5.5) into (5.4), we have
x xk+1= B2(x xk1).
Continuing this process k times, we can write
x xk+1= Bk+1(x x0).
This shows that {xk} converges to the solutionxfor any choicex0if and only ifBk 0as k .
-
8/12/2019 Numerical Linear Algebra Applications Jin
76/196
66 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Recall that Bk 0 as k if and only if the spectral radius (B) < 1. Since|i| B, a good way to see whether (B) < 1 is to see whetherB < 1 bycomputing
B
with a row-sum or column-sum norm. Note that the converse is not
true. Combining the result of Theorem 5.1 with the above observation, we have thefollowing theorem.
Theorem 5.2 The iteration (5.3) converges for any choice ofx0 if and only if(B)n
j=1j=i
|aij|, i= 1, 2, ,