numerical linear algebra applications jin

Upload: latec

Post on 03-Jun-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    1/196

    Numerical Linear AlgebraAnd Its Applications

    Xiao-Qing JIN 1 Yi-Min WEI 2

    August 29, 2008

    1Department of Mathematics, University of Macau, Macau, P. R. China.2Department of Mathematics, Fudan University, Shanghai, P.R. China

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    2/196

    2

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    3/196

    i

    To Our Families

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    4/196

    ii

    CONTENTS

    page

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

    Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1 Basic symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Basic problems in NLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Why shall we study numerical methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 Matrix factorizations (decompositions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Perturbation and error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Operation cost and convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Chapter 2 Direct Methods for Linear Systems . . . . . . . . . . . . . . . . . 9

    2.1 Triangular linear systems andLU factorization . . . . . . . . . . . . . . . . . . . . . 9

    2.2 LU factor ization with pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.3 Cholesky factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Chapter 3 Perturbation and Error Analysis . . . . . . . . . . . . . . . . . . . 25

    3.1 Vector and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.2 Perturbation analysis for linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.3 Error analysis on floating point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.4 Error analysis on partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    Chapter 4 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.1 Least squares problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.2 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    5/196

    iii

    4.3 QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    Chapter 5 Classical Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.1 Jacobi and Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.3 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    5.4 SOR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    Chapter 6 Krylov Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    6.1 Steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    6.2 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    6.3 Practical CG method and convergence analysis . . . . . . . . . . . . . . . . . . . . 92

    6.4 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    6.5 GMRES method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    Chapter 7 Nonsymmetric Eigenvalue Problems . . . . . . . . . . . . . . . 1117.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    7.2 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    7.3 Inverse power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    7.4 QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    7.5 Real version ofQR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    Chapter 8 Symmetric Eigenvalue Problems . . . . . . . . . . . . . . . . . . . 1318.1 Basic spectral properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    8.2 Symmetric QR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

    8.3 Jacobi method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    6/196

    iv

    8.4 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    8.5 Divide-and-conquer method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 46

    Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    Chapter 9 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    9.2 Background of BVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    9.3 Strang-type preconditioner for ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    9.4 Strang-type preconditioner for DDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

    9.5 Strang-type preconditioner for NDDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    9.6 Strang-type preconditioner for SPDDEs .. . . . . . . . . . . . . . . . . . . . . . . . . 177

    Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 8 5

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    7/196

    v

    Preface

    Numerical linear algebra, also called matrix computation, has been a center of sci-

    entific and engineering computing since 1946, the first modern computer was born.Most of problems in science and engineering finally become problems in matrix compu-tation. Therefore, it is important for us to study numerical linear algebra. This bookgives an elementary introduction to matrix computation and it also includes some newresults obtained in recent years. In the beginning of this book, we first give an outlineof numerical linear algebra in Chapter 1.

    In Chapter 2, we introduce Gaussian elimination, a basic direct method, for solvinggeneral linear systems. Usually, Gaussian elimination is used for solving a dense linearsystem with median size and no special structure. The operation cost of Gaussianelimination is O(n3) where n is the size of the system. The pivoting technique is also

    studied.In Chapter 3, in order to discuss effects of perturbation and error on numerical

    solutions, we introduce vector and matrix norms and study their properties. The erroranalysis on floating point operations and on partial pivoting technique is also given.

    In Chapter 4, linear least squares problems are studied. We will concentrate onthe problem of finding the least squares solution of an overdetermined linear systemAx = b where A has more rows than columns. Some orthogonal transformations andthe QR decomposition are used to design efficient algorithms for solving least squaresproblems.

    We study classical iterative methods for the solution of Ax = b in Chapter 5.Iterative methods are quite different from direct methods such as Gaussian elimination.Direct methods based on an LUfactorization of the matrix A are prohibitive in termsof computing time and computer storage if A is quite large. Usually, in most largeproblems, the matrices are sparse. The sparsity may be lost during the LU factorizationprocedure and then at the end ofLU factorization, the storage becomes a crucial issue.For such kind of problem, we can use a class of methods called iterative methods. Weonly consider some classical iterative methods in this chapter.

    In Chapter 6, we introduce another class of iterative methods called Krylov sub-space methods proposed recently. We will only study two versions among those Krylovsubspace methods: the conjugate gradient (CG) method and the generalized mini-

    mum residual (GMRES) method. The CG method proposed in 1952 is one of the bestknown iterative method for solving symmetric positive definite linear systems. TheGMRES method was proposed in 1986 for solving nonsymmetric linear systems. Thepreconditioning technique is also studied.

    Eigenvalue problems are particularly interesting in scientific computing. In Chapter

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    8/196

    vi

    7, nonsymmetric eigenvalue problems are studied. We introduce some well-knownmethods such as the power method, the inverse power method and the QR method.

    The symmetric eigenvalue problem with its nice properties and rich mathematicaltheory is one of the most interesting topics in numerical linear algebra. In Chapter 8,we will study this topic. The symmetric QR iteration method, the Jacobi method, thebisection method and a divide-and-conquer technique will be discussed in this chapter.

    In Chapter 9, we will briefly survey some of the latest developments in using bound-ary value methods for solving systems of ordinary differential equations with initialvalues. These methods require the solutions of one or more nonsymmetric, large andsparse linear systems. Therefore, we will use the GMRES method in Chapter 6 withsome preconditioners for solving these linear systems. One of the main results is thatif an A1,2-stable boundary value method is used for an m-by-m system of ODEs,then the preconditioned matrix can be decomposed as I+L where I is the identitymatrix and the rank of L is at most 2m(1+ 2). It follows that when the GMRESmethod is applied to the preconditioned system, the method will converge in at most2m(1 + 2) + 1 iterations. Applications to different delay differential equations are alsogiven.

    If any other mathematical topic is as fundamental to the mathematicalsciences as calculus and differential equations, it is numerical linear algebra. L. Trefethen and D. Bau III

    Acknowledgments: We would like to thank Professor Raymond H. F. Chan of

    the Department of Mathematics, Chinese University of Hong Kong, for his constantencouragement, long-standing friendship, financial support; Professor Z. H. Cao of theDepartment of Mathematics, Fudan University, for his many helpful discussions anduseful suggestions. We also would like to thank our friend Professor Z. C. Shi forhis encouraging support and valuable comments. Of course, special appreciation goesto two important institutions in the authors life: University of Macau and FudanUniversity for providing a wonderful intellectual atmosphere for writing this book.Most of the writing was done during evenings, weekends and holidays. Finally, thanksare also due to our families for their endless love, understanding, encouragement andsupport essential to the completion of this book. The most heartfelt thanks to all ofthem!

    The publication of the book is supported in part by the research grants No.RG024/01-02S/JXQ/FST, No.RG031/02-03S/JXQ/FST and No.RG064/03-04S/JXQ/FST fromUniversity of Macau; the research grant No.10471027 from the National Natural ScienceFoundation of China and some financial support from Shanghai Education Committeeand Fudan University.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    9/196

    vii

    Authors words on the corrected and revised second printing: In its secondprinting, we corrected some minor mathematical and typographical mistakes in thefirst printing of the book. We would like to thank all those people who pointed these

    out to us. Additional comments and some revision have been made in Chapter 7.The references have been updated. More exercises are also to be found in the book.The second printing of the book is supported by the research grant No.RG081/04-05S/JXQ/FST.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    10/196

    viii

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    11/196

    Chapter 1

    Introduction

    Numerical linear algebra (NLA) is also called matrix computation. It has been a centerof scientific and engineering computing since the first modern computer came to thisworld around 1946. Most of problems in science and engineering are finally transferredinto problems in NLA. Thus, it is very important for us to study NLA. This book givesan elementary introduction to NLA and it also includes some new results obtained inrecent years.

    1.1 Basic symbols

    We will use the following symbols throughout this book.

    LetR

    denote the set of real numbers,C

    denote the set of complex numbers andi 1. Let Rn denote the set of realn-vectors and Cn denote the set of complexn-vectors.

    Vectors will almost always be column vectors.

    Let Rmn denote the linear vector space ofm-by-nreal matrices and Cmn denotethe linear vector space ofm-by-n complex matrices.

    We will use the upper case letters such as A, B, C, and , etc, to denotematrices and use the lower case letters such as x, y , z , etc, to denote vectors.

    The symbol aij will denote the ij -th entry in a matrix A.

    The symbolAT will denote the transpose of the matrix A and A will denote theconjugate transpose of the matrix A.

    Let a1, , am Rn (or Cn). We will use span{a1, , am} to denote the linearvector space of all the linear combinations ofa1, , am.

    1

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    12/196

    2 CHAPTER 1. INTRODUCTION

    Let rank(A) denote the rank of the matrix A.

    Let dim(S) denote the dimension of the vector space S.

    We will use det(A) to denote the determinant of the matrix A and use diag(a11, , ann)to denote the n-by-n diagonal matrix:

    diag(a11, , ann) =

    a11 0 0

    0 a22. . .

    ......

    . . . . . . 0

    0 0 ann

    .

    For matrixA = [aij], the symbol |A| will denote the matrix with entries (|A|)ij =|aij|.

    The symbol Iwill denote the identity matrix, i.e.,

    I=

    1 0 00 1

    . . . ...

    ... . . .

    . . . 00 0 1

    ,

    andei will denote the i-th unit vector, i.e., the i-th column vector ofI.

    We will use to denote a norm of matrix or vector. The symbols 1, 2and will denote the p-norm withp = 1, 2, , respectively.

    As in MATLAB, in algorithms,A(i, j) will denote the (i, j)-th entry of matrixA;A(i, :) andA(:, j) will denote thei-th row and thej-th column ofA, respectively;A(i1: i2, k) will express the column vector constructed by using entries from thei1-th entry to thei2-th entry in thek-th column ofA;A(k, j1 : j2) will express therow vector constructed by using entries from the j1-th entry to the j2-th entryin the k-th row of A; A(k : l, p : q) will denote the (l k+ 1)-by-(qp+ 1)submatrix constructed by using the rows from the k-th row to the l-th row andthe columns from the p-th column to the q-th column.

    1.2 Basic problems in NLA

    NLA includes the following three main important problems which will be studied inthis book:

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    13/196

    1.3. WHY SHALL WE STUDY NUMERICAL METHODS? 3

    (1) Find the solution of linear systems

    Ax= b

    whereA is ann-by-n nonsingular matrix and b is an n-vector.

    (2) Linear least squares problems: For anym-by-nmatrixA and anm-vectorb, findann-vectorx such that

    Ax b2= minyRn

    Ay b2.

    (3) Eigenvalues problems: For any n-by-n matrix A, find a part (or all) of its eigen-values and corresponding eigenvectors. We remark here that a complex numberis called an eigenvalue ofAif there exists a nonzero vector x Cn such that

    Ax= x,

    wherex is called the eigenvector ofA associated with .

    Besides these main problems, there are many other fundamental problems in NLA,for instance, total least squares problems, matrix equations, generalized inverses, in-verse problems of eigenvalues, and singular value problems, etc.

    1.3 Why shall we study numerical methods?

    To answer this question, let us consider the following linear system,

    Ax= b

    where A is an n-by-n nonsingular matrix and x = (x1, x2, , xn)T. If we use thewell-known Cramer rule, then we have the following solution:

    x1=det(A1)

    det(A), x2=

    det(A2)

    det(A), , xn= det(An)

    det(A) ,

    whereAi, for i = 1, 2, , n, are matrices with the i-th column replaced by the vectorb. Then we should compute n+ 1 determinants det(Ai), i = 1, 2, , n, and det(A).

    There are [n!(n 1)](n + 1) = (n 1)(n + 1)!multiplications. Whenn = 25, by using a computer with 10 billion operations/sec., weneed

    24 26!1010 3600 24 365 30.6 billion years.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    14/196

    4 CHAPTER 1. INTRODUCTION

    If one uses Gaussian elimination, it requires

    n

    i=1

    (i 1)(i + 1) =n

    i=1

    i2

    n=1

    6 n(n + 1)(2n + 1) n= O(n3

    )

    multiplications. Then less than 1 second, we could solve 25-by-25 linear systems byusing the same computer. From above discussions, we note that for solving the sameproblem by using different numerical methods, the results are much different. There-fore, it is essential for us to study the properties of numerical methods.

    1.4 Matrix factorizations (decompositions)

    For any linear system Ax = b, if we can factorize (decompose) A as A = LU where L

    is a lower triangular matrix and Uis an upper triangular matrix, then we haveLy= b

    U x= y.(1.1)

    By substituting, we can easily solve (1.1) and then Ax= b. Therefore, matrix factor-izations (decompositions) are very important tools in NLA. The following theorem isbasic and useful in linear algebra, see [17].

    Theorem 1.1 (Jordan Decomposition Theorem) IfACnn, then there exists

    a nonsingular matrixX Cnn such that

    X1AX=J diag(J1, J2, , Jp),

    orA= X JX1, whereJis called the Jordan canonical form ofA and

    Ji=

    i 1 0 00 i 1

    . . . ...

    ... 0 . . .

    . . . 0...

    . . . . . . 1

    0 0 i

    Cnini ,

    fori = 1, 2, , p, are called Jordan blocks withn1 + +np= n. The Jordan canonicalform ofA is unique up to the permutation of diagonal Jordan blocks. IfA Rnn withonly real eigenvalues, then the matrixXcan be taken to be real.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    15/196

    1.5. PERTURBATION AND ERROR ANALYSIS 5

    1.5 Perturbation and error analysis

    The solutions provided by numerical algorithms are seldom absolutely correct. Usu-

    ally, there are two kinds of errors. First, errors appear in input data caused by priorcomputations or measurements. Second, there may be errors caused by algorithmsthemselves because of approximations made within algorithms. Thus, we need to carryout a perturbation and error analysis.

    (1) Perturbation.

    For a given x, we want to compute the value of function f(x). Suppose thereis a perturbation x ofx and|x|/|x| is very small. We want to find a positivenumber c(x) as small as possible such that

    |f(x + x) f(x)|

    |f(x)

    | c(x) |x|

    |x|.

    Thenc(x) is called the condition number off(x) atx. Ifc(x) is large, we say thatthe functionfis ill-conditioned at x; ifc(x) is small, we say that the function fis well-conditioned at x.

    Remark: A computational problem being ill-conditioned or not has no relationwith numerical methods that we used.

    (2) Error.

    By using some numerical methods, we calculate the value of a function f at apoint x and we obtain y. Because of the rounding error (or chopping error),usually

    y=f(x).If there exists x such that

    y= f(x + x), |x| |x|,where is a positive constant having a closed relation with numerical methodsand computers used, then we say that the method is stable if is small; themethod is unstable if is large.

    Remark: A numerical method being stable or not has no relation with computa-tional problems that we faced.

    With the perturbation and error analysis, we obtain|y f(x)||f(x)| =

    |f(x + x) f(x)||f(x)| c(x)

    |x||x| c(x).

    Therefore, whether a numerical result is accurate depends on both the stability of thenumerical method and the condition number of the computational problem.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    16/196

    6 CHAPTER 1. INTRODUCTION

    1.6 Operation cost and convergence rate

    Usually, numerical algorithms are divided into two classes:(i) direct methods;

    (ii) iterative methods.

    By using direct methods, one can obtain an accurate solution of computational prob-lems within finite steps in exact arithmetic. By using iterative methods, one can onlyobtain an approximate solution of computational problems within finite steps.

    The operation cost is an important measurement of algorithms. The operationcost of an algorithm is the total operations of +, ,, used in the algorithm.We remark that the speed of algorithms is only partially depending on the operationcost. In modern computers, the speed of operations is much faster than that of datatransfer. Therefore, sometimes, the speed of an algorithm is mainly depending on thetotal amount of data transfers.

    For direct methods, usually, we use the operation cost as a main measurement ofthe speed of algorithms. For iterative methods, we need to consider

    (i) operation cost in each iteration;

    (ii) convergence rate of the method.

    For a sequence{xk} provided by an iterative algorithm, if{xk} x, the exactsolution, and if{xk} satisfies

    xk x cxk1 x, k= 1, 2, ,where 0< c

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    17/196

    1.6. OPERATION COST AND CONVERGENCE RATE 7

    3. Let

    A=

    A11 A12A21 A22

    ,

    whereAij , fori, j = 1, 2, are square matrices with det(A11) = 0, and satisfyA11A21= A21A11.

    Thendet(A) = det(A11A22 A21A12).

    4. Show that det(I uv) = 1 vu where u, v Cm are column vectors.5. Prove Hadamards inequality forA Cnn:

    |det(A)| n

    j=1

    aj2,

    whereaj =A(:, j) andaj2= ni=1

    |A(i, j)|2 1

    2

    . When does the equality hold?

    6. Let B be nilpotent, i.e., there exists an integer k > 0 such that Bk = 0. Show that ifAB= BA, then

    det(A + B) = det(A).

    7. Let A be an m-by-n matrix and B be an n-by-m matrix. Show that the matrices AB 0

    B 0

    and

    0 0B BA

    are similar. Conclude that the nonzero eigenvalues ofAB are the same as those ofBA,

    anddet(Im+ AB) = det(In+ BA).

    8. A matrix M Cnn is Hermitian positive definite if it satisfiesM=M, xM x >0,

    for all x = 0 Cn. LetA and B be Hermitian positive definite matrices.

    (1) Show that the matrix productAB has positive eigenvalues.

    (2) Show thatAB is Hermitian if and only ifA and B commute.

    9. Show that any matrixA

    Cnn can be written uniquely in the form

    A= B + iC,

    whereB and Care Hermitian.

    10. Show that if A is skew-Hermitian, i.e., A =A, then all its eigenvalues lie on theimaginary axis.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    18/196

    8 CHAPTER 1. INTRODUCTION

    11. Let

    A=

    A11 A12A21 A22

    .

    Assume that A11, A22 are square, and A11, A22 A21A111A12 are nonsingular. Let

    B=

    B11 B12B21 B22

    be the inverse ofA. Show that

    B22= (A22 A21A111A12)1, B12= A111A12B22,

    B21= B22A21A111, B11= A111 B12A21A111.

    12. Suppose thatA and B are Hermitian with A being positive definite. Show that A+Bis positive definite if and only if all the eigenvalues ofA1B are greater than

    1.

    13. Let A be idempotent, i.e., A2 =A. Show that each eigenvalue ofA is either 0 or 1.

    14. Let A be a matrix with all entries equal to one. Show that A can be written as A =eeT, where eT = (1, 1, , 1), and A is positive semi-definite. Find the eigenvalues andeigenvectors ofA.

    15. Prove that any matrix A Cnn has a polar decomposition A = HQ, where H isHermitian positive semi-definite and Q is unitary. We recall that M Cnn is a unitarymatrix ifM1 =M. Moreover, ifAis nonsingular, thenHis Hermitian positive definiteand the polar decomposition ofA is unique.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    19/196

    Chapter 2

    Direct Methods for LinearSystems

    The problem of solving linear systems is central in NLA. For solving linear systems, ingeneral, we have two classes of methods. One is called the direct method and the otheris called the iterative method. By using direct methods, within finite steps, one canobtain an accurate solution of computational problems in exact arithmetic. By usingiterative methods, within finite steps, one can only obtain an approximate solution ofcomputational problems.

    In this chapter, we will introduce a basic direct method called Gaussian eliminationfor solving general linear systems. Usually, Gaussian elimination is used for solving a

    dense linear system with median size and no special structure.

    2.1 Triangular linear systems and LUfactorization

    We first study triangular linear systems.

    2.1.1 Triangular linear systems

    We consider the following nonsingular lower triangular linear system

    Ly= b (2.1)

    9

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    20/196

    10 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    whereb = (b1, b2, , bn)T Rn is a known vector,y = (y1, y2, , yn)T is an unknownvector, and L = [lij] Rnn is given by

    L=

    l11 0 0l21 l22 0

    ...

    l31 l32 l33. . .

    ......

    ... ...

    . . . 0ln1 ln2 ln3 lnn

    withlii= 0, i = 1, 2, , n. By the first equation in (2.1), we have

    l11y1= b1,

    and then

    y1=

    b1l11 .

    Similarly, by the second equation in (2.1), we have

    y2= 1

    l22(b2 l21y1).

    In general, if we have already obtained y1, y2, , yi1, then by using the i-th equationin (2.1), we have

    yi= 1

    lii

    bi

    i1j=1

    lijyj

    .

    This algorithm is called the forward substitution method which needsO(n2) operations.

    Now, we consider the following nonsingular upper triangular linear system

    U x= y (2.2)

    wherex = (x1, x2, , xn)T is an unknown vector, andU Rnn is given by

    U=

    u11 u12 u13 u1n0 u22 u23

    ...

    0 0 u33. . .

    ......

    ... . . .

    . . . ...

    0

    0 unn

    with uii= 0, i= 1, 2, , n. Beginning from the last equation of (2.2), we can obtain

    xn, xn1, , x1 step by step. The xn= yn/unn and xi is given by

    xi= 1

    uii

    yi

    nj=i+1

    uijxj

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    21/196

    2.1. TRIANGULAR LINEAR SYSTEMS ANDLU FACTORIZATION 11

    for i = n 1, , 1. This algorithm is called the backward substitution method whichalso needs O(n2) operations.

    For general linear systems

    Ax= b (2.3)

    whereA Rnn andb Rn are known. If we can factorize the matrix A intoA= LU

    where L is a lower triangular matrix and U is an upper triangular matrix, then wecould find the solution of (2.3) by the following two steps:

    (1) By using the forward substitution method to find solution y ofLy = b.

    (2) By using the backward substitution method to find solutionx ofU x= y.

    Now the problem that we are facing is how to factorize the matrix A intoA = LU. Wetherefore introduce Gaussian transform matrices.

    2.1.2 Gaussian transform matrix

    LetLk = I lkeTk

    whereI Rnn is the identity matrix,lk = (0, , 0, lk+1,k, , lnk)T Rn andek Rnis the k-th unit vector. Then for any k,

    Lk =

    1 0 00 . . . ...... 1

    ...... lk+1,k 1

    ......

    ... . . . 0

    0 lnk 0 1

    is called the Gaussian transform matrix. Such a matrix is a unit lower triangularmatrix. We remark that a unit triangular matrix is a triangular matrix with ones onits diagonal. For any given vector

    x= (x1, x2, , xn)T Rn,we have

    Lkx = (x1, , xk, xk+1 xklk+1,k, , xn xklnk)T

    = (x1, , xk, 0, , 0)T

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    22/196

    12 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    if we takelik =

    xixk

    , i= k+ 1, , n

    withxk= 0. It is easy to check thatL1k =I+ lke

    Tk

    by noting that eTk lk = 0.For a given matrix A Rnn, we have

    LkA= (I lkeTk )A= A lk(eTk A)

    andrank(lk(e

    Tk A)) = 1.

    Therefore,LkAis a rank-one modification of the matrix A.

    2.1.3 Computation ofLU factorization

    We consider the following simple example. Let

    A=

    1 5 92 4 73 3 10

    .By using the Gaussian transform matrix

    L1= 1 0 02 1 03 0 1

    ,we have

    L1A=

    1 5 90 6 110 12 17

    .Followed by using the Gaussian transform matrix

    L2= 1 0 00 1 0

    0 2 1 ,we have

    L2(L1A) U= 1 5 90 6 11

    0 0 5

    .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    23/196

    2.1. TRIANGULAR LINEAR SYSTEMS ANDLU FACTORIZATION 13

    Therefore, we finally have

    A= LU

    where

    L (L2L1)1 =L11 L12 = 1 0 02 1 0

    3 2 1

    .For general n-by-n matrix A, we can use n 1 Gaussian transform matrices L1,

    L2, , Ln1such thatLn1 L1Ais an upper triangular matrix. In fact, letA(0) Aand assume that we have already found k1 Gaussian transform matrices L1, , Lk1Rnn such that

    A(k1) =Lk1 L1A=

    A(k1)11 A

    (k1)12

    0 A(k1)22

    whereA(k1)11 is a (k 1)-by-(k 1) upper triangular matrix and

    A(k1)22 =

    a(k1)kk a(k1)kn

    ... . . .

    ...

    a(k1)nk a(k1)nn

    .Ifa

    (k1)kk = 0, then we can use the Gaussian transform matrix

    Lk = I lkeTkwhere

    lk = (0, , 0, lk+1,k, , lnk)T

    with

    lik =a

    (k1)ik

    a(k1)kk

    , i= k+ 1, , n,

    such that the last n k entries in the k-th column of LkA(k1) become zeros. Wetherefore have

    A(k) LkA(k1) = A

    (k)11 A

    (k)12

    0 A

    (k)

    22 where A

    (k)11 is a k-by-k upper triangular matrix. Aftern 1 steps, we obtain A(n1)

    which is an upper triangular matrix that we need. Let

    L= (Ln1 L1)1, U=A(n1),

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    24/196

    14 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    then A = LU. Now we want to show that L is a unit lower triangular matrix. Bynoting thateTjli= 0 for j < i, we have

    L = L11 L1n1= (I+ l1e

    T1)(I+ l2e

    T2) (I+ ln1eTn1)

    = I+ l1eT1 + + ln1eTn1

    = I+ [l1, l2, , ln1, 0]

    =

    1 0 0 0

    l21 1 0 0l

    31 l

    32 1

    . . . ...

    ... ...

    ... . . . 0

    ln1 ln2 ln3 1

    .This computational process of the LU factorization is called Gaussian elimination.Thus, we have the following algorithm.

    Algorithm 2.1 (Gaussian elimination)

    for k= 1 :n 1A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)

    A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)A(k+ 1 :n, k)A(k, k+ 1 :n)end

    The operation cost of Gaussian elimination is

    n1k=1

    (n k) + 2(n k)2

    =

    n(n 1)2

    +n(n 1)(2n 1)

    3

    = 2

    3n3 + O(n2) =O(n3).

    We remark that in Gaussian elimination, a(k1)kk , k = 1, , n 1, are required tobe nonzero. We have the following theorem.

    Theorem 2.1 The entries a(i1)ii = 0, i = 1, , k, if and only if all the leading

    principal submatricesAi ofA, i= 1, , k, are nonsingular.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    25/196

    2.2. LU FACTORIZATION WITH PIVOTING 15

    Proof: By induction, for k = 1, it is obviously true. Assume that the statement istrue until k 1. We want to show that ifA1, , Ak1 are nonsingular, then

    Ak is nonsingular

    a

    (k1)

    kk = 0 .

    By assumption, we know that

    a(i1)ii = 0, i= 1, , k 1.

    By using k 1 Gaussian transform matricesL1, , Lk1, we obtain

    A(k1) =Lk1 L1A=

    A(k1)11 A

    (k1)12

    0 A(k1)22

    (2.4)

    where A(k1)11 is an upper triangular matrix with nonzero diagonal entries a

    (i1)ii , i =

    1, , k1. Therefore, thek-th leading principal submatrix ofA(k1)

    has the followingform A

    (k1)11 0 a

    (k1)kk

    .

    Let (L1)k, , (Lk1)k denote the k-th leading principal submatrices ofL1, , Lk1,respectively. By using (2.4), we obtain

    (Lk1)k (L1)kAk =

    A(k1)11 0 a

    (k1)kk

    .

    By noting that Li,i = 1,

    , k

    1, are unit lower triangular matrices, we immediatelyknow that

    det(Ak) =a(k1)kk det(A

    (k1)11 ) = 0

    if and only ifa(k1)kk = 0.

    Thus, we have

    Theorem 2.2 If all the leading principal submatrices Ai of a matrix A Rnn arenonsingular fori= 1, , n 1, then there exists a uniqueLU factorization ofA.

    2.2 LU factorization with pivotingBefore we study pivoting techniques, we first consider the following simple example:

    0.3 1011 11 1

    x1x2

    =

    0.70.9

    .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    26/196

    16 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    If we using Gaussian elimination with the 10-decimal-digit floating point arithmetic,we have

    L= 1 00.3333333333 1012 1 and U = 0.3 1011 1

    0 0.3333333333 1012

    .

    Then the computational solution is

    x= (0.0000000000, 0.7000000000)T

    which is not good comparing with the accurate solution

    x= (0.2000000000006

    , 0.6999999999994

    )T.

    If we just interchange the first equation and the second equation, we have 1 1

    0.3 1011 1

    x1x2

    =

    0.90.7

    .

    By using Gaussian elimination with the 10-decimal-digit floating point arithmetic again,we have L= 1 0

    0.3 1011 1

    , U= 1 10 1

    .

    Then the computational solution is

    x= (0.2000000000, 0.7000000000)T

    which is very good. So we need to introduce permutations into Gaussian elimination.We first define a permutation matrix.

    Definition 2.1 A permutation matrixP is an identity matrix with permuted rows.

    The important properties of the permutation matrix are included in the followinglemma. Its proof is straightforward.

    Lemma 2.1 LetP, P1, P2 Rnn

    be permutation matrices andX Rnn

    . Then

    (i) P X is the same as X with its rows permuted. XP is the same as X with itscolumns permuted.

    (ii) P1 =PT.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    27/196

    2.2. LU FACTORIZATION WITH PIVOTING 17

    (iii) det(P) = 1.(iv) P1 P2 is also a permutation matrix.

    Now we introduce the main theorem of this section.

    Theorem 2.3 If A is nonsingular, then there exist permutation matricesP1 andP2,a unit lower triangular matrix L, and a nonsingular upper triangular matrix U suchthat

    P1AP2= LU.

    Only one ofP1 andP2 is necessary.

    Proof: We use induction on the dimension n. Forn = 1, it is obviously true. Assume

    that the statement is true for n 1. IfA is nonsingular, then it has a nonzero entry.Choose permutation matrices P

    1 and P

    2 such that the (1, 1)-th position ofP

    1AP

    2 isnonzero. Now we write a desired factorization and solve for unknown components:

    P

    1AP

    2 =

    a11 A12A21 A22

    =

    1 0L21 I

    u11 U120 A22

    =

    u11 U12L21u11 L21U12+A22

    ,

    (2.5)

    whereA22,A22 are (n 1)-by-(n 1) matrices, andL21, UT12 are (n 1)-by-1 matrices.

    Solving for the components of this 2-by-2 block factorization, we get

    u11= a11= 0, U12= A12,

    and

    L21u11= A21, A22= L21U12+A22.Therefore, we obtain

    L21 = A21

    a11, A22= A22 L21U12.

    We want to apply induction toA22, but to do so we need to check thatdet( A22) = 0.

    Since

    det(P

    1AP

    2) = det(A) = 0

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    28/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    29/196

    2.2. LU FACTORIZATION WITH PIVOTING 19

    Algorithm 2.2 (Gaussian elimination with complete pivoting)

    for k= 1 :n 1choose p, q, (k p, q n)such that

    |A(p, q)| = max {|A(i, j)| : i= k : n, j = k : n}A(k, 1 :n) A(p, 1 :n)A(1 :n, k) A(1 :n, q)if A(k, k) = 0

    A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)

    A(k+ 1 :n, k)A(k, k+ 1 :n)else

    stop

    endend

    We remark that although theLUfactorization with complete pivoting can overcomesome shortcomings of the LU factorization without pivoting, the cost of completepivoting is very high. Usually, it requires O(n3) operations in comparison with entriesof the matrix for pivoting.

    In order to reduce the operation cost of pivoting, the LU factorization with partial

    pivoting is proposed. In partial pivoting, at the k-th step, we choose a(k1)

    pk from the

    submatrixA(k1)22 which satisfies

    |a(k1)pk | = max|a(k1)ik | :k i n

    .

    When A is nonsingular, the LU factorization with partial pivoting can be carried outuntil we finally obtain

    P A= LU.

    In this algorithm, the operation cost in comparison with entries of the matrix forpivoting is O(n2). We have

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    30/196

    20 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    Algorithm 2.3 (Gaussian elimination with partial pivoting)

    for k= 1 :n

    1

    choose p, (k p n)such that|A(p, k)| = max {|A(i, k)| : i= k : n}A(k, 1 :n) A(p, 1 :n)if A(k, k) = 0

    A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)A(k+ 1 :n, k+ 1 :n) =A(k+ 1 :n, k+ 1 :n)

    A(k+ 1 :n, k)A(k, k+ 1 :n)else

    stopend

    end

    2.3 Cholesky factorization

    LetA Rnn be symmetric positive definite, i.e., it satisfiesA= AT, xTAx >0,

    for all x = 0 Rn. We have

    Theorem 2.4 LetA Rnn be symmetric positive definite. Then there exists a lowertriangular matrixL Rnn with positive diagonal entries such that

    A= LLT.

    This factorization is called the Cholesky factorization.

    Proof: SinceAis positive definite, all the principal submatrices ofAshould be positivedefinite. By Theorem 2.2, there exist a unit lower triangular matrixL and an uppertriangular matrix U such that

    A=LU.Let

    D= diag(u11, , unn),

    U =D1U,

    whereuii> 0, for i = 1, , n. Then we haveUTDLT =AT =A =LDU .Therefore, LTU1 =D1 UTLD.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    31/196

    2.3. CHOLESKY FACTORIZATION 21

    We note thatLTU1 is a unit upper triangular matrix and D1 UTLD is a lowertriangular matrix. Hence

    LTU1 =I=D1 UTLDwhich impliesU=LT. Thus

    A=LDLT.Let

    L=Ldiag(u11, , unn).We finally have

    A= LLT.

    Thus, when a matrix A is symmetric positive definite, we could find the solution ofthe systemAx = b by the following three steps:

    (1) Find the Cholesky factorization ofA: A= LLT.

    (2) Find solutiony ofLy = b.

    (3) Find solutionx ofLTx= y.

    From Theorem 2.4, we know that we do not need a pivoting in Cholesky factor-ization. Also we could calculate L directly through a comparison in the correspondingentries between two sides ofA = LLT. We have the following algorithm.

    Algorithm 2.4 (Cholesky factorization)

    for k= 1 :n

    A(k, k) =

    A(k, k)A(k+ 1 :n, k) =A(k+ 1 :n, k)/A(k, k)for j = k+ 1 :n

    A(j : n, j) =A(j : n, j) A(j : n, k)A(j, k)end

    end

    The operation cost of Cholesky factorization is n3

    /3.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    32/196

    22 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

    Exercises:

    1. Let S, T Rnn be upper triangular matrices such that

    (ST I)x= b

    is a nonsingular system. Find an algorithm ofO(n2) operations for computingx.

    2. Show that theLDLT factorization of a symmetric positive definite matrix A is unique.

    3. LetA Rnn be symmetric positive definite. Find an algorithm for computing an uppertriangular matrix U Rnn such that A = U UT.

    4. Let A = [aij ] Rnn be strictly diagonally dominant matrix, i.e.,

    |akk | >n

    j=1j=k|akj |, k= 1, 2, , n.

    Prove that a strictly diagonally dominant matrix is nonsingular, and a strictly diagonallydominant symmetric matrix with positive diagonal entries is positive definite.

    5. Let

    A=

    A11 A12A21 A22

    with A11 being ak-by-k nonsingular matrix. Then

    S= A22 A21A111A12is called the Schur complement ofA11 in A. Show that after k steps of Gaussian elimi-

    nation without pivoting, A(k1)22 =S.

    6. Let A be a symmetric positive definite matrix. At the end of the first step of Gaussianelimination, we have

    a11 aT1

    0 A22

    .

    Prove thatA22 is also symmetric positive definite.

    7. Let A = [aij ] Rnn be a strictly diagonally dominant matrix. After one step ofGaussian elimination, we have

    a11 a

    T1

    0 A22 .

    Show thatA22 is also strictly diagonally dominant.

    8. Show that ifP AQ= LUis obtained via Gaussian elimination with pivoting, then |uii| |uij |, forj = i + 1, , n.

    9. Let H=A + iB be a Hermitian positive definite matrix, where A, B Rnn.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    33/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    34/196

    24 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    35/196

    Chapter 3

    Perturbation and Error Analysis

    In this chapter, we will discuss effects of perturbation and error on numerical solutions.The error analysis on floating point operations and on partial pivoting technique is alsogiven. It is well-known that the essential notions of distance and size in linear vectorspaces are captured by norms. We therefore need to introduce vector and matrix normsand study their properties before we develop our perturbation and error analysis.

    3.1 Vector and matrix norms

    We first introduce vector norms.

    3.1.1 Vector norms

    Letx= (x1, x2, , xn)T Rn.

    Definition 3.1 A vector norm onRn is a function that assigns to eachx Rn a realnumberx, called the norm ofx, such that the following three properties are satisfied

    for allx, y Rn and all R:(i)x >0 ifx = 0, andx = 0 if and only ifx= 0;

    (ii)x = || x;

    (iii)x + y x + y.A useful class of vector norms is the p-norm defined by

    xp

    ni=1

    |xi|p 1

    p

    25

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    36/196

    26 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    where 1 p. The following p-norms are the most commonly used norms in practice:

    x1=ni=1

    |xi|, x2 = ni=1

    |xi|212

    , x = max1in

    |xi|.

    The Cauchy-Schwarz inequality concerning 2 is given as follows,

    |xTy| x2y2

    for x, y Rn, which is a special case of the Holder inequality given as follows,

    |xTy| xpyq

    where 1/p + 1/q= 1.A very important property of vector norms on Rn is that all the vector norms on

    Rn are equivalent as the following theorem said, see [35].

    Theorem 3.1 If and are two norms onRn, then there exist two positiveconstantsc1 andc2 such that

    c1x x c2x

    for allx Rn.

    For example, ifx Rn, then we have

    x2 x1

    nx2,

    x x2

    nxand

    x x1 nx.

    We remark that for any sequence of vectors

    {xk

    }wherexk = (x

    (k)1 ,

    , x

    (k)n )T

    Rn,

    and x = (x1, , xn)T Rn, by Theorem 3.1, one can prove that

    limk

    xk x = 0 limk

    |x(k)i xi| = 0,

    for i= 1, , n.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    37/196

    3.1. VECTOR AND MATRIX NORMS 27

    3.1.2 Matrix norms

    Let

    A= [aij]ni,j=1 R

    nn

    .We now turn our attention to matrix norms.

    Definition 3.2 A matrix norm is a function that assigns to each A Rnn a realnumberA, called the norm ofA, such that the following four properties are satisfied

    for allA, B Rnn and all R:(i)A >0 ifA = 0, andA = 0 if and only ifA= 0;

    (ii)A = || A;

    (iii)A + B A + B;(iv)AB A B.

    An important property of matrix norms on Rnn is that all the matrix norms onRnn are equivalent. For the relation between a vector norm and a matrix norm, wehave

    Definition 3.3 If a matrix norm M and a vector norm v satisfy

    Axv AMxv,

    forA Rnn andx Rn, then these norms are called mutually consistent.

    For any vector norm v, we can define a matrix norm in the following naturalway:

    AM maxx=0

    Axvxv = maxxv=1 Axv.

    The most important matrix norms are the matrix p-norms induced by the vector p-norms for p = 1, 2, . We have the following theorem.

    Theorem 3.2 Let

    A= [aij]ni,j=1 Rnn.Then we have

    (i)A1 = max1jn

    ni=1

    |aij|.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    38/196

    28 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    (ii)A = max1in

    nj=1

    |aij|.

    (iii)A2= max(ATA), wheremax(ATA) is the largest eigenvalue ofATA.Proof: We only give the proof of (i) and (iii). In the following, we always assume thatA = 0.

    For (i), we partition the matrix A by columns:

    A= [a1, , an].

    Let

    = aj01 = max1jn

    aj1.

    Then for any vector x Rn which satisfiesx1=ni=1

    |xi| = 1, we have

    Ax1 = nj=1

    xjaj

    1

    n

    j=1|xj | aj1

    (n

    j=1|xj|) max

    1jnaj1

    = aj01= .Letej0 denote the j0-th unit vector and then

    Aej01= aj01 = .

    Therefore

    A1= maxx1=1

    Ax1 = = max1jn

    aj1 = max1jn

    ni=1

    |aij|.

    For (iii), we have

    A2 = maxx2=1

    Ax2= maxx2=1

    [(Ax)T(Ax)]1/2

    = maxx2=1[xT(ATA)x]1/2.

    SinceATAis positive semi-definite, its eigenvalues can be assumed to be in the followingorder:

    1 2 n 0.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    39/196

    3.1. VECTOR AND MATRIX NORMS 29

    Letv1, v2, , vn Rn

    denote the orthonormal eigenvectors corresponding to 1, 2, , n, respectively. Thenfor any vector x Rn withx2= 1, we have

    x=ni=1

    ivi,ni=1

    2i = 1.

    Therefore,

    xTATAx=ni=1

    i2i 1.

    On the other hand, letx = v1, we have

    xT

    AT

    Ax= vT1A

    T

    Av1 = vT11v1= 1.

    Thus

    A2= maxx2=1

    Ax2=

    1=

    max(ATA).

    We have the following theorem for the norm 2.

    Theorem 3.3 LetA Rnn. Then we have

    (i)A2= maxx2=1

    maxy2=1

    |yAx|, wherex, y Cn.

    (ii)AT2= A2=ATA2.

    (iii)A2= QAZ2, for any orthogonal matricesQ andZ. We recall that a matrixM Rnn is called orthogonal ifM1 =MT.

    Proof: We only prove (i). We first introduce the dual norm D of a vector norm defined as follows,

    yD = maxx=1

    |yx|.

    For 2, we have by the Cauchy-Schwarz inequality,

    |yx| y2x2with equality when x = 1y2 y. Therefore, the dual norm of 2 is given by

    yD2 = maxx2=1

    |yx| = maxx2=1

    y2x2= y2.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    40/196

    30 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    So, 2 is its own dual. Now, we consider

    A

    2 = max

    x2=1 Ax

    2 = max

    x2=1 Ax

    D2

    = maxx2=1

    maxy2=1

    |(Ax)y|

    = maxx2=1

    maxy2=1

    |yAx|.

    Another useful norm is the Frobenius norm which is defined by

    A

    F

    n

    j=1n

    i=1 |aij|212

    .

    One of the most important properties of F is that for any orthogonal matrices Qand Z,

    AF = QAZF.In the following, we will extend our discussion on norms to the field ofC. We remark

    that from the viewpoint of norms, there is no essential difference between matrices orvectors in the field ofR and matrices or vectors in the field ofC.

    Definition 3.4 Let A Cnn. Then the set of all the eigenvalues ofA is called thespectrum ofA and

    (A) = max{|| : belongs to the spectrum ofA}

    is called the spectral radius ofA.

    For the relation between the spectral radius and matrix norms, we have

    Theorem 3.4 LetA Cnn. Then(i) For any matrix norm, we have

    (A) A.

    (ii) For any >0, there exists a norm defined onCnn such that

    A (A) + .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    41/196

    3.1. VECTOR AND MATRIX NORMS 31

    Proof: For (i), letx Cn satisfyx = 0, Ax= x, || =(A).

    Then we have(A)xeT1 = xeT1 = AxeT1 A xeT1 .

    Hence(A) A.

    For (ii), by using Theorem 1.1 (Jordan Decomposition Theorem), we know thatthere is a nonsingular matrix X Cnn such that

    X1AX=

    1 1

    2 2. . .

    . . .

    n1 n1n

    wherei= 1 or 0. For any given >0, let

    D= diag(1, , 2, , n1),

    then

    D1 X1AXD=

    1 12 2

    . . . . . .

    n1 n1

    n

    .

    Now, defineG = D1 X1GXD, G Cnn.

    It is easy to see this matrix norm actually is induced by the vector norm definedas follows:

    xXD = (XD)1x, x Cn.Therefore,

    A= D1 X1AXD = max1in

    (|i| + |i|) (A) + ,

    wheren= 0.

    We remark that for any sequence of matrices{A(k)} where A(k) = [a(k)ij ] Rnn,and A = [aij] Rnn,

    limk

    A(k) A = 0 limk

    a(k)ij =aij,

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    42/196

    32 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    for i, j= 1, , n.

    Theorem 3.5 LetA

    Cnn. Then

    limk

    Ak = 0 (A)< 1.

    Proof: We first assume thatlimk

    Ak = 0.

    Let be an eigenvalue ofA such that (A) =||. Then k is an eigenvalue ofAk foranyk . By Theorem 3.4 (i), we know that for any k,

    (A)k = ||k = |k| (Ak) Ak.Therefore,

    limk

    (A)k = 0

    which implies (A)< 1.Conversely, assume that(A)< 1. By Theorem 3.4 (ii), there exists a matrix norm

    such thatA

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    43/196

    3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 33

    Corollary 3.1 Letbe a norm defined onCnn withI = 1 andA Cnn satisfyA

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    44/196

    34 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    Obviously, the condition number depends on the matrix norm used. When (A) issmall, thenA is said to be well-conditioned, whereas if(A) is large, then A is said tobe ill-conditioned. Note that for any p-norm, we have

    1 = I = A A1 A A1 =(A).

    Let x be an approximation of the exact solution x ofAx = b. The error vector isdefined as follows,

    e= x x,i.e.,

    x= x + e. (3.2)

    The absolute error is given by

    e = x xfor any vector norm. Ifx = 0, then the relative error is defined by

    ex =

    x xx .

    We have by substituting (3.2) into Ax = b,

    A(x + e) =Ax + Ae= b.

    Therefore,

    Ax= b Ae=b.The xis the exact solution ofAx= bwherebis a perturbed vector ofb. Sincex= A1band x= A1b, we have

    x x = A1(b b) A1 b b. (3.3)

    Similarly,

    b = Ax A x,i.e.,

    1

    xA

    b . (3.4)Combining (3.3), (3.4) and (3.1), we obtain the following theorem which gives the effectof perturbations of the vector b on the solution of Ax = b in terms of the conditionnumber.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    45/196

    3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 35

    Theorem 3.7 Let x be an approximate solution of the exact solution x of Ax = b.Then

    x

    x

    x (A)b

    b

    b .

    The next theorem includes the effect of perturbations of the coefficient matrix Aon the solution ofAx = b in terms of the condition number.

    Theorem 3.8 LetA be a nonsingular matrix andA be a perturbed matrix ofA suchthat

    A A A1

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    46/196

    36 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    we getx x

    x

    (1 A1E)1

    A1E + (A)

    b

    .

    By using

    A1E A1 E =(A)EA ,

    we finally havex x

    x (A)

    1 (A) EA

    EA +

    b

    .

    Theorems 3.7 and 3.8 give upper bounds for the relative error of x in terms ofthe condition number ofA. From Theorems 3.7 and 3.8, we know that if A is well-

    conditioned, i.e.,(A) is small, the relative error in x will be small if the relative errorsin both A and b are small.

    Corollary 3.2 Let be any matrix norm withI = 1 and A be a nonsingularmatrix withA +A being a perturbed matrix ofA such that

    A1A

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    47/196

    3.3. ERROR ANALYSIS ON FLOATING POINT ARITHMETIC 37

    By using identity

    B1 =A1 B1(B A)A1,

    we have,(A +A)1 A1 = (A +A)1AA1.

    Then

    (A +A)1 A1 A1 A (A +A)1 A12 A1 r .

    Finally, we obtain

    (A +A)1 A1A1

    A1 A1 r

    A1 A1 A1

    A

    (A)

    1 (A) AA

    AA .

    3.3 Error analysis on floating point arithmetic

    In computers, the floating point numbers fare expressed as

    f= J, L J U,

    where is the base, J is the order, and is the fraction. Usually, has the followingform:

    = 0.d1d2

    dt

    wheret is the length (precision) of , d1= 0, and 0 di< , for i = 2, , t.Let

    F= {0} {f :f = J, 0 di< , d1= 0, L J U}.ThenFcontains

    2( 1)t1(U L + 1) + 1floating point numbers. These numbers are symmetrically distributed in the intervals[m, M] and [M, m], where

    m= L1, M =U(1 t). (3.5)

    We remark thatF is only a finite set which cannot contain all the real numbers inthese two intervals.

    Letf l(x) denote the floating point number of any real number x. Then

    f l(x) = 0, forx = 0.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    48/196

    38 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    Ifm |x| M, by rounding, f l(x) is the minimum of|f l(x) x| = min

    fF|f x|.

    By chopping, f l(x) is the minimum of

    |f l(x) x| = min|f||x|

    |f x|.

    For example, let = 10, t = 3, L = 0 and U = 2. We consider the floating pointexpression ofx = 5.45627. By rounding, we have f l(x) = 0.546 10. By chopping, wehave f l(x) = 0.545 10. The following theorem gives an estimate of the relative errorof floating point expressions.

    Theorem 3.9 Letm

    |x

    | M, wherem andM are defined by (3.5). Then

    f l(x) =x(1 + ), || u,whereu is the machine precision, i.e.,

    u=

    12

    1t, by rounding,

    1t, by chopping.

    Proof: In the following, we assume that x= 0 and x > 0. Let be an integer andsatisfy

    1 x < . (3.6)Since the order of floating point numbers in [1, ) is , all the numbers

    0.d1d2 dt

    are distributed in the interval with distancet. For the rounding error, by (3.6), wehave

    |f l(x) x| 12

    t =1

    211t 1

    2x1t,

    i.e.,|f l(x) x|

    x 1

    21t.

    For the chopping error, we have

    |f l(x) x| t =11t x1t,i.e.,

    |f l(x) x|x

    1t.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    49/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    50/196

    40 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    Letx = u. By the left inequality of (3.9), we have

    (1 + u)n enu. (3.10)Letx = nu. By the right inequality of (3.9), we have

    enu 1 + 1.01nu. (3.11)Combining (3.10) and (3.11), we have

    (1 + u)n 1 + 1.01nu. (3.12)By (3.7), (3.8) and (3.12), the proof is complete.

    We consider the following example.

    Example 3.1. For givenx, y

    Rn, estimate the upper bound of

    |f l(xTy) xTy|.Let

    Sk =f l ki=1

    xiyi

    .

    By Theorem 3.10, we have

    S1 = x1y1(1 + 1), |1| u,and

    Sk = f l(Sk1+ f l(xkyk))

    = [Sk1+ xkyk(1 + k)](1 + k), |k|,|k| u.Therefore,

    f l(xTy) = Sn=ni=1

    xiyi(1 + i)n

    j=i(1 + j)

    =ni=1

    (1 + i)xiyi,

    where

    1 + i= (1 + i)n

    j=i(1 + j)with1= 0. Thus, ifnu 0.01, we then have by Theorem 3.11,

    |f l(xTy) xTy| ni=1

    |i| |xiyi| 1.01nuni=1

    |xiyi|.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    51/196

    3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 41

    Before we finish this section, let us briefly discuss the floating point analysis onelementary matrix operations. We first introduce the following notations:

    |E| = [|eij|],

    whereE= [eij] Rnn and

    |E| |F||eij| |fij|

    for i, j = 1, 2, , n. Let A, B Rnn be matrices with entries inF, and F. ByTheorem 3.10, we have

    f l(A) =A + E, |E| u|A|,

    andf l(A + B) = (A + B) + E, |E| u|A + B|.

    From Example 3.1, we also have

    f l(AB) =AB + E, |E| 1.01nu|A| |B|.

    Note that |A| |B| maybe is much larger than |AB|. Therefore the relative error ofABmay not be small.

    3.4 Error analysis on partial pivoting

    We will show that if Gaussian elimination with partial pivoting is used to solve Ax = b,then the computational solution xsatisfies

    (A + E)x= b,

    where E is an error matrix. An upper bound ofE is also given. We first study therounding error of the LU factorization ofA.

    Lemma 3.1 Let A Rnn with floating point entries. Assume that A has an LUfactorization and6nu

    1 whereu is the machine precision. Then by using Gaussian

    elimination, we have LU=A + Ewhere

    |E| 3nu(|A| + |L| |U|).

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    52/196

    42 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    Proof: We use induction onn. Obviously, Lemma 3.1 is true for n = 1. Assume thatthe lemma holds for n 1. Now, we consider a matrix A Rnn:

    A= wT

    v A1

    ,

    whereA1 R(n1)(n1). At the first step of Gaussian elimination, we should computethe vector l1= f l(v/) and modify the matrix A1 as

    A1= f l(A1 f l(l1wT)).By Theorem 3.10, we have

    l1 = v/ + f, |f| u|| |v| (3.13)

    and A1= A1 l1wT + F, |F| (2 + u)u(|A1| + |l1| |w|T). (3.14)ForA1, by using the assumption, we obtain an LU factorization with a unit lowertriangular matrixL1 and an upper triangular matrixU1 such that

    L1 U1=A1+ E1where

    |E1| 3(n 1)u(| A1| + |L1| |U1|).Thus, we have

    LU = 1 0l1L1

    wT

    0 U1

    = A + E,

    where

    E= 0 0f E1+ F .By using (3.14), we obtain

    | A1| (1 + 2u + u2)(|A1| + |l1| |w|T).

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    53/196

    3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 43

    Therefore, by using the condition 6nu 1, we have

    |E1+ F

    | |E1

    |+

    |F

    | 3(n 1)u(| A1| + |L1| |U1|) + (2 + u)u(|A1| + |l1| |w|T) 3(n 1)u

    (1 + 2u + u2)(|A1| + |l1| |w|T) + |L1| |U1|

    +(2 + u)u(|A1| + |l1| |w|T)

    u

    3n 1 + [6n + 3(n 1)u 5]u

    (|A1| + |l1| |w|T)

    +3(n

    1)u(

    |L1| |U1|) 3nu(|A1| + |l1| |w|T + |L1| |U1|).

    Combining with (3.13), we obtain

    |E| =

    0 0

    |||f| |E1+ F|

    3nu

    0 0

    |v| |A1| + |l1| |w|T + |

    L1| |

    U1|

    3nu|| |w|T|v| |A1| + 1 0|l1| |L1| || |w|T0 |U1| = 3nu(|A| + |L| |U|).

    The proof is complete.

    Corollary 3.3 LetA Rnn be nonsingular with floating point entries and6nu 1.Assume that by using Gaussian elimination with partial pivoting, we obtain

    LU =P A + EwhereL= [lij] is a unit lower triangular matrix with|lij| 1,Uis an upper triangularmatrix andP is a permutation matrix. ThenEsatisfies the following inequality:

    |E| 3nu(|P A| + |L| |U|).

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    54/196

    44 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    After we obtain the LU factorization ofA, the problem of solving Ax = b becomesthe problem of solving the following two triangular systems:

    Ly= P b, U x= y.Therefore, we need to estimate the rounding error of solving triangular systems.

    Lemma 3.2 Let S Rnn be a nonsingular triangular matrix with floating pointentries and 1.01nu 0.01. By using the method proposed in Section 2.1.1 to solveSx = b, we then obtain a computational solutionx which satisfies

    (S+ H)x= b,

    where

    |H| 1.01nu|S|.Proof: We use induction on n. Without loss of generality, let S = L be a lowertriangular matrix. Obviously, Lemma 3.2 is true for n = 1. Assume that the lemmais true for n 1. Now, we consider a lower triangular matrix L Rnn. Let x be thecomputational solution ofLx= b and we partition L, b and x as follows:

    L=

    l11 0

    l1 L1

    , b=

    b1

    c

    , x=

    x1

    y

    ,

    wherec, y Rn1 andL1 R(n1)(n1). By Theorem 3.10, we have

    x1 = f l(b1/l11) = b1l11(1 + 1), |1| u. (3.15)

    Note that y is the computational solution of the (n 1)-by-(n 1) system

    L1y= f l(c x1l1).By assumption, we have

    (L1+ H1)y= f l(c x1l1)where

    |H1| 1.01(n 1)u|L1|. (3.16)

    By Theorem 3.10 again, we obtain

    f l(c x1l1) =f l(c f l(x1l1)) = (I+ D)1(c x1l1 x1Dl1),

    whereD= diag(2, , n), D = diag(2, , n)

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    55/196

    3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 45

    with|i|,|i| u, i= 2, , n.

    Therefore, x1l1+ x1Dl1+ (I+ D)(L1+ H1)y= c,

    and then(L + H)x= b,

    where

    H=

    1l11 0Dl1 H1+ D(L1+ H1)

    .

    By using (3.15), (3.16) and the condition 1.01nu 0.01, we have

    |H

    | |1| |l11| 0|D| |l1| |H1| + |D|(|L1| + |H1|)

    u|l11| 0u|l1| |H1| + u(|L1| + |H1|)

    u|l11| 0

    |l1| [1.01(n 1) + 1 + 1.01(n 1)u]|L1|

    1.01nu|L|.

    We then have the main theorem of this section.

    Theorem 3.12 LetA Rnn be a nonsingular matrix with floating point entries and1.01nu0.01. If Gaussian elimination with partial pivoting is used to solveAx = b,we then obtain a computational solutionx which satisfies

    (A + A)x= b,

    whereA u(3n + 5.04n3)A (3.17)

    with the growth factor

    1A maxi,j,k |a(k)ij|.

    Proof: By using Gaussian elimination with partial pivoting, we have the following twotriangular systems: Ly= P b, U x= y.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    56/196

    46 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS

    By using Lemma 3.2, the computational solution xshould satisfy

    (L + F)(U+ G)x= P b,i.e.,

    (LU+ FU+LG + F G)x= P b, (3.18)where

    |F| 1.01nu|L|, |G| 1.01nu|U|. (3.19)SubstitutingLU=P A + Einto (3.18), we have

    (A + A)x= b,

    where

    A = P

    T

    (E+ FU+LG + F G).By using (3.19), Corollary 3.3 and the condition 1.01nu 0.01, we have

    |A| PT(3nu|P A| + (3n + 2.04n)u|L| |U|)= nuPT(3|P A| + 5.04|L| |U|). (3.20)

    By Corollary 3.3 again, the absolute values of entries inL are less than or equal to 1.Therefore, we have

    L n. (3.21)

    We now define 1A maxi,j,k |a(k)ij|

    and then we have

    U nA. (3.22)Substituting (3.21) and (3.22) into (3.20), we have (3.17). The proof is complete.

    We remark thatA usually is very small comparing with the initial error fromgiven data. Thus, Gaussian elimination with partial pivoting is numerically stable.

    Exercises:1. Let

    A=

    1 0.999999

    0.999999 1

    .

    ComputeA1, det(A) and the condition number ofA.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    57/196

    3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 47

    2. Prove thatABF A2BF andABF AFB2.3. Prove thatA22 A1A for any square matrix A.

    4. Show that A11 00 A22 2 A11 A12A21 A22 2 .5. Let A be nonsingular. Show that

    A112 = minx2=1

    Ax2.

    6. Show that ifS is real and S= ST, then I Sis nonsingular and the matrix(I S)1(I+ S)

    is orthogonal. This is known as the Cayley transform ofS.

    7. Prove that if bothA and A + E are nonsingular, then

    (A + E)1 A1A1 (A + E)

    1 E.

    8. Let A Rnn be nonsingular and let x, y, z Rn such that Ax = b and Ay = b+z.Show that z2

    A2 x y2 A12z2.

    9. Let A = [aij ] be an m-by-nmatrix. Define

    |||A|||l = maxi,j

    |aij |.

    Is||| |||l a matrix norm? Give a reason for your answer.10. Show that ifX Cnn is nonsingular, then AX = X1AX2 defines a matrix norm.11. Let A = LDLT Rnn be a symmetric positive definite matrix and

    D= diag(d11, , dnn).Show that

    2(A) maxi

    {dii}mini

    {dii} .

    12. Verify that

    xy

    F =

    xy

    2=

    x2

    y2,

    for any x, y Cn.13. Show that if 0 =vRn and E Rnn, thenEI v vT

    vTv

    2F

    = E2FEv22

    vTv .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    58/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    59/196

    Chapter 4

    Least Squares Problems

    In this chapter, we study linear least squares problems:

    minyRn

    Ay b2

    where the data matrix A Rmn with m n and the observation vector b Rmare given. We introduce some well-known orthogonal transformations and the QRdecomposition for constructing efficient algorithms for these problems. For a literatureon least squares problems, we refer to [15, 21, 42, 44, 45, 48].

    4.1 Least squares problems

    In practice, if we are given m points t1, t2,

    , tmwith data on these points y1, y2,

    , ym,

    and functions1(t), 2(t), , n(t) defined on these points, we then try to find f(x, t)defined by

    f(x, t) n

    j=1

    xjj(t)

    such that residuals defined by

    ri(x) yi f(x, ti) =yi n

    j=1

    xjj(ti), i= 1, 2, , m,

    can be as small as possible. In matrix form, we have

    r(x) =b Axwhere

    A=

    1(t1) n(t1)... ...1(tm) n(tm)

    ,49

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    60/196

    50 CHAPTER 4. LEAST SQUARES PROBLEMS

    andb= (y1, , ym)T, x= (x1, , xn)T, r(x) = (r1(x), , rm(x))T.

    Whenm= n, we can require that r(x) = 0 and x can be found by solving the systemAx = b. Whenm > n, we require that r(x) can reach its minimum under the norm 2. We therefore introduce the following definition of the least squares problem.

    Definition 4.1 LetA Rmn andb Rm. Findx Rn such that

    b Ax2= r(x)2= minyRn

    r(y)2= minyRn

    b Ay2. (4.1)

    It is called the least squares (LS) problem andr(x) is called the residual.

    In the following, we only consider the case of

    rank(A) =n < m.

    We first study the solution x of the following equation

    Ax= b, A Rmn. (4.2)

    The range of matrix A is defined by

    R(A) {y Rm :y = Ax, x Rn}.

    It is easy to see that

    R(A) = span

    {a1,

    , an

    }whereai, i = 1, , n, are column vectors ofA. The nullspace ofA is defined by

    N(A) {x Rn :Ax = 0}.

    The dimension ofN(A) is denoted by null(A). The orthogonal complement of a sub-spaceS Rn is defined by

    S {y Rn :yTx= 0, for all x S}.

    We have the following theorems for (4.2).

    Theorem 4.1 The equation (4.2) has solutions rank(A) =rank([A, b]).

    Theorem 4.2 Let x be a special solution of (4.2). Then the solution set of (4.2) isgiven byx +N(A).

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    61/196

    4.1. LEAST SQUARES PROBLEMS 51

    Corollary 4.1 Assume that the equation (4.2) has some solution. The solution isunique null(A) = 0.

    We have the following essential theorem for the solution of (4.1).

    Theorem 4.3 The LS problem (4.1) always has solutions. The solution is unique ifand only if null(A) = 0.

    Proof: SinceRm = R(A) R(A),

    the vector b can be expressed uniquely by

    b= b1+ b2

    where b1 R(A) and b2 R(A). For any x Rn, since b1 Ax R(A) and isorthogonal tob2, we therefore have

    r(x)22 = b Ax22= (b1 Ax) + b222

    = b1 Ax22+ b222.

    Note that r(x)22reaches the minimum if and only ifb1 Ax22reaches the minimum.Since b1 R(A),r(x)22 reaches its minimum if and only if

    Ax= b1,

    i.e.,b1 Ax22= 0.

    Thus, by Corollary 4.1, we know that the solution ofAx= b1is unique, i.e., the solutionof (4.1) is unique, if and only if

    null(A) = 0.

    LetX = {x Rn :x is a solution of (4.1)}.

    We have

    Theorem 4.4 A vectorx X if and only ifATAx= ATb. (4.3)

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    62/196

    52 CHAPTER 4. LEAST SQUARES PROBLEMS

    Proof: Letx X. By Theorem 4.3, we know that Ax = b1 where b1 R(A) and

    r(x) =b

    Ax= b

    b1= b2

    R(A).

    Therefore

    ATr(x) =ATb2= 0.

    Substitutingr(x) =b Ax into ATr(x) = 0, we obtain (4.3).Conversely, letx Rn satisfy

    ATAx= ATb,

    then for any y Rn, we have

    b

    A(x + y)

    22 =

    b

    Ax

    22

    2yTAT(b

    Ax) +

    Ay

    22

    = b Ax22+ Ay22

    b Ax22.

    Thus, x X.We therefore have the following algorithm for LS problems:

    (1) Compute C=ATAandd = ATb.

    (2) Find the Cholesky factorization ofC= LLT.

    (3) Solve triangular linear systems: Ly= d and LTx= y.

    We remark that in computation ofATA, usually, the operation cost isO(n2m), andsome information of matrix A could be lost. For example, we consider

    A=

    1 1 1 0 00 00 0

    .We have

    ATA= 1 + 2 1 11 1 + 2 1

    1 1 1 + 2

    .Assume that = 103 and a 6-digital decimal floating system is used. Then 1 +2 =1 + 106 is rounded off to be 1, which means that ATA is singular!

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    63/196

    4.1. LEAST SQUARES PROBLEMS 53

    We note that the solutionx of (4.3) can be expressed as

    x= (ATA)1ATb.

    If we let

    A = (ATA)1AT,

    then the LS solutionx could be written as

    x= Ab.

    Actually, the n-by-m matrix A is the Moore-Penrose generalized inverse ofA, whichis unique, see [14, 17, 42]. In general, we have

    Definition 4.2 LetX

    Rnm. If it satisfies the following conditions:

    AXA= A, XAX =X, (AX)T =AX, (XA)T =X A,

    thenX is called the Moore-Penrose generalized inverse ofA and denoted byA.

    Now we develop the perturbation analysis of LS problems. Assume that there is aperturbationb onband letx,x +xdenote the solutions of the following LS problems,respectively,

    minx

    b Ax2, minx

    (b + b) Ax2.Then

    x= A

    b,and

    x + x = A(b + b) =Ab

    where b= b + b. We have

    Theorem 4.5 Letb1 and b1 denote orthogonal projections ofb and bonR(A), respec-tively. Ifb1= 0, then

    x2x2 2(A)

    b12b12

    where2(A) = A2A

    2 andb1= b1+ b1.

    Proof: Letb2 denote the orthogonal projection ofb onR(A). Thenb= b1+ b2 andATb2 = 0. Note that

    Ab= Ab1+ Ab2 = A

    b1+ (ATA)1ATb2= A

    b1.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    64/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    65/196

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    66/196

    56 CHAPTER 4. LEAST SQUARES PROBLEMS

    where x1 is the first component of the vector x. Let the coefficient ofx be zero andthen we have the following equation:

    1 2(x22 x1)x e122= 0.

    Solving this equation for, we have = x2. Substituting it into (4.7), we thereforehave

    Hx= x2e1.

    We remark that for any vector 0= x Rn, by Theorem 4.8, one can construct aHouseholder matrix H such that the last n 1 components ofHx are zeros. We canuse the following two steps to construct the unit vector ofH:

    (1) compute v = x x2e1;(2) compute = v/v2.

    Now a natural question is: how to choose the sign in front ofx2? Usually, wechoose

    v= x + sign(x1)x2e1,wherex1= 0 is the first component of the vector x, see [38]. Since

    H=I 2T =I 2vTv

    vvT =I vvT

    where = 2/vTv, we only need to compute and v instead of forming . Thus, wehave the following algorithm.

    Algorithm 4.1 (Householder transformation)

    function:[v, ] =house(x)n= length(x)= x(2 :n)Tx(2 :n)

    v(1) =x(1) +sign(x(1))

    x(1)2 +

    v(2 :n) =x(2 :n)

    if = 0= 0

    else= 2/(v(1)2 + )

    end

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    67/196

    4.3. QR DECOMPOSITION 57

    4.2.2 Givens rotation

    A Givens rotation is defined as follows:

    G(i ,k,) = I+ s(eieTk ekeTi) + (c 1)(eieTi + ekeTk )

    =

    1 ...

    .... . .

    ... ...

    c s ...

    ... s c

    ... ...

    . . ....

    ... 1

    ,

    wherec = cos and s = sin . It is easy to prove thatG(i ,k,) is an orthogonal matrix.Letx Rn andy = G(i ,k,)x. We then have

    yi= cxi+ sxk, yk = sxi+ cxk, yj =xj, j=i, k.

    If we want to make yk = 0, then we only need to take

    c= xi

    x2i + x2k

    , s= xk

    x2i + x2k

    .

    Therefore,

    yi=

    x2i + x2k, yk = 0.

    We remark that for any vector 0= x Rn, one can construct a Givens rotationG(i ,k,) acting onx to make a nonzero component ofx be zero.

    4.3 QR decomposition

    Let A Rmn and b Rm. By Theorem 3.3 (iii), for any orthogonal matrix Q, wehave

    Ax b2 = QT(Ax b)2.Therefore, the LS problem

    minx

    QTAx QTb2is equivalent to (4.1). We wish that we could find a suitable orthogonal matrix Q suchthat the original LS problem becomes an easier solvable LS problem. We have

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    68/196

    58 CHAPTER 4. LEAST SQUARES PROBLEMS

    Theorem 4.9 (QR decomposition) Let A Rmn (m n). Then A has a QRdecomposition:

    A= Q R0 , (4.8)whereQ Rmm is an orthogonal matrix andR Rnn is an upper triangular matrixwith nonnegative diagonal entries. The decomposition is unique whenm= n andA isnonsingular.

    Proof: We use induction. When n = 1, we note that it is true by using Theorem4.8. Now, we assume that the theorem is true for all the matrices in Rp(n1) with

    p n1. Let the first column ofA Rmn be a1. By Theorem 4.8 again, there existsan orthogonal matrix Q1 Rmm such that

    QT1 a1=

    a1

    2e1.

    Therefore, we have

    QT1 A=

    a12 vT0 A1

    .

    For the matrix A1 R(m1)(n1), we obtain by assumption,

    A1= Q2

    R2

    0

    ,

    whereQ2 R(m1)(m1) is an orthogonal matrix andR2is an upper triangular matrixwith nonnegative diagonal entries. Thus, let

    Q= Q1

    1 00 Q2

    , R=

    a12 vT0 R20 0

    .ThenQ and R are the matrices satisfying the conditions of the theorem.

    When A Rmm is nonsingular, we want to show that the QR decomposition isunique. Let

    A= QR =Q RwhereQ,Q Rmm are orthogonal matrices, and R,R Rmm are upper triangularmatrices with nonnegative diagonal entries. Since A is nonsingular, we know that thediagonal entries ofR andR are positive. Therefore, the matricesQTQ=RR1are both orthogonal and upper triangular matrices with positive diagonal entries. ThusQTQ=RR1 =I ,

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    69/196

    4.3. QR DECOMPOSITION 59

    i.e.,

    Q= Q,

    R= R.

    A complex version of the QR decomposition is needed later on.

    Corollary 4.2 LetA Cmn (m n). ThenA has aQR decomposition:

    A= Q

    R

    0

    ,

    where Q Cmm is a unitary matrix and R Cnn is an upper triangular matrixwith nonnegative diagonal entries. The decomposition is unique whenm= n andA isnonsingular.

    Now we use the QR decomposition to solve the LS problem (4.1). Suppose that

    A Rmn (m n) has linearly independent columns, b Rm, and A has a QRdecomposition (4.8). LetQ be partitioned as

    Q= [Q1Q2] ,

    and

    QTb=

    QT1QT2

    b=

    c1c2

    .

    ThenAx b22= QTAx QTb22 = Rx c122+ c222.

    Thex is the solution of the LS problem (4.1) if and only if it is the solution ofRx = c1.Note that it is much easier to get the solution of (4.1) by solving Rx = c1 since R isan upper triangular matrix. We have the following algorithm for LS problems:

    (1) Compute aQR decomposition ofA.

    (2) Compute c1= QT1 b.

    (3) Solve the upper triangular systemRx = c1.

    Finally, we discuss how to use Householder transformations to compute the QRdecomposition of A. Let m = 7 and n = 5. Assume that we have already foundHouseholder transformations H1 and H2 such that

    H2H1A=

    0

    0 0 + 0 0 + 0 0 + 0 0 + 0 0 +

    .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    70/196

    60 CHAPTER 4. LEAST SQUARES PROBLEMS

    Now we construct a Householder transformationH3 R55 such thatH3

    +

    ++++

    = 0000

    .LetH3= diag(I2,H3). We obtain

    H3H2H1A=

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    .

    In general, after n such steps, we can reduce the matrix A into the following form,

    HnHn1 H1A=

    R0

    ,

    where R is an upper triangular matrix with nonnegative diagonal entries. By settingQ= H1 Hn, we obtain

    A= Q

    R

    0

    .

    Thus, we have the following algorithm.

    Algorithm 4.2 (QR decomposition: Householder transformation)

    forj = 1 :n[v, ] =house(A(j : m, j))A(j: m, j : n) = (Imj+1 vvT)A(j : m, j :n)ifj < m

    A(j+ 1 :m, j) =v(2 :m j+ 1)end

    end

    We remark that the QR decomposition is not only a basic tool for solving LSproblems but also an important tool for solving some other fundamental problems inNLA.

    Exercises:

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    71/196

    4.3. QR DECOMPOSITION 61

    1. Let A Rmn have full column rank. Prove thatA +Ealso has full column rank ifEsatisfiesE2 1A2 , where A = (ATA)1AT.

    2. Let U = [uij

    ] be a nonsingular upper triangular matrix. Show that

    (U) maxi

    |uii|mini

    |uii| ,

    where(U) = UU1.3. Let A Rmn withm nand have full column rank. Show that

    I AAT 0

    rx

    =

    b0

    has a solution where x minimizesAx b2.

    4. Let x Rn

    and P be a Householder transformation such thatP x= x2e1.

    LetG12, G23, , Gn1,n be Givens rotations, and letQ= G12G23 Gn1,n.

    Suppose Qx = x2e1. Is Pequal to Q? Give a proof or a counterexample.5. Let A Rmn. Show that X=A minimizesAX IF over all X Rnm. What is

    the minimum?

    6. Let x =

    x1x2

    C2. Find an algorithm to compute the following unitary matrix

    Q= c s

    s c

    , c R, c2 + |s|2 = 1

    such thatQx =

    0

    .

    7. Suppose an m-by-nmatrix A has the form

    A=

    A1A2

    ,

    whereA1 is an n-by-n nonsingular matrix and A2 is an (m n)-by-n arbitrary matrix.Prove thatA2 A11 2.

    8. Consider the following well-known ill-conditioned matrix

    A=

    1 1 1 0 00 00 0

    , || 1.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    72/196

    62 CHAPTER 4. LEAST SQUARES PROBLEMS

    (a) Choose a small such that rank(A) = 3. Then compute2(A) to show that A isill-conditioned.

    (b) Find the LS solution withA given as above and b = (3, ,,)T by using

    (i) the normalized equation method;

    (ii) the QR method.

    9. Let A = BCwhere B Cmr and C Crn withr= rank(A) = rank(B) = rank(C).

    Show thatA =C(CC)1(BB)1B.

    10. Let A = UV Cmn, where U Cmn satisfies UU = I, V Cnn satisfiesVV =Iand is an n-by-n diagonal matrix. Show that

    A =VU.

    11. Prove thatA = lim

    0(AA + I)1A = lim

    0A(AA + I)1.

    12. Show thatR(A) N(A) = {0}.

    13. Let A = [aij ] Cnn be idempotent. Then

    R(A) N(A) = Cn, rank(A) =n

    i=1aii.

    14. Let A Cmn. Prove thatR(AA) = R(AA) = R(A),

    R(AA) = R(AA) = R(A) = R(A),

    N(AA) =N(AA) =N(A) =N(A),

    N(AA) =N(AA) =N(A).

    ThereforeAA andAA are orthogonal projectors.

    15. Prove Corollary 4.2.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    73/196

    Chapter 5

    Classical Iterative Methods

    We study classical iterative methods for the solution of Ax = b. Iterative methods,originally proposed by Gauss in 1823, Liouville in 1837, and Jacobi in 1845, are quitedifferent from direct methods such as Gaussian elimination, see [2].

    Direct methods based on an LU factorization ofA become prohibitive in terms ofcomputing time and computer storage if the matrix A is quite large. In some practicalsituation such as the discretization of partial differential equations, the matrix size canbe as large as several hundreds of thousands. For such problems, direct methods becomeimpractical. Furthermore, most large problems are sparse, and usually the sparsity islost during LU factorizations. Therefore, we have to face a very large matrix withmany nonzero entries at the end ofLUfactorizations, and then the storage becomes acrucial issue. For such kind of problems, we can use a class of methods called iterativemethods. In this chapter, we only consider some classical iterative methods.

    We remark that the disadvantage with classical iterative methods is that the conver-gence rate maybe is slow or they may even diverge, and a stopping criterion is neededto be found.

    5.1 Jacobi and Gauss-Seidel method

    5.1.1 Jacobi method

    Consider the following linear system

    Ax= b

    whereA = [aij] Rnn. We can write the matrix A in the following formA= D L U,

    whereD= diag(a11, a22, , ann),

    63

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    74/196

    64 CHAPTER 5. CLASSICAL ITERATIVE METHODS

    L=

    0

    a21 0

    a31

    a32 0

    ... ... . . . . . .

    an1 an2 an,n1 0

    ,and

    U =

    0 a12 a13 a1n

    0 a23 a2n. . .

    . . . ...

    0 an1,n0

    .Then it is easy to see that

    x= BJx + g,

    whereBJ =D

    1(L + U), g= D1b.

    The matrix BJis called the Jacobi iteration matrix. The corresponding iteration

    xk =BJxk1+ g, k= 1, 2, , (5.1)

    is known as the Jacobi method if an initial vector x0 =

    x(0)1 , x

    (0)2 , , x(0)n

    Tis given.

    5.1.2 Gauss-Seidel method

    In the Jacobi method, to compute the components of the vector

    xk+1=

    x(k+1)1 , x

    (k+1)2 , , x(k+1)n

    T,

    only the components of the vector xk are used. However, note that to compute x(k+1)i ,

    we could use x(k+1)1 , x

    (k+1)2 , , x(k+1)i1 , which were already available for us. Thus a

    natural modification of the Jacobi method is to rewrite the Jacobi iteration (5.1) in thefollowing form

    xk = (D L)1U xk1+ (D L)1b, k= 1, 2, . (5.2)The idea is to use each new component as soon as it is available in the computation ofthe next component. The iteration (5.2) is known as the Gauss-Seidel method.

    Note that the matrix D L is a lower triangular matrix with a11, , ann on thediagonal. Because these entries are assumed to be nonzero, the matrix D L is non-singular. The matrix

    BGS= (D L)1Uis called the Gauss-Seidel iteration matrix.

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    75/196

    5.2. CONVERGENCE ANALYSIS 65

    5.2 Convergence analysis

    5.2.1 Convergence theorems

    It is often hard to make a good initial approximation x0. Thus, it will be nice to haveconditions that will guarantee the convergence of Jacobi, Gauss-Seidel methods for anyarbitrary choice of the initial approximation.

    Both of the Jacobi iteration and the Gauss-Seidel iteration can be expressed by

    xk+1= Bxk+ g, k= 0, 1, . (5.3)

    For the Jacobi iteration, we have

    BJ =D1(L + U), g= D1b;

    and for the Gauss-Seidel iteration, we have

    BGS= (D L)1U, g= (D L)1b.

    The iteration (5.3) is called linear stationary iteration, where B Rnn is called theiteration matrix, g Rn the constant term, and x0 Rn the initial vector. In thefollowing, we give a convergence theorem.

    Theorem 5.1 The iteration (5.3) converges with an arbitrary initial guess x0 if andonly if the matrixBk 0 ask .

    Proof: Fromx = Bx + g andxk+1= Bxk+ g, we have

    x xk+1= B(x xk). (5.4)

    Because it is true for any value ofk , we can write

    x xk =B(x xk1). (5.5)

    Substituting (5.5) into (5.4), we have

    x xk+1= B2(x xk1).

    Continuing this process k times, we can write

    x xk+1= Bk+1(x x0).

    This shows that {xk} converges to the solutionxfor any choicex0if and only ifBk 0as k .

  • 8/12/2019 Numerical Linear Algebra Applications Jin

    76/196

    66 CHAPTER 5. CLASSICAL ITERATIVE METHODS

    Recall that Bk 0 as k if and only if the spectral radius (B) < 1. Since|i| B, a good way to see whether (B) < 1 is to see whetherB < 1 bycomputing

    B

    with a row-sum or column-sum norm. Note that the converse is not

    true. Combining the result of Theorem 5.1 with the above observation, we have thefollowing theorem.

    Theorem 5.2 The iteration (5.3) converges for any choice ofx0 if and only if(B)n

    j=1j=i

    |aij|, i= 1, 2, ,