082183813x.pdf

Linear Algebra in Action

Harry Dym

Graduate Studies in Mathematics Volume 78

,. American Mathematical Society

Linear Algebra in Action

Harry Dym

Graduate Studies in Mathematics Volume 78

Editorial Board David Cox

Walter Craig Nikolai Ivanov

Steven G. Krantz David Saltman (Chair)

2000 Mathematics Subject Classification. Primary 15-01, 30-01, 34-01, 39-01, 52-01, 93-01.

For additional information and updates on this book, visit www.ams.org/bookpages/gsm-78

Library of Congress Cataloging-in-Publication Data Dym, H. (Harry), 1938-.

Linear algebra in action / Harry Dym. p. cm. - (Graduate studies in mathematics, ISSN 1065-7339 i v. 78)

Includes bibliographical references and index. ISBN-13: 978-0-8218-3813-6 (alk. paper) ISBN-lO: 0-8218-3813-X (alk. paper) 1. Algebras, Linear. I. Title.

QA184.2.D96 2006 512'.5--dc22 2006049906

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given.

Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by e-mail to reprint-permissionGams. org.

2007 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights

except those granted to the United States Government. Printed in the United States of America.

@ The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability.

Visit the AMS home page at http://IMl.ams.org/ 10987654321 12 11 10 09 08 07

Dedicated to the memory of our oldest son Jonathan Carroll Dym and our first granddaughter A vital Chana Dym, who were recalled prematurely for

no apparent reason, he but 44 and she but 12. Yhi zichram baruch

Contents

Preface

Chapter 1. Vector spaces 1.1. Preview 1.2. The abstract definition of a vector space 1.3. Some definitions 1.4. Mappings 1.5. Triangular matrices 1.6. Block triangular matrices 1. 7. Schur complements 1.8. Other matrix products

Chapter 2. Gaussian elimination 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7. 2.8.

Some preliminary observations Examples Upper echelon matrices The conservation of dimension Quotient spaces Conservation of dimension for matrices From U to A Square matrices

Chapter 3. Additional applications of Gaussian elimination 3.1. Gaussian elimination redux

xv

1 1 2 5

11 13 16 17 19

21 22 24 30 36 38 38 40 41

45 45

-v

vi Contents

3.2. Properties of BA and AC 48 3.3. Extracting a basis 50 3.4. Computing the coefficients in a basis 51 3.5. The Gauss-Seidel method 52 3.6. Block Gaussian elimination 55 3.7. {O, 1, oo} 56 3.8. Review 57

Chapter 4. Eigenvalues and eigenvectors 61 4.1. Change of basis and similarity 62 4.2. Invariant subspaces 64 4.3. Existence of eigenvalues 64 4.4. Eigenvalues for matrices 66 4.5. Direct sums 69 4.6. Diagonalizable matrices 71 4.7. An algorithm for diagonalizing matrices 73 4.8. Computing eigenvalues at this point 74 4.9. Not all matrices are diagonalizable 76 4.10. The Jordan decomposition theorem 78 4.11. An instructive example 79 4.12. The binomial formula 82 4.13. More direct sum decompositions 82 4.14. Verification of Theorem 4.12 84 4.15. Bibliographical notes 87

Chapter 5. Determinants 89 5.1. Functionals 89 5.2. Determinants 90 5.3. Useful rules for calculating determinants 93 5.4. Eigenvalues 97 5.5. Exploiting block structure 99 5.6. The Binet-Cauchy formula 102 5.7. Minors 104 5.8. Uses of determinants 108 5.9. Companion matrices 108 5.10. Circulants and Vandermonde matrices 109

Contents vii

Chapter 6. Calculating Jordan forms 111 6.1. Overview 112 6.2. Structure of the nullspaces NBi 112 6.3. Chains and cells 115 6.4. Computing J 116 6.5. An algorithm for U 117 6.6. An example 120 6.7. Another example 122 6.8. Jordan decompositions for real matrices 126 6.9. Companion and generalized Vandermonde matrices 128

Chapter 7. N ormed linear spaces 133 7.1. Four inequalities 133 7.2. N ormed linear spaces 138 7.3. Equivalence of norms 140 7.4. Norms of linear transformations 142 7.5. Multiplicative norms 143 7.6. Evaluating some operator norms 145 7.7. Small perturbations 147 7.8. Another estimate 149 7.9. Bounded linear functionals 150 7.1O. Extensions of bounded linear functionals 152 7.11. Banach spaces 155

Chapter 8. Inner product spaces and orthogonality 157 8.1. Inner product spaces 157 8.2. A characterization of inner product spaces 160 8.3. Orthogonality 161 8.4. Gram matrices 163 8.5. Adjoints 163 8.6. The Riesz representation theorem 166 8.7. Normal, selfadjoint and unitary transformations 168 8.8. Projections and direct sum decompositions 170 8.9. Orthogonal projections 172 8.1O. Orthogonal expansions 174 8.11. The Gram-Schmidt method 177

viii

8.12. Toeplitz and Hankel matrices 8.13. Gaussian quadrature 8.14. Bibliographical notes

Chapter 9. Symmetric, Hermitian and normal matrices 9.1. 9.2. 9.3. 9A. 9.5. 9.6. 9.7. 9.8. 9.9. 9.1O.

Hermitian matrices are diagonalizable Commuting Hermitian matrices Real Hermitian matrices Projections and direct sums in IF n Projections and rank Normal matrices Schur's theorem QR factorization Areas, volumes and determinants Bibliographical notes

Chapter 10. Singular values and related inequalities 1O.1. Singular value decompositions 1O.2. Complex symmetric matrices 1O.3. Approximate solutions of linear equations lOA. The Courant-Fischer theorem 1O.5. 1O.6.

Inequalities for singular values Bibliographical notes

Chapter 11. Pseudoinverses 11.1. Pseudoinverses

Contents

178 180 183

185 186 188 190 191 195 195 198 201 202 206

207 207 212 213 215 218 225

227 227

11.2. The Moore-Penrose inverse 234 11.3. Best approximation in terms of Moore-Penrose inverses 237

Chapter 12. Triangular factorization and positive definite matrices 239 12.1. A detour on triangular factorization 240 12.2. 12.3. 12A. 12.5. 12.6. 12.7. 12.8.

Definite and semidefinite matrices Characterizations of positive definite matrices An application of factorization Positive definite Toeplitz matrices Detour on block Toeplitz matrices A maximum entropy matrix completion problem Schur complements for semidefinite matrices

242 244 247 248 254 258 262

Contents ix

12.9. Square roots 265 12.1O. Polar forms 267 12.11. Matrix inequalities 268 12.12. A minimal norm completion problem 271 12.13. A description of all solutions to the minimal norm

completion problem 273 12.14. Bibliographical notes 274

Chapter 13. Difference equations and differential equations 275 13.1. Systems of difference equations 276 13.2. The exponential etA 277 13.3. Systems of differential equations 279 13.4. Uniqueness 281 13.5. Isometric and isospectral flows 282 13.6. Second-order differential systems 283 13.7. Stability 284 13.8. Nonhomogeneous differential systems 285 13.9. Strategy for equations 285 13.1O. Second-order difference equations 286 13.11. Higher order difference equations 289 13.12. Ordinary differential equations 290 13.13. Wronskians 293 13.14. Variation of parameters 295

Chapter 14. Vector valued functions 297 14.1. 1\lean value theorems 298 14.2. Taylor's formula with remainder 299 14.3. Application of Taylor's formula with remainder 300 14.4. Mean value theorem for functions of several variables 301 14.5. Mean value theorems for vector valued functions of

several variables 301 14.6. Newton's method 304 14.7. A contractive fixed point theorem 306 14.8. A refined contractive fixed point theorem 308 14.9. Spectral radius 309 14.1O. The Brouwer fixed point theorem 313 14.11. Bibliographical notes 316

x Contents

Chapter 15. The implicit function theorem 317 15.1. Preliminary discussion 317 15.2. The main theorem 319 15.3. A generalization of the implicit function theorem 324 15.4. Continuous dependence of solutions 326 15.5. The inverse function theorem 327 15.6. Roots of polynomials 329 15.7. An instructive example 329 15.8. A more sophisticated approach 331 15.9. Dynamical systems 333 15.10. Lyapunov functions 335 15.11. Bibliographical notes 336

Chapter 16. Extremal problems 337 16.1. Classical extremal problems 337 16.2. Extremal problems with constraints 341 16.3. Examples 344 16.4. Krylov subspaces 349 16.5. The conjugate gradient method 349 16.6. Dual extremal problems 354 16.7. Bibliographical notes 356

Chapter 17. Matrix valued holomorphic functions 357 17.1. Differentiation 357 17.2. Contour integration 361 17.3. Evaluating integrals by contour integration 365 17.4. A short detour on Fourier analysis 370 17.5. Contour integrals of matrix valued functions 372 17.6. Continuous dependence of the eigenvalues 375 17.7. More on small perturbations 377 17.8. Spectral radius redux 378 17.9. Fractional powers 381

Chapter 18. Matrix equations 383 18.1. The equation X - AX B = C 383 18.2. The Sylvester equation AX - X B = C 385 18.3. Special classes of solutions 388

Contents

18.4. 18.5. 18.6. 18.7.

Riccati equations Two lemmas An LQR problem Bibliographical notes

Chapter 19. Realization theory 19.1. Minimal realizations 19.2. Stabilizable and detectable realizations 19.3. Reproducing kernel Hilbert spaces 19.4. de Branges spaces 19.5. RQ invariance 19.6. Factorization of 8(A) 19.7. Bibliographical notes

Chapter 20. Eigenvalue location problems 20.1. Interlacing 20.2. Sylvester's law of inertia 20.3. Congruence 20.4. Counting positive and negative eigenvalues 20.5. Exploiting continuity 20.6. Gersgorin disks 20.7. The spectral mapping principle 20.8. AX = X B 20.9. Inertia theorems 20.10. An eigenvalue assignment problem 20.11. Bibliographical notes

Chapter 21. Zero location problems 21.1. Bezoutians

xi

390 396 398 400

401 408 415 416 418 420 421 425

427 427 430 431 433 437 438 439 440 441 443 446

447 447

21.2. A derivation of the formula for H, based on realization 452 21.3. The Barnett identity 453 21.4. The main theorem on Bezoutians 455 21.5. Resultants 457 21.6. Other directions 461 21. 7. Bezoutians for real polynomials 463 21.8. Stable polynomials 464 21.9. Kharitonov's theorem 466

XlI Contents

21.1O. Bibliographical notes 467 Chapter 22. Convexity 469

22.1. Preliminaries 469 22.2. Convex functions 471 22.3. Convex sets in lR n 473 22.4. Separation theorems in lR n 475 22.5. Hyperplanes 477 22.6. Support hyperplanes 479 22.7. Convex hulls 480 22.8. Extreme points 482 22.9. Brouwer's theorem for compact convex sets 485 22.1O. The Minkowski functional 485 22.11. The Gauss-Lucas theorem 488 22.12. The numerical range 489 22.13. Eigenvalues versus numerical range 491 22.14. The Heinz inequality 492 22.15. Bibliographical notes 494

Chapter 23. Matrices with nonnegative entries 495 23.1. Perron-Frobenius theory 496 23.2. Stochastic matrices 503 23.3. Doubly stochastic matrices 504 23.4. An inequality of Ky Fan 507 23.5. The Schur-Horn convexity theorem 509 23.6. Bibliographical notes 513

Appendix A. Some facts from analysis 515 A.1. Convergence of sequences of points 515 A.2. Convergence of sequences of functions 516 A.3. Convergence of sums 516 A.4. Sups and infs 517 A.5. Topology 518 A.6. Compact sets 518 A.7. N ormed linear spaces 518

Appendix B. More complex variables 521 B.l. Power series 521

Contents xiii

B.2. Isolated zeros 523 B.3. The maximum modulus principle 525 B.4. In (1 - A) when JAJ < 1 525 B.5. Rouche's theorem 526 B.6. Liouville's theorem 528 B.7. Laurent expansions 528 B.8. Partial fraction expansions 529

Bibliography 531

Notation Index 535

Subject Index 537

Preface

A foolish consistency is the hobgoblin of little minds, ... Ralph Waldo Emerson, Self Reliance

This book is based largely on courses that I have taught at the Fein-berg Graduate School of the Weizmann Institute of Science over the past 35 years to graduate students with widely varying levels of mathematical sophistication and interests. The objective of a number of these courses was to present a user-friendly introduction to linear algebra and its many ap-plications. Over the years I wrote and rewrote (and then, more often than not, rewrote some more) assorted sets of notes and learned many interesting things en route. This book is the current end product of that process. The emphasis is on developing a comfortable familiarity with the material. Many lemmas and theorems are made plausible by discussing an example that is chosen to make the underlying ideas transparent in lieu of a formal proof; i.e., I have tried to present the material in the way that most of the mathe-maticians that I know work rather than in the way they write. The coverage is not intended to be exhaustive (or exhausting), but rather to indicate the rich terrain that is part of the domain of linear algebra and to present a decent sample of some of the tools of the trade of a working analyst that I have absorbed and have found useful and interesting in more than 40 years in the business. To put it another way, I wish someone had taught me this material when I was a graduate student. In those days, in the arrogance of youth, I thought that linear algebra was for boys and girls and that real men and women worked in functional analysis. However, this is but one of many opinions that did not stand the test of time.

-xv

xvi Preface

In my opinion, the material in this book can (and has been) used on many levels. A core course in classical linear algebra topics can be based on the first six chapters, plus selected topics from Chapters 7-9 and 13. The latter treats difference equations, differential equations and systems thereof. Chapters 14-16 cover applications to vector calculus, including a proof of the implicit function based on the contractive fixed point theorem, and ex-tremal problems with constraints. Subsequent chapters deal with matrix valued holomorphic functions, matrix equations, realization theory, eigen-value location problems, zero location problems, convexity, and matrices with nonnegative entries. I have taken the liberty of straying into areas that I consider significant, even though they are not usually viewed as part of the package associated with linear algebra. Thus, for example, I have added short sections on complex function theory, Fourier analysis, Lyapunov func-tions for dynamical systems, boundary value problems and more. A number of the applications are taken from control theory.

I have adapted material from many sources. But the one which was most significant for at least the starting point of a number of topics covered in this work is the wonderful book [45] by Lancaster and Tismenetsky.

A number of students read and commented on substantial sections of assorted drafts: Boris Ettinger, Ariel Ginis, Royi Lachmi, Mark Kozdoba, Evgeny Muzikantov, Simcha RimIer, Jonathan Ronen, Idith Segev and Amit Weinberg. I thank them all, and extend my appreciation to two senior readers: Aad Dijksma and Andrei Iacob for their helpful insightful remarks. A special note of thanks goes to Deborah Smith, my copy editor at AMS, for her sharp eye and expertise in the world of commas and semicolons.

On the production side, I thank Jason Friedman for typing an early version, and our secretaries Diana Mandelik, Ruby Musrie, Linda Alman, Terry Debesh, all of whom typed selections and to Diana again for preparing all the figures and clarifying numerous mysterious intricacies of Latex. I also thank Barbara Beeton of AMS for helpful advice on AMS Latex.

One of the difficulties in preparing a manuscript for a book is knowing when to let go. It is always possible to write it better.! Fortunately AMS maintains a web page: http://www.ams.org/bookpages/gsm-78, for sins of omission and commission (or just plain afterthoughts).

TAM, ACH TEREM NISHLAM,oo. October 18, 2006 Rehovot, Israel

1 Israel Gohberg tells of a conversation with Lev Sakhnovich that took place in Odessa many years ago: Lev: Israel, how is your book with Mark Gregorovic (Krein) progressing? Israel: It's about 85% done. Lev: That's great! Why so sad? Israel: If you would have asked me yesterday, I would have said 95%.

Vector spaces

The road to wisdom? Well it's plain and simple to express. Err and err and err again, but less and less and less.

1.1. Preview

Chapter 1

Cited in [43)

One of the fundamental issues that we shall be concerned with is the solution of linear equations of the form

au Xl +a12X2 + .. +alqxq=bl a2lXI +a22X2 + .. +a2qxq=b2

where the aij and the bi are given numbers (either real or complex) for i = 1, . .. ,p and j = 1, . .. ,q, and we are looking for the x j for j = 1, ... ,q. Such a system of equations is equivalent to the matrix equation

Ax=b,

where

[a~l ... a~q] _ [~l] _ [b.l] A =: :' x - : and b - : apl apq Xq bp

-1

2 1. Vector spaces

RC Cola: The term aij in the matrix A sits in the i'th row and the j'th column of the matrix; Le., the first index stands for the number of the row and the second for the number of the column. The order is rc as in the popular drink by that name.

Given A and b, the basic questions are: 1. When does there exist at least one solution x? 2. When does there exist at most one solution x? 3. How to calculate the solutions, when they exist? 4. How to find approximate solutions?

The answers to these questions are part and parcel of the theory of vector spaces.

1.2. The abstract definition of a vector space

This subsection is devoted to the abstract definition of a vector space. Even though the emphasis in this course is definitely computational, it seems advisable to start with a few abstract definitions which will be useful in future situations as well as in the present.

A vector space V over the real numbers is a nonempty collection of objects called vectors, together with an operation called vector addition, which assigns a new vector u + v in V to every pair of vectors u in V and v in V, and an operation called scalar multiplication, which assigns a vector av in V to every real number a and every vector v in V such that the following hold:

1. For every pair of vectors u and v, u+v = v+u; Le., vector addition is commutative.

2. For any three vectors u, v and w, u + (v + w) = (u + v) + w; Le., vector addition is associative.

3. There is a zero vector (or, in other terminology, additive identity) o E V such that 0 + v = v + 0 = v for every vector v in V.

4. For every vector v there is a vector w (an additive inverse of v) such that v + w = o.

5. For every vector v, Iv = v. 6. For every pair ofreal numbers a and f3 and every vector v, a(f3v) =

(af3)v. 7. For every pair ofreal numbers a and f3 and every vector v, (a+f3)v =

av+ f3v. 8. For every real number a and every pair of vectors u and v, a(u+v) =

au + av.

1.2. The abstract definition of a vector space 3

Because ofItem 2, we can write u+v+w without brackets; similarly, because of Item 6 we can write o.{3v without brackets. It is also easily checked that there is exactly one zero vector 0 E V: If 0 is a second zero vector, then o = 0 + 0 = O. A similar argument shows that each vector v E V has exactly one additive inverse, -v = (-l)v in V. Correspondingly, we write u + (-v) = u - v.

From now on we shall use the symbol lR to designate the real numbers, the symbol C to designate the complex numbers and the symbol IF when the statement in question is valid for both lR and C and there is no need to specify. Numbers in IF are often referred to as scalars.

A vector space V over C is defined in exactly the same way as a vector space V over lR except that the numbers a and {3 which appear in the definition above are allowed to be complex.

Exercise 1.1. Show that if V is a vector space over C, then Ov = 0 for every vector v E V.

Exercise 1.2. Let V be a vector space over IF. Show that if a, {3 E IF and if v is a nonzero vector in V, then o.v = {3v ::::::} a = {3. [HINT: 0.- {3 =I 0 ===> v = (a - (3)-1(0. - (3)v).] Example 1.1. The set of column vectors

FP ~ { [~J : Xi E F, i ~ 1, ... ,p} of height p with entries Xi E IF that are subject to the natural rules of vector addition

[~l] + [~l] = [Xl 7 Yl] Xp YP Xp + YP

and multiplication

of the vector x by a number a E IF is the most basic example of a vector space. Note the difference between the number 0 and the vector 0 E IF p. The latter is a column vector of height p with all p entries equal to the number zero.

4 1. Vector spaces

The set IFpxq of p x q matrices with entries in IF is a vector space with respect to the rules of vector addition:

[X~l . . . X~q] + [Y~l . . . y~q] = [XU 7 Yu ... Xpl Xpq Ypl Ypq Xpl + Ypl

Xlq 7 Ylq] , Xpq + Ypq

and multiplication by a scalar a E IF:

a [X~l . . . X;q] = [a~l1 ... a~lq]. Xpl Xpq aXpl aXpq

Notice that the vector space IFP dealt with a little earlier coincides with the vector space that is designated IFpxl in the current example.

Exercise 1.3. Show that the space JR 3 endowed with the rule

x D y = [ ::i:~:~~~ 1 maX(X3,Y3)

for vector addition and the usual rule for scalar multiplication is not a vector space over R [HINT: Show that this "addition" rule does not admit a zero element; Le., there is no vector a E JR 3 such that a D x = x D a = x for every x E JR3.]

Exercise 1.4. Let C c Ill. 3 denote the set of vectors a ~ [ :: 1 such that the polynomial al + a2t + a3t2 ~ 0 for every t E R Show that it is closed under vector addition (Le., a, bE C ~ a+ bE C) and under multiplication by positive numbers (Le., a E C and a> 0 ~ aa E C), but that C is not a vector space over R [REMARK: A set C with the indicated two properties is called a cone.] Exercise 1.5. Show that for each positive integer n, the space of polyno-mials

n

p(A) = L ajAj of degree n j=O

with coefficients aj E C is a vector space over C under the natural rules of addition and scalar multiplication. [REMARK: You may assume that 'L,"]=o ajAj = 0 for every A E C if and only if ao = al = ... = an = 0.] Exercise 1.6. Let:F denote the set of continuous real-valued functions f(x) on the interval 0 :$ X :$ 1. Show that :F is a vector space over JR with respect to the natural rules of vector addition ((fl + h)(x) = Jr(x) + h(x)) and scalar multiplication ((af)(x) = af(x)).

1.3. Some definitions 5

1.3. Some definitions Subspaces: A subspace M of a vector space V over IF is a nonempty

subset of V that is closed under vector addition and scalar multiplication. In other words if x and y belong to M, then x+y E M and ax E M for every scalar a Elf. A subspace of a vector space is automatically a vector space in its own right.

Exercise 1.7. Let Fo denote the set of continuous real-valued functions f(x) on the interval 0 ::; x ::; 1 that meet the auxiliary constraints f(O) = 0 and f(1) = O. Show that Fo is a vector space over ~ with respect to the natural rules of vector addition and scalar multiplication that were intro-duced in Exercise 1.6 and that Fo is a subspace of the vector space F that was considered there.

Exercise 1.8. Let Fl denote the set of continuous real-valued functions f (x) on the interval 0 ::; x ::; 1 that meet the auxiliary constraints f (0) = 0 and f(1) = 1. Show that Fl is not a vector space over ~ with respect to the natural rules of vector addition and scalar multiplication that were introduced in Exercise 1.6.

Span: If VI, ... ,Vk is a given set of vectors in a vector space V over IF, then

span {VI, ... ,Vk} = {tajVj : al, .. ' , ak Elf} . )=1

In words, the span iR the set of all linear combinations al VI + ... + ak Vk of the indicated set of vectors, with coefficients aI, ... ,ak in IF. It is important to keep in mind that span{ VI, ... ,Vk} may be small in some sense. In fact, span { VI, ... ,Vk} is the smallest vector space that contains the vectors V}, ... ,Vk. The number of vectors k that were used to define the span is not a good indicator of the size of this space. Thus, for example, if

then span{Vl, V2, V3} = span{vl}'

To clarify the notion of the size of the span we need the concept of linear dependence .

Linear dependence: A set of vectors VI, ... ,Vk in a vector space V over IF is said to be linearly dependent over IF if there exists a

6 1. Vector spaces

set of scalars aI, ... ,ak ElF, not all of which are zero, such that

a1 VI + ... + ak vk = 0 . Notice that this permits you to express one or more of the given vectors in terms of the others. Thus, if a1 =f. 0, then

a2 ak VI = - -V2 - ... - -Vk a1 a1

and hence span{vI, ... ,vd = span{V2, ... ,vd

Further reductions are possible if the vectors V2, ... ,Vk are still lin-early dependent .

Linear independence: A set of vectors VI, ... ,Vk in a vector space V over IF is said to be linearly independent over IF if the only scalars aI, ... ,ak E IF for which

a1v1 + ... + akvk = 0 are a1 = ... = ak = O. This is just another way of saying that you cannot express one of these vectors in terms of the others. Moreover, if {VI, ... ,vd is a set of linearly independent vectors in a vector space V over IF and if

(1.1) V = a1v1 + ... + akvk and V = f31v1 + ... + f3kvk for some choice of constants a1, ... ,ak, 131, ... ,13k ElF, then aj = f3j for j = 1, . .. ,k.

Exercise 1.9. Verify the last assertion; i.e., if (1.1) holds for a linearly independent set of vectors, {VI, ... ,Vk}, then aj = f3j for j = 1, ... ,k. Show by example that this conclusion is false if the given set of k vectors is not linearly independent .

Basis: A set of vectors VI, ... ,Vk is said to form a basis for a vector space V over IF if

(1) span{vI, ... ,Vk} = V. (2) The vectors VI, ... ,Vk are linearly independent.

Both of these conditions are essential. The first guarantees that the given set of k vectors is big enough to express every vector V E Vasa linear combination of VI, ... ,Vk; the second that you cannot achieve this with less than k vectors.

A nontrivial vector space V has many bases. However, the number of elements in each basis for V is exactly the same and is referred to as the dimension of V and will be denoted dim V. A proof of this statement will be furnished later. The next example should make it plausible.


Example 1.2. It is readily checked that the vectors

[~] ,m and m form a basis for the vector space IF 3 over the field IF. It is also not hard to show that no smaller set of vectors will do. (Thus, dim IF 3 = 3, and, of course, dimIF k = k for every positive integer k.)

In a similar vein, the p x q matrices E ij , i = 1, ... ,p,j = 1, ... ,q, that are defined by setting every entry in Eij equal to zero except for the ij entry, which is set equal to one, form a basis for the vector space IFpxq .

Matrix multiplication: Let A = [aij] be apxq matrix and B = [bstj be a q x r matrix. Then the product AB is the p x r matrix C = [CklJ with entries

q

Ckf. = L akjbjf., k = 1, ... ,p;.e = 1 ... ,r. j=1 Notice that Ckf. is the matrix product of the the k'th row ak of A with the ,th column bf. of B:

Ckf. = akbf. = [akl ... akq] [ b~f.] . bqf.

Thus, for example, if

[ ~ ~ ] [1 2 3 3 A and B = 1 0 -1 1 o 1 2

then AB ~ [4 7 10 2].

3 4 5 9

:] , -1

Moreover, if A E IFpxq and x E lF q, then y = Ax is the vector in lF P with components Yi = L3=1 aijXj for i = 1, ... ,po

Identity matrix: We shall use the symbol In to denote the n x n matrix A = [aij], i, j = 1, ... ,n, with aii = 1 for i = 1, ... ,n and aij = 0 for i i= j. Thus,

13~ [H ~].

8 1. Vector spaces

The matrix In is referred to as the n x n identity matrix, or just the identity matrix if the size is clear from the context. The name stems from the fact that Inx = x for every vector x E lFn.

Zero matrix: We shall use the symbol Opxq for the matrix in lF pxq all of whose entries are equal to zero. The subscript p x q will be dropped if the size is clear from the context.

The definition of matrix multiplication is such that:

Matrix multiplication is not commutative, Le., even if A and Bare both p x p matrices, in general AB i- BA. In fact, if p > 1, then one can find A and B such that AB = Opxp, but BA i- Opxp.

Exercise 1.10. Find a pair of 2 x 2 matrices A and B such that AB = 02x2 but BA i- 02x2.

(1.2)

Matrix multiplication is associative: If A E lF pxq , B E lF qxr and C E lFrxs, then

(AB)C = A(BC). Matrix multiplication is distributive: If A, A l , A2 E lF pxq and B, B l ,

B2 E lF qxr , then

(Al + A2)B = AlB + A2B and A(Bl + B2) = ABl + AB2 . If A E lF pxq is expressed both as an array of p row vectors of length

q and as an array of q column vectors of height p:

and if B E lF qxr is expressed both as an array of q row vectors of length r and as an array of r column vectors of height q:

then the product AB can be expressed in the following three ways:

[ alB 1 AB= ~B = [Ab l q Abr ] = Laibi. i=l

Exercise 1.11. Show that if


then

AB = [:~~ ~ ~] B + [ ~ and hence that

AB = [ all ] [b ll ... b14] + [ a12 ] [b21 b22 b23 b24] + [ a13 ] [b31 ... b34]. a21 a22 a23

Exercise 1.12. Verify the three ways of writing a matrix product in for-mula (1.2). [HINT: Let Exercise 1.11 serve as a guide.]

Block multiplication: It is often convenient to express a large ma-trix as an array of sub-matrices (Le., blocks of numbers) rather than as an array of numbers. Then the rules of matrix multiplication still apply (block by block) provided that the block decompositions are compatible. Thus, for example, if

with entries Aij E lFPixqj and Bjk E lFqjXTk, then

C = AB = [Cij ] ,i = 1, ... ,3, j = 1, ... ,4, where

is a Pi x r j matrix. Transposes: The transpose of a P x q matrix A is the q x P matrix

whose k'th row is equal to the k'th column of A laid sideways, k = 1, ... ,q. In other words, the ij entry of A is equal to the ji entry of its transpose. The symbol AT is used to designate the transpose of A. Thus, for example, if

(1.3)

A = [~ ~ : 1 ,then AT = [: ~] . It is readily checked that

(ATf = A and (ABf = BT AT. Hermitian transposes: The Hermitian transpose AH of a P x q ma-

trix A is the same as the transpose AT of A, except that all the entries

10 1. Vector spaces

in the transposed matrix are replaced by their complex conjugates. Thus, for example, if

A = [ 1 3i 5 + i 1 4 2-i 6i

' then AH

It is readily checked that (1.4) (AH)H = A and (AB)H = BH AH .

Inverses: Let A E lFpxq . Then: (1) A matrix C E lF qxp is said to be a left inverse of A if CA = lq. (2) A matrix B E lFpxq is said to be a right inverse of A if AB =

lp. In the first case A is said to be left invertible. In the second case A is said to be right invertible. It is readily checked that if a matrix A E lFpxq has both a left inverse C and a right inverse B, then B = C:

C = Clp = C(AB) = (CA)B = lqB = B. Notice that this implies that if A has both a left and a right inverse, then it has exactly one left inverse and exactly one right inverse and (as shown just above) the two are equal. In this instance, we shall say that A is invertible and refer to B = C as the inverse of A and denote it by A-I. In other words, a matrix A E lFpxq is invertible if and only if there exists a matrix B E lF qxp such that AB = lp and BA = lq. In fact, as we shall see later, we must also have q = p in this case.

Exercise 1.13. Show that if A and B are invertible matrices of the same size, then AB is invertible and (AB)-l = B-IA-l.

Exercise 1.14. Show that the matrix A = [ 1~ o~ ~ll has no left inverses and no right inverses.

Exercise 1.15. Show that the matrix A = [~ ~ ~] has at least two right inverses, but no left inverses.

Exercise 1.16. Show that if a matrix A E Cpxq has two right inverses BI and B2, then >'Bl + (1- >.)B2 is also a right inverse for every choice of>. E C. Exercise 1.17. Show that a given matrix A E lFpxq has either 0, 1 or infinitely many right inverses and that the same conclusion prevails for left inverses.

1.4. Mappings 11

Exercise 1.18. Let Au E lF Pxp , Al2 E lF pxq and A21 E lF qxP Show that if Au is invertible, then

[Au A12] is right invertible and is left invertible.

1.4. Mappings Mappings: A mapping (or transformation) T from a subset 'DT of

a vector space U into a vector space V is a rule that assigns exactly one vector v E V to each u E 'DT. The set 'DT is called the domain ofT.

The fO[IlOW] ing three e[X:f~S4;~ve]some idea of the possibilities: ( a) T: :~ E]R 2 I-t X2 - Xl E ]R 3.

Xl + 2X2 + 6

(b) T: {[:~] E]R2: Xl - X2 # o} I-t [lj(XI - X2)] E ]RI. [ ] [

3XI + X2] ( c) T: :~ E]R 2 I-t Xl - X2 E ]R 3 . 3XI + X2

The restriction on the domain in the second example is imposed in order to insure that the definition is meaningful. In the other two examples the domain is taken equal to the full vector space.

In this framework we shall refer to the set

NT = {u E 'DT : Tu = Ov} as the nullspace (or kernel) of T and the set

'RT = {Tu : u E VT} as the range (or image) ofT. The subscript V is added to the symbol o in the first definition to emphasize that it is the zero vector in V, not in U .

Linear mapping: A mapping T from a vector space U over IF into a vector space V over the same number field IF is said to be a linear mapping (or a linear transformation) if for every choice of u, v E U and a ElF the following two conditions are met:

(1) T(u + v) = Tu + Tv. (2) T(au) = aTu.

It is readily checked that if T is a linear mapping from a vector space U over IF into a vector space V over IF, then NTis a subspace of U and 'RT is a subspace of V . Moreover, in the preceding set of three examples, T is linear only in case (c).

12 1. Vector spaces

The identity: Let U be a vector space over IF. The special linear transformation from U into U that maps each vector U E U into itself is called the identity mapping. It is denoted by the symbol In if U = IFn and by Iu otherwise, though, more often than not, when the underlying space U is clear from the context, the subscript U will be dropped and I will be written in place of Iu. Thus, Iuu = I U = U for every vector U E U.

Exercise 1.19. Compute NT and 'RT for each of the three cases (a), (b) and (c) considered above and say which are subspaces and which are not.

Linear transformations are intimately connected with matrix multiplication:

Exercise 1.20. Show that if T is a linear transformation from a vector space U over IF with basis {Ul, ... , uq } into a vector space V over IF with basis {VI, ... , V p}, then there exists a unique set of scalars aij E IF, i = 1, ... , p and j = 1, . .. , q such that

(1.5) p

TUj = LaijVi for j = 1, ... ,q i=l

and hence that q p

(1.6) T(LXjUj) = LYiVi ~Ax=y, j=l i=l

where x E IFq has components Xl,." , Xq, Y E IFP has components Yl, .. ' ,YP and the entries aij of A E IFpxq are determined by formula (1.5) .

WARNING: If A E C pxq , then matrix multiplication defines a lin-ear map from x E C q to Ax E C p. Correspondingly, the nullspace of this map,

NA = {x E C q : Ax = O}, is a subspace of C q , and the range of this map,

'RA = {Ax: x E C q }, is a subspace of CP. However, if A E IR pxq , then matrix multiplication also defines a linear map from x E IR q to Ax E IR P; and in this setting

NA = {x E IRq: Ax = O} is a subspace of IRq, and the range of this map,

'RA = {Ax: x E IRq}, is a subspace of IRP. In short, it is important to clarify the space on which A is acting, i.e., the domain of A. This will usually be clear from the context.

1.5. Triangular matrices 13

1.5. Triangular matrices

An n x n matrix A = [aij] is said to be

upper triangular if all its nonzero entries sit either on or above the diagonal, i.e., if aij = 0 when i > j.

lower triangular if all its nonzero entries sit either on or below the diagonal, i.e., if AT is upper triangular.

triangular if it is either upper triangular or lower triangular. diagonal if aij = 0 when i t= j.

Systems of equations based on a triangular matrix are particularly conve-nient to work with, even if the matrix is not invertible.

Example 1.3. Let A E lF 4x4 be a 4 x 4 upper triangular matrix with nonzero diagonal entries and let b be any vector in IF 4 . Then the vector x is a solution of the equation

(1.7) Ax=b if and only if

allXI + al2X2 + al3X3 + al4X4 bl a22x 2 + a23 X3 + a24X4 b2

a33 X3 + a34X4 b3 a44x 4 b4 .

Therefore, since the diagonal entries of A are nonzero, it is readily seen that these equations admit a (unique) solution, by working from the bottom up:

-lb a44 4

asi(b3 - a34X4) X2 - a~",}(b2 - a23x3 - a24X4) Xl aii(bl - al2x 2 - al3X3 - a14 X4) .

Thus, we have shown that for any right-hand side b, the equation (1.7) admits a (unique) solution x.

Exploiting the freedom in the choice of b, let ej, j = 1, ... ,4, denote the j'th column of the identity matrix 14 and let Xj denote the solution of the equation AXj = ej for j = 1, . .. ,4. Then the 4 x 4 matrix

X = [Xl X2 X3 X4]

with columns Xl, ... ,X4 is a right inverse of A: AX = A[XI ... X4] = [AXI'" AX4]

[el ... e4] = 14 .

14 1. Vector spaces

Analogous examples can be built for pxp lower triangular matrices. The only difference is that now it is advantageous to work from the top down. The existence of a left inverse can also be obtained by writing down the requisite equations that must be solved. It is easier, however, to play with transposes. This works because A is a triangular matrix with nonzero diagonal entries if and only if AT is a triangular matrix with nonzero diagonal entries and

Y A = Ip :::::} ATyT = Ip .

Exercise 1.21. Show that the right inverse X of the upper triangular ma-trix A that is constructed in the preceding example is also a left inverse and that it is upper triangular.

Lemma 1.4. Let A be a p x p triangular matrix. Then

(1) A is invertible if and only if all its diagonal entries are different from zero.

Moreover, if A is an invertible triangular matrix, then (2) A is upper triangular :::::} A-I is upper triangular. (3) A is lower triangular :::::} A-I is lower triangular.

Proof. Suppose first that

A = [au a 12 ] o a22

is a 2 x 2 upper triangular matrix with nonzero diagonal entries au and a22. Then it is readily checked that the matrix equation

A [~~~ ~~~] = [~ ~], which is equivalent to the pair of equations

A [ ~~~ ] = [ ~] and A [ ~~~ ] = [ ~ ] , has exactly one solution

X = [xu X12] = [ all X21 X22 0

-1 -1 1 -au a12a 22 -1 a22

and that this solution is also a left inverse of A:

1.5. Triangular matrices 15

Thus, every 2 x 2 upper triangular matrix A with nonzero diagonal entries is invertible and

(1.8) [ aOll A-1 = -1 -1 1 -an a12a22

-1 a22

is also upper triangular. Now let A and B be upper triangular k x k matrices such that AB =

BA = h. Then for every choice of a, b,e E C k and a,(J E C with 0.=1=0,

[ A a] [B b] _ [AB + acT Ab + a(J ] o a cT (J acT a(J [ h + acT Ab + (Ja ] acT a(J .

Consequently, the product of these two matrices will be equal to h+1 if and only if c = 0, Ab + (Ja = 0 and a(J = 1, that is, if and only if c = 0, b = -(JBa and (J = 1/0.. Moreover, if c, band (J are chosen to meet these conditions, then

since Ba + ab = Ba + 0.( -(JBa) = o.

Thus, we have shown if k x k upper triangular matrices with nonzero entries on the diagonal are invertible, then the same holds true for (k + 1) x (k + 1) upper triangular matrices with nonzero entries on the diagonal. Therefore, since we already know that 2 x 2 upper triangular matrices with nonzero entries on the diagonal are invertible, it follows by induction that every upper triangular matrix with nonzero entries on the diagonal is invertible and that the inverse is upper triangular.

Suppose next that A E Cpxp is an invertible upper triangular matrix with inverse B E Cpxp. Then, upon expressing the identity AB = Ip in block form as

\ [AI a1] [Bj. b1] = [Ip-l 0] o 0.1 c1 {31 0 1

with diagonal blocks of size (p - 1) x (p - 1) and 1 xI, respectively, it is readily seen that al{31 = 1. Therefore, 0.1 =1= O. The next step is to play the same game with Al to show that its bottom diagonal entry is nonzero and, continuing this way down the line, to conclude that the diagonal entries of A are nonzero and that the inverse matrix B is also automatically upper triangular. The details are left to the reader.

16 1. Vector spaces

This completes the proof of the asserted statements for upper triangular matrices. The proof for lower triangular matrices may be carried out in much the same way or, what is simpler, by taking transposes. D

Exercise 1.22. Show that if A E c nxn and Ak = Onxn for some positive integer k, then In - A is invertible. [HINT: It's enough to show that (In -A)(In+ A + A2+ ... +Ak- 1) = (In +A+A2+ .. . + Ak- 1)(In -A) = In.J Exercise 1.23. Show that even though all the diagonal entries of the ma-trix

A=[H n are equal to zero, A is invertible, and find A-I.

Exercise 1.24. Use Exercise 1.22 to show that a triangular n x n matrix A with nonzero diagonal entries is invertible by writing

A = D + (A - D) = D(In + D-1(A - D)), where D is the diagonal matrix with d jj = ajj for j = 1, ... ,n. [HINT: The key observation is that (D-1(A - D))n = O.J

1.6. Block triangular matrices

A matrix A E lF nxn with block decomposition

where Aij E lFPiXqj for i,j = 1, ... ,k and PI + ... + Pk = ql + ... + qk = n is said to be

upper block triangular if Pi = qi for i = 1, . .. , k and Aij = 0 for i > j.

lower block triangular if Pi = qi for i = 1, . .. , k and Aj = 0 for i < j.

block triangular if it is either upper block triangular or lower block triangular .

block diagonal if Pi = qi for i = 1, ... ,k and Aij = 0 for i =1= j. Note that the blocks Aii in a block triangular decomposition need not be triangular.

1.7. Schur complements 17

Exercise 1.25. Let A = [OAll AAI2] be an upper block triangular ma-qxp 22

trix with invertible diagonal blocks All of size p x p and A22 of size q x q. Show that A is invertible and that

(1.9) A-I = [ AliI Oqxp

A-IA A-I] - 11 12 22 A-I ,

22

which generalizes formula (1.8). Exercise 1.26. Use formula (1.9) to calculate the inverse of the matrix

A=[~ ~ ~l. 005

Exercise 1.27. Let A = [~~~ ~2X2q] be a lower block triangular matrix with invertible diagonal blocks All of size p x p and A22 of size q x q. Find a matrix B of the same form as A such that AB = BA = I p+q.

1.7. Schur complements

Let

(1.10) E = [~ ~], where A E Cpx P, BE C pxq , e E C qxp and D E cqxq . Then the following two factorization formulas are extremely useful:

(1) If A is an invertible matrix, then (1.11)

and D - e A-I B is referred to as the Schur complement of A with respect to E.

(2) If D is an invertible matrix, then

(1.12) = [Ip BD- I ] [A - BD-Ie E 0 I 0 q and A-BD-Ie is referred to as the Schur complement of D with respect to E.

At this point, these two formulas may appear to be simply tedious exercises in block matrix multiplication. However, they are extremely useful. Another proof based on block Gaussian elimination, which leads to even more general factorization formulas, will be presented in Chapter 3. Notice that the first formula exhibits E as the product of an invertible lower triangular matrix

18 1. Vector spaces

times a block diagonal matrix times an invertible upper triangular matrix, whereas the second formula exhibits E as the product of an invertible upper triangular matrix times a block diagonal matrix times an invertible lower triangular matrix.

Exercise 1.28. Verify formulas (1.11) and (1.12) under the stated condi-tions.

Exercise 1.29. Show that if BE C pxq and C E C qxp , then (1.13) Ip - BC is invertible =::? Iq - CB is invertible and that if these two matrices are invertible, then (1.14) [HINT: Exploit formulas (1.11) and (1.12).] Exercise 1.30. Let the matrix E be defined by formula (1.10). Show that:

A and D - CA- I B invertible ==> E is invertible, and construct an example to show that the opposite implication is false.

Exercise 1.31. Show that if the matrix E is defined by formula (1.10), then

D and A - BD-IC invertible ==> E is invertible, and show by example that the opposite implication is false.

Exercise 1.32. Show that if the blocks A and D in the matrix E defined by formula (1.10) are invertible, then

E is invertible =::? D - CA-1 B is invertible =::? A - BD-1C is invertible.

Exercise 1.33. Show that if blocks A and D in the matrix E defined by formula (1.10) are invertible and A - BD-IC is invertible, then (1.15) (A - BD-IC)-l = A-I + A-I B(D - CA-I B)-ICA-I . [HINT: Multiply both sides of the asserted identity by A - BD-IC.] Exercise 1.34. Show that if if blocks A and D in the matrix E defined by formula (1.10) are invertible and D - CA-1B is invertible, then (1.16) (D - CA-I B)-l = D-1 + D-1C(A - BD-IC)-l BD-I . [HINT: Multiply both sides of the asserted identity by D - CA-I B.] Exercise 1.35. Show that if A E CpxP, B E Cpxq , C E Cqxp and the matrices A and A + BC are both invertible, then the matrix Iq + CA -1 B is invertible and (Iq + CA-1 B)-l = Iq - C(A + BC)-l B.

1.B. Other matrix products 19

Exercise 1.36. Show that if A E CpxP, B E Cpx q, C E C qxp and the

matrix A + BC is invertible, then the matrix [~ ~q] is invertible, and find its inverse.

Exercise 1.37. Let A E CpxP, U E CP, v E CP and assume that A is invertible. Show that

[ A -u].. t'bl v H 1 IS Inver 1 e and that if these conditions are met, then

(Ip + uvH A-1)-lu = u(l + v H A-1u)-1 . Exercise 1.38. Show that if in the setting of Exercise 1.37 the condition 1 + v H A-1u i= 0 is met, then the Sherman-Morrison formula (1.17) (A + uvH)-l = A-I _ A-1uvH A-I

1 +vHA-1u holds.

Exercise 1.39. Show that if A is a P x q matrix and C is a q x q invertible matrix, then RAG = RA

Exercise 1.40. Show that the upper block triangular matrix

A = [Ad1 1~~ 1~: 1 o 0 A33

with entries Aj of size Pi XPj is invertible if the diagonal blocks All, A22 and A33 are invertible, and find a formula for A-I. [HINT: Look for a matrix B of the same form as A such that AB = Ipl +P2+P3']

1.8. Other matrix products

Two other product rules for matrices that arise in assorted applications are:

The Schur product C = AoB of A = [aij] E cnxn with B = [bij] E cnxn is defined as the n x n matrix C = [Cij] with entries Cij = aijbij for i,j = 1, ... ,no

The Kronecker product AB of A = [aij] E cpxq with B = [bij ] E cnxm is defined by the formula

[ al~:B ...

AB= aplB

20 1. Vector spaces

The Schur product of two square matrices of the same size is clearly commutative. It is also readily checked that the Kronecker product of real (or complex) matrices is associative:

and satisfies the rules (A B) C = A (B C)

(AB)T = AT BT, (A B)(C D) = AC BD,

when the indicated matrix multiplications are meaningful. If x E IF k, U E lFk, Y E lFe and v E lFe, then the last rule implies that

(xT u)(yT v) = (xT yT)(u v).

Chapter 2

Gaussian elimination

... People can tell you... do it like this. But that ain't the way to learn. You got to do it for yourself.

Willie Mays, cited in Kahn [40], p.163

Gaussian elimination is a way of passing from a given system of equations to a new system of equations that is easier to analyze. The passage from the given system to the new system is effected by multiplying both sides of the given equation, say

Ax=b, successively on the left by appropriately chosen invertible matrices. The restriction to invertible multipliers is essential. Otherwise, the new system will not have the same set of solutions as the given one. In particular, the left multipliers will be either permutation matrices (which are defined below) or lower triangular matrices with ones on the diagonal. Both types are invertible. The first operation serves to interchange (Le., permute) rows, whereas the second serves to add a multiple of one row to other rows. Thus, for example,

[~ 1 0] [au a12 a,.] ["" a" a,oJ o 0 a21 a22 a2n = au a12 a1n , o 1 a31 a32 a3n a31 a32 a3n whereas

[~ 0 ~] [au a,.] [ a11 a12 a,. ] 1 a21 a2n = aall + a21 aa12 + a22 aa1n + a2n . 0 a31 a3n {3all + a31 {3a12 + a32 {3aln + a3n -21

22 2. Gaussian elimination

2.1. Some preliminary observations

The operation of adding (or subtracting) a constant multiple of one row of a p x q matrix from another row of that matrix can always be achieved by multiplying on the left by a p x p matrix with ones on the diagonal and one other nonzero entry. Every such matrix can be expressed in the form (2.1) Ea = Ip + aeieJ with i and j fixed and i =1= j , where the vectors el ... ,ep denote the standard basis for IFP (Le., the columns in the identity matrix Ip) and a E IF.

It is readily seen that the following conclusions hold for the class of matrices ij of the form (2.1):

(1) ij is closed under multiplication: Ea E{1 = Ea+{1' (2) The identity belongs to ij: Eo = Ip. (3) Every matrix in ij is invertible: Ea is invertible and E;;l = E_a. (4) Multiplication is commutative in ij: EaE{1 = E{1Ea.

Thus, the class of matrices of the form (2.1) is a commutative group with respect to matrix multiplication. The same conclusion holds for the more general class of p x p matrices of the form (2.2) Eu=Ip+ueT, with uEIFP and eTu=O. The trade secret is the identity, which is considered in the next exercise, or, in less abstract terms, the observation that

[~ ! H] [~ ! H]- [LL H] o b 0 1 0 dOl 0 b+d 0 1

and the realization that there is nothing special about the size of this matrix or the second column.

Exercise 2.1. Let u, v E IFP be such that eT u = 0 and eT v = O. Show that

(Ip + uen(Ip + yen = (Ip + veT)(Ip + uen = Ip + (v + u)eT . Permutation matrices: Every n x n permutation matrix P is ob-

tained by taking the identity matrix In and interchanging some of the rows. Consequently, P can be expressed in terms of the columns ej, j = 1, ... ,n of In and a one to one mapping ()" of the set of integers {I, . .. ,n} onto itself by the formula

(2.3) n

P = Pa = Leje;(j)' j=l

2.1. Some preliminary observations 23

Thus, for example, if n = 4 and 0-(1) = 3, 0-(2) = 2, 0-(3) = 4 and 0-(4) = 1, then

The set of n x n permutation matrices also forms a group under multiplication, but this group is not commutative (Le., conditions (1)-(3) in the list given above are satisfied, but not (4)).

Orthogonal matrices: An n x n matrix V with real entries is said to be an orthogonal matrix if VTV = In.

Exercise 2.2. Show that every permutation matrix is an orthogonal ma-trix. [HINT: Use formula (2.3).]

The following notions will prove useful:

Upper echelon: A p x q matrix U is said to be an upper echelon matrix if the first nonzero entry in row i lies to the left of the first nonzero entry in row i + 1. Thus, for example, the first of the following two matrices is an upper echelon matrix, while the second is not.

[~~~~!~] [~~~~] o 0 0 0 2 0 0 5 0 5 o 0 0 0 0 0 000 0

Pivots: The first nonzero entry in each row of an upper echelon matrix is termed a pivot. The pivots in the matrix on the left just above are 3, 1 and 2.

Pivot columns: A column in an upper echelon matrix U will be referred to as a pivot column if it contains a pivot. Thus, the first, third and fifth columns of the matrix considered in the preceding paragraph are pivot columns. If GA = U, where G is invertible and U E lF pxq is in upper echelon form with k pivots, then the columns

~l , '~k of A that correspond in position to the pivot columns Uil' ... ,Uik of U will also be called pivot columns (even though the pivots are in U not in A) and the entries Xi!' .. . ,Xik in x E lF q will be referred to as pivot variables.


2.2. Examples

Example 2.1. Consider the equation Ax = b, where

(2.4) A = [~ ~ ~ !] and b = [~] 2 6 3 2 1

1. Construct the augmented matrix

(2.5) [0 2

A= 1 5 2 6

3 1 1] 342 321

that is formed by adding b as an extra column to the matrix A on the far right. The augmented matrix is introduced to insure that the row operations that are applied to the matrix A are also applied to the vector b.

2. Interchange the first two rows of A to get

(2.6)

where

[1 5 3 4 023 1 263 2

:] =P,A,

[~ H] has been chosen to obtain a nonzero entry in the upper left-hand corner of the new matrix.

3. Subtract two times the top row of the matrix PIA from its bottom row to get (2.7) [ ~ ~

o -4

3 4 3 1

-3 -6 where El = [ ~ ~ ~]

-2 0 1 is chosen to obtain all zeros below the pivot in the first column. 4. Add two times the second row of EIPIA to its third row to get

(2.8) [~ ~ ~ ~ ~ ] = E2EIPIA = [U c], o 0 3 -4 -1

where

2.2. Examples 25

is chosen to obtain all zeros below the pivot in the second column, U = E2EIPIA is in upper echelon form and c = E2E1Pl b. It was not nec-essary to permute the rows, since the upper left-hand corner of the block

[ 23 1 1] o 3 -4 -1 was already nonzero. 5. Try to solve the new system of equations

(2.9) u x = [~ ~ ~ i 1 [:~] [~ 1 o 0 3 -4 X3 -1

X4

by solving for the pivot variables from the bottom row up: The bottom row equation is

3X3 - 4X4 = -1, and hence for the third pivot variable X3 we obtain the formula

3X3 = 4X4 -1.

The second row equation is

2X2 + 3X3 + X4 = 1 , and hence for the second pivot variable X2 we obtain the formula

2X2 = -3X3 - X4 + 1 = -5X4 + 2 .

Finally, the top row equation is

Xl + 5X2 + 3X3 + 4X4 = 2,

and hence for the first pivot variable Xl we get

Xl = -5X2 - 3X3 - 4X4 + 2 _ -5( -5X4 + 2) (4 1) 4 2 - 2 - X4 - - X4 +

9 = 2X4 - 2.

Thus, we have expressed each of the pivot variables Xl, X2, X3 in terms of the variable X4. In vector notation,

x = [~~] [ -;3] + X4 [~~2] is a solution of the system of equations (2.9), or equivalently, (2.10)


(with A and b as in (2.4)) for every choice of X4. However, since the matrices E2, EI and PI are invertible, x is a solution of (2.10) if and only if Ax = b, i.e., if and only if x is a solution of the original equation.

6. Check that the computed solution solves the original system of equa-tions. Strictly speaking, this step is superfluous, because the construction guarantees that every solution of the new system is a solution of the old sys-tem, and vice versa. Nevertheless, this is an extremely important step, because it gives you a way of verifying that your calculations are correct.

Conclusions: Since U is a 3 x 4 matrix with 3 pivots, much the same sorts of calculations as those carried out above imply that for each choice of bE ]F3, the equation Ax = b considered in this example has at least one solution x E IF4. Therefore, RA = IF3. Moreover, for any given b, there is a family of solutions of the form x = u + X4V for every choice of X4 E IF. But this implies that Ax = Au + x4Av = Au for every choice of X4 E IF, and hence that vENA. In fact,

This, as we shall see shortly, is a consequence of the number of pivots and their positions. (In particular, anticipating a little, it is not an accident that the dimensions of these two spaces sum to the number of columns of A.) Example 2.2. Consider the equation Ax = b with

A = [~ ~ : ~l and b = [~l 1 2 8 4 b3

1. Form the augmented matrix

A~ [: 2. Interchange the first two rows to get

[~ ~ : ~ :~l = PIA 1284b3

with PI as in Step 2 of the preceding example.

2.2. Examples 27

3. Subtract the top row of PIA from its bottom row to get

[~ ~ !! ~~] = EIPIA, o 0 4 3 b3 -b2

where

4. Subtract the second row of EIPIA from its third row to get

[~ ~ !! ~~ ] = E2EIPI A = [U c], 0000 b3-b2-bt

where

E2~ [~ _: n [1241] U= 0 0 4 3 000 0

5. Try to solve the new system of equations

[~ ~ ! !] [:~] = [ ~~ ] o 0 0 0:: b3 - b2 - bl

working from the bottom up. To begin with, the bottom row yields the equation 0 = b3 - b2 - bl. Thus,

it is clear that there are no solutions unless b3 = bl + b2. If this restriction is in force, then the second row gives us the equation

and hence, the pivot variable, 4X3 + 3X4 = bl

bl - 3X4 X3 = 4

Next, the first row gives us the equation

Xl + 2X2 + 4X3 + X4 = b2

and hence, the other pivot variable,

Xl = b2 - 2X2 - 4X3 - X4

= b2 - 2X2 - (b1 - 3X4) - X4 = b2 - bl - 2X2 + 2X4 .


Consequently, if b3 = bl + b2, then

is a solution of the given system of equations for every choice of X2 and X4 in IF.

6. Check that the computed solution solves the original system of equations.

Conclusions: The preceding calculations imply that the equation Ax = b is solvable if and only if

Moreover, for each such b E IF3 there exists a solution of the form x = u + X2Vl + X4V2 for every X2, X4 E IF. In particular, X2Avl + X4Av2 = 0 for every choice of X2 and X4. But this is possible only if AVI = 0 and AV2 = o.

Exercise 2.3. Check that for the matrix A in Example 2.2, RA is the span of the pivot columns of A:

The next example is carried out more quickly.

Example 2.3. Let

A= [~ 0 3 4 ~] ~db= [~] 1 0 0 2 3 6 0 6 8 14 b4

Then a vector x E IF5 is a solution of the equation Ax = b if and only if

[ ~ ~ ~ 000 000

o 0] 4 7 2 1 o 0

2.2. Examples 29

The pivots of the upper echelon matrix on the left are in columns 2, 3 and 4. Therefore, upon solving for the pivot variables X2, X3 and X4 in terms of Xl, Xs and bl, .. ' ,b4 from the bottom row up, we obtain the formulas

o b4 - 2bl 2X4 b3 - 2b2 - bl - Xs 3X3 bl - 4X4 - 7xs

3b1 + 4b2 - 2b3 - 5xs X2 b2 .

But this is the same as Xl Xl 1 0 X2 b2 0 0 X3 = (-5xs + 3bl + 4~ - 2b3)/3 = Xl 0 +xs -5/3 X4 (-xs + b3 - 2b2 - bl )/2 0 -1/2 Xs Xs 0 1

0 0 0 0 1 0

+bl 1 +b2 4/3 +b3 -2/3 -1/2 -1 1/2

0 0 0 XIUI + XSU2 + blU3 + b2u4 + b3US,

where UI, ... ,Us denote the five vectors in lF s of the preceding line. Thus, we have shown that for each vector b E lF4 with b4 = 2bl, the vector

x = Xl UI + XSU2 + bl U3 + b2ll4 + b3Us is a solution of the equation Ax = b for every choice of Xl and Xs. Therefore, Xl UI + XSU2 is a solution of the equation Ax = 0 for every choice of XI, Xs E IF. Thus, UI, U2 E NA and, as Ax = XIAul + XSAU2 + blAu3 + b2Au4 + b3AuS = blAu3 + b2Au4 + b3Ans, the vectors

belong to RA.

Exercise 2.4. Let aj, j = 1, ... ,5, denote the j'th column vector of the matrix A considered in the preceding example. Show that .

(1) span{vI, V2, V3} = span{a2'a3,ad i.e., the span of the pivot columns of A.


2.3. Upper echelon matrices

The examples in the preceding section serve to illustrate the central role played by the number of pivots in an upper echelon matrix U and their positions when trying to solve systems of equations by Gaussian elimination.

Our next main objective is to exploit the special structure of upper echelon matrices in order to draw some general conclusions for matrices in this class. Extensions to general matrices will then be made on the basis of the following lemma:

Lemma 2.4. Let A E lF pxq and assume that A i= Opxq. Then there exists an invertible matrix G E lFPxp such that

(2.11) GA=U is in upper echelon form.

Proof. By Gaussian elimination there exists a sequence PI. P2, ... ,Pk of pxp permutation matrices and a sequence EI, E2,'" ,Ek of lower triangular matrices with ones on the diagonal such that

is in upper echelon form. Consequently the matrix G = EkPk'" E2P2EIPI fulfills the asserted conditions, since it is the product of invertible matrices.

D

Lemma 2.5. Let U E lF pxq be an upper echelon matrix with k pivots and let ej denote the j'th column of Ip for j = 1, ... ,po Then:

(1) k ~ min{p,q}. (2) The pivot columns of U are linearly independent. (3) The span of the pivot columns = span {el' ... ,ek} = Ru; i. e.,

(a) If k < p, then

Ru={[~]: bElFk and OElFP- k}. (b) If k = p, then Ru = lF P .

(4) The first k columns of uT form a basis for RUT.

Proof. The first assertion follows from the fact there is at most one pivot in each column and at most one pivot in each row. Next, let UI, ... ,uq

2.3. Upper echelon matrices 31

denote the columns of U and let Uil" .. ,Uik (with il < ... < ik) denote the pivot columns of U. Then clearly (2.12) span {Uill ... ,Uik} ~ span{uI, ... ,uq } ~ {[~] : bE lFk and 0 E lFP- k } , if k < p. On the other hand, the matrix formed by arraying the pivot columns one after the other is of special form:

[Uil . . . Uik] = [g~~] , where Un is a k x k upper triangular matrix with the pivots as diagonal entries and U21 = O(p-k)xk' Therefore, Un is invertible, and, for any choice of b E IF k, the formulas

[UiI Uik] Uli1b = [g~~] Uli1b = [~] imply (2) and that (2.13)

{[~] : bE lFk and 0 E lF P- k} ~ {Ux: x E lF q } ~ span {Uill ... ,Uik}' The two inclusions (2.12) and (2.13) yield the equality advertised in (a) of (3). The same argument (but with U = Un) serves to justify (b) of (3).

Item (4) is easy and is left to the reader. D Exercise 2.5. Verify (4) of Lemma 2.5. Exercise 2.6. Let U E lF pxq be an upper echelon matrix with k pivots. Show that there exists an invertible matrix K E lF qxq such that:

(1) If k < q, then

RUT = { K [~] : b E lFk and 0 E lF q- k } . (2) If k = q, then RUT = lF q

[HINT: In case of difficulty, try some numerical examples for orientation.] Exercise 2.7. Let U be a 4 x 5 matrix of the form

[Un U12 U13 U14 U15] o 0 U23 U24 U25

U = [UI U2 U3 U4 U5] = 0 0 0 U34 U35 o 0 000

with 'Un, U23 and 'U34 all nonzero. Show that span {UI' U3, ll4} = Ru


Exercise 2.8. Find a basis for the null space Nu of the 4 x 5 matrix U considered in Exercise 2.7 in terms of its entries Uij, when the pivots of U are all set equal to one.

Lemma 2.6. Let U E IFpxq be in upper echelon form with k pivots. Then: (1) k ~ min{p, q}. (2) k = q :=} U is left invertible :=} Nu = {a}. (3) k = p :=} U is right invertible :=} Ru = IFP.

Proof. The first assertion is established in Lemma 2.5 (and is repeated here for perspective).

Suppose next that U has q pivots. Then

U - [ Un ] if q < p and U = Uu if q = p, - O(p-q)Xq

where Un is a q x q upper triangular matrix with nonzero diagonal entries. Thus, if q < p and V E IFqxp is written in block form as

V = [Vu Vd with ViI = U1/ and V12 E IFqx(p-q), then V is a left inverse of U for every choice of Vi2 E IFqx(P-q)j i.e., k = q =? U is left invertible.

Suppose next that U is left invertible with a left inverse V. Then

x E Nu =? Ux = 0 =? 0 = V(Ux) = (VU)x = x, i.e., U left invertible =? Nu = {a}.

To complete the proof of (2), observe that: The span of the pivot columns of U is equal to the span of all the columns of U, alias Ru. Therefore, every column of U can be expressed as a linear combination of the pivot columns. Thus, as

Nu = {a} =? the q columns of U are linearly independent, it follows that

Nu = {a} =? U has q pivots. Finally, even though the equivalence k = p :=} Ru = IF p is known from

Lemma 2.5, we shall present an independent proof of all of (3), because it is instructive and indicates how to construct right inverses, when they exist. We proceed in three steps:

(a) k = P =? U is right invertible: If k = p = q, then U is right (and left) invertible by Lemma 1.4. If k = p and q > p, then there exists a


q x q permutation matrix P that (multiplying U on the right) serves to in-terchange the columns of U so that the pivots are concentrated on the left, i.e.,

UP = [Un U12] , where Un is a p x p upper triangular matrix with nonzero diagonal entries. Thus, if q > p and V E lF qxp is written in block form as

V = [ ~~ ] with Vn E lF Pxp and V21 E IF(q-p)xp , then

UPV = Ip {::::::} Un Vn + U12V21 = Ip {::::::} Vn = Ulil(Ip - U12V21) . Consequently, for any choice of V21 E IF(q-p)xp , the matrix PV will be a right inverse of U if Vn is chosen as indicated just above; i.e., (a) holds.

(b) U is right invertible ~ Ru = lFP : If U is right invertible and V is a right inverse of U, then for each choice of b E IF P, x = Vb is a solution of the equation Ux = b:

UV = Ip ~ U(Vb) = (UV)b = b; i.e., (b) holds.

(c) Ru = lF P ~ k = p: If Ru = lF P , then there exists a vector v E lF q such that Uv = ep , where ep denotes the p'th column of Ip. If U has less than p pivots, then the last row of U, erU = OT, i.e.,

1 = e~ep = e~(Uv) = (e~U)v= OTv = 0, which is impossible. Therefore, Ru = lF P ~ U has p pivots and (c) holds.

o

Exercffie 2.9. Let A ~ [~ ~ ~] and B ~ [~ ~ ~]. Fmd a bMffl fur each of the spaces RBA, RA and RAB.

Exercise 2.10. Find a basis for each of the spaces NBA, NA and NAB for the matrices A and B that are given in the preceding exercise.

Exercise 2.11. Show that if A E lF pxq , B E lFPxp and Ub ... ,Uk is a basis for RA, then span {BU1, ... ,BUk} = RBA and that this second set of vectors will be a basis for RBA if B is left invertible.

Exercise 2.12. Show that if A is a p x q matrix and C is a q x q invertible matrix, then RAG = RA


Exercise 2.13. Show that if U E lF pxq is a p x q matrix in upper echelon form with p pivots, then U has exactly one right inverse if and only if p = q.

If A E lF pxq and U is a subspace of lF q, then (2.14) AU={Au: UEU}. Exercise 2.14. Show that if GA = Band G is invertible (as is the case in formula (2.11) with U = B), then

nB = GnA, NB =NA, nBT = nAT and GTNBT = NAT . Exercise 2.15. Let U E lF pxq be an upper echelon matrix with k pivots, where 1 ~ k ~ p < q. Show that Nu =1= {o}. [HINT: There exists a q x q permutation matrix P (that is introduced to permute the columns of U, if need be) such that

UP = [Uu U12] , U21 U22

where Uu is a k x k upper triangular matrix with nonzero diagonal entries, Ul2 E lFkx(q-k), U21 = O(p-k)Xk and U22 = O(p-k)x(q-k) and hence that

x = P [ Ulil Ul2 ] Y -Iq-k

is a nonzero solution of the equation U x = 0 for every nonzero vector y E lF q- k .]

Exercise 2.16. Let nL = nL(U) and nR = nR(U) denote the number of left and right inverses, respectively, of an upper echelon matrix U E lFpxq . Show that the combinations (nL = 0, nR = 0), (nL = 0, nR = 00), (nL = 1, nR = 1) and (nL = 00, nR = 0) are possible. Exercise 2.17. In the notation of the previous exercise, show that the combinations (nL = 0, nR = 1), (nL = 1, nR = 0), (nL = 00, nR = 1), (nL = 1, nR = 00) and (nL = 00, nR = 00) are impossible. Lemma 2.7. Let A E lF pxq and assume that NA = {o}. Then p ~ q. Proof. Lemma 2.4 guarantees the existence of an invertible matrix G E lFPxp such that formula (2.11) is in force and hence that

NA = {o} {:=:} Nu = {o} . Moreover, in view of Lemma 2.6,

Nu = {o} {:=:} U has q pivots. Therefore, by another application of Lemma 2.6, q ~ p. o

Theorem 2.8. Let Vb .. ,Ve be a basis for a vector space V over IF and let UI, ... ,Uk be a basis for a subspace U of V. Then:


(1) k S: . (2) k =


then the procedure continues untilf - k new vectors {WI, ... ,we-d have been added to the original set {UI' ... , ud to form a basis for V. D

dimension: The preceding theorem guarantees that if V is a vector space over IF with a finite basis, then every basis of V has ex-actly the same number of vectors. This number is called the dimension of the vector space.

zero dimension: The dimension of the vector space {O} will be assigned the number zero.

Exercise 2.18. Show that IFk is a k dimensional vector space over IF. Exercise 2.19. Show that if U and V are finite dimensional subspaces of a vector space W over IF, then the set U + V that is defined by the formula (2.15) U + V = {u + v: U E U and v E V} is a vector space over IF and (2.16) dim (U + V) = dim U + dim V - dim U n V . Exercise 2.20. Let T be a linear mapping from a finite dimensional vector space U over IF into a vector space V over IF. Show that dim RT :::; dim U. [HINT: If UI, . .. ,Un is a basis for U, then the vectors TUj, j = 1,... ,n span RT.] Exercise 2.21. Construct a linear mapping T from a vector space U over IF into a vector space V over IF such that dim RT < dim U.

2.4. The conservation of dimension

Theorem 2.9. Let T be a linear mapping from a finite dimensional vector space U over IF into a vector space V over IF (finite dimensional or not). Then (2.17) dim NT + dim RT = dim U. Proof. In view of Exercise 2.20, RT is automatically a finite dimensional space regardless of the dimension of V. Suppose first that NT =I- {O}, RT =I-{O} and let UI, ... ,Uk be a basis for NT, VI, . ,VI be a basis for RT and choose vectors Yj E U such that

TYj = Vj, j = 1, ... ,l. The first item of business is to show that the vectors UI, ... ,Uk and YI,'" ,Yl are linearly independent over IF. Suppose, to the contrary, that there exists scalars QI, . .. ,Qk and {31, . .. ,(31 such that

k I (2.18) L QiUi + L(3jYj = O.

i=l j=l

2.4. The conservation of dimension 37

Then

T (t. a;Ui + ~ 13m ) ~ T(O) ~ O. But the left-hand side of the last equality can be reexpressed as

k I 1

LaiTui + L{3jTYj = 0 + L{3jVj. i=1 j=l j=1

Therefore, {31 = ... = {31 = 0 and so too, by (2.18), al = ... = ak = O. This completes the proof of the asserted linear independence.

The next step is to verify that the vectors U 1, . .. ,Uk, Yl,. . . ,YI span U and thus that this set of vectors is a basis for U. To this end, let wE U. Then, since

I I Tw = L {3jVj = L {3jTYj ,

j=l j=l for some choice of {31, . .. ,{3 ElF, it follows that

This means that I

w - L{3jYj E NT j=l

and, consequently, this vector can be expressed as a linear combination of Ul, .. ' ,Uk. In other words,

k I W = LaiUi + L{3jYj

i=l j=l for some choice of scalars al, ... ,ak and {3I, ... ,{31 in IF. But this means that

span{Ul, ... ,Uk,Y},'" ,YI} = U and hence, in view of the already exhibited linear independence, that

dimU = k+l = dimNT + dim RT,

as claimed. Suppose next that NT = {O} and 'RT =1= {O}. Then much the same

sort of argument serves to prove that if VI, . .. ,VI is a basis for 'RT and if Yj E U is such that TYj = Vj for j = 1, ... ,I, then the vectors YI, ... ,YI are


linearly independent and span U. Thus, dim U = dim RT = .e, and hence formula (2.17) is still in force, since dim NT = O.

It remains only to consider the case RT = {O}. But then NT = U, and formula (2.17) is still valid. 0 Remark 2.10. We shall refer to formula (2.17) as the principle of con-servation of dimension. Notice that it is correct as stated if U is a finite dimensional subspace of some other vector space W.

2.5. Quotient spaces This section is devoted to a brief sketch of another approach to establishing Theorem 2.9 that is based on quotient spaces. It can be skipped without loss of continuity.

Quotient spaces: Let V be a vector space over IF and let M be a subspace of V and, for U E V, let UM = {u + m : mE M}. Then V j M = {UM : U E V} is a vector space over IF with respect to the rules UM +VM = (U+V)M and a(uM) = (au)M of vector addition and scalar multiplication, respectively. The details are easily filled in with the aid of Exercises 2.22-2.24.

Exercise 2.22. Let M be a proper nonzero subspace of a vector space V over IF and, for U E V, let UM = {u + m: mE M}. Show that if X,Y E V, then

xM = Y M {=} x - Y E M and use this result to describe the set of vectors U E V such that UM = OM.

Exercise 2.23. Show that if, in the setting of Exercise 2.22, u, v, x, y E V and if also uM = XM and v M = Y M, then (u + V)M = (x + Y)M. Exercise 2.24. Show that if, in the setting of Exercise 2.22, a, {3 E IF and U E V, but U M, then (au)M = ({3U)M if and only if a = {3. Exercise 2.25. Let U be a finite dimensional vector space over IF and let V be a subspace of U. Show that dimU = dim (UjV) + dim V. Exercise 2.26. Establish the principle of conservation of dimension with the aid of Exercise 2.25.

2.6. Conservation of dimension for matrices

One of the prime applications of the principle of conservation of dimension is to the particular linear transformation T from lF q into lF P that is defined by multiplying each vector x E lF P by a given matrix A E lF pxq Because of its importance, the main conclusions are stated as a theorem, even though

2.6. Conservation of dimension for matrices 39

they are easily deduced from the definitions of the requisite spaces and Theorem 2.9.

Theorem 2.11. If A E lF pxq, then (2.19) NA {x E lF q : Ax = O} is a subspace of lF q , (2.20) 'RA = {Ax: x E lF q } is a subspace of lF P and (2.21) q = dim NA + dim 'RA .

rank: If A E lF pxq , then the dimension of'RA is termed the rank of A:

rank A = dim'RA .

Exercise 2.27. Let A E lF pxq , BE lF Pxp and C E lF qxq . Show that: (1) rankBA::; rank A, with equality if B is invertible. (2) rank AC ::; rank A, with equality if C is invertible.

Theorem 2.12. If A E lF pxq , then (2.22) rank A = rank AT = rankAH ::; min {p, q}. Proof. The statement is obvious if A = Opxq. If A =1= Opxq, then there exists an invertible matrix G E lF Pxp such that GA = U is in upper echelon form. Thus,

rank A = rank G A = rank U = the number of pivots of U , whereas,

rank AT = rank AT GT = rank UT = the number of pivots of U . The proof that rank A = rankAH is left to the reader as an exercise. 0

Exercise 2.28. Show that if A E C pxq, then rank A = rankAH.

Exercise 2.29. Show that if A E C pxq and C E C kxq , then

(2.23) rank [~] =q{=::}NAnNc={o}. Exercise 2.30. Show that if A E C pxq and B E C pxr , then (2.24) rank [A B] = p {=::} NAB nNBB = {O}. Exercise 2.31. Show that if A is a triangular matrix (either upper or lower), then rank A is bigger than or equal to the number of nonzero di-agonal entries in A. Give an example of an upper triangular matrix A for which the inequality is strict.

Exercise 2.32. Calculate dimNA and dim 'RA in the setting of Exercise 2.4 and confirm that these numbers are consistent with the principle of conser-vation of dimension.


2.7. From U to A

The next theorem is an analogue of Lemma 2.6 that is stated for general matrices A E lF pxq ; i.e., the conclusions are not restricted to upper echelon matrices. It may be obtained from Lemma 2.6 by exploiting formula (2.11). However, it is more instructive to give a direct proof.

Theorem 2.13. Let A E lF pxq . Then

(1) rank A = p :::::} A is right invertible :::::} RA = lF P (2) rank A = q :::::} A is left invertible :::::} NA = {o}. (3) If A has both a left inverse B E lF qxp and a right inverse C E lF qxp ,

then B = C and p = q.

Proof. Since RA ~ lF P, it is clear that RA = lF P :::::} rank A = p.

Suppose next that RA = lF P Then the equations

AXj=bj , j=l, ... ,p, are solvable for every choice of the vectors b j . If, in particular, b j is set equal to the j'th column of the identity matrix I p , then

A [Xl ... xp] = [b l This exhibits the q x p matrix

X = [Xl Xp]

with columns Xl, .. , ,Xp as a right inverse of A. Conversely, if AC = Ip for some matrix C E lF qxp , then X = Cb is a

solution of the equation Ax = b for every choice of bE lF P, i.e., RA = lF P This completes the proof of (1).

Next, (2) follows from (1) and the observation that N A = {o} :::::} rank A = q (by Theorem 2.11)

:::::} rank AT = q (by Theorem 2.12) -=::} AT is right invertible (by part (1)) -=::} A is left invertible.

Moreover, (1) and (2) imply that if A is both left invertible and right invert-ible, then p = q and, as has already been shown, the two one-sided inverses coincide:

B = BIp = B(AC) = (BA)C = IqC = C. D

2.8. Square matrices 41

Exercise 2.33. Find the null space NA and the range RA of the matrix

A = [~ ~ ~ ~ 1 acting on lR. 4 5 204

and check that the principle of conservation of dimension holds.

2.8. Square matrices

Theorem 2.14. Let A E IFPxP. Then the following statements are equiva-lent:

(1) A is left invertible. (2) A is right invertible. (3) NA = {a}. (4) RA = IFP.

Proof. This is an immediate corollary of Theorem 2.13. o

Remark 2.15. The equivalence of (3) and (4) in Theorem 2.14 is a special case of the Fredholm alternative, which, in its most provocative form, states that if the solution to the equation Ax = b is unique, then it exists, or to put it better:

If A E IFPxp and the equation Ax = b has at most one solution, then it has exactly one.

Lemma 2.16. If A E IFPxp, B E IFPxp and AB is invertible, then both A and B are invertible.

Proof. Clearly, RAB ~ RA and NAB ;2 N B. Therefore, p = rankAB ~ rank A ~ p and 0 = dimNAB ~ dimNB ~ O.

The rest is immediate from Theorem 2.14. o

Lemma 2.17. IfVEIFpx q , then (2.25) NVHV = N v and rank VHV = rank V .

Proof. It is easily seen that N v ~ NVHV' since Vx = 0 clearly implies that VHVx = O. On the other hand, if VHVx = 0, then xHVHVx = 0 and hence the vector y = Vx, with entries y, ... ,YP is subject to the constraints

P O=yHy =LIYjI2.

j=l


Therefore, VHVx = 0 ===} y = Vx = O. This completes the proof of the first assertion in (2.25). The second then follows easily from the principle of conservation of dimension, since VHV and V both have q columns. 0

Exercise 2.34. Show that if A E IFpxq and B E IFqxp, then NAB = {O} if and only if NA n RB = {O} and NB = {O}. Exercise 2.35. Find a p x q matrix A and a vector b E ]RP such that NA = {O} and yet the equation Ax = b has no solutions. Exercise 2.36. Let BE IFnxp, A E IFpxq and let {u!, ... ,Uk} be a basis for RA. Show that {Bu!, ... ,Bud is a basis for RBA if and only if RA nNB = {O}.

Exercise 2.37. Find a basis for RA and NA if A = ~ !2 ~ ! ~ . [1 3 1 82] 1 6 11 5 9

Exercise 2.38. Let B E IFnxp, A E lF pxq and let {u!, ... ,Uk} be a basis for RA. Show that:

(a) span{BuI, ... ,BUk} = RBA. (b) If B is left invertible, then {Bu!, ... ,BUk} is a basis for RBA.

Exercise 2.39. Let A E IF 4x5, let v!, V2, V3 be a basis for RA and let V = [VI V2 V3]. Show that V H V is invertible, that B = V (V H V) -1 V H is not left invertible and yet RB = RBA.

Exercise 2.40. Let UI, U2, U3 be linearly independent vectors in a vector space U over IF and let ll4 = UI + 2U2 + U3.

(a) Show that the vectors U}, U2, ll4 are linearly independent and that span {UI' U2, U3} = span {UI' U2, U4}.

(b) Express the vector 7Ul +13u2+5u3 as a linear combination of the vec-tors u}, U2, U4. [Note that the coefficients of all three vectors change.]

Exercise 2.41. For which values of x is matrix [ ~ 3 2] 4 1 invertible? 4 x

Exercise 2.42. Show that the matrix A = is invertible and [~1 3: 2~] find its inverse by solving the system of equations A[XI X2 X3] = 13 col-umn by column.

2.8. Square matrices 43

Exercise 2.43. Show that if A E C pxp is invertible, BE C pxq , C E C qxp , DE c qxq and

E = [~ ~], then dim RE = dim RA + dim R(D-CA-IB)' Exercise 2.44. Show that if, in the setting of the previous exercise, D is invertible, then rankE = rankD + rank (A - BD-IC).

:~::: 2~:~e:: :e [mtt t !,]U':i: :lim[ T]t:o: :~:,t::~ 3 0 1 2 1

a basis for RA and a basis for NA . Exercise 2.46. Use the method of Gaussian elimination to solve the equa-

tion Ax ~ b when A ~ [~ ~ n ' b ~ [ n if ~ible, ~d Md basis for R A and a basis for N A. Exercise 2.47. Use the method of Gaussian elimination to solve the equa-

tion Ax ~ b when A ~ U ! ~] , b ~ U l IT posmble, and Md basis for R A and a basis for N A. Exercise 2.48. Find lower triangular matrices with ones on the diagonal EI, E2, ... and permutation matrices PI, P2, . .. such that

EkPkEk-I Pk-l ... EIPI A is in upper echelon form for any two of the three preceding exercises.

Exercise 2.49. Use Gaussian elimination to find at least two right inverses to the matrix A given in Exercise 2.45. [HINT: Try to solve the equation

column by column.]

[Xu X12 X13]

A X21 X22 X23 = 13 , X31 X32 X33 X41 X42 X43

Exercise 2.50. Use Gaussian elimination to find at least two left inverses to the matrix A given in Exercise 2.46. [HINT: Find rightinverses to AT.]

Additional applications of Gaussian elimination

Chapter 3

I was working on the proof of one of my poems all morning, and took out a comma. In the afternoon I put it back again.

Oscar Wilde

This chapter is devoted to a number of applications of Gaussian elim-ination, both theoretical and computational. There is some overlap with conclusions reached in the preceding chapter, but the methods of obtaining them are usually different.

3.1. Gaussian elimination redux

Recall that the method of Gaussian elimination leads to the following con-clusion:

Theorem 3.1. Let A E lF pxq be a nonzero matrix. Then there exists a set of lower triangular p x p matrices EI, ... ,Ek with ones on the diagonal and a set of p x p permutation matrices PI, ... ,Pk such that (3.1) is in upper echelon form. Moreover, in this formula, Pj acts only (if at all) on rows j, ... ,p and Ej - Ip has nonzero entries in at most the j 'th column.

The extra information on the structure of the permutation matrices may seem tedious, but, as we shall see shortly, it has significant implications: it enables us to slide all the permutations to the right in formula (3.1) without changing the form of the matrices E I , ... , Ek.

-45

46 3. Additional applications of Gaussian elimination

Theorem 3.2. Let A E lF pxq be any nonzero matrix. Then there exists a lower triangular p x p matrix E with ones on the diagonal and a p x p permutation matrix P such that

EPA=U

is in upper echelon form.

Discussion. To understand where this theorem comes from, suppose first that A is a nonzero 4 x 5 matrix and let el, ... ,e4 denote the columns of 14. Then there exists a choice of permutation matrices PI, P2, P3 and lower triangular matrices

[~ 0 0 ~] [~] El 1 0 -1 T with - 0 1 - 4 + Ulel Ul= 0 0 [~ 0 0 ~] [;] E2 1 0 = 14 + u2ef with - d 1 U2 = e 0 [~ 0 0 ~] ~~m, E3 1 0 -1 T with = - 4 + U3e3 0 1 0 f

such that

(3.2) is in upper echelon form. In fact, since P2 is chosen so that it interchanges the second row of EIP1A with its third or fourth row, if necessary, and P3 is chosen so that it interchanges the third row of E2P2EIPIA with its fourth row, if necessary, these two permutation matrices have a special form:

[01 IITl] , P2 - where III is a 3 x 3 permutation matrix

P3 = [~ g2]' where 112 is a 2 x 2 permutation matrix. This exhibits the pattern, which underlies the fact that

3.1. Gaussian elimination redux 47

where Ej denotes a matrix of the same form as E j Thus, for example, since e~ P3 = e~ and V2 = P3U2 is a vector of the same form as U2, it follows that

P3 E2 P3(14 + u2ef) = P3 +v2e~

P3 + v2e~ P3 = E~P3 , where

E~ = 14 +v2e~ is a matrix of the same form as E2 In a similar vein

P3P2E I = P3E~ P2 = E~ P3P2 and consequently,

E3P3E2P2EIPI = E3E~E~ P3P2PI = EP, with

E = E3E~E~ and P = P3P2PI . Much the same argument works in the general setting of Theorem 3.2. You have only to exploit the fact that Gaussian elimination corresponds to mul-tiplication on the left by EkPk EIPI, where

E = I + [0] e~ with E IF; and b E lF p - j J p bj J J' and that the p x p permutation matrix Pi may be written in block form as

Pi = [li - 1 0], o IIi-l

where IIi - 1 is a (p - i + 1) x (p - i + 1) permutation matrix. Then, letting Cj = IIi-lbj,

since

PiEj Pi (lp + [:J e;) Pi + [~] e;

(lp+[~]e;)~=EjPi for i>j, e; Pi = e; for i > j .

Remark 3.3. Theorem 3.2 has interesting implications. However, we wish to emphasize that when Gaussian elimination is used in practice to study the equation Ax = b, it is not necessary to go through all this theoretical

48 3. Additional applications of Gaussian elimination

analysis. It suffices to carry out all the row operations on the augmented matrix

[a~l . . . a~q b:l ] apl alq bq

and then to solve for the "pivot variables", just as in the examples. But, do check that your answer works.

In what follows, we shall reap some extra dividends from the represen-tation formula

(3.3) EPA=U (that is valid for both real and complex matrices) by exploiting the special structure of the upper echelon matrix U.

Exercise 3.1. Show that if A E lF nxn is an invertible matrix, then there exists a permutation matrix P such that

(3.4) PA=LDU, where L is lower triangular with ones on the diagonal, U is upper triangular with ones on the diagonal and D is a diagonal matrix.

Exercise 3.2. Show that if LlDlUl = L2D2U2, where L j , D j and Uj are n x n matrices of the form exhibited in Exercise 3.1, then L1 = L2, D1 = D2 and Ul = U2. [HINT: Consider L2lL1D1 = D2U2Ul1.] Exercise 3.3. Show that there exists a 3 x 3 permutation matrix P and a lower triangular matrix

B = [~: l n such that [! H][ ~ ! n = BP if and only if a = O.

Exercise 3.4. Find a permutation matrix P such that P A = LU, where L

:::: :::::::t:::e ~a:r: =l(f 1 r an upper triangclar 3.2. Properties of BA and AC

In this section a number of basic properties of the product of two matrices in terms of the properties of their factors are reviewed for future use.

Lemma 3.4. Let A E lF pxq and let B E lFPxp be invertible. Then:

3.2. Properties of BA and AC

(1) A is left invertible if and only if B A is left invertible. (2) A is right invertible if and only if BA is right invertible. (3) NA =NBA. (4) BRA = RBA. (5) rank BA = rank A. (6) NA = {O} ~ NBA = {O}. (7) RA = IF P ~ RBA = IF P

49

Proof. The first assertion follows easily from the o

082183813x.pdf

Documents

publication data dym

square matrices chapter

triangular matrices

block gaussian elimination

mathematics volume

gaussian elimination

vital chana dym

paper isbn