linear algebra by s. e. payne

A Second Semester of Linear Algebra

S. E. Payne

19 January ’09

Contents

1 Preliminaries 71.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Linear Equations Solved with Matrix

Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Vector Spaces 152.1 Definition of Vector Space . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Prototypical Example . . . . . . . . . . . . . . . . . . . 152.1.2 A Second Example . . . . . . . . . . . . . . . . . . . . 16

2.2 Basic Properties of Vector Spaces . . . . . . . . . . . . . . . . 162.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Sums and Direct Sums of Subspaces . . . . . . . . . . . . . . . 182.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Dimensional Vector Spaces 213.1 The Span of a List . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Linear Independence and the Concept of Basis . . . . . . . . . 223.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Linear Transformations 314.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . 314.2 Kernels and Images . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Rank and Nullity Applied to Matrices . . . . . . . . . . . . . . 344.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 Bases and Coordinate Matrices . . . . . . . . . . . . . . . . . 36

3

4 CONTENTS

4.6 Matrices as Linear Transformations . . . . . . . . . . . . . . . 384.7 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 414.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Polynomials 475.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2 The Algebra of Polynomials . . . . . . . . . . . . . . . . . . . 495.3 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . 525.4 Polynomial Ideals . . . . . . . . . . . . . . . . . . . . . . . . . 545.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Determinants 616.1 Determinant Functions . . . . . . . . . . . . . . . . . . . . . . 61

6.1.3 n-Linear Alternating Functions . . . . . . . . . . . . . 646.1.5 A Determinant Function - The Laplace

Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Permutations & Uniqueness of Determinants . . . . . . . . . . 67

6.2.1 A Formula for the Determinant . . . . . . . . . . . . . 676.3 Additional Properties of Determinants . . . . . . . . . . . . . 72

6.3.1 If A is a unit in Mn(K), then det(A) is a unit in K. . . 726.3.2 Triangular Matrices . . . . . . . . . . . . . . . . . . . . 726.3.3 Transposes . . . . . . . . . . . . . . . . . . . . . . . . . 726.3.4 Elementary Row Operations . . . . . . . . . . . . . . . 736.3.5 Triangular Block Form . . . . . . . . . . . . . . . . . . 736.3.6 The Classical Adjoint . . . . . . . . . . . . . . . . . . . 746.3.8 Characteristic Polynomial of a Linear Map . . . . . . . 776.3.10 The Cayley-Hamilton Theorem . . . . . . . . . . . . . 796.3.11 The Companion Matrix of a Polynomial . . . . . . . . 816.3.13 The Cayley-Hamilton Theorem: A Second Proof . . . . 836.3.14 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . 856.3.15 Comparing ST with TS . . . . . . . . . . . . . . . . . 85

6.4 Deeper Results with Some Applications∗ . . . . . . . . . . . . 896.4.1 Block Matrices whose Blocks Commute∗ . . . . . . . . 906.4.2 Tensor Products of Matrices∗ . . . . . . . . . . . . . . 916.4.3 The Cauchy-Binet Theorem-A Special Version∗ . . . . 926.4.5 The Matrix-Tree Theorem∗ . . . . . . . . . . . . . . . 946.4.10 The Cauchy-Binet Theorem - A General Version∗ . . . 966.4.12 The General Laplace Expansion∗ . . . . . . . . . . . . 98

CONTENTS 5

6.4.13 Determinants, Ranks and Linear Equations∗ . . . . . . 1006.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Operators and Invariant Subspaces 1137.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 1137.2 Upper-Triangular Matrices . . . . . . . . . . . . . . . . . . . . 1157.3 Invariant Subspaces of Real Vector Spaces . . . . . . . . . . . 1207.4 Two Commuting Linear Operators . . . . . . . . . . . . . . . 1227.5 Commuting Families of Operators∗ . . . . . . . . . . . . . . . 1267.6 The Fundamental Theorem of Algebra∗ . . . . . . . . . . . . . 1297.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8 Inner Product Spaces 1378.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . 1378.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . 1468.3 Orthogonal Projection and Minimization . . . . . . . . . . . . 1488.4 Linear Functionals and Adjoints . . . . . . . . . . . . . . . . . 1528.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

9 Operators on Inner Product Spaces 1599.1 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . 1599.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . 1639.3 Decomposition of Real Normal Operators . . . . . . . . . . . . 1669.4 Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . 1699.5 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1719.6 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . 1759.7 The Singular-Value Decomposition . . . . . . . . . . . . . . . 177

9.7.3 Two Examples . . . . . . . . . . . . . . . . . . . . . . 1819.8 Pseudoinverses and Least Squares∗ . . . . . . . . . . . . . . . 1849.9 Norms, Distance and More on Least Squares∗ . . . . . . . . . 1899.10 The Rayleigh Principle∗ . . . . . . . . . . . . . . . . . . . . . 1959.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10 Decomposition WRT a Linear Operator 20310.1 Powers of Operators . . . . . . . . . . . . . . . . . . . . . . . 20310.2 The Algebraic Multiplicity of an

Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20510.3 Elementary Operations . . . . . . . . . . . . . . . . . . . . . . 208

6 CONTENTS

10.4 Transforming Nilpotent Matrices . . . . . . . . . . . . . . . . 21010.5 An Alternative Approach to the Jordan Form . . . . . . . . . 21610.6 A “Jordan Form” for Real Matrices . . . . . . . . . . . . . . . 21910.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

11 Matrix Functions∗ 23311.1 Operator Norms and Matrix Norms∗ . . . . . . . . . . . . . . 23311.2 Polynomials in an Elementary Jordan Matrix∗ . . . . . . . . . 23611.3 Scalar Functions of a Matrix∗ . . . . . . . . . . . . . . . . . . 23811.4 Scalar Functions as Polynomials∗ . . . . . . . . . . . . . . . . 24211.5 Power Series∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 24411.6 Commuting Matrices∗ . . . . . . . . . . . . . . . . . . . . . . 25011.7 A Matrix Differential Equation∗ . . . . . . . . . . . . . . . . . 25511.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

12 Infinite Dimensional Vector Spaces∗ 25912.1 Partially Ordered Sets & Zorn’s Lemma∗ . . . . . . . . . . . . 25912.2 Bases for Vector Spaces∗ . . . . . . . . . . . . . . . . . . . . . 26012.3 A Theorem of Philip Hall∗ . . . . . . . . . . . . . . . . . . . . 26112.4 A Theorem of Marshall Hall, Jr.∗ . . . . . . . . . . . . . . . . 26412.5 Exercises∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Chapter 1

Preliminaries

Preface to the Student

This book is intended to be used as a text for a second semester of linearalgebra either at the senior or first-year-graduate level. It is written for youunder the assumption that you already have successfully completed a firstcourse in linear algebra and a first course in abstract algebra. The firstshort chapter is a very quick review of the basic material with which you aresupposed to be familiar. If this material looks new, this text is probably notwritten for you. On the other hand, if you made it into graduate school, youmust have already acquired some background in modern algebra. Perhaps allyou need is to spend a little time with your undergraduate texts reviewing themost basic facts about equivalence relations, groups, matrix computations,row reduction techniques, and the basic concepts of linear independence,span, and basis in the context of Rn.

On the other hand, some of you will be ready for a more advanced ap-proach to the material covered here than we can justify making a part of thecourse. For this reason I have included some starred sections that may beskipped without disturbing the general flow of ideas, but that might be ofinterest to some students. If material from a starred section is ever cited ina later section, that section will necessarily also be starred.

For the material we do cover in detail, we hope that you will find our pre-sentation to be thorough and our proofs to be complete and clear. However,when we indicate that you should verify something for yourself, that meansyou should use paper and pen and write out the appropriate steps in detail.

One major difference between your undergraduate linear algebra text and

7

8 CHAPTER 1. PRELIMINARIES

this one is that we discuss abstract vector spaces over arbitrary fields insteadof restricting our discussion to the standard space of n-tuples of real numbers.A second difference is that the emphasis is on linear transformations from onevector space over the field F to a second one, rather than on matrices overF . However, a great deal of the general theory can be effortlessly translatedinto results about matrices.

1.1 Fields

The fields of most concern in this text are the complex numbers C, the realnumbersR, the rational numbersQ, and the finite Galois fields GF (q), whereq is a prime power. In fact, it is possible to work through the entire book anduse only the fields R and C. And if finite fields are being considered, mostthe time it is possible to use just those finite fields with a prime number ofelements, i.e., the fields Zp ∼ Z/pZ, where p is a prime integer, with thealgebraic operations just being addition and multiplication modulo p. Forthe purpose of reading this book it is sufficient to be able to work with thefields just mentioned. However, we urge you to pick up your undergraduateabstract algebra text and review what a field is. For most purposes thesymbol F denotes an arbitrary field except where inner products and/ornorms are involved. In those cases F denotes either R or C.

Kronecker delta When the underlying field F is understood, the sym-bol δij (called the Kronecker delta) denotes the element 1 ∈ F if i = j andthe element 0 ∈ F if i 6= j. Occasionally it means 0 or 1 in some otherstructure, but the context should make that clear.

If you are not really familiar with the field of complex numbers, you shouldspend a little time getting acquainted. A complex number α = a+ bi, wherea, b ∈ R and i2 = −1, has a conjugate α = a− bi, and α ·α = a2 + b2 = |α|2.It is easily checked that α · β = α · β. Note that |α| =

√a2 + b2 = 0 if and

only if α = 0. You should show how to compute the multiplicative inverse ofany nonzero complex number.

We define the square-root symbol√

as usual: For 0 ≤ x ∈ R, put√x = |y| where y is a real number such that y2 = x.

The following two lemmas are often useful.

Lemma 1.1.1. Every complex number has a square root.

1.2. GROUPS 9

Proof. Let α = a + bi be any complex number. With the definition of√

given above, and with γ =√a2 + b2 ≥ |a|, we find that(√

γ + a

2± i√γ − a

2

)2

= a± |b|i.

Now just pick the sign ± so that ±|b| = b.

Lemma 1.1.2. Every polynomial of odd degree with real coefficients has a(real) zero.

Proof. It suffices to prove that a monic polynomial

P (x) = xn + a1xn−1 + · · ·+ an

with some ai 6= 0, with a1, . . . , an ∈ R and n odd has a zero. Put a =|a1|+ |a2|+ · · ·+ |an|+ 1 > 1, and ε = ±1. Then

|a1(εa)n−1 + · · ·+ an−1(εa) + an| ≤ |a1|an−1 + · · ·+ |an−1|a+ |an|

≤ (|a1|+ |a2|+ · · ·+ |an|)an−1 = (a− 1)(an−1) < an.

It readily follows that P (a) > 0 and P (−a) < 0. (Check this out foryourself!) Hence by the Intermediate Value Theorem (from Calculus) thereis a λ ∈ (−a, a) such that P (λ) = 0.

1.2 Groups

Let G be an arbitrary set and let ◦ : G×G→ G be a binary operation on G.Usually we denote the image of (g1, g2) ∈ G×G under the map ◦ by g1 ◦ g2.Then you should know what it means for (G, ◦) to be a group. If this is thecase, G is an abelian group provided g1 ◦ g2 = g2 ◦ g1 for all g1, g2 ∈ G. Ourprimary example of a group will be a vector space whose elements (calledvectors) form an abelian group under vector addition. It is also helpful if youremember how to construct the quotient group G/N , where N is a normalsubgroup of G. However, this latter concept will be introduced in detail inthe special case where it is needed.


1.3 Matrix Algebra

An m× n matrix A over F is an m by n array of elements from the field F .We may think of A as an ordered list of m row vectors from F n or equallywell as an ordered list of n column vectors from Fm. The element in row iand column j is usually denoted Aij. The symbol Mm,n(F ) denotes the setof all m× n matrices over F . It is readily made into a vector space over F .For A,B ∈Mm,n(F ) define A+B by

(A+B)i,j = Aij +Bij.

Similarly, define scalar multiplication by

(aA)ij = aAij.

Just to practice rehearsing the axioms for a vector space, you should showthat Mm,n(F ) really is a vector space over F . (See Section 2.1.)

If A and B are m × n and n × p over F , respectively, then the productAB is an m× p matrix over F defined by

(AB)ij =n∑k=1

AikBkj.

Lemma 1.3.1. Matrix multiplication, when defined, is associative.

Proof. (Sketch) If A,B,C are m × n, n × p and p × q matrices over F ,respectively, then

((AB)C)ij =

p∑l=1

(ABil)Clj =

p∑l=1

(n∑k=1

AikBkl)Clj =

=∑

1≤k≤n;1≤l≤p

AikBklClj = (A(BC))ij.

The following observations are not especially deep, but they come in sohandy that you should think about them until they are second nature to you.

Obs. 1.3.2. The ith row of AB is the ith row of A times the matrix B.

1.4. LINEAR EQUATIONS SOLVED WITH MATRIX ALGEBRA 11

Obs. 1.3.3. The jth column of AB is the matrix A times the jth column ofB.

If A is m× n, then the n×m matrix whose (i, j) entry is the (j, i) of Ais called the transpose of A and is denoted AT .

Obs. 1.3.4. If A is m× n and B is n× p, then (AB)T = BTAT .

Obs. 1.3.5. If A is m×n with columns C1, . . . , Cn and X = (x1, . . . , xn)T isn×1, thenAX is the linear combination of columns of A given by

∑nj=1 xjCj.

Obs. 1.3.6. If A is m×n with rows α1, . . . , αm and X = (x1, . . . , xm), thenXA is the linear combination of rows of A given by

∑mi=1 xiαi.

At this point you should review block multiplication of partitioned ma-trices. The following special case is sometimes helpful.

Obs. 1.3.7. If A is m × n and B is n × p, view A as partitioned into ncolumn vectors in Fm and view B as partitioned into n row vectors in F p.Then using block multiplication we can view AB as

AB =n∑j=1

colj(A) · rowj(B).

Note that colj(A) · rowj(B) is an m× p matrix with rank at most 1.

1.4 Linear Equations Solved with Matrix

Algebra

For each i, 1 ≤ i ≤ m, let

n∑j=1

aijxj = bj

be a linear equation in the indeterminates x1, . . . , xn with the coefficients aijand bj all being real numbers. One of the first things you learned to do inyour undergraduate linear algebra course was to replace this system of linearequations with a single matrix equation of the form


A~x = ~b, where A = (aij) is m× n with m rows and n columns,

and ~x = (x1, . . . , xn)T , ~b = (b1, . . . , bm)T .

You then augmented the matrix A with the column ~b to obtain

A′ =

a11 . . . a1n b1

a21 . . . a2n b2... . . .

......

am1 . . . amn bm

.

At this point you performed elementary row operations on the matrix A′

so as to replace the submatrix A with a row-reduced echelon matrix R andat the same time replace ~b with a probably different column vector ~b′. Youthen used this matrix (R ~b′) to read off all sorts of information about theoriginal matrix A and the system of linear equations. The matrix R has rnonzero rows for some r, 0 ≤ r ≤ m. The leftmost nonzero entry in each rowof R is a 1, and it is the only nonzero entry in its column. Then the matrix(R ~b′) is used to solve the original system of equations by writing each of the“leading variables” as a linear combination of the other (free) variables. Ther nonzero rows of R form a basis for the row space of A, the vector subspaceof Rn spanned by the rows of A. The columns of A in the same positions asthe leading 1’s of R form a basis for the column space of A, i.e., the subspaceof Rm spanned by the columns of A. There are n− r free variables. Let eachof these variables take a turn being equal to 1 while the other free variablesare all equal to 0, and solve for the other leading variables using the matrixequation R~x = ~0. This gives a basis of the (right) nullspace of A, i.e., a basisof the space of solutions to the system of homogeneous equations obtainedby replacing ~b with ~0. In particular we note that r is the dimension of boththe row space of A and of the column space of A. We call r the rank of A.The dimension of the right null space of A is n − r. Replacing A with itstranspose AT , so that the left null space of A becomes the right null spaceof AT , shows that the left null space of A has dimension m− r.

A good reference for the basic material is the following text: David C.Lay, LINEAR ALGEBRA AND ITS APPLICATIONS, 3rd Edition, AddisonWesley, 2003. An excellent reference at a somewhat higher level is R. A. Hornand C. R. Johnson, MATRIX ANALYSIS, Cambridge Univ. Press, 1985.

1.5. EXERCISES 13

1.5 Exercises

1. Consider the matrix

A =

1 1 −1−1 1 −32 1 01 2 −3

.

(a) Find the rank of A.

(b) Compute a basis for the column space of A, the row space of A,the null space of A, and the null space of AT .

(c) Give the orthogonality relations among the four subspaces of part(b).

(d) Find all solutions x = (x1, x2, x3)T to Ax =

1−330

; and to

Ax =

3−144

.

2. Block Multiplication of matrices. Let A be an m × n matrix over Fwhose elements have been partitioned into blocks. Also, let B be ann× p matrix over F whose elements are partitioned into blocks so thatthe number of blocks in each row of A is the same as the number ofblocks in each column of B. Moreover, suppose that the block Aij in thei-th row and j-th column of blocks of A and the block Bjk in the j-throw and k-th column of blocks of B are compatible for multiplication,i.e., Aij · Bjk should be well-defined for all appropriate indices i, j, k.Then the matrix product AB (which is well-defined) may be computedby block multiplication. For example, if A and B are partitioned inthis way, then the (i, j)-th block of AB is given by the natural formula


(AB)ij =

A11 A12 · · · A1r

A21 A22 · · · A2r...

......

...As1 As2 · · · Asr

B11 B12 · · · B1t

B21 B22 · · · B2t...

......

...Br1 Br2 · · · Brt

ij

=r∑

k=1

AikBkj.

3. Suppose the matrix A (which might actually be huge!) is partitionedinto four block in such a way that it is block upper triangular and thetwo diagonal blocks are square and are invertible.

A =

(A11 A12

0 A22

).

Knowing that A11 and A22 are both invertible, compute A−1. Thenillustrate your solution by using block computations to compute theinverse of the following matrix.

A =

1 1 1 −1 23 2 0 1 −10 0 1 0 10 0 1 1 00 0 0 1 0

Chapter 2

Vector Spaces

Linear algebra is primarily the study of linear maps on finite-dimensionalvector spaces. In this chapter we define the concept of vector space anddiscuss its elementary properties.

2.1 Definition of Vector Space

A vector space over the field F is a set V together with a binary operation “+”on V such that (V,+) is an abelian group, along with a scalar multiplicationon V (i.e., a map F × V → V ) such that the following properties hold:

1. For each a ∈ F and each v ∈ V , av is a unique element of V with1 v = v for all v ∈ V . Here 1 denotes the multiplicative identity of F .

2. Scalar multiplication distributes over vector addition: a(u + v) =(au) + (av), which is usually written as au + av, for all a ∈ F and allu, v ∈ V .

3. (a+ b)u = au+ bu, for all a, b ∈ F and all u ∈ V .4. a(bv) = (ab)v, for all a, b ∈ F and all v ∈ V .

2.1.1 Prototypical Example

Let S be any nonempty set and let F be any field. Put V = F S = {f : S →F : f is a function}. Then we can make V into a vector space over F asfollows. For f, g ∈ V , define the vector sum f + g : S → F by

(f + g)(s) = f(s) + g(s) for all s ∈ S.

15

16 CHAPTER 2. VECTOR SPACES

Then define a scalar multiplication F × V → V as follows:

(af)(s) = a(f(s)) for all a ∈ F, f ∈ F, s ∈ S.

It is a very easy but worthwhile exercise to show that with this vector ad-dition and this scalar multiplication, V is a vector space over F . It is also in-teresting to see that this family of examples includes (in some abstract sense)all the examples of vector spaces you might have studied in your first linearalgebra course. For example, let F = R and let S = {1, 2, . . . , n}. Theneach f : {1, 2, . . . , n} → R is given by the n-tuple (f(1), f(2), . . . , f(n)). Sowith almost no change in your point of view you can see that this exampleis essentially just Rn as you knew it before.

2.1.2 A Second Example

Let F be any field and let x be an indeterminate over F . Then the ring F [x]of all polynomials in x with coefficients in F is a vector space over F . InChapter 4 we will see how to view F [x] as a subspace of the special case ofthe preceding example where S = {0, 1, 2, . . .}. However, in this case, anytwo elements of F [x] can be multiplied to give another element of F [x], andF [x] has the structure of a commutative ring with 1. It is even an integraldomain! In fact, it is a linear algebra. See the beginning of Chapter 4 forthe definition of a linear algebra, a term that really ought to be definedsomewhere in a Linear Algebra course. (Look up any of these words thatyou are unsure about in your abstract algebra text.) (Note: Our conventionis that the zero polynomial has degree equal to −∞.)

If we fix the nonnegative integer n, then

Pn = {f(x) ∈ F [x] : degree(f(x)) ≤ n}

is a vector space with the usual addition of polynomials and scalar multipli-cation.

2.2 Basic Properties of Vector Spaces

We are careful to distinguish between the zero 0 of the field F and the zerovector ~0 in the vector space V .

2.3. SUBSPACES 17

Theorem 2.2.1. (Properties of zero) For a ∈ F and v ∈ V we have thefollowing:

av = ~0 if and only if a = 0 ∈ F or v = ~0 ∈ V.

Proof. First suppose that a = 0. Then 0v = (0 + 0)v = 0v+ 0v. Now addingthe additive inverse of 0v to both sides of this equation yields ~0 = 0v, for allv ∈ V . Similarly, if v = ~0, we have a~0 = a(~0 + ~0) = a~0 + a~0. Now add theadditive inverse of a~0 to both sides to obtain ~0 = a~0.

What remains to be proved is that if av = ~0, then either a = 0 or v = ~0.So suppose av = ~0. If a = 0 we are done. So suppose a 6= 0. In this case ahas a multiplicative inverse a−1 ∈ F . So v = (a−1a)v = a−1(av) = a−1~0 = ~0by the preceding paragraph. This completes the proof.

Let −v denote the additive inverse of v in V , and let −1 denote theadditive inverse of 1 ∈ F .

Lemma 2.2.2. For all vectors v ∈ V , (−1)v = −v.

Proof. The idea is to show that if (−1)v is added to v, the the result is ~0,the additive identity of V . So, (−1)v+v = (−1)v+1v = (−1+1)v = 0v = ~0by the preceding result.

2.3 Subspaces

Definition A subset U of V is called a subspace of V provided that withthe vector addition and scalar multiplication of V restricted to U , the set Ubecomes a vector space in its own right.

Theorem 2.3.1. The subset U of V is a subspace of V if and only if thefollowing properties hold:

(i) U 6= ∅.

(ii) If u, v ∈ U , then u+ v ∈ U .

(iii) If a ∈ F and u ∈ U , then au ∈ U .

Proof. Clearly the three properties all hold if U is a subspace. So now supposethe three properties hold. Then U is not empty, so it has some vector u. Then0u = ~0 ∈ U . For any v ∈ U , (−1)v = −v ∈ U . By these properties and


property (ii) of the Theorem, (U,+) is a subgroup of (V,+). (Recall thisfrom your abstract algebra course.) It is now easy to see that U , with theaddition and scalar multiplication inherited from V , must be a vector space,since all the other properties hold for U automatically because they hold forV .

2.4 Sums and Direct Sums of Subspaces

Definition Let U1, . . . , Um be subspaces of V . The sum U1 +U2 + · · ·+Umis defined to be

m∑i=1

Ui := {u1 + u2 + · · ·+ um : ui ∈ Ui for 1 ≤ i ≤ m}.

You are asked in the exercises to show that the sum of subspaces is asubspace.

Definition The indexed family, or ordered list (U1, . . . , Um) of subspacesis said to be independent provided that if ~0 = u1 +u2 + · · ·+um with ui ∈ Ui,1 ≤ i ≤ m, then ui = 0 for all i = 1, 2, . . . ,m.

Theorem 2.4.1. Let Ui be a subspace of V for 1 ≤ i ≤ m. Each elementv ∈

∑mi=1 Ui has a unique expression of the form v =

∑mi=1 ui with ui ∈ Ui

for all i if and only if the list (U1, . . . , Um) is independent.

Proof. Suppose the list (U1, . . . , Um) is independent. Then by definition ~0has a unique representation of the desired form (as a sum of zero vectors).Suppose that some v =

∑mi=1 ui =

∑mi=1 vi has two such representations with

ui, vi ∈ Ui, 1 ≤ i ≤ m. Then ~0 = v − v =∑m

i=1(ui − vi)with ui − vi ∈ Uisince Ui is a subspace. So ui = vi for all i. Conversely, if each element of thesum has a unique representation as an element of the sum, then certainly ~0does also, implying that the list of subspaces is independent.

Definition If each element of∑m

i=1 Ui has a unique representation asan element in the sum, then we say that the sum is the direct sum of thesubspaces, and write

m∑i=1

Ui = U1 ⊕ U2 ⊕ · · · ⊕ Um = ⊕m∑i=1

Ui.

2.5. EXERCISES 19

Theorem 2.4.2. Let U1, . . . , Um be subspaces of V for which V =∑m

i=1 Ui.Then the following are equivalent:

(i) V = ⊕∑m

i=1 Ui.

(ii) Uj ∩∑

1≤i≤m;i 6=j Ui = {~0} for each j, 1 ≤ j ≤ m.

Proof. Suppose V = ⊕∑m

i=1 Ui. If u ∈ Uj ∩∑

1≤i≤m;i 6=j Ui, say u = −uj ∈ Ujand u =

∑1≤i≤m;i 6=j ui, then

∑mi=1 ui = ~0, forcing all the ui’s equal to ~0. This

shows that (i) implies (ii). It is similarly easy to see that (ii) implies that ~0has a unique representation in

∑mi=1 Ui.

Note: When m = 2 this says that U1 + U2 = U1 ⊕ U2 if and only ifU1 ∩ U2 = {~0}.

2.5 Exercises

1. Determine all possible subspaces of F 2 = {f : {1, 2} → F : f is a function}.

2. Prove that the intersection of any collection of subspaces of V is againa subspace of V . Here V is any vector space over any field F .

3. Define the sum of a countably infinite number of subspaces of V anddiscuss what it should mean for such a collection of subspaces to beindependent.

4. Prove that the set-theoretic union of two subspaces of V is a subspaceif and only if one of the subspaces is contained in the other.

5. Prove or disprove: If U1, U2, W are subspaces of V for which U1 +W =U2 +W , then U1 = U2.

6. Prove or disprove: If U1, U2, W are subspaces of V for which U1⊕W =U2 ⊕W , then U1 = U2.

7. Let U1, U2, U3 be three subspaces of a vector space V over the field Ffor which U1 ∩ U2 = U1 ∩ U3 = U2 ∩ U3 = {0}. Prove that

U1 + U2 + U3 = U1 ⊕ U2 ⊕ U3,

or give a counterexample.

Chapter 3

Dimensional Vector Spaces

Our main concern in this course will be finite dimensional vector spaces, aconcept that will be introduced in this chapter. However, in a starred sectionof Chapter 12 and with the help of the appropriate axioms from set theory weshow that every vector space has a well-defined dimension. The key conceptshere are: span, linear independence, basis, dimension.

We assume throughout this chapter that V is a vector space over the fieldF .

3.1 The Span of a List

For the positive integer N let N be the ordered set N = {1, 2, . . . , N}, and letN be the ordered set {1, 2, . . . , } of all natural numbers. Definition: A listof elements of V is a function from some N to V or from N to V . Usuallysuch a list is indicated by (v1, v2, . . . , vm) for some positive integer m, orperhaps by (v1, v2, . . .) if the list is finite of unknown length or countablyinfinite. An important aspect of a list of vectors of V is that it is ordered.A second important difference between lists and sets is that elements of listsmay be repeated, but in a set repetitions are not allowed. For most the workin this course the lists we consider will be finite, but it is important to keepan open mind about the infinite case.

Definition: Let L = (v1, v2, . . .) be a list of elements of V . We saythat v ∈ V is in the span of L provided there are finitely many scalarsa1, a2, . . . , am ∈ F such that v =

∑mi=1 aivi. Then the set of all vectors in

the span of L is said to be the span of L and is denoted span(v1, v2, . . .). By

21

22 CHAPTER 3. DIMENSIONAL VECTOR SPACES

convention we say that the span of the empty list ( ) is the zero space {~0}.The proof of the following lemma is a routine exercise.

Lemma 3.1.1. The span of any list of vectors of V is a subspace of V .

If span(v1, . . . , vm) = V , we say (v1, . . . , vm) spans V . A vector space issaid to be finite dimensional if some finite list spans V . For example, let F n

denote the vector space of all column vectors with n entries from the fieldF , and let ~ei be the column vector (0, . . . , 0, 1, 0 . . . , 0)T with n− 1 entriesequal to 0 ∈ F and a 1 ∈ F in position i. Then F n is finite dimensionalbecause the list (e1, e2, . . . , en) spans F n.

Let f(x) ∈ F [x] have the form f(x) = a0 + a1x + · · · anxn with an 6= 0.We say that f(x) has degree n. The zero polynomial has degree −∞. We letPn(F ) denote the set of all polynomials with degree at most n. Then Pn(F )is a subspace of F [x] with spanning list L = (1, x, x2, . . . , xn).

3.2 Linear Independence and the Concept of

Basis

One of the most fundamental concepts of linear algebra is that of linearindependence.

Definition: A finite list (v1, . . . , vm) of vectors in V is said to be linearlyindependent provided the only choice of scalars a1, . . . , am ∈ F for which∑m

i=1 aivi = ~0 is a1 = a2 = · · · = am = 0. An infinite list (v1, v2, . . .) ofvectors in V is said to be linearly independent provided that for each positiveinteger m, the list (v1, . . . , vm) consisting of the first m vectors of the infinitelist is linearly independent.

A finite set S of vectors of V is said to be linearly independent providedeach list of distinct vectors of S (no repetition allowed) is linearly indepen-dent. An arbitrary set S of vectors of V is said to be linearly independentprovided every finite subset of S is linearly independent.

Any subset of V or list of vectors in V is said to be linearly dependentprovided it is not linearly independent. It follows immediately that a listL = (v1, v2, . . .) is linearly dependent provided there is some integer m ≥ 1for which there are m scalars a1, . . . , am ∈ F such that

∑mi=1 aivi = ~0 and

not all the ai’s are equal to zero.The following Linear Dependence Lemma will turn out to be ex-

tremely useful.

3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS 23

Lemma 3.2.1. Let L = (v1, v2, . . .) be a nonempty list of nonzero vectors inV . (Of course we know that any list that includes the zero vector is automat-ically linearly dependent.) Then the following are equivalent:

(i) L is linearly dependent.(ii) There is some integer j ≥ 2 such that vj ∈ span(v1, . . . , vj−1). (Usu-

ally we will want to choose the smallest index j for which this holds.)(iii) There is some integer j ≥ 2 such that if the jth term vj is removed

from the list L, the span of the remaining list equals the span of L.

Proof. Suppose that the list L = (v1, v2, . . .) is linearly dependent and v1 6= ~0.For some m there are scalars a1, . . . , am for which

∑mi=1 aivi = ~0 with at least

one of the ai not equal to 0. Since v1 6= ~0, not all of the scalars a2, . . . , amcan equal 0. So let j ≥ 2 be the largest index for which aj 6= 0. Then

vj = (−a1a−1j )v1 + (−a2a

−1j )v2 + · · ·+ (−aj−1a

−1j )vj−1,

proving (ii).To see that (ii) implies (iii), just note that in any linear combination of

vectors from L, if vj appears, it can be replaced by its expression as a linearcombination of the vectors v1, . . . , vj−1.

It should be immediately obvious that if (iii) holds, then the list L islinearly dependent.

The following theorem is of major importance in the theory. It says that(finite) linearly independent lists are never longer than (finite) spanning lists.

Theorem 3.2.2. Suppose that V is spanned by the finite list L = (w1, . . . , wn)and that M = (v1, . . . , vm) is a linearly independent list. Then m ≤ n.

Proof. We shall prove that m ≤ n by using an algorithm that is interestingin its own right. It amounts to starting with the list L and removing one wand adding one v at each step so as to maintain a spanning list. Since thelist L = (w1, . . . , wn) spans V , adjoining any vector to the list produces alinearly dependent list. In particular,

(v1, w1, . . . , wn)

is linearly dependent with its first element different from ~0. So by the LinearDependence Lemma we may remove one of the w’s so that the remaining list


of length n is still a spanning list. Suppose this process has been carried outuntil a spanning list of the form

B = (v1, . . . , vj, w′1, . . . , w

′n−j)

has been obtained. If we now adjoin vj+1 to the list B by inserting it imme-diately after vj, the resulting list will be linearly dependent. By the LinearDependence Lemma, one of the vectors in this list must be a linear combina-tion of the vectors preceding it in the list. Since the list (v1, . . . , vj+1) mustbe linearly independent, this vector must be one of the w′’s and not one ofthe v’s. So we can remove this w′ and obtain a new list of length n which stillspans V and has v1, . . . , vj+1 as its initial members. If at some step we hadadded a v and had no more w’s to remove, we would have a contradiction,since the entire list of v’s must be linearly independent even though when weadded a v the list was to become dependent. So we may continue the processuntil all the vectors v1, . . . , vm have been added to the list, i.e., m ≤ n.

Definition A vector space V over the field F is said to be finite dimen-sional provided there is a finite list that spans V .

Theorem 3.2.3. If U is a subspace of the finite-dimensional space V , thenU is finite dimensional.

Proof. Suppose that V is spanned by a list of length m. If U = {~0}, thencertainly U is spanned by a finite list and is finite dimensional. So supposethat U contains a nonzero vector v1. If the list (v1) does not span U , let v2

be a vector in U that is not in span(v1). By the Linear Dependence Lemma,the list (v1, v2) must be linearly independent. Continue this process. At eachstep, if the linearly independent list obtained does not span the space U , wecan add one more vector of U to the list keeping it linearly independent. Bythe preceding theorem, this process has to stop before a linearly independentlist of length m + 1 has been obtained, i.e., U is spanned by a list of lengthat most m, so is finite dimensional.

Note: In the preceding proof, the spanning list obtained for U was alsolinearly independent. Moreover, we could have taken V as the subspace U onwhich to carry out the algorithm to obtain a linearly independent spanninglist. This is a very important type of list.

Definition A basis for a vector space V over F is a list L of vectors ofV that is a spanning list as well as a linearly independent list. We have justobserved the following fact:

3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS 25

Lemma 3.2.4. Each finite dimensional vector space V has a basis. By con-vention the empty set ∅ is a basis for the zero space {~0} .

Lemma 3.2.5. If V is a finite dimensional vector space, then there is aunique integer n ≥ 0 such that each basis of V has length exactly n.

Proof. If B1 = (v1, . . . , vn) and B2 = (u1, . . . , um) are two bases of V , thensince B1 spans V and B2 is linearly independent, m ≤ n. Since B1 is linearlyindependent and B2 spans V , n ≤ m. Hence m = n.

Definition If the finite dimensional vector space V has a basis withlength n, we say that V is n-dimensional or that n is the dimension of V .Moreover, each finite dimensional vector space has a well-defined dimension.

For this course it is completely satisfactory to consider bases only for finitedimensional spaces. However, students occasionally ask about the infinitedimensional case and we think it is pleasant to have a convenient treatmentavailable. Hence we have included a treatment of the infinite dimensionalcase in Chapter 12 that treats a number of special topics. For our generalpurposes, however, we are content merely to say that any vector space whichis not finite dimensional is infinite dimensional (without trying to associateany specific infinite cardinal number with the dimension).

Lemma 3.2.6. A list B = (v1, . . . , vn) of vectors in V is a basis of V if andonly if every v ∈ V can be written uniquely in the form

v = a1v1 + · · ·+ anvn. (3.1)

Proof. First assume that B is a basis. Since it is a spanning set, each v ∈ Vcan be written in the form given in Eq. 3.1. Since B is linearly independent,such an expression is easily seen to be unique. Conversely, suppose eachv ∈ V has a unique expression in the form of Eq. 3.1. The existence of theexpression for each v ∈ V implies that B is a spanning set. The uniquenessthen implies that B is linearly independent.

Theorem 3.2.7. If L = (v1, . . . , vn) is a spanning list of V , then a basis ofV can be obtained by deleting certain elements from L. Consequently everyfinite dimensional vector space has a basis.

Proof. If L is linearly independent, it must already be a basis of V . If not,consider v1. If v1 = ~0, discard v1. If not, then leave L unchanged. Since


L is assumed to be linearly dependent, there must be some j such thatvj ∈ span(v1, . . . , vj−1). Choose the smallest j for which this is true anddelete that vj from L. This will yield a list L1 of n − 1 elements that stillspans V . If L1 is linearly independent, then L1 is the desired basis of V . Ifnot, then proceed as before to obtain a spanning list of size n− 2. Continuethis way, always producing spanning sets of shorter length, until eventuallya linearly independent spanning set is obtained.

Lemma 3.2.8. Every linearly independent list of vectors in a finite dimen-sional vector space V can be extended to a basis of V .

Proof. Let V be a finite dimensional space, say with dim(V ) = n. LetL = (v1, . . . , vk) be a linearly independent list. So 0 ≤ k ≤ n. If k < n weknow that L cannot span the space, so there is a vector vk+1 not in span(L).Then L′ = (v1, . . . , vk, vk+1) must still be linearly independent. If k = 1 < nwe can repeat this process, adjoining vectors one at a time to produce longerlinearly independent lists until a basis is obtained. As we have seen above,this must happen when we have an independent list of length n.

Lemma 3.2.9. If V is a finite-dimensional space and U is a subspace ofV , then there is a subspace W of V such that V = U ⊕ W . Moreover,dim(U) ≤ dim(V ) with equality if and only if U = V .

Proof. We have seen that U must also be finite-dimensional, so that ithas a basis B1 = (v1, . . . , vk). Since B1 is a linearly independent set ofvectors in a finite-dimensional space V , it can be completed to a basisB = (v1, . . . , vk, vk+1, . . . , vn) of V . Put W = span(vk+1, . . . , vn). ClearlyV = U + W , and U ∩W = {~0}. It follows that V = U ⊕W . The last partof the lemma should also be clear.

Theorem 3.2.10. Let L = (v1, . . . , vk) be a list of vectors of an n-dimensionalspace V . Then any two of the following properties imply the third one:

(i) k = n;(ii) L is linearly independent;(iii) L spans V .

Proof. Assume that (i) and (ii) both hold. Then L can be completed to abasis, which must have exactly n vectors, i.e., L already must span V . If(ii) and (iii) both hold, L is a basis by definition and must have n elementsby definition of dimension. If (iii) and (i) both hold, L can be restricted to

3.3. EXERCISES 27

form a basis, which must have n elements. Hence L must already be linearlyindependent.

Theorem 3.2.11. let U and W be finite-dimensional subspaces of the vectorspace V . Then

dim(U +W ) = dim(U) + dim(W )− dim(U ∩W ).

Proof. Let B1 = (v1, . . . , vk) be a basis for U ∩ W . Complete it to a ba-sis B2 = (v1, . . . , vk, u1, . . . , ur) of U and also complete it to a basis B3 =(v1, . . . , vk, w1, . . . , wt) for W . Put B = (v1, . . . , vk, u1, . . . , ur, w1, . . . , wt).We claim that B is a basis for U +W , from which the theorem follows.

First we show that B is linearly independent. So suppose that thereare scalars ai, bi, ci ∈ F for which

∑ki=1 aivi +

∑ri=1 biui +

∑ti=1 ciwi = ~0.

It follows that∑k

i=1 aivi +∑r

i=1 biui = −∑t

i=1 ciwi ∈ U ∩ W . Since B2

is linearly independent, this means all the bi’s are equal to 0. This forces∑ki=1 aivi +

∑ti=1 ciwi = ~0. Since B3 is linearly independent, this forces all

the ai’s and ci’s to be 0. But it should be quite clear that B spans U +V , sothat in fact B is a basis for U +W .

At this point the following theorem is easy to prove. We leave the proofas an exercise.

Theorem 3.2.12. Let U1, . . . , Um be finite-dimensional subspaces of V withV = U1 + · · ·Um and with Bi a basis for Ui, 1 ≤ i ≤ m. The the followingare equivalent:

(i) V = U1 ⊕ · · · ⊕ Um;(ii) dim(V ) = dim(U1) + dim(U2) + · · ·+ dim(Um);(iii) (B1,B2, . . . ,Bm) is a basis for V .(iv) The spaces U1, . . . , Um are linearly independent.

3.3 Exercises

1. Write out a proof of Lemma 3.1.1.

2. Let L be any list of vectors of V , and let S be the set of all vectors inL. Show that the intersection of all subspaces having S as a subset isjust the span of L.


3. Show that any subset T of a linearly independent set S of vectors isalso linearly independent, and observe that this is equivalent to thefact that if T is a linearly dependent subset of S, then S is also linearlydependent. Note: The empty set ∅ of vectors is linearly independent.

4. Show that the intersection of any family of linearly independent sets ofvectors of V also linearly independent.

5. Give an example of two linearly dependent sets whose intersection islinearly independent.

6. Show that a list (v) of length 1 is linearly dependent if and only ifv = ~0.

7. Show that a list (v1, v2) of length 2 is linearly independent if and onlyif neither vector is a scalar times the other.

8. Let m be a positive integer. Let V = {f ∈ F [x] : deg(f) = m or f =~0}. Show that V is or is not a subspace of F [x].

9. Prove or disprove: There is a basis of Pm(x) all of whose members havedegree m.

10. Prove or disprove: there exists a basis (p0, p1, p2, p3) of P3(F ) such that

(a) all the polynomials p0, . . . , p3 have degree 3.

(b) all the polynomials p0, . . . , p3 give the value 0 when evaluated at3.

(c) all the polynomials p0, . . . , p3 give the value 3 when evaluated at0.

(d) all the polynomials p0, . . . , p3 give the value 3 when evaluated at0 and give the value 1 when evaluated at 1.

11. Prove that if U1, U2, · · · , Um are subspaces of V then dim(U1 + · · ·+Um) ≤ dim(U1) + dim(U2) + · · ·+ dim(Um).

12. Prove or give a counterexample: If U1, U2, U3 are three subspaces ofa finite dimensional vector space V , then

dim(U1 + U2 + U3) =

3.3. EXERCISES 29

dim(U1) + dim(U2) + dim(U3)

−dim(U1 ∩ U2)− dim(U2 ∩ U3)− dim(U3 ∩ U1)

+dim(U1 ∩ U2 ∩ U3).

13. Suppose that p0, p1, . . . , pm are polynomials in the space Pm(F ) (ofpolynomials over F with degree at most m) such that pj(2) = 0 for eachj. Prove that the set {p0, p1, . . . , pm} is not linearly independent inPm(F ).

14. Let U, V,W be vector spaces over a field F, with U and W subspaces ofV . We say that V is the direct sum of U and W and write V = U⊕Wprovided that for each vector v ∈ F there are unique vectors u ∈ U ,w ∈ W for which v = u+ w.

(a) Prove that if dimU + dimW = dimV and U ∩ W = {0}, thenV = U ⊕W .

(b) Prove that if C is an m×n matrix over the fieldR of real numbers,then Rn = Col(CT ) ⊕ N(C), where Col(A) denotes the columnspace of the matrix A and N(A) denotes the right null space ofA.

Chapter 4

Linear Transformations

4.1 Definitions and Examples

Throughout this chapter we let U , V and W be vector spaces over the fieldF .

Definition A function (or map) T from U to V is called a linear map orlinear transformation provided T satisfies the following two properties:

(i) T (u+ v) = T (u) + T (v) for all u, v ∈ U .and

(ii) T (au) = aT (u) for all a ∈ F and all u ∈ U .These two properties can be combined into the following single property:

Obs. 4.1.1. T : U → V is linear provided T (au+bv) = aT (u)+bT (v)∀a, b ∈F, u, v ∈ U .

Notice that T is a homomorphism of the additive group (U,+) into theadditive group (V,+). So you should be able to show that T (~0) = ~0, wherethe first ~0 is the zero vector of U and the second is the zero vector of V .

The zero map: The map 0 : U → V defined by 0(v) = ~0 for all v ∈ Uis easily seen to be linear.

The identity map: Similarly, the map I : U → U defined by I(v) = vfor all v ∈ U is linear.

For f(x) = a0 + a1x + a2x2 + · · · anxn ∈ F [x], the formal derivative of

f(x) is defined to be f ′(x) = a1 + 2a2x+ 3a3x3 + · · ·+ nanx

n−1.Differentiation: Define D : F [x] → F [x] by D(f) = f ′, where f ′ is the

formal derivative of f . Then D is linear.

31

32 CHAPTER 4. LINEAR TRANSFORMATIONS

The prototypical linear map is given by the following. Let F n and Fm bethe vector spaces of column vectors with n and m entries, respectively. LetA ∈Mm,n(F ). Define TA : F n → Fm by

TA : X 7→ AX for all X ∈ F n.

The usual properties of matrix algebra force TA to be linear.

Theorem 4.1.2. Suppose that U is n-dimensional with basis B = (u1, . . . , un),and let L = (v1, . . . vn) be any list of n vectors of V . Then there is a uniquelinear map T : U → V such that T (ui) = vi for 1 ≤ i ≤ n.

Proof. Let u be any vector of U , so u =∑n

i=1 aiui for unique scalars ai ∈ F .Then the desired T has to be defined by T (u) =

∑ni=1 aiT (ui) =

∑ni=1 aivi.

This clearly defines T uniquely. The fact that T is linear follows easily fromthe basic properties of vector spaces.

Put L(U, V ) = {T : U → V : T is linear}.

The interesting fact here is that L(U, V ) is again a vector space in its ownright. Vector addition S + T is defined for S, T ∈ L(U, V ) by: (S + T )(u) =S(u) + T (u) for all u ∈ U . Scalar multiplication is defined for a ∈ F andT ∈ L(U, V ) by (aT )(u) = a(T (u)) for all u ∈ U . You should verify thatwith this vector addition and scalar multiplication L(U, V ) is a vector spacewith the zero map being the additive identity.

Now suppose that T ∈ L(U, V ) and S ∈ L(V,W ). Then the composition(S ◦ T ) : U → W , usually just written ST , defined by ST (u) = S(T (u)),is well-defined and is easily shown to be linear. In general the compositionproduct is associative (when a triple product is defined) because this is trueof the composition of functions in general. But also we have the distributiveproperties (S1 + S2)T = S1T + S2T and S(T1 + T2) = ST1 + ST2 wheneverthe products are defined.

In general the multiplication of linear maps is not commutative even whenboth products are defined.

4.2 Kernels and Images

We use the following language. If f : A → B is a function, we say that Ais the domain of f and B is the codomain of f . But f might not be onto

4.2. KERNELS AND IMAGES 33

B. We define the image (or range of f by Im(f) = {b ∈ B : f(a) =b for at least one a ∈ A}. And we use this language for linear maps also.The null space (or kernel) of T ∈ L(U, V ) is defined by null(T ) = {u ∈ U :T (u) = ~0}. NOTE: The terms range and image are interchangeable, andthe terms null space and kernel are interchangeable.

In the exercises you are asked to show that the null space and image of alinear map are subspaces of the appropriate spaces.

Lemma 4.2.1. Let T ∈ L(U, V ). Then T is injective (i.e., one-to-one) ifand only if null(T ) = {~0}.

Proof. Since T (~0) = ~0, if T is injective, clearly null(T ) = {~0}. Conversely,suppose that null(T ) = {~0}. Then suppose that T (u1) = T (u2) for u1, u2 ∈U . Then by the linearity of T we have T (u1 − u2) = T (u1) − T (u2) = ~0, sou1 − u2 ∈ null(T ) = {~0}. Hence u1 = u2, implying T is injective.

The following Theorem and its method of proof are extremely useful inmany contexts.

Theorem 4.2.2. If U is finite dimensional and T ∈ L(U, V ), then Im(T ) isfinite-dimensional and

dim(U) = dim(null(T )) + dim(Im(T )).

Proof. (Pay close attention to the details of this proof. You will want to usethem for some of the exercises.)

Start with a basis (u1, . . . , uk) of null(T ). Extend this list to a basis(u1, . . . , uk, w1, . . . , wr) of U . Thus dim(null(T )) = k and dim(U) = k + r.To complete a proof of the theorem we need only show that dim(Im(T )) = r.We will do this by showing that (T (w1), . . . , T (wr)) is a basis of Im(T ). Letu ∈ U . Because (u1, . . . , uk, w1, . . . , wr) spans U , there are scalars ai, bi ∈ Fsuch that

u = a1u1 + · · · akuk + b1w1 + · · · brwr.

Remember that u1, . . . , uk are in null(T ) and apply T to both sides of thepreceding equation.

T (u) = b1T (w1) + · · · brT (wr).


This last equation implies that (T (w1), . . . , T (wr)) spans Im(T ), so at leastIm(T ) is finite dimensional. To show that (T (w1), . . . , T (wr)) is linearlyindependent, suppose that

r∑i=1

ciT (wi) = ~0 for some ci ∈ F.

It follows easily that∑r

i=1 ciwi ∈ null(T ), so∑r

i=1 ciwi =∑k

i=1 diui. Since(u1, . . . , uk, w1, . . . , wr) is linearly independent, we must have that all theci’s and di’s are zero. Hence (T (w1), . . . , T (wr)) is linearly independent, andhence is a basis for Im(T ).

Obs. 4.2.3. There are two other ways to view the equality of the precedingtheorem. If n = dim(U), k = dim(null(T )) and r = dim(Im(T )), then thetheorem says n = k + r. And if dim(V ) = m, then clearly r ≤ m. Sok = n− r ≥ n−m. If n > m, then k > 0 and T is not injective. If n < m,then r ≤ n < m says T is not surjective (i.e., onto).

Definition: Often we say that the dimension of the null space of a linearmap T is the nullity of T .

4.3 Rank and Nullity Applied to Matrices

Let A ∈ Mm,n(F ). Recall that the row rank of A (i.e., the dimension ofthe row space row(A) of A) equals the column rank (i.e., the dimension ofthe column space col(A)) of A, and the common value rank(A) is called therank of A. Let TA : F n → Fm : X 7→ AX as usual. Then the null space ofTA is the right null space rnull(A) of the matrix A, and the image of TA isthe column space of A. So by Theorem 4.2.2 n = rank(A) + dim(rnull(A)).Similarly, m = rank(A) + dim(lnull(A)). (Clearly lnull(A) denotes the leftnull space of A.)

Theorem 4.3.1. Let A be an m× n matrix and B be an n× p matrix overF . Then

(i) dim(rnull(AB)) ≤ dim(rnull(A)) + dim(rnull(B)),and(ii) rank(A) + rank(B) - rank(AB) ≤ n.

4.4. PROJECTIONS 35

Proof. Start with

rnull(B) ⊆ rnull(AB) ⊆ F p.

col(B) ∩ rnull(A) ⊆ rnull(A) ⊆ F n.

Consider the map TB : rnull(AB) → col(B) ∩ rnull(A) : ~x 7→ B~x. Note thatTB could be defined on all of F p, but we have defined it only on rnull(AB).On the other hand, rnull(B) really is in rnull(AB), so null(TB) = rnull(B).Also, Im(TB) ⊆ col(B) ∩ rnull(A) ⊆ rnull(A), since TB is being applied onlyto vectors ~x for which A(B~x) = 0. Hence we have the following:

dim(rnull(AB)) = dim(null(TB)) + dim(Im(TB))

= dim(rnull(B)) + dim(Im(TB))

≤ dim(rnull(B)) + dim(rnull(A)).

This proves (i).This can be rewritten as p− rank(AB) ≤ (p− rank(B)) + (n− rank(A)),

which implies (ii).

4.4 Projections

Suppose V = U ⊕W . Define P : V → V as follows. For each v ∈ V , writev = u + w with u ∈ U and w ∈ W . Then u and w are uniquely definedfor each v ∈ V . Put P (v) = u. It is straightforward to verify the followingproperties of P .

Obs. 4.4.1. (i) P ∈ L(V ).(ii) P 2 = P .(iii) U = Im(P ).(iv) W = null(P ).(v) U and W are both P -invariant.

This linear map P is called the projection onto U along W (or parallel toW ) and is often denoted by P = PU,W . Using this notation we see that

Obs. 4.4.2. I = PU,W + PW,U .


As a kind of converse, suppose that P ∈ L(V ) is idempotent, i.e., P 2 = P .Put U = Im(P ) and W = null(P ). Then for each v ∈ V we can writev = P (v) + (v−P (v)) ∈ Im(P ) + null(P ). Hence V = U +W . Now supposethat u ∈ U ∩ W . Then on the one hand u = P (v) for some v ∈ V . Onthe other hand P (u) = ~0. Hence ~0 = P (u) = P 2(v) = P (v) = u, implyingU ∩W = {~0}. Hence V = U ⊕W . It follows readily that P = PU,W . Hencewe have the following:

Obs. 4.4.3. The linear map P is idempotent if and only if it is the projectiononto its image along its null space.

4.5 Bases and Coordinate Matrices

In this section let V be a finite dimensional vector space over the field F .Since F is a field, i.e., since the multiplication in F is commutative, it turnsout that it really does not matter whether V is a left vector space over F(i.e., the scalars from F are placed on the left side of the vectors of V ) or Vis a right vector space. Let the list B = (v1, . . . , vn) be a basis for V . So if vis an arbitrary vector in V there are unique scalars c1, . . . , cn in F for whichv =

∑ni=1 civi. The column vector [v]B = (c1, . . . , cn)T ∈ F n is then called

the coordinate matrix for v with respect to the basis B, and we may write

v =n∑i=1

civi = (v1, . . . , vn)

c1

c2...cn

= B[v]B.

Perhaps we should discuss this last multiplication a bit.

In the usual theory of matrix manipulation, if we want to multiply twomatrices A and B to get a matrix AB = C, there are integers n, m and psuch that A is m× n, B is n× p, and the product C is m× p. If in generalthe entry in the ith row and jth column of a matrix A is denoted by Aij,then the (i, j)th entry of AB = C is (AB)ij =

∑nk=1 AikBkj. If A is a row or

a column the entries are usually indicated by a single subscript. So we might

4.5. BASES AND COORDINATE MATRICES 37

write

A = (a1, . . . , an); B =

b1

b2...bn

; AB =∑k

akbk.

In this context it is usually assumed that the entries of A and B (and hencealso of C) come from some ring, probably a commutative ring R, so that inparticular this sum

∑k akbk is a uniquely defined element of R. Moreover,

using the usual properties of arithmetic in R it is possible to show directlythat matrix multiplication (when defined!) is associative. Also, matrix ad-dition is defined and matrix multiplication (when defined!) distributes overaddition, etc. However, it is not always necessary to assume that the entriesof A and B come from the same kind of algebraic system. We may just aseasily multiply a column (c1, . . . , cn)T of scalars from the field F by the row(v1, . . . , vn) representing an ordered basis of V over F to obtain

v = (v1, . . . , vn) ·

c1

c2...cn

=n∑i=1

civi.

We may also suppose A is an n× n matrix over F and write

(u1, u2, . . . , un) = (v1, v2, . . . , vn)A, so uj =n∑i=1

Aijvi.

Then it follows readily that (u1, . . . , un) is an ordered basis of V if andonly if the matrix A is invertible, in which case it is also true that

(v1, v2, . . . , vn) = (u1, u2, . . . , un)A−1, so vj =n∑i=1

(A−1)ijui.

To see this, observe that

(v1, . . . , vn) ·

c1

c2...cn

= (v1, v2, . . . , vn)A ·

c1

c2...cn


= (v1, v2, . . . , vn) · (A

c1

c2...cn

)

Since (v1, v2, . . . , vn) is an independent list, this product is the zero vector ifand only if

A

c1

c2...cn

is the zero vector, from which the desired result is clear.

4.6 Matrices as Linear Transformations

Let B1 = (u1, . . . , un) be an ordered basis for the vector space U over thefield F , and let B2 = (v1, . . . , vm) be an ordered basis for the vector spaceV over F . Let A be an m × n matrix over F . Define TA : U → V by[TA(u)]B2 = A · [u]B1 for all u ∈ U . It is quite straightforward to show thatTA ∈ L(U, V ). It is also clear (by letting u = uj), that the jth column of Ais [T (uj)]B2 . Conversely, if T ∈ L(U, V ), and if we define the matrix A to bethe matrix with jth column equal to [T (uj)]B2 , then [T (u)]B2 = A · [u]B1 forall u ∈ U . In this case we say that A is the matrix that represents T withrespect to the pair (B1,B2) of ordered bases of U and V , respectively, and wewrite A = [T ]B2,B1 .

Note that a coordinate matrix of a vector with respect to a basis has asubscript that is a single basis, whereas the matrix representing a linear mapT has a subscript which is a pair of bases, with the basis of the range spacelisted first and that of the domain space listed second. Soon we shall see whythis order is the convenient one. When U = V and B1 = B2 it is sometimesthe case that we write [T ]B1 in place of [T ]B1,B1 . And we usually write L(V )in place of L(V, V ), and T ∈ L(V ) is called a linear operator on V .

In these notes, however, even when there is only one basis of V beingused and T is a linear operator on V , we sometimes indicate the matrix thatrepresents T with a subscript that is a pair of bases, instead of just one basis,because there are times when we want to think of T as a member of a vector

4.6. MATRICES AS LINEAR TRANSFORMATIONS 39

space so that it has a coordinate matrix with respect to some basis of thatvector space. Our convention makes it easy to recognize when the matrixrepresents T with respect to a basis as a linear map and when it represents Tas a vector itself which is a linear combination of the elements of some basis.

We give an example of this.

Example 4.6.1. Let V = M2,3(F ). Put B = (v1, . . . , v6) where the vi aredefined as follows:

v1 =

(1 0 00 0 0

); v2 =

(0 1 00 0 0

); v3 =

(0 0 10 0 0

);

v4 =

(0 0 01 0 0

); v5 =

(0 0 00 1 0

); v6 =

(0 0 00 0 1

).

It is clear that B is a basis for M2,3(F ), and if A ∈ M2,3(F ), then thecoordinate matrix [A]B of A with respect to the basis B is

[A]B = (A11, A12, A13, A21, A22, A23)T .

Given such a matrix A, define a linear map TA : F 3 → F 2 by TA(u) = Aufor all u ∈ F 3. Let S1 = (e1, e2, e3) be the standard ordered basis of F 3,and let S2 = (h1, h2) be the standard ordered basis of F 2. So, for example,h2 = (0, 1)T . You should check that

[TA]S2,S1 = A.

For 1 ≤ i ≤ 3; 1 ≤ j ≤ 2, let fij ∈ L(F 3, F 2) be defined by fij(ek) =δikhj. Let B3 = (f11, f21, f31, f12, f22, f32). We want to figure out what is thecoordinate matrix [TA]B3. (The next theorem sows that B3 is indeed a basisof L(F 3, F 2).)

We claim that [TA]B3 = (a11, a12, a13, a21, a22, a23)T . Because of the orderin which we listed the basis vectors fij, this is equivalent to saying that TA =∑

i,j aijfji. If we evaluate this sum at (ek) we get

∑i,j

aijfji(ek) =∑i

aikfki(ek) =∑i

aikhi =

(a1k

a2k

)= Aek = TA(ek).

This establishes our claim.


Recall that B1 = (u1, . . . , un) is an ordered basis for U over the field F ,and that B2 = (v1, . . . , vm) is an ordered basis for V over F . Now supposethat W has an ordered basis B3 = (w1, . . . , wp).

Let S ∈ L(U, V ) and T ∈ L(V,W ), so that T ◦S ∈ L(U,W ), where T ◦Smeans do S first. Then we have

[(T ◦ S)(u)]B3 = [T (S(u))]B3 = [T ]B3,B2 [S(u)]B2 = [T ]B3,B2 [S]B2,B1 [u]B1 =

= [T ◦ S]B3,B1 [u]B1 for all u ∈ U.

This implies that

[T ◦ S]B3,B1 = [T ]B3,B2 · [S]B2,B1 .

This is the equation that suggests that the subscript on the matrix repre-senting a linear map should have the basis for the range space listed first.

Recall that L(U, V ) is naturally a vector space over F with the usualaddition of linear maps and scalar multiplication of linear maps. Moreover,for a, b ∈ F and S, T ∈ L(U, V ), it follows easily that

[aS + bT ]B2,B1 = a[S]B2,B1 + b[T ]B2,B1 .

We leave the proof of this fact as a straightforward exercise. It thenfollows that if U = V and B1 = B2, the correspondence T 7→ [T ]B1,B1 = [T ]B1

is an algebra isomorphism. This includes consequences such as [T−1]B =([T ]B)−1 when T happens to be invertible. Proving these facts is a worthwhileexercise!

Let fij ∈ L(U, V ) be defined by

fij(uk) = δikvj, 1 ≤ i, k ≤ n; 1 ≤ j ≤ m.

So fij maps ui to vj and maps uk to the zero vector for k 6= i. Thiscompletely determines fij as a linear map from U to V .

Theorem 4.6.2. The set B∗ = {fij : 1 ≤ i ≤ m; 1 ≤ j ≤ n} is a basis forL(U, V ) as a vector space over F .

Note: We could turn B∗ into a list, but we don’t need to.

4.7. CHANGE OF BASIS 41

Proof. We start by showing that B∗ is linearly independent. Suppose that∑ij cijfij = 0, so that ~0 =

∑ij cijfij(uk) =

∑j ckjvj for each k. Since

(v1, . . . , vm) is linearly independent, ck1 = ck2 = · · · = ckm = 0, and this holdsfor each k, so the set of fij must be linearly independent. We now show that itspans L(U, V ). For suppose that S ∈ L(U, V ) and that [S]B2,B1 = C = (cij),i.e., S(uj) =

∑ni=1 cijvi. Put T =

∑i,k cikfki. Then T (uj) =

∑cikfki(uj) =∑

i cijvi, implying that S = T since they agree on a basis.

Corollary 4.6.3. If dim(U) = n and dim(V ) = m, then dimL(U, V ) = mn.

4.7 Change of Basis

We want to investigate what happens to coordinate matrices and to matricesrepresenting linear operators when the ordered basis is changed. For thesake of simplicity we shall consider this question only for linear operatorson a space V , so that we can use a single ordered basis. So in this sectionwe write matrices representing linear operators with a subscript which is asingle basis.

Let F be any field and let V be a finite dimensional vector space overF , say dim(V ) = n. Let B1 = (u1, . . . , un) and B2 = (v1, . . . , vn) be two(ordered) bases of V . So for v ∈ V , and for i = 1, say that [v]B1 =(c1, c2, . . . , cn)T , i.e., v =

∑ni=1 ciui. We often write this equality in the

form

v = (u1, . . . , un)[v]B1 = B1[v]B1 .

Similarly, v = B2[v]B2 .

Since B1 and B2 are both bases for V , there is an invertible matrix Qsuch that

B1 = B2Q and B2 = B1Q−1.

The first equality indicates that (u1, . . . , un) = (v1, . . . , vn)Q, or

uj =n∑i=1

Qijvi. (4.1)


This equation says that the jth column of Q is the coordinate matrix ofuj with respect to B2. Similarly, the jth column of Q−1 is the coordinatematrix of vj with respect to B1.

For every v ∈ V we now have

v = B1[v]B1 = (B2Q)[v]B1 = B2[v]B2 .

It follows thatQ[v]B1 = [v]B2 . (4.2)

Now let T ∈ L(V ). Recall that the matrix [T ]B that represents T withrespect to the basis B is the unique matrix for which

[T (v)]B = [T ]B[v]B for all v ∈ V.

Theorem 4.7.1. Let B1 = B2Q as above. Then [T ]B2 = Q[T ]B1Q−1.

Proof. In Eq. 4.2 replace v with T (v) to get

Q[T (v)]B1 = [T (v)]B2 = [T ]B2 [v]B2 = [T ]B2Q[v]B1

for all v ∈ V . It follows that

Q[T ]B1 [v]B1 = [T ]B2Q[v]B1

for all v ∈ V , implyingQ[T ]B1 = [T ]B2Q, (4.3)

which is equivalent to the statement of the theorem.

It is often convenient to remember this fact in the following form (let Pplay the role of Q−1 above):

If B2 = B1P, then [T ]B2 = P−1[T ]B1P. (4.4)

A Specific Setting

Now let V = F n, whose elements we think of as being column vectors.Let A be an n× n matrix over F and define TA ∈ L(F n) by

TA(v) = Av, for all v ∈ F n.

4.7. CHANGE OF BASIS 43

Let S = (e1, . . . , en) be the standard ordered basis for F n, i.e., ej is thecolumn vector in F n whose jth entry is 1 and all other entries are equal to0. It is clear that we can identify each vector v ∈ F n with [v]S . Moreover,the jth column of A is Aej = [Aej]S = [TAej]S = [TA]S [ej]S = [TA]Sej = thejth column of [TA]S , which implies that

A = [TA]S . (4.5)

Putting all this together, we have

Theorem 4.7.2. Let S = (e1, . . . , en) be the standard ordered basis of F n.Let B = (v1, . . . , vn) be a second ordered basis. Let P be the matrix whosejth column is vj = [vj]S . Let A be an n × n matrix over F and defineTA : F n → F n by TA(v) = Av. So [TA]S = A. Then [TA]B = P−1AP .

Definition Two n × n matrices A and B over F are said to be similar(written A ∼ B) if and only if there is an invertible n × n matrix P suchthat B = P−1AP . You should prove that “similarity” is an equivalencerelation on Mn(F ) and then go on to complete the details giving a proof ofthe following corollary.

Corollary 4.7.3. If A, B ∈ Mn(F ), and if V is an n-dimensional vectorspace over F , then A and B are similar if and only if there are bases B1 andB2 of V and T ∈ L(V ) such that A = [T ]B1 and B = [T ]B2.

The Dual SpaceWe now specialize to the case where V = F is viewed as a vector space

over F . Here L(U, F ) is denoted U∗ and is called the dual space of U . Anelement of L(U, F ) is called a linear functional. Write B = (u1, . . . , un) forthe fixed ordered basis of U . Then 1 ∈ F is a basis of F over F , so we writethis basis as 1 = (1) and m = 1, and there is a basis B∗ of U∗ defined byB∗ = (f1, . . . , fn), where fi(uj) = δij ∈ F . This basis B∗ is called the basisdual to B. If f ∈ U∗ satisfies f(uj) = cj for 1 ≤ j ≤ n, then

[f ]1,B1= [c1, . . . , cn] = [f(u1), . . . , f(un)].

As above in the more general case, if g =∑

i cifi, then g(uj) =∑

i cifi(uj) =cj, so f = g, and [f ]B∗ = [c1, . . . , cn]T = ([f ]1,B1

)T . We restate this in general:

For f ∈ U∗, [f ]B∗ = ([f ]1,B1)T .


Given a vector space U , let GL(U) denote the set of all invertible linearoperators on U . If we let composition of maps be the binary operation ofGL(U), then GL(U) turns out to be a group called the general linear groupon U .

Suppose that T ∈ GL(U), i.e., T is an invertible element of L(U,U). Wedefine a map T : U∗ → U∗ by

T (f) = f ◦ T−1, for all f ∈ U∗.

We want to determine the matrix [T ]B∗,B∗ . We know that the jth column of

this matrix is [T (fj)]B∗ = [fj◦T−1]B∗ =([fj ◦ T−1]1,B

)T=([fj]1,B · [T−1]B,B

)T=

((0, · · · , 1j, · · · , 0) ([T ]B,B)−1)T = ([T ]B,B)−T ·

0...

1j...0

= jth column of ([T ]B,B)−T .

This says that

[T ]B∗,B∗ = ([T ]B,B)−T .

Exercise 18. asks you to show that the map GL(V )→ GL(V ∗) : T 7→ Tis an isomorphism.

4.8 Exercises

1. Let T ∈ L(U, V ). Then null(T ) is a subspace of U and Im(T ) is asubspace of V .

2. Suppose that V and W are finite-dimensional and that U is a subspaceof V . Prove that there exists a T ∈ L(V,W ) such that null(T ) = U ifand only if dim(U) ≥ dim(V )− dim(W ).

3. Let T ∈ L(V ). Put R = Im(T ) and N = null(T ). Note that both Rand N are T -invariant. Show that R has a complementary T -invariantsubspace W (i.e., V = R⊕W and T (W ) ⊆ W ) if and only if R∩N ={~0}, in which case N is the unique T -invariant subspace complementaryto R.

4.8. EXERCISES 45

4. State and prove Theorem 4.3.1 for linear maps (instead of for matrices).

5. Prove Corollary 4.6.3

6. If T ∈ L(U, V ), we know that as a function from U to V , T has aninverse if and only if it is bijective (i.e., one-to-one and onto). Showthat when T is invertible as a function, then its inverse is in L(V, U).

7. If T ∈ L(U, V ) is invertible, and if B1 is a basis for U and B2 is a basisfor V , then ([T ]B2,B1)

−1 = [T−1]B1,B2 .

8. Two vector spaces U and V are said to be isomorphic provided there isan invertible T ∈ L(U, V ). Show that if U and V are finite-dimensionalvector spaces over F , then U and V are isomorphic if and only ifdim(U) = dim(V ).

9. Let A ∈ Mm,n(F ) and b ∈ Mm,1(F ) = Fm. Consider the matrix

equation A~x = ~b as a system of m linear equations in n unknownsx1, . . . , xn. Interpret Obs. 4.2.3 for this system of linear equations.

10. Let B1, B2 be bases for U and V , respectively, with dim(U) = n anddim(V ) = m. Show that the map

M : L(U, V )→Mm,n(F ) : T 7→ [T ]B2,B1

is an invertible linear map.

11. Suppose that V is finite-dimensional and that T ∈ L(V ). Show thatthe following are equivalent:

(i) T is invertible.

(ii) T is injective.

(iii) T is surjective.

12. Suppose that V is finite dimensional and S, T ∈ L(V ). Prove that STis invertible if and only if both S and T are invertible.

13. Suppose that V is finite dimensional and T ∈ L(V ). Prove that Tis a scalar multiple of the identity if and only if ST = TS for everyS ∈ L(V ).


14. Suppose that W is finite dimensional and T ∈ L(V,W ). Prove that Tis injective if and only if there exists an S ∈ L(W,V ) such that ST isthe identity map on V .

15. Suppose that V is finite dimensional and T ∈ L(V,W ). Prove that Tis surjective if and only if there exists an S ∈ L(W,V ) such that TS isthe identity map on W .

16. Let V be the vector space over the realsR consisting of the polynomialsin x of degree at most 4 with coefficients in R, and with the usualaddition of vectors (i.e., polynomials) and scalar multiplication.

Let B1 = {1, x, x2, . . . , x4} be the standard ordered basis of V . Let Wbe the vector space of 2 × 3 matrices over R with the usual additionof matrices and scalar multiplication. Let B2 be the ordered basis of

W given as follows: B2 = {v1 =

(1 0 00 0 0

), v2 =

(0 1 00 0 0

), v3 =(

0 0 10 0 0

), v4 =

(0 0 01 0 0

), v5 =

(0 0 00 1 0

), v6 =

(0 0 00 0 1

)}.

Define T : V 7→ W by:

For f = a0 + a1x+ a2x2 + a3x

3 + a4x4, put

T (f) =

(0 a3 a2 + a4

a1 + a0 a0 0

).

Construct the matrix A = [T ]B1,B2 that represents T with respect tothe pair B1,B2 of ordered bases.

17. Let T ∈ L(V ). Prove or disprove each of the following:

(a) V = null(T )⊕ range(T ).

(b) There exists a subspace U of V such that U ∩ null(T ) = {0} andrange(T ) = {T (u) : u ∈ U}.

18. Let V be a finite dimensional vector space over F . Show that the map

GL(V )→ GL(V ∗) : T 7→ T

is an isomorphism.

Chapter 5

Polynomials

5.1 Algebras

It is often the case that basic facts about polynomials are taken for granted asbeing well-known and the subject is never developed in a formal manner. Inthis chapter, which we usually assign as independent reading, we wish to givethe student a somewhat formal introduction to the algebra of polynomialsover a field. It is then natural to generalize to polynomials with coefficientsfrom some more general algebraic structure, such as a commutative ring. Thetitle of the course for which this book is intended includes the words “linearalgebra,” so we feel some obligation to define what a linear algebra is.

Definition Let F be a field. A linear algebra over the field F is a vectorspace A over F with an additional operation called multiplication of vectorswhich associates with each pair of vectors u, v ∈ A a vector uv in A calledthe product of u and v in such a way that

(a) multiplication is associative: u(vw) = (uv)w for all u, v, w ∈ A;

(b) multiplication distributes over addition: u(v +w) = (uv) + (uw) and(u+ v)w = (uw) + (vw), for all u, v, w ∈ A;

(c) for each scalar c ∈ F , c(uv) = (cu)v = u(cv) for all u, v ∈ A.

If there is an element 1 ∈ A such that 1u = u1 = u for each u ∈ A, wecall A a linear algebra with identity over F , and call 1 the identity of A. Thealgebra A is called commutative provided uv = vu for all u, v ∈ A.

Warning: What we have just called a “linear algebra” is sometimescalled a “linear associative algebra,” because multiplication of vectors is as-sociative. There are important situations in which nonassociative linear al-

47

48 CHAPTER 5. POLYNOMIALS

gebras (i.e., not necessarily associative algebras) are studied. We will notmeet them in this course, so for us all linear algebras are associative.

Example 5.1.1. The set of n × n matrices over a field, with the usual op-erations, is a linear algebra with identity; in particular the field itself is analgebra with identity. This algebra is not commutative if n ≥ 2. Of course,the field itself is commutative.

Example 5.1.2. The space of all linear operators on a vector space, withcomposition as the product, is a linear algebra with identity. It is commutativeif and only if the space is one-dimensional.

Now we turn our attention to the construction of an algebra which isquite different from the two just given. Let F be a field and let S be the setof all nonnegative integers. We have seen that the set of all functions fromS into F is a vector space which we now denote by F∞. The vectors in F∞

are just infinite sequences (i.e., lists) f = (f0, f1, f2, . . .) of scalars fi ∈ F . Ifg = (g0, g1, g2, . . .) and a, b ∈ F , then af + bg is the infinite list given by

af + bg = (af0 + bg0, af1 + bg1, . . .) (5.1)

We define a product in F∞ by associating with each pair (f, g) of vectorsin F∞ the vector fg which is given by

(fg)n =n∑i=0

fign−i, n = 0, 1, 2, . . . (5.2)

Since multiplication in F is commutative, it is easy to show that multi-plication in F∞ is also commutative. In fact, it is a relatively routine taskto show that F∞ is now a linear algebra with identity over F . Of course thevector (1, 0, 0, . . .) is the identity, and the vector x = (0, 1, 0, 0, . . .) plays adistinguished role. Throughout this chapter x will continue to denote thisparticular vector (and will never be an element of the field F ). The productof x with itself n times will be denoted by xn, and by convention x0 = 1.Then

x2 = (0, 0, 1, 0, . . .), x3 = (0, 0, 0, 1, 0, . . .), etc.

Obs. 5.1.3. The list (1, x, x2, . . .) is both independent and infinite. Thus thealgebra F∞ is not finite dimensional.

5.2. THE ALGEBRA OF POLYNOMIALS 49

The algebra F∞ is sometimes called the algebra of formal power seriesover F . The element f = (f0, f1, f2, . . .) is frequently written as

f =∞∑n=0

fnxn. (5.3)

This notation is very convenient, but it must be remembered that it ispurely formal. In algebra there is no such thing as an ‘infinite sum,’ and thepower series notation is not intended to suggest anything about convergence.

5.2 The Algebra of Polynomials

Definition Let F [x] be the subspace of F∞ spanned by the vectors 1, x, x2, . . . ,.An element of F [x] is called a polynomial over F .

Since F [x] consists of all (finite) linear combinations of x and its powers,a non-zero vector f in F∞ is a polynomial if and only if there is an integern ≥ 0 such that fn 6= 0 and such that fk = 0 for all integers k > n. Thisinteger (when it exists) is called the degree of f and is denoted by deg(f).Thezero polynomial is said to have degree −∞. So if f ∈ F [x] has degree n, itmay be written in the form

f = f0x0 + f1x+ f2x

2 + · · ·+ fnxn, fn 6= 0.

Usually f0x0 is simply written f0 and called a scalar polynomial. A non-zero

polynomial f of degree n such that fn = 1 is called a monic polynomial. Theverification of the various parts of the next result guaranteeing that A = F [x]is an algebra is routine and is left to the reader.

Theorem 5.2.1. Let f and g be non-zero polynomials over F . Then(i) fg is a non-zero polynomial;(ii) deg(fg) = deg(f) + deg(g) ;(iii) fg is a monic polynomial if both f and g are monic;(iv) fg is a scalar polynomial if and only if both f and g are scalar

polynomials;(v) deg(f + g) ≤ max{deg(f), deg(g)}.

Corollary 5.2.2. The set F [x] of all polynomials over a given field F withthe addition and multiplication given above is a commutative linear algebrawith identity over F .


Corollary 5.2.3. Suppose f, g, and h are polynomials over F such thatf 6= 0 and fg = fh. Then g = h.

Proof. Since fg = fh, also f(g − h) = 0. Since f 6= 0, it follows from (i)above that g − h = 0.

Let f =∑m

i=0 fixi and g =

∑nj=0 gjx

j, and interpret fk = 0, gt = 0, ifk > m, t > n, respectively. Then

fg =∑i,j

figjxi+j =

m+n∑i=0

(i∑

j=0

fjgi−j

)xi,

where the first sum is extended over all integer pairs (i, j) with 0 ≤ i ≤ mand 0 ≤ j ≤ n.

Definition Let A be a linear algebra with identity over the field F . Wedenote the identity of A by 1 and make the convention that u0 = 1 for eachu ∈ A. Then to each polynomial f =

∑ni=0 fix

i over F and u ∈ A, weassociate an element f(u) in A by the rule

f(u) =n∑i=0

fiui.

Warning: f0u0 = f0 · 1, where 1 is the multiplicative identity of A.

Example 5.2.4. Let C be the field of complex numbers and let f = x2 + 2.

(a) If A = C and z ∈ C, f(z) = z2 + 2, in particular f(3) = 11 and

f

(1 + i

1− i

)= 1.

(b) If A is the algebra of all 2× 2 matrices over C and if

B =

(1 0−1 2

),

then

f(B) = 2

(1 00 1

)+

(1 0−1 2

)2

=

(3 0−3 6

).

5.2. THE ALGEBRA OF POLYNOMIALS 51

(c) If A is the algebra of all linear operators on C3 and T is the element ofA given by

T (c1, c2, c3) = (i√

2c1, c2, i√

2c3),

then f(T ) is the linear operator on C3 defined by

f(T )(c1, c2, c3) = (0, 3c2, 0).

(d) If A is the algebra of all polynomials over C and g = x4 + 3i, then f(g)is the polynomial in A given by

f(g) = −7 + 6ix4 + x8.

Theorem 5.2.5. Let F be a field, A a linear algebra with identity over F ,f, g ∈ F [x], u ∈ A and c ∈ F . Then:

(i) (cf + g)(u) = cf(u) + g(u);(ii) (fg)(u) = f(u)g(u) = (gf)(u).

Proof. We leave (i) as an exercise. So for (ii), suppose

f =m∑i=0

fixi and g =

n∑j=0

gjxj.

Recall that fg =∑

i,j figjxi+j. So using (i) we obtain

(fg)(u) =∑i,j

figjui+j =

(m∑i=0

fiui

)(n∑j=0

gjuj

)= f(u)g(u).

Fix u ∈ A and define Eu : F [x]→ A by

Eu(f) = f(u). (5.4)

Using Theorem 5.2.5 it is now easy to see that the map Eu : F [x] → Ais an algebra homomorphism, i.e., it preserves addition and multiplication.There is a special case of this that is so important for us that we state it asa separate corollary.

Corollary 5.2.6. If A = L(V ) and T ∈ A, and if f, g ∈ F [x],then(f · g)(T ) = f(T ) ◦ g(T ).


5.3 Lagrange Interpolation

Throughout this section F is a fixed field and t0, t1, . . . , tn are n+ 1 distinctelements of F . Put V = {f ∈ F [x] : deg(f) ≤ n}, and define Ei : V → Fby Ei(f) = f(ti), 0 ≤ i ≤ n.

By Theorem 5.2.5 each Ei is a linear functional on V . Moreover, we showthat B∗ = (E0, E1, . . . , En) is the basis of V ∗ dual to a particular basis of V .

Put

pi =∏j 6=i

(x− tjti − tj

).

Then each pi has degree n, so belongs to V , and

Ej(pi) = pi(tj) = δij. (5.5)

It will turn out that B = (p0, . . . , pn) is a basis for V , and then Eq. 5.5expresses what we mean by saying that B∗ is the basis dual to B.

If f =∑n

i=0 cipi, then for each j

f(tj) =∑i

cipi(tj) = cj. (5.6)

So if f is the zero polynomial, each cj must equal 0, implying that the list(p0, . . . , pn) is linearly independent in V . Since (1, x, x2, . . . , xn) is a basis forV , clearly dim(V ) = n+ 1. Hence B = (p0, . . . , pn) must be a basis for V . Itthen follows from Eq. 5.6 that for each f ∈ V , we have

f =n∑i=0

f(ti)pi. (5.7)

The expression in Eq. 5.7 is known as Lagrange’s Interpolation For-mula. Setting f = xj in Eq. 5.7 we obtain

xj =n∑i=0

(ti)jpi (5.8)

Definition Let A1 and A2 be two linear algebras over F . They are saidto be isomorphic provided there is a one-to-one mapping u 7→ u′ of A1 ontoA2 such that

(a) (cu+ dv)′ = cu′ + dv′

5.3. LAGRANGE INTERPOLATION 53

and(b) (uv)′ = u′v′

for all u, v ∈ A1 and all scalars c, d ∈ F . The mapping u 7→ u′ is calledan isomorphism of A1 onto A2. An isomorphism of A1 onto A2 is thus avector space isomorphism of A1 onto A2 which has the additional propertyof preserving products.

Example 5.3.1. Let V be an n-dimensional vector space over the field F . Aswe have seen earlier, each ordered basis B of V determines an isomorphismT 7→ [T ]B of the algebra of linear operators on V onto the algebra of n × nmatrices over F . Suppose now that S is a fixed linear operator on V and thatwe are given a polynomial

f =n∑i=0

cixi

with coefficients ci ∈ F . Then

f(S) =n∑i=0

ciSi.

Since T 7→ [T ]B is a linear mapping,

[f(S)]B =n∑i=0

ci[Si]B.

From the additional fact that

[T1T2]B = [T1]B[T2]B

for all T1, T2 ∈ L(V ), it follows that

[Si]B = ([S]B)i , 2 ≤ i ≤ n.

As this relation is also valid for i = 0 and 1, we obtain the result that

Obs. 5.3.2.[f(S)]B = f ([S]B) .

In other words, if S ∈ L(V ), the matrix of a polynomial in S, with respectto a given basis, is the same polynomial in the matrix of S.


5.4 Polynomial Ideals

In this section we are concerned primarily with the fact that F [x] is a prin-cipal ideal domain.

Lemma 5.4.1. Suppose f and d are non-zero polynomials in F [x] such thatdeg(d) ≤ deg(f). Then there exists a poynomial g ∈ F [x] for which

deg(f − dg) < deg(f).

Note: This includes the possibility that f = dg so deg(f − dg) = −∞.

Proof. Suppose

f = amxm +

m−1∑i=0

aixi, am 6= 0

and that

d = bnxn +

n−1∑i=0

bixi, bn 6= 0.

Then m ≥ n and

f −(ambn

)xm−nd = 0 or deg

[f −

(ambn

)xm−nd

]< deg(f).

Thus we may take g =(ambn

)xm−n.

This lemma is useful in showing that the usual algorithm for “long divi-sion” of polynomials works over any field.

Theorem 5.4.2. If f, d ∈ F [x] and d 6= 0, then there are unique polynomialsq, r ∈ F [x] such that

(i) f = dq + r;(ii) deg(r) < deg(d).

Proof. If deg(f) < deg(d) we may take q = 0 and r = f . In case f 6= 0and deg(f) ≥ deg(d), the preceding lemma shows that we may choose apolynomial g ∈ F [x] such that deg(f − dg) < deg(f). If f − dg 6= 0 anddeg(f − dg) ≥ deg(d) we choose a polynomial h ∈ F [x] such that

deg[f − d(g + h)] < deg(f − dg).

5.4. POLYNOMIAL IDEALS 55

Continuing this process as long as necessary, we ultimately obtain polynomi-als q, r satisfying (i) and (ii).

Suppose we also have f = dq1 + r1 where deg(r1) < deg(d). Thendq+ r = dq1 + r1 and d(q− q1) = r1− r. If q−a1 6= 0, then d(q− q1) 6= 0 and

deg(d) + deg(q − q1) = deg(r1 − r).

But since the degree of r1 − r is less than the degree of d, this is impossible.Hence q = a1 and then r = r1.

Definition Let d be a non-zero polynomial over the field F . If f ∈ F [x],the preceding theorem shows that there is at most one polynomial q ∈ F [x]such that f = dq. If such a q exists we say that d divides f , that f is divisibleby d, and call q the quotient of f by d. We also write q = f/d.

Corollary 5.4.3. Let f ∈ F [x], and let c ∈ F . Then f is divisible by x− cif and ony if f(c) = 0.

Proof. By the theorem, f = (x− c)q + r where r is a scalar polynomial. ByTheorem 5.2.5,

f(c) = 0q(c) + r(c) = r(c).

Hence r = 0 if and only if f(c) = 0.

Definition Let F be a field. An element c ∈ F is said to be a root or azero of a given polynomial f ∈ F [x] provided f(c) = 0.

Corollary 5.4.4. A polynomial f ∈ F [x] of degree n has at most n roots inF .

Proof. The result is obviously true for polynomials of degree 0 or 1. Weassume it to be true for polynomials of degree n − 1. If a is a root of f ,f = (x − a)q where q has degree n − 1. Since f(b) = 0 if and only if a = bor q(b) = 0, it follows by our induction hypothesis that f has at most nroots.

Definition Let F be a field. An ideal in F [x] is a subspace M of F [x]such that fg belongs to M whenever f ∈ F [x] and g ∈M .


Example 5.4.5. If F is a field and d ∈ F [x], the set M = dF [x] of allmultiples df of d by arbitrary f in F [x] is an ideal. This is because d ∈ M(so M is nonempty), and it is easy to check that M is closed under additionand under multiplication by any element of F [x]. The ideal M is called theprincipal ideal generated by d and is denoted by dF [x]. If d is not the zeropolynomial and its leading coefficient is a, then d1 = a−1d is monic andd1F [x] = dF [x].

Example 5.4.6. Let d1, . . . , dn be a finite number of polynomials over F .Then the (vector space) sum M of the subspaces diF [x] is a subspace and isalso an ideal. M is the ideal generated by the polynomials d1, . . . , dn.

The following result is the main theorem on ideals in F [x].

Theorem 5.4.7. Let M be any non-zero ideal in F [x]. Then there is aunique monic polynomial d ∈ F [x] such that M is the principal ideal dF [x]generated by d.

Proof. Among all nonzero polynomials in M there is (at least) one of minimaldegree. Hence there must be a monic polynomial d of least degree in M .Suppose that f is any element of M . We can divide f by d and get aunique quotient and remainder: f = qd + r where deg(r) < deg(d). Thenr = f − qd ∈ M , but deg(r) is less than the smallest degree of any nonzeropolynomial in M . Hence r = 0. So f = qd. This shows that M ⊆ dF [x].Clearly d ∈ M implies dF [x] ⊆ M , so in fact M = dF [x]. If d1 and d2

are two monic polynomials in F [x] for which M = d1F [x] = d2F [x], thend1 divides d2 and d2 divides d1. Since they are monic, they clearly must beidentical.

Definition If p1, . . . , pk ∈ F [x] and not all of them are zero, then themonic generator d of the ideal p1F [x] + · · · + pkF [x] is called the greatestcomon divisor (gcd) of p1, · · · , pk. We say that the polynomials p1, . . . , pkare relatively prime if the greatest common divisor is 1, or equivalently if theideal they generate is all of F [x].

There is a very important (but now almost trivial) consequence of The-orem 5.4.7. It is easy to show that if A is any n × n matrix over any fieldF , the set {f(x) ∈ F [x] : f(A) = 0} is an ideal in F [x]. Hence there isa unique monic polynomial p(x) of least degree such that p(A) = 0, and ifg(x) ∈ F [x] satisfies g(A) = 0, then g(x) = p(x)h(x) for some h(x) ∈ F [x].This polynomial p(x) is called the minimal polynomial of A.

5.4. POLYNOMIAL IDEALS 57

The exercises at the end of this chapter are to be considered an integralpart of the chapter. You should study them all.

Definition The field F is called algebraically closed provided each poly-nomial in F [x] that is irreducible over F has degree 1.

To say that F is algebraically closed means the every non-scalar irre-ducible monic polynomial over F is of the form x − c. So to say F is alge-braically closed really means that each non-scalar polynomial f in F [x] canbe expressed in the form

f = c(x− c1)n1 · · · (x− ck)nk

where c is a scalar and c1, . . . , ck are distinct elements of F . It is also truethat F is algebraically closed provided that each non-scalar polynomial overF has a root in F .

The field R of real numbers is not algebraically closed, since the polyno-mial (x2 + 1) is irreducible over R but not of degree 1. The FundamentalTheorem of Algebra states that the field C of complex numbers is algebraicallyclosed. We shall prove this theorem later after we have introduced the con-cepts of determinant and of eigenvalues.

The Fundamental Theorem of Algebra also makes it clear what the possi-bilities are for the prime factorization of a polynomial with real coefficients. Iff is a polynomial with real coefficients and c is a complex root of f , then thecomplex conjugate c is also a root of f . Therefore, those complex roots whichare not real must occur in conjugate pairs, and the entire set of roots has theform {b1, . . . , br, c1, c1, . . . , ck, ck}, where b1, . . . , br are real and c1, . . . , ck arenon-real complex numbers. Thus f factors as

f = c(x− b1) . . . (x− br)p1 · · · pk

where pi is the quadratic polynomial

pi = (x− ci)(x− ci).

These polynomials pi have real coefficients. We see that every irreduciblepolynomial over the real number field has degree 1 or 2. Each polynomialover R is the product of certain linear factors given by the real roots of fand certain irreducible quadratic polynomials.


5.5 Exercises

1. (The Binomial Theorem) Let(mk

)= m!

k!(m−k)!be the usual binomial co-

efficient. Let a, b be elements of any commutative ring. Then

(a+ b)m =m∑i=0

(m

k

)am−kbk.

Note that the binomial coefficient is an integer that may be reducedmodulo any modulus p. It follows that even when the denominator of(mk

)appears to be 0 in some ring of characteristic p, for example, this

binomial coefficient can be interpreted modulo p. For example(6

3

)= 20 ≡ 2 (mod 3),

even though 3! is zero modulo 3.

The derivative of the polynomial

f = c0 + c1x+ · · ·+ cnxn

is the polynomial

f ′ = Df = c1 + 2c2x+ · · ·+ ncnxn−1.

Note: D is a linear operator on F [x].

2. (Taylor’s Formula). Let F be any field, let n be any positive integer,and let f ∈ F have degree m ≤ n. Then

f =n∑k=0

Dk(f)(c)

k!(x− c)k.

Be sure to explain how to deal with the case where k! is divisible bythe characteristic of the field F .

If f ∈ F [x] and c ∈ F , the multiplicity of c as a root of f is the largestpositive integer r such that (x− c)r divides f .

5.5. EXERCISES 59

3. Show that if the multiplicity of c as a root of f is r ≥ 2, then themultiplicity of c as a root of f ′ is at least r − 1.

4. Let A =

(a bc d

). Let M = {f ∈ F [x] : f(A) = 0}. Show that M is

a nonzero ideal. (Hint: consider the polynomial f(x) = x2− (a+d)x+(ad− bc).)Definition Let F be a field. A polynomial f ∈ F [x] is said to bereducible over F provided there are polynomials g, h ∈ F [x] with degreeat least 1 for which f = gh. If f is not reducible over F , it is said tobe irreducible over F . A polynomial p(x) ∈ F [x] of degree at least 1is said to be a prime polynomial over F provided whenever p dividesa product gh of two polynomials in F [x] then it has to divide at leastone of g and h.

5. Show that a polynomial p(x) ∈ F [x] with degree at least 1 is primeover F if and only if it is irreducible over F .

6. (The Primary Decomposition of f) If F is a field, a non-scalar monicpolynomial in F [x] can be factored as a product of monic primes inF [x] in one and, except for order, only one way. If p1, . . . , pk are thedistinct monic primes occurring in this factorization of f , then

f = pn11 p

n22 · · · p

nkk ,

where ni is the number of times the prime pi occurs in this factorization.This decomposition is also clearly unique and is called the primarydecomposition of f (or simply the prime factorization of f).

7. Let f be a non-scalar monic polynomial over the field F , and let

f = pn11 · · · p

nkk

be the prime factorization of f . For each j, 1 ≤ j ≤ k, let

fj =f

pjnj=∏i 6=j

pnii .

Then f1, . . . , fk are relatively prime.


8. Using the same notation as in the preceding problem, suppose thatf = p1 · · · pk is a product of distinct non-scalar irreducible polynomialsover F . So fj = f/pj. Show that

f ′ = p′1f1 + p′2f2 + · · ·+ p′kfk.

9. Let f ∈ F [x] have derivative f ′. Then f is a product of distinct irre-ducible polynomials over F if and only if f and f ′ are relatively prime.

10. Euclidean algorithm for polynomials

Let f(x) and g(x) be polynomials over F for which

deg(f(x)) ≥ deg(g(x)) ≥ 1.

Use the division algorithm for polynomials to compute polynomialsqi(x) and ri(x) as follows.

f(x) = q1(x)g(x) + r1(x), degr1(x) < deg(g(x)). If r1(x) 6= 0, then

g(x) = q2r1(x) + r2(x), degr2(x) < deg(g(x)). If r2(x) 6= 0, then

r1(x) = q3(x)r2(x) + r3(x), degr3(x) < deg(r2(x)). If r3(x) 6= 0, then

r2(x) = q4(x)r3(x) + r4(x), degr4(x) < deg(r3(x)). If r4(x) 6= 0, then...

rj(x) = qj+2(x)rj+1(x) + rj+2(x), degrj+2(x) < deg(rj+1(x)). If rj+2(x) = 0,

rj(x) = qj+2(x)rj+1(x) + 0.

Show that rj+1(x) = gcd(f(x), g(x)). Then use these equations to ob-tain polynomials a(x) and b(x) for which rj+1(x) = a(x)f(x)+b(x)g(x).The case where 1 is the gcd of f(x) and g(x) is especially useful.

Chapter 6

Determinants

We assume that the reader has met the notion of a commutative ring K with1. Our main goal in this chapter is to study the usual determinant functiondefined on the set of square matrices with entries from such a K. However,essentially nothing from the general theory of commutative rings with 1 willbe used.

One of the main types of application of the notion of determinant isto determinants of matrices whose entries are polynomials in one or moreindeterminates over a field F . So we might have K = F [x], the ring ofpolynomials in the indeterminate x with coefficients from the field F . It isalso quite useful to consider the theory of determinants over the ring Z ofrational integers.

6.1 Determinant Functions

Throughout these notes K will be a commutative ring with identity. Thenfor each positive integer n we wish to assign to each n× n matrix over K ascalar (element of K) to be known as the determinant of the matrix. As soonas we have defined these terms we may say that the determinant function isn-linear alternating with value 1 at the identity matrix.

Definition Let D be a function which assigns to each n × n matrix Aover K a scalar D(A) in K. We say that D is n-linear provided that for eachi, 1 ≤ i ≤ n, D is a linear function of the ith row when the other n− 1 rowsare held fixed.

Perhaps this definition needs some clarification. If D is a function from

61

62 CHAPTER 6. DETERMINANTS

Mm,n(K) into K, and if α1, . . . , αn are the rows of the matrix A, we alsowrite

D(A) = D(α1, . . . , αn),

that is, we think of D as a function of the rows of A. The statement that Dis n-linear then means

D(α1, . . . , cαi + α′i, . . . , αn) = cD(α1, . . . , αi, . . . , αn) + (6.1)

+ D(α1, . . . , α′i, . . . , αn).

If we fix all rows except row i and regard D as a function of the ith row, itis often convenient to write D(αi) for D(A). Thus we may abbreviate Eq. 6.1to

D(cαi + α′i) = cD(αi) +D(α′i),

so long as it is clear what the meaning is.

In the following sometimes we use Aij to denote the element in row i andcolumn j of the matrix A, and sometimes we write A(i, j).

Example 6.1.1. Let k1, . . . , kn be positive integers, 1 ≤ ki ≤ n, and let a beany element of K. For each n× n matrix A over K, define

D(A) = aA(1, k1)A(2, k2) · · ·A(n, kn). (6.2)

Then the function defined by Eq. 6.2 is n-linear. For, if we regard D asa function of the ith row of A, the others being fixed, we may write

D(αi) = A(i, ki)b

where b is some fixed element of K (b is a multiplied by one entry of A fromeach row other than the i-th row.) . Let α′i = (A′i1, . . . , A

′in). Then we have

D(cαi + α′i) = [cA(i, ki) + A′(i, ki)]b

= cD(αi) +D(α′i).

Thus D is a linear function of each of the rows of A.

A particular n-linear function of this type is just the product of the diag-onal entries:

D(A) = A11A22 · · ·Ann.

6.1. DETERMINANT FUNCTIONS 63

Example 2. We find all 2-linear functions on 2×2 matrices over K. LetD be such a function. If we denote the rows of the 2× 2 identity matrix byε1 and ε2, then we have

D(A) = D(A11ε1 + A12ε2, A21ε1 + A22ε2).

Using the fact that D is 2-linear, we have

D(A) = A11D(ε1, A21ε1 + A22ε2) + A12D(ε2, A21ε1 + A22ε2) =

= A11A21D(ε1, ε1) + A11A22D(ε1, ε2) + A12A21D(ε2, ε1) + A12A22D(ε2, ε2).

This D is completely determined by the four scalars

D(ε1, ε1), D(ε1, ε2), D(ε2, ε1), D(ε2, ε2).

It is now routine to verify the following. If a, b, c, d are any four scalarsin K and if we define

D(A) = A11A21a+ A11A22b+ A12A21c+ A12A22d,

then D is a 2-linear function on 2× 2 matrices over K and

D(ε1, ε1) = a, D(ε1, ε2) = bD(ε2, ε1) = c, D(ε2, ε2) = d.

Lemma 6.1.2. A linear combination of n-linear functions is n-linear.

Proof. It suffices to prove that a linear combination of two n-linear functionsis n-linear. Let D and E be n-linear functions. If a and b are elements of Kthe linear combination aD + bE is defined by

(aD + bE)(A) = aD(A) + bE(A).

Hence, if we fix all rows except row i,

(aD + bE)(cαi + α′i) = aD(cαi + α′i) + bE(cαi + α′i)

= acD(αi) + aD(α′i) + bcE(αi) + bE(α′i)

= c(aD + bE)(αi) + (aD + bE)(α′i).


NOTE: If K is a field and V is the set of n×n matrices over K, the abovelemma says the following. The set of n-linear functions on V is a subspaceof the space of all functions from V into K.

Example 3. Let D be the function defined on 2× 2 matrices over K by

D(A) = A11A22 − A12A21. (6.3)

This D is the sum of two functions of the type described in Example 1:

D = D1 +D2

D1(A) = A11A22

D2(A) = −A12A21

(6.4)

Most readers will recognize this D as the “usual” determinant functionand will recall that it satisfies several additional properties, such as the fol-lowing one.

6.1.3 n-Linear Alternating Functions

Definition: Let D be an n-linear function. We say D is alternating providedD(A) = 0 whenever two rows of A are equal.

Lemma 6.1.4. Let D be an n-linear alternating function, and let A be n×n.If A′ is obtained from A by interchanging two rows of A, then D(A′) =−D(A).

Proof. If the ith row of A is α and the jth row of A is β, i 6= j, and all otherrows are being held constant, we write D(α, β) in place of D(A).

D(α + β, α + β) = D(α, α) +D(α, β) +D(β, α) +D(β, β).

By hypothesis, D(α + β, α + β) = D(α, α) = D(β, β) = 0. So

0 = D(α, β) +D(β, α).

6.1. DETERMINANT FUNCTIONS 65

If we assume that D is n-linear and has the property that D(A′) = −D(A)when A′ is obtained from A by interchanging any two rows of A, then if Ahas two equal rows, clearly D(A) = −D(A). If the characteristic of the ringK is odd or zero, then this forces D(A) = 0, so D is alternating. But, forexample, if K is an integral domain with characteristic 2, this is clearly notthe case.

6.1.5 A Determinant Function - The LaplaceExpansion

Definition: Let K be a commutative ring with 1, and let n be a positiveinteger. Suppose D is a function from n×n matrices over K into K. We saythat D is a determinant function if D is n-linear, alternating and D(I) = 1.

It is clear that there is a unique determinant function on 1× 1 matrices,and we are now in a position to handle the 2×2 case. It should be clear thatthe function given in Example 3. is a determinant function. Furthermore,the formulas exhibited in Example 2. make it easy to see that the D givenin Example 3. is the unique determinant function.

Lemma 6.1.6. Let D be an n-linear function on n × n matrices over K.Suppose D has the property that D(A) = 0 when any two adjacent rows of Aare equal. Then D is alternating.

Proof. Let B be obtained by interchanging rows i and j of A, where i < j.We can obtain B from A by a succession of interchanges of pairs of adjacentrows. We begin by interchanging row i with row i+ 1 and continue until therows are in the order

α1, . . . , αi−1, αi+1, . . . , αj, αi, αj+1, . . . , αn.

This requires k = j − i interchanges of adjacent rows. We now move αjto the ith position using (k − 1) interchanges of adjacent rows. We havethus obtained B from A by 2k − 1 interchanges of adjacent rows. Thus byLemma 6.1.4,

D(B) = −D(A).

Suppose A is any n × n matrix with two equal rows, say αi = αj withi < j. If j = i + 1, then A has two equal and adjacent rows, so D(A) = 0.


If j > i + 1, we intechange αi+1 and αj and the resulting matrix B has twoequal and adjacent rows, so D(B) = 0. On the other hand, D(B) = −D(A),hence D(A) = 0.

Lemma 6.1.7. Let K be a commutative ring with 1 and let D be an alter-nating n-linear function on n× n matrices over K. Then

(a) D(A) = 0 if one of the rows of A is 0.

(b) D(B) = D(A) if B is obtained from A by adding a scalar multiple ofone row of A to a different row of A.

Proof. For part (a), suppose the ith row αi is a zero row. Using the linearityof D in the ith row of A says D(αi + αi) = D(αi) + D(αi), which forcesD(A) = D(αi) = 0. For part (b), if i 6= j, write D(A) = D(αi, αj), withall rows other than the ith one held fixed. D(B) = D(αi + cαj, αj) =D(αi, αj) + cD(αj, αj) = D(A) + 0.

Definition: If n > 1 and A is an n × n matrix over K, we let A(i|j)denote the (n− 1)× (n− 1) matrix obtained by deleting the ith row and jthcolumn of A. If D is an (n− 1)-linear function and A is an n× n matrix, weput Dij(A) = D[A(i|j)].

Theorem 6.1.8. Let n > 1 and let D be an alternating (n−1)-linear functionon (n − 1) × (n − 1) matrices over K. For each j, 1 ≤ j ≤ n, the functionEj defined by

Ej(A) =n∑i=1

(−1)i+jAijDij(A) (6.5)

is an alternating n-linear function on n×n matrices A. If D is a determinantfunction, so is each Ej.

Proof. If A is an n × n matrix, Dij(A) is independent of the ith row of A.Since D is (n−1)-linear, it is clear that Dij is linear as a function of any rowexcept row i. Therefore AijDij(A) is an n-linear function of A. Hence Ej isn-linear by Lemma 6.1.2. To prove that Ej is alternating it will suffice toshow that Ej(A) = 0 whenever A has two equal and adjacent rows. Supposeαk = αk+1. If i 6= k and i 6= k+ 1, the matrix A(i|j) has two equal rows, andthus Dij(A) = 0. Therefore,

Ej(A) = (−1)k+jAkjDkj(A) + (−1)k+1+jA(k+1)jD(k+1)j(A).

6.2. PERMUTATIONS & UNIQUENESS OF DETERMINANTS 67

Since αk = αk+1,

Akj = A(k+1)j and A(k|j) = A(k + 1|j).

Clearly then Ej(A) = 0.

Now suppose D is a determinant function. If I(n) is the n × n identitymatrix, then I(n)(j|j) is the (n − 1) × (n − 1) identity matrix I(n−1). Since

I(n)ij = δij, it follows from Eq. 6.5 that

Ej(I(n)) = D(I(n−1)). (6.6)

Now D(I(n−1)) = 1, so that Ej(I(n)) = 1 and Ej is a determinant function.

We emphasize that this last Theorem (together with a simple inductionargument) shows that if K is a commutative ring with identity and n ≥ 1,then there exists at least one determinant function on Kn×n. In the nextsection we will show that there is only one determinant function. The de-terminant function Ej is referred to as the Laplace expansion of the deter-minant along the jth column. There is a similar Laplace expansion of thedeterminant along the ith row of A which will eventually show up as an easycorollary.

6.2 Permutations & Uniqueness of Determi-

nants

6.2.1 A Formula for the Determinant

Suppose that D is an alternating n-linear function on n × n matrices overK. Let A be an n× n matrix over K with rows α1, α2, . . . , αn. If we denotethe rows of the n× n identity matrix over K by ε1, ε2, . . . , εn, then

αi =n∑j=1

Aijεj, 1 ≤ i ≤ n. (6.7)

Hence


D(A) = D

(∑j

A1jεj, α2, . . . , αn

)=

∑j

A1jD(εj, α2, . . . , αn).

If we now replace α2 with∑

k A2kεk, we see that

D(A) =∑j,k

A1jA2kD(εj, εk, . . . , αn).

In this expression replace α3 by∑

lA3lεl, etc. We finally obtain

D(A) =∑

k1,k2,...,kn

A1k1A2k2 · · ·AnknD(εk1 , . . . , εkn). (6.8)

Here the sum is over all sequences (k1, k2, . . . , kn) of positive integers notexceeding n. This shows that D is a finite sum of functions of the type de-scribed by Eq. 6.2. Note that Eq. 6.8 is a consequence just of the assumptionthat D is n-linear, and that a special case was obtained earlier in Example2. Since D is alternating,

D(εk1 , εk2 , . . . , εkn) = 0

whenever two of the indices ki are equal. A sequence (k1, k2, . . . , kn) of pos-itive integers not exceeding n, with the property that no two of the ki areequal, is called a permutation of degree n. In Eq. 6.8 we need therefore sumonly over those sequences which are permutations of degree n.

A permutation of degree n may be defined as a one-to-one function fromthe set {1, 2, . . . , n} onto itself. Such a function σ corresponds to the n-tuple(σ1, σ2, . . . , σn) and is thus simply a rule for ordering 1, 2, . . . , n in somewell-defined way.

If D is an alternating n-linear function and A is a n× n matrix over K,we then have

D(A) =∑σ

A1(σ1) · · ·An(σn)D(εσ1, . . . , εσn) (6.9)

where the sum is extended over the distinct permutations σ of degree n.


Next we shall show that

D(εσ1, . . . , εσn) = ±D(ε1, . . . , εn) (6.10)

where the sign ± depends only on the permutation σ. The reason for this isas follows. The sequence (σ1, σ2, . . . , σn) can be obtained from the sequence(1, 2, . . . , n) by a finite number of interchanges of pairs of elements. Forexample, if σ1 6= 1, we can transpose 1 and σ1, obtaining (σ1, . . . , 1, . . .).Proceeding in this way we shall arrive at the sequence (σ1, . . . , σn)) after nor fewer such interchanges of pairs. Since D is alternating, the sign of itsvalue changes each time that we interchange two of the rows εi and εj. Thus,if we pass from (1, 2, . . . , n) to (σ1, σ2, . . . , σn) by means of m interchangesof pairs (i, j), we shall have

D(εσ1, . . . , εσn) = (−1)mD(ε1, . . . , εn).

In particular, if D is a determinant function

D(εσ1, . . . , εσn) = (−1)m, (6.11)

where m depends only on σ, not on D. Thus all determinant functions assignthe same value to the matrix with rows εσ1, . . . , εσn, and this value is either1 or -1.

A basic fact about permutations is the following: if σ is a permutationof degree n, one can pass from the sequence (1, 2, . . . , n) to the sequence(σ1, σ2, . . . , σn) by a succession of interchanges of pairs, and this can bedone in a variety of ways. However, no matter how it is done, the number ofinterchanges used is either always even or always odd. The permutation isthen called even or odd, respectively. One defines the sign of a permutationby

sgn σ =

{1, if σ is even;−1, if σ is odd.

We shall establish this basic properties of permutations below from whatwe already know about determinant functions. However, for the momentlet us assume this property. Then the integer m occurring in Eq. 6.11 isalways even if σ is an even permutation, and is always odd if σ is an oddpermutation. For any alternating n-linear function D we then have

D(εσ1, . . . , εσn) = (sgn σ)D(ε1, . . . , εn),


and using Eq. 6.9 we obtain

D(A) =

[∑σ

(sgn σ)A1(σ1) · · ·An(σn)

]D(I). (6.12)

From Eq. 6.12 we see that there is precisely one determinant function onn× n matrices over K. If we denote this function by det, it is given by

det(A) =∑σ

(sgn σ)A1(σ1)A2(σ2) · · ·An(σn), (6.13)

the sum being extended over the distinct permutations σ of degree n. Wecan formally summarize this as follows.

Theorem 6.2.2. Let K be a commutative ring with 1 and let n be a positiveinteger. There is precisely one determinant function on the set of n × nmatrices over K and it is the function det defined by Eq. 6.13. If D is anyalternating n-linear function on Mn(K), then for each n× n matrix A,

D(A) = (det A)D(I).

This is the theorem we have been working towards, but we have left a gapin the proof. That gap is the proof that for a given permutation σ, when wepass from (1, 2, . . . , n) to (σ1, . . . , σn) by interchanging pairs, the number ofinterchanges is always even or always odd. This basic combintaorial fact canbe proved without any reference to determinants. However, we now pointout how it follows from the existence of a determinant function on n × nmatrices.

Let K be the ring of rational integers. Let D be a determinant functionon the n × n matrices over K. Let σ be a permutation of degree n, andsuppose we pass from (1, 2, . . . , n) to (σ1, . . . , σn) by m interchanges of pairs(i, j), i 6= j. As we showed in Eq. 6.11

(−1)m = D(εσ1, . . . , εσn),

that is, the number (−1)m must be the value of D on the matrix with rowsεσ1, . . . , εσn. If

D(εσ1, . . . , εσn) = 1,

then m must be even. If

D(εσ1, . . . , εσn) = −1,


then m must be odd.From the point of view of products of permutations, the basic property

of the sign of a permutation is that

sgn (στ) = (sgn σ)(sgn τ). (6.14)

This result also follows from the theory of determinants (but is well knownin the theory of the symmetric group independent of any determinant theory).In fact, it is an easy corollary of the following theorem.

Theorem 6.2.3. Let K be a commutative ring with identity, and let A andB be n× n matrices over K. Then

det (AB) = (det A)( det B).

Proof. Let B be a fixed n × n matrix over K, and for each n × n matrix Adefine D(A) = det (AB). If we denote the rows of A by α1, . . . , αn, then

D(α1, . . . , αn) = det (α1B, . . . , αnB).

Here αjB denotes the 1× n matrix which is the product of the 1× n matrixαj and the n× n matrix B. Since

(cαi + α′i)B = cαiB + α′iB

and det is n-linear, it is easy to see that D is n-linear. If αi = αj, thenαiB = αjB, and since det is alternating,

D(α1, . . . , αn) = 0.

Hence, D is alternating. So D is an alternating n-linear function, and byTheorem 6.2.2

D(A) = (det A)D(I).

But D(I) = det (IB) = det B, so

det (AB) = D(A) = (det A)(det B).


6.3 Additional Properties of Determinants

Several well-known properties of the determinant function are now easy con-sequences of results we have already obtained, Eq. 6.13 and Theorem 6.2.3,for example. We give a few of these, proving some and leaving the others asrather routine exercises.

Recall that a unit u in a ring K with 1 is an element for which there isan element v ∈ K for which uv = vu = 1.

6.3.1 If A is a unit in Mn(K), then det(A) is a unit inK.

If A is an invertible n × n matrix over a commutative ring K with 1, thendet (A) is a unit in the ring K.

6.3.2 Triangular Matrices

If the square matrix A over K is upper or lower triangular, then det (A) isthe product of the diagonal entries of A.

6.3.3 Transposes

If AT is the transpose of the square matrix A, thendet (AT ) = det (A).

Proof. If σ is a permutation of degree n,

(AT )i(σi) = A(σi)i.

Hence

det (AT ) =∑σ

(sgn σ)A(σ1)1 · · ·A(σn)n.

When i = σ−1j, A(σi)i = Aj(σ−1j). Thus

A(σ1)1 · · ·A(σn)n = A1(σ−11) · · ·An(σ−1n).

Since σσ−1 is the identity permutation,

6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 73

(sgn σ)(sgn σ−1) = 1, so sgn (σ−1) = sgn (σ).

Furthermore, as σ varies over all permutations of degree n, so does σ−1.Therefore,

det (AT ) =∑σ

(sgn σ−1)A1(σ−11) · · ·An(σ−1n) = det (A).

6.3.4 Elementary Row Operations

If B is obtained from A by adding a multiple of one row of A to another (ora multiple of one column to another), then det(A) = det(B). If B = cA,then det (B) = cndet (A).

6.3.5 Triangular Block Form

Suppose an n× n matrix A is given in block form

A =

(B C0 E

),

where B is an r × r matrix, E is an s× s matrix, C is r × s, and 0 denotesthe s× r zero matrix. Then

det

(B C0 E

)= (det B)(det E). (6.15)

Proof. To prove this, define

D(B,C,E) = det

(B C0 E

).

If we fix B and C, then D is alternating and s-linear as a function of therows of E. Hence by Theorem 6.2.2

D(B,C,E) = (det E)D(B,C, I),


where I is the s× s identity matrix. By subtracting multiples of the rows ofI from the rows of B and using the result of 6.3.4, we obtain

D(B,C, I) = D(B, 0, I).

Now D(B, 0, I) is clearly alternating and r-linear as a function of the rowsof B. Thus

D(B, 0, I) = (det B)D(I, 0, I) = 1.

Hence

D(B,C,E) = (det E)D(B,C, I)

= (det E)D(B, 0, I)

= (det E)(det B).

By taking transposes we obtain

det

(A 0B C

)= (det A)(det C).

It is now easy to see that this result generalizes immediately to the casewhere A is in upper (or lower) block triangular form.

6.3.6 The Classical Adjoint

Since the determinant function is unique and det(A) = det(AT ), we knowthat for each fixed column index j,

det (A) =n∑i=1

(−1)i+jAijdet A(i|j), (6.16)

and for each row index i,

det (A) =n∑j=1

(−1)i+jAijdet A(i|j). (6.17)


As we mentioned earlier, the formulas of Eqs. 6.16 and 6.17 are known asthe Laplace expansion of the determinant in terms of columns, respectively,rows. Later we will present a more general version of the Laplace expansion.

The scalar (−1)i+jdet A(i|j) is usually called the i, j cofactor of A or thecofactor of the i, j entry of A. The above formulas for det (A) are called theexpansion of det (A) by cofactors of the jth column (sometimes the expansionby minors of the jth column), or respectively, the expansion of det (A) bycofactors of the ith row (sometimes the expansion by minors of the ith row).If we set

Cij = (−1)i+jdet A(i|j),

then the formula in Eq. 6.16 says that for each j,

det (A) =n∑i=1

AijCij,

where the cofactor Cij is (−1)i+j times the determinant of the (n−1)×(n−1)matrix obtained by deleting the ith row and jth column of A.

Similarly, for each fixed row index i,

det (A) =n∑j=1

AijCij.

If j 6= k, then

n∑i=1

AikCij = 0.

To see this, replace the jth column of A by its kth column, and call theresulting matrix B. Then B has two equal columns and so det(B) = 0.


Since B(i|j) = A(i|j), we have

0 = det (B)

=n∑i=1

(−1)i+jBijdet (B(i|j))

=n∑i=1

(−1)i+jAikdet (A(i|j))

=n∑i=1

AikCij.

These properties of the cofactors can be summarized by

n∑i=1

AikCij = δjkdet (A). (6.18)

The n×n matrix adj A , which is the transpose of the matrix of cofactorsof A is called the classical adjoint of A. Thus

(adj A)ij = Cji = (−1)i+jdet (A(j|i)). (6.19)

These last two formulas can be summarized in the matrix equation

(adj A)A = (det (A))I. (6.20)

(To see this, just compute the (j, k) entry on both sides of this equation.)

We wish to see that A(adj A) = (det A)I also. Since AT (i|j) = (A(j|i))T ,we have

(−1)i+jdet (AT (i|j)) = (−1)i+jdet (A(j|i)),

which simply says that the i, j cofactor of AT is the j, i cofactor of A. Thus

adj (AT ) = (adj A)T . (6.21)

Applying Eq. 6.20 to AT , we have

(adj AT )AT = (det (AT ))I = (det A)I.

Transposing, we obtain


A(adj AT )T = (det (A))I.

Using Eq. 6.21 we have what we want:

A(adj A) = (det (A))I. (6.22)

An almost immediate corollary of the previous paragraphs is the following:

Theorem 6.3.7. Let A be an n × n matrix over K. Then A is invertibleover K if and only if det (A) is invertible in K. When A is invertible, theunique inverse for A is

A−1 = (det A)−1adj A.

In particular, an n × n matrix over a field is invertible if and only if itsdeterminant is different from zero.

NOTE: This determinant criterion for invertibility proves that an n× nmatrix with either a left or right inverse is invertible.

NOTE: The reader should think about the consequences of Theorem 6.3.7in case K is the ring F [x] of polynomials over a field F , or in case K is thering of rational integers.

6.3.8 Characteristic Polynomial of a Linear Map

If P is also an n× n invertible matrix, then because K is commutative anddet is multiplicative, it is immediate that

det (P−1AP ) = det (A). (6.23)

This means that if K is actually a field, if V is an n-dimensional vectorspace over K, if T : V → V is any linear map, and if B is any basis of V ,then we may unambiguously define the characteristic polynomial cT (x) of Tto be

cT (x) = det(xI − [T ]B).

This is because if A and B are two matrices that represent the samelinear transformation with respect to some bases of V , then by Eq. 6.23 andTheorem 4.7.1


det(xI − A) = det(xI −B).

Given a square matrix A, a principal submatrix of A is a square submatrixcentered about the diagonal of A. So a 1 × 1 principal submatrix is simplya diagonal element of A.

Theorem 6.3.9. Let K be a commutative ring with 1, and let A be an n×nmatrix over K. The characteristic polynomial of A is given by

f(x) = det(xI − A) =n∑i=0

cixn−i (6.24)

where c0 = 1, and for 1 ≤ i ≤ n, ci =∑det(B), where B ranges over all the

i× i principal submatrices of −A.

For an n× n matrix A, the trace of A is defined to be

tr(A) =n∑i=1

Aii.

Note: Putting i = 1 yields the fact that the coefficient of xn−1 is−∑n

i=1Aii = −tr(A), and putting i = n says that the constant term is(−1)ndet(A).

Proof. Clearly det(xI − A) is a polynomial of degree n which is monic, i.e.,c0 = 1, and and with constant term det(−A) = (−1)ndet(A). Suppose1 ≤ i ≤ n − 1 and consider the coefficient ci of xn−i in the polynomialdet(xI − A). Recall that in general, if D = (dij) is an n × n matrix over acommutative ring with 1, then

det(D) =∑π∈Sn

(−1)sgn(π) · d1,π(1)d2,π(2) · · · dn,π(n).

So to get a term of degree n − i in det(xI − A) =∑

π∈Sn(−1)sgn(π)(xI −A)1,π(1) · · · (xI−A)n,π(n) we first select n− i indices j1, . . . , jn−i, with comple-mentary indices k1, . . . , ki. Then in expanding the product (xI−A)1,π(1) · · · (xI−A)n,π(n) when π fixes j1, . . . , jn−i, we select the term x from the factors(xI−A)j1,j1 , . . . , (xI−A)jn−ijn−i , and the terms (−A)k1,π(k1), . . . , (−A)ki,π(ki)

otherwise. So if A(k1, . . . , ki) is the principal submatrix of A indexed by rows


and columns k1, . . . , ki, then det(−A(k1, . . . , ki)) is the associated contribu-tion to the coefficient of xn−i. It follows that ci =

∑det(B) where B ranges

over all the principal i× i submatrices of −A.

Suppose the permutation π ∈ Sn consists of k permutation cycles of sizesl1, . . . , lk, respectively, where

∑li = n. Then sgn(π) can be computed by

sgn(π) = (−1)l1−1+l2−1+···lk−1 = (−1)n−k = (−1)n(−1)k.

We record this formally as:

sgn(π) = (−1)n(−1)k if π ∈ Sn is the product of k disjoint cycles. (6.25)

6.3.10 The Cayley-Hamilton Theorem

In this section we let K be an integral domain (i.e., K is a commutative ringwith 1 such that a · b = 0 if and only if either a = 0 or b = 0) and let λ be anindeterminate over K, so that the polynomial ring K[λ] is also an integraldomain. If B is any n × n matrix over an appropriate ring, |B| denotes itsdeterminant. Let A be an n× n matrix over K and let

f(λ) = |λI − A| = λn + cn−1λn−1 + · · ·+ c1λ+ c0 =

n∑i=0

ciλi

be the characteristic polynomial of A. (Note that we put cn = 1.) TheCayley-Hamilton Theorem asserts that f(A) = 0. The proof we give actuallyallows us to give some other interesting results also. The classical adjointof the matrix A (also known as the adjugate of A), will be denoted by Aadj.Recall that (Aadj)ij = (−1)i+j det(A(j|i)).

Since λI − A has entries from an integral domain, it also has a classicaladjoint (i.e., an adjugate), and(

(λI − A)adj)ij

= (−1)i+j det ((λI − A)(j|i)) .

Since the (i, j) entry of (λI − A)adj is a polynomial in λ of degree at mostn− 1, it follows that (λI − A)adj must itself be a polynomial in λ of degreeat most n− 1 and whose coefficients are n× n matrices over K. Say

(λI − A)adj =n−1∑i=0

Diλi


for some n× n matrices D0, . . . , Dn−1.

At this point note that if λ = 0 we have: (−A)adj = (−1)n−1Aadj = D0.

On the one hand we know that

(λI − A)(λI − A)adj = det(λI − A) · I = f(λ) · I.

On the other hand we have

(λI−A)

(n−1∑i=0

Diλi

)= −AD0 +

n−1∑i=1

(Di−1−ADi)λi+Dn−1λ

n =n∑i=0

ci ·I ·λi.

If we define Dn = D−1 = 0, we can write

n∑i=0

(Di−1 − ADi)λi =

n∑i=0

ci · I · λi.

So Di−1−ADi = ci·I, 0 ≤ i ≤ n, which implies that AjDj−1−Aj+1Dj = cjAj.

Thus we find

f(A) =n∑j=0

cjAj =

n∑j=0

Aj(Dj−1 − ADj) =

= (D−1 − AD0) + (AD0 − A2D1) + (A2D1 − A3D2) +

+ · · ·+ (An−1Dn−2 − AnDn−1) + (AnDn−1 − An+1Dn)

= D−1 − An+1Dn = 0.

This proves the Cayley-Hamilton theorem.

Multiply Dj−1−AjDj = cjI on the left by Aj−1 to get Aj−1D−1−AjDj =cjA

j−1, and then sum for 1 ≤ j ≤ n. This gives

n∑j=1

cjAj−1 =

n∑j=1

(Aj−1Dj−1 − AjDj) = D0 − AnDn = D0,

so(−1)n−1Aadj = D0 = An−1 + cn−1A

n−2 + · · ·+ c1I.

Put g(λ) = f(λ)−f(0)λ−0

= λn−1 + cn−1λn−2 + · · · + c1 to obtain g(A) =

(−1)n−1Aadj, or

Aadj = (−1)n−1g(A). (6.26)


This shows that the adjugate of a matrix is a polynomial in that matrix.Then if det(A) is a unit in K, so A is invertible, we see how to view A−1 asa polynomial in A.

Now use the equations Dj−1 − ADj = cjI (with D−1 = Dn = 0, cn = 1)in a slightly different manner.

Dn−1 = I

Dn−2 = ADn−1 + cn−1I = A+ cn−1I

Dn−3 = ADn−2 + cn−2I = A2 + cn−1A+ cn−2I

Dn−4 = An−3 + cn−3I = A3 + cn−1A2 + cn−2A+ cn−3I

...

Dn−j = Aj−1 + cn−1Aj−2 + cn−2A

j−3 + · · ·+ cn−j+2A+ cn−j+1I...

Dj = An−j−1 + cn−1An−j−2 + cn−2A

n−j−3 + · · ·+ cj+2A+ cj+1I...

D2 = AD3 + c3I = An−3 + cn−1An−4 + cn−2A

n−5 + · · ·+ c4A+ c3I

D1 = AD2 + c2I = An−2 + cn−1An−3 + cn−2A

n−4 + · · ·+ c3A+ c2I

D0 = AD1 + c1I = An−1 + cn−1An−2 + cn−2A

n−3 + · · ·+ c2A+ c1I

Substituting these values for Di into (λI − A)adj =∑n−1

i=0 Diλi and col-

lecting coefficients on fixed powers of A we obtain

(λI − A)adj = An−1 + (λ+ cn−1)An−2 + (λ2 + cn−1λ+ cn−2)An−3

+ (λ3 + cn−1λ2 + cn−2λ+ cn−3)An−4

... (6.27)

+ (λn−2 + cn−1λn−3 + cn−2λ

n−4 + · · ·+ c3λ+ c2)A

+ (λn−1 + cn−1λn−2 + cn−2λ

n−3 + cn−3λn−4 + · · ·+ c2λ+ c1) · A0

6.3.11 The Companion Matrix of a Polynomial

In this section K is a field and f(x) = xn +an−1xn−1 + · · ·+a1x+a0 ∈ K[x].

Define the companion matrix C(f(x)) by


C(f(x)) =

0 0 · · · 0 0 −a0

1 0 · · · 0 0 −a1

0 1 · · · 0 0 −a2...

.... . .

......

...0 0 · · · 1 0 −an−2

0 0 · · · 0 1 −an−1

.

The main facts about C(f(x)) are in the next result.

Theorem 6.3.12.det (xIn − C(f(x))) = f(x)

is both the minimal and characteristic polynomial of C(f(x)).

Proof. First we establish that f(x) = det(xIn−C(f(x))). This result is clearif n = 1 and we proceed by induction. Suppose that n > 1 and compute thedeterminant by cofactor expansion along the first row, applying the inductionhypothesis to the first summand.

det(xIn − C(f(x))) = det

x 0 · · · 0 0 a0

−1 x · · · 0 0 a1

0 −1 · · · 0 0 a2...

.... . .

......

...0 0 · · · −1 x an−2

0 0 · · · 0 −1 x+ an−1

= x det

x · · · 0 0 a1

−1 · · · 0 0 a2...

. . ....

......

0 · · · −1 x an−2

0 · · · 0 −1 x+ an−1

+a0(−1)n+1det

−1 x · · · 0 00 −1 · · · 0 0...

.... . .

......

0 0 · · · −1 x0 0 · · · 0 −1


= x(xn−1 + an−1xn−2 + · · ·+ a1) + a0(−1)n+1(−1)n−1

= xn + an−1xn−1 + · · ·+ a1x+ a0 = f(x).

This shows that f(x) is the characteristic polynomial of C(f(x)).Now let T be the linear operator on Kn whose matrix with respect to the

standard basis S = (e1, e2, . . . , en) is C(f(x)). Then Te1 = e2, T 2e1 = Te2 =e3, . . . , T je1 = T (ej) = ej+1 for 1 ≤ j ≤ n − 1, and Ten = −a0e1 − a1e2 −· · · − an−1en, so

(T n + an−1Tn−1 + · · ·+ a1T + a0I)e1 = 0.

Also

(T n + · · ·+ a1T + a0I)ej+1 = (T n + · · ·+ a1T + a0I)T je1

= T j(T n + · · ·+ a1T + a0I)e1 = 0.

It follows that f(T ) must be the zero operator. On the other hand,(e1, T e1, . . . , T

n−1e1) is a linearly independent list, so that no nonzero poly-nomial in T with degree less than n can be the zero operator. Then sincef(x) is monic it must be that f(x) is also the minimal polynomial for T andhence for C(f(x)).

6.3.13 The Cayley-Hamilton Theorem: A Second Proof

Let dim(V ) = n and let T ∈ L(V ). If f is the characteristic polynomial forT , then f(T ) = 0. This is equivalent to saying that the minimal polynomialfor T divides the characteristic polynomial for T .

Proof. This proof is an illuminating and fairly sophisticated application ofthe general theory of determinants developed above.

Let K be the commutative ring with identity consisting of all polynomialsin T . Actually, K is a commutative algebra with identity over the scalarfield F . Choose a basis B = (v1, . . . , vn) for V and let A be the matrix whichrepresents T in the given basis. Then

T (vj) =n∑i=1

Aijvi, 1 ≤ j ≤ n.

These equations may be written in the equivalent form


n∑i=1

(δijT − AijI)vi = ~0, 1 ≤ j ≤ n.

Let B ∈Mn(K) be the matrix with entries

Bij = δijT − AjiI.

Note the interchanging of the i and j in the subscript on A. Also, keep inmind that each element of B is a polynomial in T . Let f(x) = det (xI − A) =det(xI − AT

). Then f(T ) = det(B). (This is an element of K. Think about

this equation until it seems absolutely obvious.) Our goal is to show thatf(T ) = 0. In order that f(T ) be the zero operator, it is necessary andsufficient that det(B)(vk) = ~0 for 1 ≤ k ≤ n. By the definition of B, thevectors v1, . . . , vn satisfy the equations

~0 =n∑i=1

(δijT − AijI)(vi) =n∑i=1

Bjivi, 1 ≤ j ≤ n. (6.28)

Let B be the classical adjoint of B, so that BB = BB = det(B)I. Notethat B also has entries that are polynomials in the operator T . Let Bkj

operate on the right side of Eq. 6.28 to obtain

~0 = Bkj

n∑i=1

Bji(vi) =n∑i=1

(bkjBji)(vi).

So summing over j we have

~0 =n∑i=1

(n∑j=1

BkjBji

)(vi) =

n∑i=1

(δkidet(B)) (vi) = det(B)(vk).

At this point we know that each irreducible factor of the minimal poly-nomial of T is also a factor of the characteristic polynomial of T . A converseis also true: Each irreducible factor of the characteristic polynomial of T isalso a factor of the minimal polynomial of T . However, we are not yet readyto give a proof of this fact.


6.3.14 Cramer’s Rule

We now discuss Cramer’s rule for solving systems of linear equations. Sup-pose A is an n× n matrix over the field F and we wish to solve the systemof linear equations AX = Y for some given n-tuple (y1, . . . , yn). If AX = Y ,then

(adj A)AX = (adj A)Y

implying(det A)X = (adj A)Y.

Thus computing the j-th row on each side we have

(det A)xj =n∑i=1

(adj A)jiyi

=n∑i=1

(−1)i+jyidet A(i|j).

This last expression is the determinant of the n × n matrix obtained byreplacing the jth column of A by Y . If det A = 0, all this tells us nothing.But if det A 6= 0, we have Cramer’s rule:

Let A be an n × n matrix over the field F such that det A 6= 0. Ify1, . . . , yn are any scalars in F , the unique solution X = A−1Y of the systemof equations AX = Y is given by

xj =det Bj

det A, j = 1, . . . , n,

where Bj is the n× n matrix obtained from A by replacing the jth columnof A by Y .

6.3.15 Comparing ST with TS

We begin this subsection with a discussion (in terms of linear operators) thatis somewhat involved and not nearly as elegant as that given at the end. Butwe find the techniques introduced first to be worthy of study. Then at theend of the subsection we give a better result with a simpler proof.

Theorem 6.3.16 (a) Let V be finite dimensional over F and supposeS, T ∈ L(V ). Then ST and TS have the same eigenvalues, but not necessarilythe same minimal polynomial.


Proof. Let λ be an eigenvalue of ST with nonzero eigenvector v. ThenTS(T (v)) = T (ST (v)) = T (λv) = λT (v). This says that if T (v) 6= 0,then λ is an eigenvalue of TS with eigenvector T (v). However, it might bethe case that T (v) = 0. In that case T is not invertible, so TS is not invert-ible. Hence it is not one-to-one. This implies that λ = 0 is an eigenvalueof TS. So each eigenvalue of ST is an eigenvalue of TS. By a symmetricargument, each eigenvalue of TS is an eigenvalue of ST .

To show that ST and TS might not have the same minimal polynomial,consider the matrices

S =

(0 10 0

); T =

(0 00 1

).

Then

ST =

(0 10 0

)and TS =

(0 00 0

).

The minimal polynomial of ST is x2 and that of TS is x.

Additional Comment 1: The above proof (and example) show that themultiplicities of 0 as a root of the minimal polynomials of ST and TS may bedifferent. This raises the question as to whether the multiplicities of nonzeroeigenvalues as roots of the minimal polynomials of ST and TS could bedifferent. However, this cannot happen. We sketch a proof of this. Supposethe minimal polynomial of ST has the factor (x−λ)r for some positive integerr and some nonzero eigenvalue λ. This means that on the one hand, for eachgeneralized eigenvector w 6= 0 associated with λ, (ST −λI)r(w) = 0, and onthe other hand, there is a nonzero generalized eigenvector v associated withλ such that (ST − λI)r−1(v) 6= 0. (See the beginning of Chapter 10.)

Step 1. First prove (say by induction on m) that T (ST − λI)m = (TS −λI)mT and (interchanging the roles of S and T ) S(TS−λI)m = (ST−λI)mS.This is for all positive integers m.

Step 2. Show ST (v) 6= 0, as follows:Assume that ST (v) = 0. Then

0 = (ST − λI)r(v)

= (ST − λI)r−1(ST − λI)(v)

= (ST − λI)r−1(0− λv)

= −λ(ST − λI)r−1(v) 6= 0,


a clear contradiction.

Note that this also means that T (v) 6= 0.Step 3. From the above we have 0 = T (ST − λI)r(v) = (TS − λI)rT (v),

so T (v) is a generalized eigenvector of TS associated with λ.Step 4. Show that (TS − λI)r−1T (v) 6= 0. This will imply that (x− λ)r

divides the minimum polynomial of TS. Then interchanging the roles of Sand T will show that if (x−λ)r divides the minimum polynomial of TS, it alsodivides the minimum polynomial of ST . Hence for each nonzero eigenvalueλ, it has the same multiplicity as a factor in the minimum polynomial of TSas it has in the minimum polynomial of ST . The details are:

Suppose that 0 = (TS − λI)r−1T (v), then

0 = S(0) = S(TS − λI)r−1T (v)

= (ST − λI)r−1ST (v)

But then

0 = (ST − λI)r(v) = (ST − λI)r−1(ST − λI)(v)

= (ST − λI)r−1(ST (v)− λv)

= 0− λ(ST − λI)r−1(v) 6= 0,

a clear contradiction.This completes the proof.

Additional Comment 2: It would be easy to show that the two charac-teristic polynomials are equal if any two n× n matrices were simultaneouslyupper triangularizable. It is true that if two such matrices commute then(over an algebraically closed field) they are indeed simultaneously upper tri-angularizable. However, consider the following example:

S =

(1 00 0

), and T =

(1 11 0

).

Suppose there were an invertible matrix P =

(a bc d

)for which both

PSP−1 and PTP−1 were upper triangular. Let ∆ = detP = ad − bc 6= 0.Then

PSP−1 =1

∆

(ad −abcd −bc

),


which is upper triangular if and only if cd = 0.

PTP−1 =1

∆

(ad+ bd− ac −ab− b2 + a2

cd+ d2 − c2 cd+ d2 − c2

),

which is upper triangular if and only if d2 + cd− c2 = 0. If c = 0 this forcesd = 0, giving ∆ = 0, a contradiction. If d = 0, then c = 0.

Additional Comment 3: The preceding Comment certainly raises thequestion as to whether or not the characteristic polynomial of ST can bedifferent from that of TS. In fact, they must always be equal, and thereis a proof of this that introduces a valuable technique. Suppose that Z isthe usual ring of integers. Suppose that Y = (yij) and Z = (zij) are twon×n matrices of indeterminates over the integers. Let D = Z[yij, zij], where1 ≤ i, j ≤ n. So D is the ring of polynomials over Z in 2n2 commutingbut algebraically independent indeterminates. Then let L = Q(yij, zij) bethe field of fractions of the integral domain D. Clearly the characteristicpolynomials of Y Z and of ZY are the same whether Y and Z are viewedas being over D or L. But over L both Y and Z are invertible, so Y Z =Y (ZY )Y −1 and ZY are similar. Let x be an additional indeterminate so thatthe characteristic polynomial of ZY is

det(xI − ZY ) = det(Y ) · det(xI − ZY ) · det(Y −1) = det(xI − Y Z),

which is then also the characteristic polynomial of Y Z. Hence ZY and Y Zhave the same characteristic polynomial.

Now let R be a commutative ring with 1 6= 0. (You may take R tobe a field, but this is not necessary.) Also let S = (sij) and T = (tij) ben× n matrices over R. There is a homomorphism θ : D[x] = Z[yij, zij][x]→R[x] that maps yij 7→ sij and zij 7→ tij for all i, j, and that maps x tox. This homomorphism maps the characteristic polynomial of Y Z to thecharacteristic polynomial of ST and the characteristic polynomial of ZYto that of TS. It follows that ST and TS have the same characteristicpolynomial since ZY and Y Z do.

Exercise: If 0 has multiplicity at most 1 as a root of the characteristicpolynomial of ST , then ST and TS have the same minimal polynomial.

Now suppose that R is a commutative ring with 1, that A and B arematrices over R with A being m × n and B being n ×m, with m ≤ n. Let

6.4. DEEPER RESULTS WITH SOME APPLICATIONS∗ 89

fAB, fBA be the characteristic polynomials of AB and BA, respectively, overR.

Theorem 6.3.15(b) fBA = xn−mfAB.

Proof. For square matrices of size m+n we have the following two identities:

(AB 0B 0

)(I A0 I

)=

(AB ABAB BA

)(I A0 I

)(0 0B BA

)=

(AB ABAB BA

).

Since the (m+ n)× (m+ n) block matrix(I A0 I

)is nonsingular (all its eigenvalues are +1), it follows that(

I A0 I

)−1(AB 0B 0

)(I A0 I

)=

(0 0B BA

).

Hence the two (m+ n)× (m+ n) matrices

C1 =

(AB 0B 0

)and C2 =

(0 0B BA

)are similar.

The eigenvalues of C1 are the eigenvalues of AB together with n zeros.The eigenvalues of C2 are the eigenvalues of BA together with m zeros. SinceC1 and C2 are similar, they have exactly the same eigenvalues, includingmultiplicities, the theorem follows.

6.4 Deeper Results with Some Applications∗

The remainder of this chapter may be omitted without loss of continuity.

In general we continue to let K be a commutative ring with 1. HereMatn(K) denotes the ring of n × n matrices over K, and |A| denotes theelement of K that is the determinant of A.


6.4.1 Block Matrices whose Blocks Commute∗

We can regard a k × k matrix M = (A(i,j)) over Matn(K) as a block matrix,a matrix that has been partitioned into k2 submatrices (blocks) over K, eachof size n × n. When M is regarded in this way, we denote its determinantin K by |M |. We use the symbol D(M) for the determinant of M viewed asa k × k matrix over Matn(K). It is important to realize that D(M) is ann× n matrix.

Theorem. Assume that M is a k× k block matrix of n×n blocksA(i,j) over K that pairwise commute. Then

|M | = |D(M)| =

∣∣∣∣∣∑σ∈Sk

(sgn σ)A(1,σ(1))A(2,σ(2)) · · ·A(k,σ(k))

∣∣∣∣∣ . (6.29)

Here Sk is the symmetric group on k symbols, so the summation is theusual one that appears in a formula for the determinant. The first proofof this result to come to our attention is the one in N. Jacobson, Lecturesin Abstract algebra, Vol. III – Theory of Fields and Galois Theory, D. VanNostrand Co., Inc., 1964, pp 67 – 70. The proof we give now is from I.Kovacs, D. S. Silver, and Susan G. Williams, Determinants of Commuting-Block Matrices, Amer. Math. Monthly, Vol. 106, Number 10, December1999, pp. 950 – 952.

Proof. We use induction on k. The case k = 1 is evident. We suppose thatEq. 6.27 is true for k− 1 and then prove it for k. Observe that the followingmatrix equation holds:

I 0 · · · 0

−A(2,1) I · · · 0...

... · · · ...−A(k,1) 0 · · · I

I · · · 00 A(1,1) · · · 0...

.... . .

...0 0 · · · A(1,1)

M =

A(1,1) ∗ ∗ ∗

0... N0

,

where N is a (k− 1)× (k− 1) matrix. To simplify the notation we write thisas

PQM = R, (6.30)


where the symbols are defined appropriately. By the multiplicative propertyof determinants we have D(PQM) = D(P )D(Q)D(M) = (A(1,1))k−1D(M)and D(R) = A(1,1)D(N). Hence we have (A(1,1))k−1D(M) = A(1,1)D(N).Take the determinant of both sides of the last equation. Since |D(N)| = |N |by the induction hypothesis, and using PQM = R, we find

|A(1,1)|k−1|D(M)| = |A(1,1)||D(N)| = |A(1,1)||N |= |R| = |P ||Q||M | = |A(1,1)|k−1|M |.

If |A(1,1)| is neither zero nor a zero divisor, then we can cancel |A(1,1)|k−1

from both sides to get Eq. 6.29.

For the general case, we embed K in the polynomial ringK[z], where zis an indeterminate, and replace A(1,1) with the matrix zI + A(1,1). Sincethe determinant of zI + A(1,1) is a monic polynomial of degree n, and henceis neither zero nor a zero divisor, Eq. 6.27 holds again. Substituting z =0 (equivalently, equating constant terms of both sides) yields the desiredresult.

6.4.2 Tensor Products of Matrices∗

Definition: Let A = (aij) ∈ Mm1,n1(K), and let B ∈ Bm2,n2(K). Thenthe tensor product or Kronecker product of A and B, denoted A ⊗ B ∈Mm1m2,n1n2(K), is the partitioned matrix

A⊗B =

a11B a12B · · · a1n1Ba21B a22B · · · a2n1B

......

. . ....

am11B am12B · · · am1n1B

. (6.31)

It is clear that Im ⊗ In = Imn.

Lemma Let A1 ∈ Mm1,n1(K), A2 ∈ Mn1,r1(K), B1 ∈ Mm2,n2(K), andB2 ∈Mn2,r2(K). Then

(A1 ⊗B1)(A2 ⊗B2) = (A1A2)⊗ (B1B2). (6.32)


Proof. Using block multiplication, we see that the (i, j) block of (A1A2) ⊗(B1B2) is

n1∑k=1

((A1)ikB1) ((A2)kjB2) =

=

(n1∑k=1

(A1)ik(A2)kj

)B1B2,

which is also seen to be the (i, j) block of (A1A2)⊗ (B1B2).

Corollary Let A ∈Mm(K) and B ∈Mn(K). Then

A⊗B = (A⊗ In)(Im ⊗B). (6.33)

and

|A⊗B| = |A|n|B|m. (6.34)

Proof. Eq. 6.33 is an easy consequence of Eq. 6.32. Then prove Eq. 6.34 asfollows. Use the fact that the determinant function is multiplicative. ClearlyIm ⊗ B is a block diagonal matrix with m blocks along the diagonal, eachequal to B, so det(Im ⊗ B) = (det(B))m. The determinant det(A ⊗ In) isa little trickier. The matrix A ⊗ In is a block matrix whose (i, j)-th blockis aijIn, 1 ≤ i, j ≤ m. Since each two of the blocks commute, we can usethe theorem that says that we can first compute the determinant as thoughit were an m × m matrix of elements from a commutative ring of n × nmatrices, and then compute the determinant of this n × n matrix. Hencedet(A⊗ I) = det(det(A) · In) = (det(A))n, and the proof is complete.

6.4.3 The Cauchy-Binet Theorem-A Special Version∗

The main ingredient in the proof of the Matrix-Tree theorem (see the nextsection) is the following theorem known as the Cauchy-Binet Theorem. Itis more commonly stated and applied with the diagonal matrix 4 belowtaken to be the identity matrix. However, the generality given here actuallysimplifies the proof.

Theorem 6.4.4. Let A and B be, respectively, r ×m and m × r matrices,with r ≤ m. Let 4 be the m×m diagonal matrix with entry ei in the (i, i)-position. For an r-subset S of [m], let AS and BS denote, respectively, the


r× r submatrices of A and B consisting of the columns of A, or the rows ofB, indexed by the elements of S. Then

det(A4B) =∑S

det(AS)det(BS)∏i∈S

ei,

where the sum is over all r-subsets S of [m].

Proof. We prove the theorem assuming that e1, . . . , em are independent (com-muting) indeterminates over F . Of course it will then hold for all values ofe1, . . . , em in F .

Recall that if C = (cij) is any r × r matrix over F , then

det(C) =∑σ∈Sr

sgn(σ)c1σ(1)c2σ(2) · · · crσ(r).

Given thatA = (aij) andB = (bij), the (i,j)-entry ofA4B is∑m

k=1 aikekbkj,and this is a linear form in the indeterminates e1, . . . , em. Hence det(A4B) isa homogeneous polynomial of degree r in e1, . . . , em. Suppose that det(A4B)has a monomial et11 e

t22 . . . where the number of indeterminates ei that have

ti > 0 is less than r. Substitute 0 for the indeterminates ei that do notappear in et11 e

t22 . . ., i.e., that have ti = 0. This will not affect the monomial

et11 et22 . . . or its coefficient in det(A4 B). But after this substitution 4 has

rank less than r, so A4 B has rank less than r, implying that det(A4 B)must be the zero polynomial. Hence we see that the coefficient of a monomialin the polynomial det(A4 B) is zero unless that monomial is the productof r distinct indeterminates ei, i.e., unless it is of the form

∏i∈S ei for some

r-subset S of [m].The coefficient of a monomial

∏i∈S ei in det(A4 B) is found by setting

ei = 1 for i ∈ S, and ei = 0 for i 6∈ S. When this substitution is made in4, A4B evaluates to ASB

S. So the coefficient of∏

i∈S ei in det(A4B) isdet(AS)det(BS).

Exercise 6.4.4.1. Let M be an n×n matrix all of whose linesums are zero.Then one of the eigenvalues of M is λ1 = 0. Let λ2, . . . , λn be the othereigenvalues of M . Show that all principal n − 1 by n − 1 submatrices havethe same determinant and that this value is 1

nλ2λ3 · · ·λn.

Sketch of Proof: First note that since all line sums are equal to zero, theentries of the matrix are completely determined by the entries of M in the


first n − 1 rows and first n − 1 columns, and that the entry in the (n, n)position is the sum of all (n− 1)2 entries in the first n− 1 rows and columns.

Clearly λ1 = 0 is an eigenvalue of A. Observe the appearance of the(n− 1)× (n− 1) subdeterminant obtained by deleting the bottom row andright hand column. Then consider the principal subdeterminant obtained bydeleting row j and column j, 1 ≤ j ≤ n− 1, from the original matrix M . Inthis (n− 1)× (n− 1) matrix, add the first n− 2 columns to the last one, andthen add the first n− 2 rows to the last one. Now multiply the last columnand the last row by -1. This leaves a matrix that could have been obtainedfrom the original upper (n − 1) × (n − 1) submatrix by moving its jth rowand column to the last positions. So it has the same determinant.

Now note that the coefficient of x in the characteristic polynomial f(x) =det(xI−A) is (−1)n−1λ2λ3 · · ·λn, since λ1 = 0, and it is also (−1)n−1

∑det(B),

where the sum is over all principal subdeterminants of order n− 1, which bythe previous paragraph all have the same value. Hence det(B) = 1

nλ2λ3 · · ·λn,

for any principal subdeterminant det(B) of order n− 1.

6.4.5 The Matrix-Tree Theorem∗

The “matrix-tree” theorem expresses the number of spanning trees in a graphas the determinant of an appropriate matrix, from which we obtain one moreproof of Cayley’s theorem counting labeled trees.

An incidence matrix N of a directed graph H is a matrix whose rowsare indexed by the vertices V of H, whose columns are indexed by the edgesE of H, and whose entries are defined by:

N(x, e) =

0 if x is not incident with e, or e is a loop,1 if x is the head of e,−1 if x is the tail of e.

Lemma 6.4.6. If H has k components, then rank(N) = |V | − k.

Proof. N has v = |V | rows. The rank of N is v−n, where n is the dimensionof the left null space of N , i.e., the dimension of the space of row vectors g forwhich gN = 0. But if e is any edge, directed from x to y, then gN = 0 if andonly if g(x) − g(y) = 0. Hence gN = 0 iff g is constant on each componentof H, which says that n is the number k of components of H.


Lemma 6.4.7. Let A be a square matrix that has at most two nonzero entriesin each column, at most one 1 in each column, at most one -1 in each column,and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or -1.

Proof. This follows by induction on the number of rows. If every columnhas both a 1 and a -1, then the sum of all the rows is zero, so the matrix issingular and det(A) = 0. Otherwise, expand the determinant by a columnwith one nonzero entry, to find that it is equal to ±1 times the determinantof a smaller matrix with the same property.

Corollary 6.4.8. Every square submatrix of an incidence matrix of a di-rected graph has determinant 0 or ±1. (Such a matrix is called totallyunimodular.)

Theorem 6.4.9. (The Matrix-Tree Theorem) The number of spanning treesin a connected graph G on n vertices and without loops is the determinant ofany n− 1 by n− 1 principal submatrix of the matrix D −A, where A is theadjacency matrix of G and D is the diagonal matrix whose diagonal containsthe degrees of the corresponding vertices of G.

Proof. First let H be a connected digraph with n vertices and with incidencematrix N . H must have at least n − 1 edges, because it is connected andmust have a spanning tree, so we may let S be a set of n−1 edges. Using thenotation of the Cauchy-Binet Theorem, consider the n by n − 1 submatrixNS of N whose columns are indexed by elements of S. By Lemma 6.4.6, NS

has rank n−1 iff the spanning subgraph of H with S as edge set is connected,i.e., iff S is the edge set of a tree in H. Let N ′ be obtained by dropping anysingle row of the incidence matrix N . Since the sum of all rows of N (or ofNS) is zero, the rank of N ′S is the same as the rank of NS. Hence we havethe following:

det(N ′S) =

{±1 if S is the edge set of a spanning tree in H,0 otherwise.

(6.35)

Now let G be a connected loopless graph on n vertices. Let H be anydigraph obtained by orienting G, and let N be an incidence matrix of H.Then we claim NNT = D − A. For,


(NNT )xy =∑

e∈E(G) N(x, e)N(y, e)

=

{deg(x) if x = y,−t if x and y are joined by t edges in G.

An n − 1 by n − 1 principal submatrix of D − A is of the form N ′N ′T

where N ′ is obtained from N by dropping any one row. By Cauchy-Binet,

det(N ′N ′T ) =∑S

det(N ′S) det(N ′TS ) =∑S

(det(N ′S))2,

where the sum is over all n− 1 subsets S of the edge set. By Eq. 6.33 this isthe number of spanning trees of G.

Exercise 6.4.9.1. (Cayley’s Theorem) In the Matrix-Tree Theorem, take Gto be the complete graph Kn. Here the matrix D − A is nI − J , where I isthe identity matrix of order n, and J is the n by n matrix of all 1’s. Nowcalculate the determinant of any n − 1 by n − 1 principal submatrix of thismatrix to obtain another proof that Kn has nn−2 spanning trees.

Exercise 6.4.9.2. In the statement of the Matrix-Tree Theorem it is notnecessary to use principal subdeterminants. If the n − 1 × n − 1 submatrixM is obtained by deleting the ith row and jth column from D − A, thenthe number of spanning trees is (−1)i+j det(M). This follows from the moregeneral lemma: If A is an n − 1 × n matrix whose row sums are all equalto 0 and if Aj is obtained by deleting the jth column of A, 1 ≤ j ≤ n, thendet(Aj) = − det(Aj+1).

6.4.10 The Cauchy-Binet Theorem - A General Version∗

Let 1 ≤ p ≤ m ∈ Z, and let Qp,m denote the set of all sequences α =(i1, i2, · · · , ip) of p integers with 1 ≤ i1 < i2 < · · · < ip ≤ m. Note that|Qp,m| =

(mp

).

Let K be a commutative ring with 1, and let A ∈Mm,n(K). If α ∈ Qp,m

and β ∈ Qj,n, let A[α|β] denote denote the submatrix of A consisting ofthe elements whose row index is in α and whose column index is in β. Ifα ∈ Qp,m, then there is a complementary sequence α ∈ Qm−p,m consisting ofthe list of exactly those positive integers between 1 and m that are not in α,and the list is in increasing order.

Theorem 6.4.11. Let A ∈ Mm,n(K) and B ∈ Mn,p(K). Assume that 1 ≤t ≤ min{m,n, p} and let α ∈ Qt,m, β ∈ Qt,p. Then


det(AB[α|β]) =∑γ∈Qt,n

det(A[α|γ]) · det(B[γ|β]).

Proof. Suppose that α = (α1, . . . , αt), β = (β1, . . . , βt), and let C = AB[α|β].Then

Cij =n∑k=1

aαikbkβj .

So we have

C =

∑n

k=1 aα1kbkβ1 · · ·∑n

k=1 bkβt...

. . ....∑n

k=1 aαtkbkβ1 · · ·∑n

k=1 aαtkbkβt

.

To calculate the determinant of C we start by using the n-linearity in thefirst row, then the second, row, etc.

det(C) =n∑

k1=1

aα1k1 · det

bk1β1 · · · bk1βt∑n

k=1 aα2kbkβ1 · · ·∑n

k=1 aα2kbkβt... · · · ...

∑nk=1 aα2kbkβt∑n

k=1 aαtkbkβ1 · · ·∑n

k=1 aαtkbkβt

=

=n∑

k1=1

· · ·n∑

kt=1

aα1k1 · · · aαtkt · det

bk1β1 · · · bk1βt...

...bktβ1 · · · bktβt

. (6.36)

If ki = kj for i 6= j, then

det

bk1β1 · · · bk1βt...


= 0.

Then the only possible nonzero determinant occurs when (k1, . . . , kt) isa permutation of a sequence γ = (γ1, . . . , γt) ∈ Qt,n. Let σ ∈ St be thepermutation of {1, 2, . . . , t} such that γi = kσ(i) for 1 ≤ i ≤ t. Then


det

bk1β1 · · · bk1βt...


sgn(σ) det(B[γ|β]). (6.37)

Given a fixed γ ∈ Qt,n, all possible permutations of γ are included in thesummation in Eq. 6.36. Therefore Eq. 6.36 may be rewritten , using Eq.6.37, as

det(C) =∑γ∈Qt,n

(∑σ∈St

sgn(σ)aα1γσ(1)· · · aαtγσ(t)

)det(B[γ|β]),

which is the desired formula.

The Cauchy-Binet formula gives another verification of the fact thatdet(AB) = det(A) · det(B) for square matrices A and B.

6.4.12 The General Laplace Expansion∗

For γ = (γ1, . . . , γt) ∈ Qt,n, put s(γ) =∑t

j=1 γj. Theorem: Let A ∈Mn(K)and let α ∈ Qt,n (1 ≤ t ≤ n) be given. Then

det(A) =∑γ∈Qt,n

(−1)s(α)+s(γ) det(A[α|γ]) · det(A[α|γ]). (6.38)

Proof. For A ∈Mn(K), define

Dα(A) =∑γ∈Qt,n

(−1)s(α)+s(γ) det(A[α|γ]) · det(A[α|γ]). (6.39)

Then Dα : Mn(K) → K is easily shown to be n-linear as a function onthe columns of A. To complete the proof, it is only necessary to show thatDα is alternating and that Dα(In) = 1. Thus, suppose that the columns of Alabeled p and q, p < q, are equal. If p and q are both in γ ∈ Qt,n, then A[α|γ]will have two columns equal and hence have zero determinant. Similarly, ifp and q are both in γ ∈ Qn−t,n, then det(A[α|γ) = 0. Thus in the evaluationof Dα(A) it is only necessary to consider those γ ∈ Qt,n such that p ∈ γ andq ∈ γ, or vice versa. So suppose p ∈ γ, q ∈ γ, and defind a new sequenceγ′ in Qt,n by replacing p ∈ γ by q. Thus γ′ agrees with γ except that q hasbeen replaced by p. Thus


s(γ′)− s(γ) = q − p. (6.40)

(Note that s(γ)− p and s(γ′)− q are both the sum of all the things in γexcept for p.) Now consider the sum

(−1)s(γ) det(A[α|γ]) det(A[α|γ]) + (−1)s(γ′) det(A[α|γ′]) det(A[α|γ′]),

which we denote by S(A). We claim that this sum is 0. Assuming this,since γ and γ′ appear in pairs in Qt,n, it follows that Dα(A) = 0 whenevertwo columns of A agree, forcing Dα to be alternating. So now we show thatS(A) = 0.

Suppose that p = γk and q = γl. Then γ and γ′ agree except in the rangefrom p to q, as do γ and γ′. This includes a total of q− p+ 1 entries. If r ofthese entries are included in γ, then

γ1 < · · · < γk = p < γk+1 < · · · < γk+r−1 < q < γk+r < · · · < γt

and

A[α|γ′] = A[α|γ]Pw−1 ,

where w is the r-cycle (k + r − 1, k + r − 2, . . . , k). Similarly,

A[α|γ′] = A[α|γ]Pw′

where w′ is a (q − p+ 1− r)-cycle. Thus,

(−1)s(γ′) det(A[α|γ′]) det(A[α|γ′]) =

= (−1)s(γ′)+(r−1)+(q−p)−r det(A[α|γ]) det(A[α|γ]).

Since s(γ′) + (q − p) − 1 − s(γ) = 2(q − p) − 1 is odd, we conclude thatS(A) = 0. Thus Dα is n-linear and alternating. It is routine to check thatDα(In) = 1, completing the proof.

Applying this formula for det(A) to det(AT ) gives the Laplace expansionin terms of columns.


6.4.13 Determinants, Ranks and Linear Equations∗

If K is a commutative ring with 1 and A ∈ Mm,n(K), and if 1 ≤ t ≤min{m,n}, then a t × t minor of A is the determinant of any submatrixA[α|β] where α ∈ Qt,m, β ∈ Qt,n. The determinantal rank of A, denotedD-rank(A), is the largest t such that there is a nonzero t× t minor of A.

With the same notation,

Ft(A) = 〈{detA[α|β] : α ∈ Qt,m, β ∈ Qt,n}〉 ⊆ K.

That is, Ft(A) is the ideal of K generated by all the t×t minors of A. Put

F0(A) = K and Ft(A) = 0 if t > min{m,n}. Ft(A) is called the tth-Fittingideal of A. The Laplace expansion of determinants along a row or columnshows that Ft+1(A) ⊆ Ft(A). Thus there is a decreasing chain of ideals

K = F0(A) ⊇ F1(A) ⊇ F2(A) ⊇ · · · .

Definition: If K is a PID, then Ft(A) is a principal ideal, say Ft(A) =〈dt(A)〉 where dt(A) is the greatest common divisor of all the t× t minors of

A. In this case, a generator of Ft(A) is called the tth-determinantal divisorof A.

Definition If A ∈ Mm,n(K), then the M -rank(A) is defined to be thelargest t such that {0} = Ann(Ft(A)) = {k ∈ K : kd = 0 for all d ∈ Ft(A)}.

Obs. 6.4.14. 1. M-rank(A) = 0 means that Ann(F1(a)) 6= {0}. Thatis, there is a nonzero a ∈ K with a · aij = 0 for all entries aij of A.Note that this is stronger than saying that every element of A is a zerodivisor. For example, if A = (2 3) ∈ M1,2(Z6), then every element ofA is a zero divisor in Z6, but there is no single nonzero element of Z6

that annihilates both entries in the matrix.

2. If A ∈ Mn(K), then M-rank(A) = n means that det(A) is not a zerodivisor of K.

3. To say that M-rank(A) = t means that there is an a 6= 0 ∈ K witha ·D = 0 for all (t+1)×(t+1) minors D of A, but there is no nonzerob ∈ K which annihilates all t × t minors of A by multiplication. Inparticular, if det(A[α|β]) is not a zero divisor of K for some α ∈ Qs,m,β ∈ Qs,n, then M-rank(A) ≥ s.


Lemma 6.4.15. If A ∈Mm,n(K), then

0 ≤M − rank(A) ≤ D-rank (A) ≤ min{m,n}.

Proof. Routine exercise.

We can now give a criterion for solvability of the homogeneous linearequation AX = 0, where A ∈ Mm,n(K). This equation always has thetrivial solution X = 0, so we want a criterion for the existence of a solutionX 6= 0 ∈Mn,1(K).

Theorem 6.4.16. Let K be a commutative ring with 1 and let A ∈Mm,n(K).The matrix equation AX = 0 has a nontrivial solution X 6= 0 ∈ Mn,1(K) ifand only if

M − rank(A) < n.

Proof. Suppose that M -rank(A) = t < n. then Ann(Ft+1(A)) 6= {0}, sochoose b 6= 0 ∈ K with b ·Ft+1(A) = {0}. Without loss of generality, we mayassume that t < m, since, if necessary, we may replace the system AX = 0with an equivalent one (i.e., one with the same solutions) by adding somerows of zeros to the bottom of A. If t = 0, then baij = 0 for all aij and wemay take

X =

b...b

.

Then X 6= 0 ∈Mn,1(K) and AX = 0.

So suppose that t > 0. Then b 6∈ Ann(Ft(A)) = {0}, so b ·det(A[ζ|β]) 6= 0for some α ∈ Qt,m, β ∈ Qt,n. By permuting rows and columns, whichdoes note affect whether AX = 0 has a nontrivial solution, we can assumeα = (1, . . . , t) = β. For 1 ≤ i ≤ t+ 1 let βi = (1, 2, . . . , i, . . . , t+ 1) ∈ Qt,t+1,where i indicates that i is deleted. Let di = (−1)t+1+i det(A[α|βi]). Thusd1, . . . , dt+1 are the cofactors of the matrix

A1 = A[(1, . . . , t+ 1)|(1, . . . , t+ 1)]

obtained by deleting row t + 1 and column i. Hence the Laplace expansiongives


{ ∑t+1j=1 aijdj = 0, if 1 ≤ i ≤ t,∑t+1j=i aijdj = det(A[(1, . . . , t, i)|(1, . . . , t, t+ 1)]), if t < i ≤ m.

(6.41)

Let X =

x1...xn

, where

{xi = bdi, if 1 ≤ i ≤ t+ 1,xi = 0, if t+ 2 ≤ i ≤ n.

Then X 6= 0 since xt+1 = b · det(A[α|β]) 6= 0. But Eq. 6.39 and the factthat b ∈ Ann(Ft+1(A)) show that

AX =

b∑t+1

j=1 a1jdj...

b∑t+1

j=1 amjdj

=

0...0

b det(A[(1, . . . , t, t+ 1)|(1, . . . , t, t+ 1)])...

b det(A[(1, . . . , t,m)|(1, . . . , t, t+ 1)])

= 0.

Thus X is a nontrivial solution to the equation AX = 0.

Conversely, assume that X 6= 0 ∈ Mn,1(K) is a nontrivial solution toAX = 0, and choose k with xk 6= 0. We claim that Ann(Fn(A)) 6= {0}. Ifn > m, then Fn(A) = {0}, and hence Ann(Fn(A)) = K 6= (0). Thus wemay assume that n ≤ m. Let α = (1, . . . , n) and for each β ∈ Qn,m, letBβ = A[α|β]. Then since AX = 0 and since each row of Bβ is a full row ofA, we conclude that Bβ = 0. The adjoint matrix formula (Eq. 6.20) thenshows that

(det(Bβ))X = (Adj Bβ)BβX = 0,

from which we conclude that xk det(Bβ) = 0. Since β ∈ Qn,m is arbitrary,we conclude that xk · Fn(A) = 0, i.e., xk ∈ Ann(Fn(A)). But xk 6= 0, soAnn(Fn(A)) 6= {0}, and we conclude that M -rank(A) < n, completing theproof.


In case K is an integral domain we may replace the M -rank by the ordi-nary determinantal rank to conclude the following:

Corollary 6.4.17. If K is an integral domain and A ∈ Mm,n(K), thenAX = 0 has a nontrivial solution if and only if D-rank(A) < n.

Proof. If I ⊆ K, then Ann(I) 6= {0} if and only if I = {0} since an integraldomain has no nonzero zero divisors. Therefore, in an integral domain D-rank(A) = M -rank(A).

The results for n equations are in n unknowns are even simpler.

Corollary 6.4.18. Let K be a commutative ring with 1.

1. If A ∈ Mn(K), then AX = 0 has a nontrivial solution if and only ifdet(A) is a zero divisor of K.

2. If K is an integral domain and A ∈ Mn(K), then AX = 0 has anontrivial solution if and only if det (A) = 0.

Proof. If A ∈ Mn(K), then Fn(A) = 〈det(A)〉, so M -rank(A) < n if andonly if det(A) is a zero divisor. In particular, if K is an integral domain thenM -rank(A) < n if and only if det(A) = 0.

There are still two other concepts of rank which can be defined for ma-trices with entries in a commutative ring.

Definition Let K be a commutative ring with 1 and let A ∈ Mm,n(K).Then we will define the row rank of A, denoted by row-rank(A), to be themaximum number of linearly independent rows in A, while the column rankof A, denoted col-rank(A), is the maximum number of linearly independentcolumns.

Corollary 6.4.19. Let K be a commutative ring with 1.

1. If A ∈ Am,n(K), then

max{row-rank(A), col-rank(A) ≤M − rank(A) ≤ D-rank(A).

2. If K is an integral domain, then

row-rank(A) = col-rank(A) = M − rank(A) = D-rank(A).


Proof. We sketch the proof of the result when K is an integral domain. Infact, it is possible to embed K in its field F of quotients and do all thealgebra in F . Recall the following from an undergraduate linear algebracourse. The proofs work over any field, even if you did only consider themover the real numbers. Let A be an m × n matrix with entries in F . Rowreduce A until arriving at a matrix R in row-reduced echelon form. The firstthing to remember here is that the row space of A and the row space of Rare the same. Similarly, the right null space of A and the right null space ofR are the same. (Warning: the column space of A and that of R usually arenot the same!) So the leading (i.e., leftmost) nonzero entry in each nonzerorow of R is a 1, called a leading 1. Any column with a leading 1 has that1 as its only nonzero entry. The nonzero rows of R form a basis for therow space of A, so the number r of them is the row-rank of A. The (right)null space of R (and hence of A) has a basis of size n − r. Also, one basisof the column space of A is obtained by taking the set of columns of A inthe positions now indicated by the columns of R in which there are leading1’s. So the column rank of A is also r. This has the interesting corollarythat if any r independent columns of A are selected, there must be somer rows of those columns that are linearly independent, so there is an r × rsubmatrix with rank r. Hence this submatrix has determinant different from0. Conversely, if some r×r submatrix has determinant different from 0, thenthe “short” columns of the submatrix must be independent, so the “long”columns of A to which they belong must also be independent. It is now clearthat the row-rank, column-rank, M-rank and determinantal rank of A are allthe same.

Obs. 6.4.20. Since all four ranks of A are the same when K is an integraldomain, in this case we may speak unambiguously of the rank of A, denotedrank(A). Moreover, the condition that K be an integral domain is trulynecessary, as the following example shows.

Define A ∈M4(Z210) by

A =

0 2 3 52 0 6 03 0 3 00 0 0 7

.

It is an interesting exercise to show the following:

1. row-rank(A) = 1.


2. col-rank(A) = 2.

3. M-rank(A) = 3.

4. D-rank(A) = 4.

Theorem 6.4.21. Let K be a commutative ring with 1, let M be a finitelygenerated K-module, and let S ⊆M be a subset. If |S| > µ(M) = rank(M),then S is K-linearly dependent.

Proof. Let µ(M) = m and let T = {w1, . . . , wm} be a generating set for Mconsisting of m elements. Choose n distinct elements {v1, . . . , vn} of S forsome n > m, which is possible by hypothesis. Since M = 〈w1, . . . , wm〉, wemay write

vj =m∑i=1

aijwi, with aij ∈ K.

Let A = (aij) ∈ Mm,n(K). Since n > m, it follows that M -rank(A)≤ m < n, so Theorem 6.4.16 shows that there is an X 6= 0 ∈ Mn,1(K) suchthat AX = 0. Then

n∑j=1

xjvj =n∑j=1

xj

(m∑i=1

aijwi

)

=m∑i=1

(n∑j=1

aijxj

)wi (6.42)

= 0, since AX = 0.

Therefore, S is K-linearly dependent.

Corollary 6.4.22. Let K be a commutative ring with 1, let M be a K-module, and let N ⊆M be a free submodule. Then rank(N) ≤ rank(M).

Proof. If rank(M) =∞, there is nothing to prove, so assume that rank(M)= m < ∞. If rank(N) > m, then there is a linearly independent subsetof M , namely a basis of N , with more than m elements, which contradictsTheorem 6.4.21.

Theorem 6.4.23. If A ∈Mm,n(K) and B ∈Mn,p(K), then

D-rank(AB) ≤ min{D-rank(A),D-rank(B)}. (6.43)


Proof. Let t > min{D-rank(A),D-rank(B)} and suppose that α ∈ Qt,m,β ∈ Qt,p. Then by the Cauchy-Binet formula

det(AB[α|β]) =∑γ∈Qt,n

det(A[α|γ]) det(B[γ|β]).

Since t > min {D-rank (A),D-rank(B)}, at least one of the determinantsdet(A[α|β]) or det(B[γ|β]) must be 0 for each γ ∈ Qt,n. Thus det(AB[α|β]) =0, and since α and β are arbitrary, it follows that D-rank(AB) < t, asrequired.

The preceding theorem has a useful corollary.

Corollary 6.4.24. Let A ∈ Mm,n(K), U ∈ GL(m,K), V ∈ GL(n,K).Then

D-rank(UAV ) = D-rank(A).

Proof. Any matrix B ∈ Mm,n(K) satisfies D-rank(B) ≤min{m,n}. SinceD-rank(U) = m and D-rank(V ) = n, it follows from Eq. 6.41 that

D-rank(UAV ) ≤ min{D-rank(A), n,m} = D-rank(A)

and

D-rank(A) = D-rank(U−1(UAV )V −1) ≤ D-rank(UAV ).

This completes the proof.

6.5 Exercises

Except where otherwise noted, the matrices in these exercises have entriesfrom a commutative ring K with 1.

1. If A =

∗ ∗ ∗0 0 ∗0 0 ∗

, show that no matter how the five elements ∗

might be replaced by elements of K it is always the case that det(A) =0.

6.5. EXERCISES 107

2. Suppose that A is n× n with n odd and that AT = −A. Show that:

(i) If 2 6= 0 in K, then det(A) = 0.

(ii)∗ If 2 = 0 but we assume that each diagonal entry of A equals 0,then det(A) = 0.

3. Let A be an m × n complex matrix and let B be an n × m complexmatrix. Also, for any positive integer p, Ip denotes the p × p identitymatrix. So AB is m×m, while BA is n× n.

Show that the complex number λ 6= 1 is an eigenvalue of Im − AB ifand only if λ is an eigenvalue of In −BA.

4. (Vandermonde Determinant)

Let t1, . . . , tn be commuting indeterminates over K, and let A be then×n matrix whose entries are from the commutative ring K[t1, . . . , tn]defined by

A =

1 t1 t21 · · · tn−1

1

1 t2 t22 · · · tn−12

......

......

...1 tn t2n · · · tn−1

n

.

Then

det A =∏

1≤j<i≤n

(ti − tj).

5. Define the following determinants of matrices whose elements comefrom the ring F [x]:


δ1 = −1

δ2 =

∣∣∣∣ 0 −1−1 x

∣∣∣∣δ3 =

∣∣∣∣∣∣0 −1 00 x −1−1 0 x

∣∣∣∣∣∣δ4 =

∣∣∣∣∣∣∣∣0 −1 0 00 x −1 00 0 x −1−1 0 0 x

∣∣∣∣∣∣∣∣Continue in this fashion. δn is the determinant of an n×n matrix eachof whose entries is either 0 or -1 or x according to the following rule.The diagonal entries are all equal to x except for the first diagonalentry which is 0. Each entry along the super diagonal, (i.e., just abovethe main diagonal ) equals -1, as does the entry in the lower left-handcorner. All other entries are 0. Evaluate δn.

6. Suppose that A is square and singular and A~x = ~b is consistent. Showthat Aadj~b = ~0.

7. Define the following determinants:

40 = 1.

41 = |a| = a (a 1× 1 determinant).

42 =

∣∣∣∣ a bb a

∣∣∣∣ = a2 − b2.

43 =

∣∣∣∣∣∣a b 0b a b0 b a

∣∣∣∣∣∣

6.5. EXERCISES 109

In general, 4n is the determinant of an n×n matrix with each diagonalentry equal to a and each entry just above or just below the maindiagonal equal to b, and all other entries equal to 0.

(i) Show that 4n+2 = a4n+1 − b24n.

(ii) Show that 4n =∑

i(−1)i(n−ii

)an−2ib2i.

8. Show that if A is a row vector and B is a column vector then A⊗B =B ⊗ A.

For an m× n matrix A let Ai denote its ith row and Aj denote its jthcolumn.

9. Show that (A⊗B)T = AT ⊗BT .

10. If A and B are invertible, then so is A⊗B, and (A⊗B)−1 = A−1⊗B−1.

11. Suppose that A ∈ M(m,m) and B ∈ M(n,n). Show that there is anmn×mn permutation matrix P for which P (A⊗B)P−1 = B ⊗ A.

12. Let A and B be as in Exercise 11. Show that det(A⊗ B) = det(A)n ·det(B)m.

13. Let Eij be the m× n matrix over F with a 1 in the (i, j) position and0 elsewhere; let Fij be the k × l matrix over F with a 1 in the (i, j)position and 0 elsewhere; let Gij be the mk × nl matrix over F with a1 in the (i, j) position and 0 elsewhere. Then for (d, g) and (h, e) with1 ≤ d ≤ m, 1 ≤ g ≤ n, 1 ≤ h ≤ k, 1 ≤ e ≤ l, we have

Edg ⊗ Fke = G(d−1)k+h,(g−1)l+e.

Show that M(mk,nl)∼= M(m,n) ⊗M(k,l).

—————————–

For the next exercise we need to define a function Vec from the setof m × n matrices into Rmn as follows. If A has columns ~a1, . . . , ~an,each in Rm, then Vec(A) is the column vector obtained by putting the


columns of A in one long column starting with ~c1 and proceeding to~cn. As an example,

Vec

(1 2 34 5 6

)=

142536

.

14. Let M be m× n and N be n× p. Then we have

(i) Vec(MN) = (Ip ⊗M)Vec(N).

(ii)Vec(MN) = (NT ⊗ Im)Vec(M).

(iii) If A is m× n, X is n× p and B is p× q, then

Vec(AXB) = (BT ⊗ A)Vec(X).

(iv) If all products are defined,

A1XB1+A2XB2 = C ⇐⇒((BT

1 ⊗ A1) + (BT2 ⊗ A2)

)Vec(X) = Vec(C).

15. Let F be any field. For positive integers m, n, let Sn = (e1, . . . , en),Sm = (h1, . . . , hm) be the standard ordered bases of F n and Fm, re-spectively. For A ∈ Mm,n(F ) there is the linear map TA ∈ L(F n, Fm)defined by TA : x 7→ Ax. Moreover, we showed that A is the uniquematrix [TA]Sm,Sn for which [TA(x)]Sm = [TA]Sm,Sn · [x]Sn for all x ∈ F n.The map A 7→ TA is an isomorphism from Mm,n(F ) onto L(F n, Fm).Let Eij denote the m × n matrix with 1 in positiion (i, j) and zeroelsewhere. Then

Eij · ek = δjk · hi. (6.44)

Define fij ∈ L(F n, Fm) by fij = TEji . So

fij : ek 7→ Eji · ek = δik · hj, 1 ≤ i, k ≤ n; 1 ≤ j ≤ m. (6.45)

Let B1 = (v1, v2, . . . , vmn) be the standard ordered basis of Fmn, andlet B2 be the ordered basis of L(F n, Fm) given by

B2 = (f11, f21, . . . , fn1, f12, f22, . . . , fn2, . . . , fnm).

6.5. EXERCISES 111

Prove the following:

(i) Vec(Eji) = v(i−1)m+j, 1 ≤ i ≤ n, 1 ≤ j ≤ m. Moreover, Vec:Mm,n(F )→ Fmn is a vector space isomorphism.

(ii) If A ∈ Mm,n(F ), then TA ∈ L(F n, Fm), so TA is some linear com-bination of the elements of B2. Hence it has a coordinate matrix withrespect to B2. In fact,

[TA]B2 = Vec(AT ).

Let B ∈Mm,m(F ), C ∈Mn,n. Define TB,C : Mmn, (F )→Mm,n(F ) by

TB,C : A 7→ BAC.

Recall that Vec(BAC) = CT ⊗ B · Vec(A). So if we interpret TB,C asan element of L(Fmn, Fmn), we have TB,C : Vec(A) 7→ CT ⊗B ·Vec(A),i.e., TB,C = TCT⊗B. So we have

TB,C : Fmn → Fmn : Vec(A) 7→ CT ⊗B · Vec(A).

(iii) [TB,C ]B1,B1 = CT ⊗B.

Define gij : Fmn → Fmn : vk 7→ δik · vj, 1 ≤ i, j ≤ mn. Put G = (gij)and B3 = (Vec(G))T = (g11, g21, . . . , gmn,1, . . . , g1,mn, . . . , gmn,mn). Then

(iv) [TB,C ]B3 = [TCT⊗B]B3 = Vec((C ⊗BT ).

Chapter 7

Operators and InvariantSubspaces

Throughout this chapter F will be an arbitrary field unless otherwise re-stricted, and V will denote an arbitrary vectors space over F . Much of ourwork will require that V be finite dimensional, but we shall be general aslong as possible. Given an operator T ∈ L(V ), our main interest will be infinding T -invariant subspaces U1, . . . , Ur such that V = U1 ⊕ U2 ⊕ · · · ⊕ Ur.Since each Ui is invariant under T , we may consider the restriction Ti = T |Uiof T to the subspace Ui. Then we write T = T1⊕· · ·⊕Tr. It is usually easierto analyze T by writing it as the sum of the operators Ti and then analyzingthe operators Ti on the smaller subspaces Ui.

7.1 Eigenvalues and Eigenvectors

Let T ∈ L(V ). Then {~0}, V, null(T ), and Im(T ) are invariant under T .However, often these are not especially interesting as invariant subspaces,and we want to begin with 1-dimensional T -invariant subspaces.

Suppose that U is a 1-dimensional T -invariant subspace. Then thereexists some nonzero vector u ∈ U . Since T (u) ∈ U , there is some scalara ∈ F such that T (u) = au. By hypothesis (u) is a basis for U . If bu is anyvector in U , then T (bu) = bT (u) = b(au) = a(bu). Hence for each v ∈ U ,T (v) = av. If a ∈ F satisfies the property that there is some nonzero vectorv ∈ V such that T (v) = av, then a is called an eigenvalue of T . A vectorv ∈ V is called an eigenvector of T belonging to the eigenvalue λ provided

113

114 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES

T (v) = λv, i.e., (T − λI)(v) = ~0. Note that the eigenvalue λ could be thezero scalar, but it is an eigenvalue if and only if there is a nonzero eigenvectorbelonging to it. For this reason many authors restrict all eigenvectors to benonzero. However, it is convenient to include ~0 in the set of eigenvectorsbelonging to any particular eigenvalue so that the set of all eigenvectorsbelonging to some eigenvalue is a subspace, in fact a T -invariant subspace.

We have the following:

Obs. 7.1.1. If λ is an eigenvalue of T , then null(T −λI) is the T -invariantsubspace of V consisting of all eigenvectors belonging to λ.

Consider the example T ∈ L(F 2) defined by T (y, z) = (2z,−y). ThenT (y, z) = λ(y, z) if and only if 2z = λy and −y = λz, so 2z = λ(−λz).If (y, z) 6= (0, 0), then λ2 = −2. If F = R, for example, then T has noeigenvalue. However, if F is algebraically closed, for example, then T hastwo eigenvalues ±λ where λ is one of the two solutions to λ2 = −2. IfF = Z5, then −2 = 3 is a non-square in F , so T has no eigenvalues. But ifF = Z11, then λ = ±3 satisfies λ2 = −2 in F .

Nonzero eigenvectors belonging to distinct eigenvalues are linearly inde-pendent.

Theorem 7.1.2. Let T ∈ L(V ) and suppose that λ1, . . . , λm are distincteigenvalues of T with corresponding nonzero eigenvectors v1, . . . , vm. Thenthe list (v1, . . . , vm) is linearly independent.

Proof. Suppose (v1, . . . , vm) is linearly dependent. By the Linear DependenceLemma we may let j be the smallest positive integer for which (v1, . . . , vj)is linearly dependent, so that vj ∈ span(v1, . . . , vk−1). So there are scalarsa1, . . . , aj−1 for which

vj = a1v1 + · · ·+ aj−1vj−1. (7.1)

Apply T to both sides of this equation to obtain

λjvj = a1λ1v1 + a2λ2v2 + · · ·+ aj−1λj−1vj−1.

Multiply both sides of Eq. 7.1 by λj and subtract the equation above fromit. This gives

~0 = a1(λj − λ1)v1 + · · ·+ aj−1(λj − λj−1)vj−1.

7.2. UPPER-TRIANGULAR MATRICES 115

Because j was chosen to be the smallest integer for which

vj ∈ span(v1, . . . , vj−1)

we now have that (v1, . . . , vj−1) is linearly independent. Since the λ’s wereall distinct, this means that a1 = · · · = aj−1 = 0, implying that vj = ~0 (byEq. 7.1), contradicting our hypothesis that all the vi’s are nonzero. Henceour assumption that (v1, . . . , vj) is linearly dependent must be false.

Since each linearly independent set of a finite dimensional space has nomore elements than the dimension of that space, we obtain the followingresult.

Corollary 7.1.3. If dim(V ) = n <∞, then for any T ∈ L(V ), T can neverhave more than n distinct eigenvalues.

7.2 Upper-Triangular Matrices

In Chapter 5 we applied polynomials to elements of some linear algebra overF . In particular, if T ∈ L(V ) and f, g, h ∈ F [x] with f = gh, then byTheorem 5.2.5 we know that f(T ) = g(T )h(T ).

Theorem 7.2.1. Let V be a nonzero, finite dimensional vector space over thealgebraically closed field F . Then each operator T on V has an eigenvalue.

Proof. Suppose dim(V ) = n > 0 and choose a nonzero vector v ∈ V . LetT ∈ L(V ). Then the set

(v, T (v), T 2(v), . . . , T n(v))

of n + 1 vectors in an n-dimensional space cannot be linearly independent.So there must be scalars, not all zero, such that ~0 = a0v+a1T (v)+a2T

2(v)+· · ·+anT

n(v). Let m be the largest index such that am 6= 0. Since v 6= ~0, thecoefficients a1, . . . , an cannot all be 0, so 0 < m ≤ n. Use the a′s to constructa polynomial which can be written in factored form as

a0 + a1z + a2z2 + · · ·+ amz

m = c(z − λ1)(z − λ2) · · · (z − λm),


where c ∈ F is nonzero, each λj ∈ F , and the equation holds for all z ∈ F .We then have

~0 = a0v + a1T (v) + · · ·+ amTm(v)

= (a0I + a1T + · · · amTm)(v) (7.2)

= c(T − λ1I)(T − λ2I) · · · (T − λmI)(v).

If (T − λmI)(v) = ~0, then λm must be an eigenvalue. If it is not the zerovector, consider (T − λm−1I)(T − λmI)(v). If this is zero, then λm−1 mustbe an eigenvalue. Proceeding this way, we see that the first λj (i.e., with thelargest j) for which (T − λjI)(T − λj+1I) · · · (T − λmI)(v) = ~0 must be aneigenvalue.

Theorem 7.2.2. Suppose T ∈ L(V ) and B = (v1, . . . , vn) is a basis of V .Then the following are equivalent:

(i) [T ]B is upper triangular.(ii) T (vk) ∈ span(v1, . . . , vk) for each k = 1, . . . , n.(iii) The span(v1, . . . , vk) is T -invariant for each k = 1, . . . , n.

Proof. By now the proof of this result should be clear to the reader.

Theorem 7.2.3. Let F be an algebraically closed field, let V be an n-dimensional vector space over F with n ≥ 1, and let T ∈ L(V ). Thenthere is a basis B for V such that [T ]B is upper triangular.

Proof. We use induction on the dimension of V . Clearly the theorem is trueif n = 1. So suppose that n > 1 and that the theorem holds for all vectorspaces over F whose diminsion is a positive integer less than n. Let λ be anyeigenvalue of T (which we know must exist by Theorem 7.2.1). Let

U = Im(T − λI).

Because T −λI is not injective, it is also not surjective, so dim(U) < n =dim(V ). If u ∈ U , then

T (u) = (T − λI)(u) + λu.

Obviously (T − λI)(u) ∈ U (from the definition of U) and λu ∈ U . Thusthe equation above shows that T (u) ∈ U , hence U is T -invariant. ThusT |U ∈ L(U). By our induction hypothesis, there is a basis (u1, . . . , um) of U


with respect to which T |U has an upper triangular matrix. Thus for each jwe have (using Theorem 7.2.2)

T (uj) = (T |U)(uj) ∈ span(u1, . . . , uj). (7.3)

Extend (u1, . . . , um) to a basis (u1, . . . , um, v1, . . . , vr) of V . For each k,1 ≤ k ≤ r, we have

T (vk) = (T − λI)(vk) + λvk.

The definition of U shows that (T − λI)(vk) ∈ U = span(u1, . . . , um).Clearly

T (vk) ∈ span(u1, . . . , um, v1, . . . , vk).

It is now clear that that T has an upper triangular matrix with respectto the basis (u1, . . . , um, v1, . . . , vr).

Obs. 7.2.4. Suppose T ∈ L(V ) has an upper triangular matrix with respectto some basis B of V . Then T is invertible if and only if all the entries onthe diagonal of that upper triangular matrix are nonzero.

Proof. We know that T is invertible if and only if [T ]B is invertible, whichby Result 6.3.2 and Theorem 6.3.7 is invertible if and only if the diagonalentries of [T ]B are all nonzero.

Corollary 7.2.5. Suppose T ∈ L(V ) has an upper triangular matrix withrespect to some basis B of V . Then the eigenvalues of T consist precisely ofthe entries on the diagonal of [T ]B. (Note: With a little care the diagonalof the upper triangular matrix can be arranged to have all equal eigenvaluesbunched together. To see this, rework the proof of Theorem 7.2.3.)

Proof. Suppose the diagonal entries of the n×n upper triangular matrix [T ]Bare λ1, . . . , λn. Let λ ∈ F . Then [T − λI]B is upper triangular with diagonalelements equal to λ1 − λ, λ2 − λ, . . . , λn − λ. Hence T − λI is not invertibleif and only if λ equals one of the λj’s. In other words, λ is an eigenvalue ofT if and only λ equals one of the λj’s as desired.

Obs. 7.2.6. Let B = (v1, . . . , vn) be a basis for V . An operator T ∈ L(V )has a diagonal matrix diag(λ1, . . . , λn) with respect to B if and only if T (vi) =λivi, i.e., each vector in B is an eigenvector of T .


In some ways, the nicest operators are those which are diagonalizable, i.e.,those for which there is some basis with respect to which they are representedby a diagonal matrix. But this is not always the case even when the fieldF is algebraically closed. Consider the following example over the complexnumbers. Define T ∈ L(C2) by T (y, z) = (z, 0). As you should verify, if S is

the standard basis of C2, then [T ]S =

(0 10 0

), so the only eigenvalue of T

is 0. But null(T − 0I) is 1-dimensional. So clearly C2 does not have a basisconsisting of eigenvectors of T .

One of the recurring themes in linear algebra is that of obtaining condi-tions that guarantee that some operator have a diagonal matrix with respectto some basis.

Theorem 7.2.7. If dim(V ) = n and T ∈ L(V ) has exactly n distinct eigen-values, then T has a diagonal matrix with respect to some basis.

Proof. Suppose that T has distinct eigenvalues λ1, . . . , λn, and let vj be anonzero eigenvector belonging to λj, 1 ≤ j ≤ n. Because nonzero eigenvec-tors corresponding to distinct eigenvalues are linearly independent, (v1, . . . , vn)is linearly independent, and hence a basis of V . So with respect to this basisT has a diagonal matrix.

The following proposition gathers some of the necessary and sufficientconditions for an operator T to be diagonalizable.

Theorem 7.2.8. Let λ1, . . . , λm denote the distinct eigenvalues of T ∈ L(V ).Then the following are equivalent:

(i) T has a diagonal matrix with respect to some basis of V .(ii) V has a basis consisting of eigenvectors of T .(iii) There exist one-dimensional T -invariant subspaces U1, . . . , Un of V

such thatV = U1 ⊕ · · · ⊕ Un.

(iv) V = null(T − λ1I)⊕ · · · ⊕ null(T − λmI).(v) dim(V ) = dim(null(T − λ1I)) + · · ·+ dim(null(T − λmI)).

Proof. At this stage it should be clear to the reader that (i), (ii) and (iii) areequivalent and that (iv) and (v) are equivalent. At least you should thinkabout this until the equivalences are quite obvious. We now show that (ii)and (iv) are equivalent.


Suppose that V has a basis B = (v1, . . . , vn) consisting of eigenvec-tors of T . We may group together those vi’s belonging to the same eigen-value, say v1, . . . , vd1 belong to λ1 so span a subspace U1 of null(T − λ1I);vd1+1, . . . , vd1+d2 belong to λ2 so span a subspace U2 of null(T−λ2I); . . . , andthe last dm of the vi’s belong to λm and span a subspace Um of null(T−λmI).Since B spans all of V , it is clear that V = U1 + · · · + Um. Since the sub-spaces of eigenvectors belonging to distinct eigenvalues are independent (aneasy corollary of Theorem 7.1.2), it must be that V = U1 ⊕ · · · ⊕Um. Henceeach Ui is all of null(T − λiI).

Conversely, if V = U1⊕· · ·⊕Um, by joining together bases of the Ui’s weget a basis of V consisting of eigenvectors of T .

There is another handy condition that is necessary and sufficient for anoperator on a finite-dimensional vector space to be diagonalizable. Beforegiving it we need the following lemma.

Lemma 7.2.9. Let T ∈ L(V ) where V is an n-dimensional vector space overthe algebraically closed field F . Suppose that λ ∈ F is an eigenvalue of T .The λ must be a root of the minimal polynomial p(x) of T .

Proof. Let v be a nonzero eigenvector of T associated with λ. It is easyto verify that p(T )(v) = p(λ) · v. But if P (T ) is the zero operator, thenp(T )(v) = 0 = p(λ) · v implies that p(λ) = 0.

Theorem 7.2.10. Let V be an n-dimensional vector space over the alge-braically closed field F , and let T ∈ L(V ). Then T is diagonalizable if andonly if the minimal polynomial for T has no repeated root.

Proof. Suppose that T ∈ L(V ) is diagonalizable. This means that there is abasis B of V such that [T ]B is diagonal, say it is the matrix diag(λ1, . . . , λn) =A. If f(x) is any polynomial over F , then f(A) = diag(f(λ1), . . . , f(λn). Soif λ1, . . . , λr are the distinct eigenvalues of T , and if p(x) = (x − λ1)(x −λ2) · · · (x− λr), then p(A) = 0, so p(T ) = 0. Hence the minimal polynomialhas no repeated roots (since it must divide p(x)).

The converse is a bit more complicated. Suppose that

p(x) = (x− λ1) · · · (x− λr)

is the minimal polynomial of T with λ1, . . . , λr distinct. By the precedinglemma we know that all the eigenvalues of T appear among λ1, . . . , λr. Let


Wi = null(T − λiI) for 1 ≤ i ≤ r. We proved earlier that W1 + · · · + Wr =W1 ⊕ · · · ⊕Wr. What we need at this points is to show that W1 ⊕ · · · ⊕Wr

is all of V .For 1 ≤ j ≤ r put

pj(x) =p(x)

x− λj.

Note that p′(x) =∑r

j=1 pj(x), so that p′(λi) =∏

k 6=i(λi−λk). Also, pj(λi) ={0, if i 6= j;

p′(λi), if i = j.Use partial fraction technique to write

1

p(x)=

r∑i=1

aix− λi

.

The usual process shows that ai = 1p′(λi)

, so we find

1 =r∑i=1

1

p′(λi)pi(x).

For each i, 1 ≤ i ≤ r, put Pi = 1p′(λi)

pi(T ), so I = P1 + P2 + · · · + Pr. Since

p(x) = (x − λi)pi(x), (T − λi)pi(T )(v) = 0 for all v ∈ V . This means thatPi(v) ∈ Wi for all v ∈ V . Moreover, for each v ∈ V , v = P1(v)+P2(v)+ · · ·+Pr(v) ∈ W1 ⊕ · · · ⊕Wr. This says V is the direct sum of the eigenspaces ofT . Hence if we choose any basis for each Wi and take their union, we have abasis B for V consisting of eigenvectors of T , so that [T ]B is diagonal.

Corollary 7.2.11. Let T ∈ L(V ) be diagonalizable. Let U be a T -invariantsubspace of V . Then T |U is diagonalizable.

Proof. Let p(x) be the minimal polynomial for T . If u ∈ U , then p(T )(u) = 0.So p(T )|U is the zero operator on U , implying that p(x) is a multiple of theminimal polynomial for T |U . Since T is diagonalizable its minimal polynomialhas no repeated roots. Hence the minimal polynomial for T |U can have norepeated roots, implying that T |U is diagonalizable.

7.3 Invariant Subspaces of Real Vector Spaces

We have seen that if V is a finite dimensional vector space over an alge-braically closed field F then each linear operator on V has an eigenvalue in

7.3. INVARIANT SUBSPACES OF REAL VECTOR SPACES 121

F . We have also seen an example that shows that this is not the case for realvector spaces. This means that an operator on a nonzero finite dimensionalreal vector space may have no invariant subspace of dimension 1. However,we now show that an invariant subspace of dimension 1 or 2 always exists.

Theorem 7.3.1. Every operator on a finite dimensional, nonzero, real vectorspace has an invariant subspace of dimension 1 or 2.

Proof. Suppose V is a real vector space with dim(V ) = n > 0 and let T ∈L(V ). Choose v ∈ V with v 6= ~0. Since (v, T (v), . . . , T n(v)) must be linearlydependent (Why?), there are scalars a0, a1, . . . , an ∈ F such that not all theai’s are zero and

~0 = a0v + a1T (v) + · · · anT n(v).

Construct the polynomial f(x) =∑n

i=0 aixi which can be factored in the

form

f(x) = c(x− λ1) · · · (x− λr)(x2 + α1x+ β1) · · · (x2 + αkx+ βk),

where c is a nonzero real number, each λj, αj, and βj is real, r + k ≥ 1,α2j < 4βj and the equation holds for all x ∈ R. We then have

0 = a0v + a1T (v) + · · ·+ anTn(v)

= (a0I + a1T + · · ·+ anTn)(v) (7.4)

= c(T − λ1I) · · · (T − λrI)(T 2 + α1T + β1I) · · · (T 2 + αkT + βkI)(v),

which means that T−λj is not injective for at least one j or that T 2+αjT+βjIis not injective for at least one j. If T − λjI is not injective for some j, thenT has an eigenvalue and hence a one-dimensional invariant subspace. IfT 2 +αjT +βjI is not injective for some j, then we can find a nonzero vectorw for which

T 2(w) + αjT (w) + βjw = ~0. (7.5)

Using Eq. 7.5 it is easy to show that span(w, T (w)) is T -invariant andit clearly has dimension 1 or 2. If it had dimension 1, then w would be aneigenvector of T belonging to an eigenvalue λ that would have to be a rootof x2 + αjx+ βj = 0, contradicting the assumption that α2

j < 4βj. Hence Thas a 2-dimensional invariant subspace.

In fact we now have an easy proof of the following:


Theorem 7.3.2. Every operator on an odd-dimensional real vector space hasan eigenvalue.

Proof. Let V be a real vector space with dim(V ) odd and let T ∈ L(V ). Weknow that the eigenvalues of T are the roots of the characteristic polynomialof T which has real coefficients and degree equal to dim(V ). But every realpolynomial of odd degree has a real root by Lemma 1.1.2, i.e., T has a realeigenvalue.

7.4 Two Commuting Linear Operators

Let V be a finite-dimensional vector space over the field F , and let T be a

linear operator on V . Suppose T has matrix A =

(0 1−1 0

)with respect

to some basis. If i is an element in F (or in some extension of F ) for whichi2 = −1, then the eigenvalues of T are ±i. So T has eigenvectors in V if andonly if i ∈ F . For example, if F = R, then T has no eigenvectors. To avoidhaving to deal with this kind of situation we assume from now on that F isalgebraically closed, so that each polynomial that has coefficients in F splitsinto linear factors over F . In particular, any linear operator T on V will haveminimal and characteristic polynomials that split into linear factors over F .Our primary example of an algebraically closed field is the field C of complexnumbers.

Recall that the ring F [x] of polynomials in the indeterminate x withcoefficients from F is a principle ideal domain. This means that if I is anideal of F [x], it must consist of all multiples of some particular element ofF [x]. Our chief example is the following: Let T be any linear operator onV , let W be a T -invariant subspace of V , and let v be any vector in V .Put T (v,W ) = {f(x) ∈ F [x] : f(T )(v) ∈ W}. It is easy to show thatT (v,W ) is an ideal of F [x]. (This just means that the sum of any twopolynomials in T (v,W ) is also in T (v,W ), and if f(x) ∈ T (v,W ) and g(x)is any polynomial in F [x], then the product f(x)g(x) is back in T (v,W ).)Hence there is a unique monic polynomial g(x) of minimal degree in T (v,W )called the T -conductor of v into W . For this conductor g(x) it is true thatf(x) ∈ T (v,W ) if and only if there is some h(x) ∈ F [x] for which f(x) =g(x) · h(x). If W = {~0}, then g(x) is called the T -annihilator of v. Clearlythe minimal polynomial p(x) of T is in T (v,W ), so g(x) divides p(x). All

7.4. TWO COMMUTING LINEAR OPERATORS 123

these polynomials have coefficients in F , so by hypothesis they all split intolinear factors over F .

The fact that p(x) divides any polynomial q(x) for which q(T ) = 0 isquite important. Here is an example. Suppose W is a subspace invariantunder T , so the restriction T |W of T to vectors in W is a linear operator onW . Let g(x) be the minimal polynomial for T |W . Since p(T ) = 0, clearlyp(T |W ) = 0, so g(x) divides p(x).

Theorem 7.4.1. Let V be a finite-dimensional vector space over (the alge-braically closed) field F , with n = dim(V ) ≥ 1. Let S and T be commutinglinear operators on V . Then every eigenspace of T is invariant under S, andS and T have a common eigenvector in V .

Proof. Since dim(V ) ≥ 1 and F is algebraically closed, T has an eigenvectorv1 in V with associated eigenvalue c ∈ F , i.e., 0 6= v1 ∈ V and (T − cI)(v1) =0. Put W = {v ∈ V : (T − cI)(v) = 0}, so W is the eigenspace of Tassociated with the eigenvalue c.

To see that W is invariant under S let w ∈ W , so T (w) = cw. Then sinceS commutes with T , it also commutes with T − cI, and (T − cI)(S(w)) =[(T − cI)S](w) = [S(T − cI)](w) = S((T − cI)(w)) = S(0) = 0. This saysthat S(w) is in W , so S acts on W , which has dimension at least 1. Butthen S|W is a linear operator on W and must have an eigenvector w1 in W .So w1 is a common eigenvector of S and T .

Note: In the above proof we do not claim that every element of W is aneigenvector of S. Also, S and T play symmetric roles in the above proof, sothat also each eigenspace of S is invariant under T .

We need to use the concept of quotient space. Let W be a subspace ofV , where V is any vector space over a field F . For each v ∈ V , the setv+W = {v+w : w ∈ W} is called a coset of the subgroup W in the additivegroup (V,+), and these cosets form a group V/W (called a quotient group)with the following binary operation: (v1 +W ) + (v2 +W ) := (v1 + v2) +W .From Modern Algebra we know that the quotient group (V/W,+) is also anabelian group (Think about the different roles being played by the symbol“+” here.) We can make this quotient group V/W into a vector space overthe same field by defining a scalar multiplication as follows: for c ∈ F andv+W ∈ V/W , put c(v+W ) = cv+W . It is routine to show that this makes


V/W into a vector space. Moreover, if B1 = (v1, . . . , vr) is a basis for W , andB2 = (v1, . . . , vr, vr+1, . . . , vn) is a basis for V , then (vr+1 +W, . . . , vn +W ) isa basis for V/W . (Be sure to check this out!!) Hence dim(V ) = dim(W ) +dim(V/W ). Moreover, if W is invariant under the operator T ∈ L(V ), thenT induces a linear operator T on V/W as follows:

T : V/W → V/W : v +W 7→ T (v) +W.

Clearly if T, S ∈ L(V ), if W is invariant under both S and T , and ifS · T = T · S, then T · S = S · T .

We are now ready for the following theorem on two commuting operators.

Theorem 7.4.2. Let S and T be two commuting operators on V over thealgebraically closed field F . Then there is a basis B of B with respect to whichboth [T ]B and [S]B are upper triangular.

Proof. We proceed by induction on n, the dimension of V .By the previous theorem there is a vector v1 ∈ V such that Tv1 = λ1v1

and Sv1 = µ1v1 for some scalars λ1 and µ1. Let W be the subspace spannedby v1. Then the dimension of V/W is n − 1, and the operators T and Son V/W commute, so by our induction hypothesis there is a basis B1 =(v2 + W, v3 + W, . . . , vn + W ) of V/W with respect to which both T and Shave upper triangular matrices. It follows that B = (v1, v2, . . . , vn) is a basisof V with respect to which both T and S have upper triangular matrices.

Theorem 7.4.3. Let T be a diagonalizable operator on V and let S ∈ L(V ).Then ST = TS if and only if each eigenspace of T is invariant under S.

Proof. Since T is diagonalizable, there is a basis B of V consisting of eigen-vectors of T . Let W be the eigenspace of T associated with the eigenvalueλ, and let w ∈ W . If S(w) ∈ W , then (TS)(w) = T (Sw) = λSw =S(λw) = S(Tw) = ST (w). So if each eigenspace of T is invariant un-der S, S and T commute at each element of B, implying that ST = TSon all of V . Conversely, suppose that ST = TS. Then for any w ∈ W ,T (Sw) = S(Tw) = S(λw) = λS(w), implying that Sw ∈ W . So eacheigenspace of T must be invariant under S.

Note that even if T is diagonalizable and S commutes with T , it need notbe the case that S must be diagonalizable. For example, if T = I, then V isthe only eigenspace of T , and if S is any non-diagonalizable operator on V ,

7.4. TWO COMMUTING LINEAR OPERATORS 125

then S still commutes with T . However, if both T and S are known to bediagonalizable, then we can say a bit more.

Theorem 7.4.4. Let S and T both be diagonalizable operators on the n-dimensional vector space V over the field F . Then S and T commute if andonly if S and T are simultaneously diagonalizable.

Proof. First suppose that S and T are simultaneously diagonalizable. Let Bbe a basis of V with respect to which both [T ]B and [S]B are diagonal. Sincediagonal matrices commute, [T ]B and [S]B commute, implying that T and Scommute.

Conversely, suppose that T and S commute. We proceed by inductionon n. If n = 1, then any basis is equivalent to the basis consisting of anynonzero vector, and any 1 × 1 matrix is diagonal. So assume that 1 < nand the result holds over all vector spaces of dimension less than n. If T isa scalar times the identity operator, then clearly any basis that diagonalizesS will diagonalize both S and T . So suppose T is not a scalar times theidentity. Let λ be an eigenvalue of T and put W = null(T −λI). So W is theeigenspace of T associated with λ, and by hypothesis 1 ≤ dim(W ) < n. ByTheorem 7.4.3 W is invariant under S. It follows that T |W and S|W are bothdiagonalizable by Corollary 7.2.11, and hence by the induction hypothesisthe two are simultaneously diagonalizable. Let BW be a basis for W whichconsists of eigenvectors of both T |W and S|W , so they are also eigenvectorsof both T and S. Repeat this process for each eigenvalue of T and let B bethe union of all the bases of the various eigenspaces. Then the matrices [T ]Band [S]B are both diagonal.

Theorem 7.4.5. Let T be a diagonalizable operator on V , an n-dimensionalvector space over F . Let S ∈ L(V ). Then there is a polynomial f(x) ∈ F [x]such that S = f(T ) if and only if each eigenspace of T is contained in asingle eigenspace of S.

Proof. First suppose that S = f(T ) for some f(x) ∈ F [x]. If T (v) = λv,then S(v) = f(T )(v) = f(λ)v. Hence the entire eigenspace of T associatedwith λ is contained in the eigenspace of S associated with its eigenvalue f(λ).This completes the proof in one direction.

For the converse, let λ1, . . . , λr be the distinct eigenvalues of T , and letWi = null(T − λiI), the eigenspace of T associated with λi. Let µi be


the eigenvalue of S whose corresponding eigenspace contains Wi. Note thatthe values µ1, . . . , µr might not be distinct. Use Lagrange interpolation toconstruct the polynomial f(x) ∈ F [x] for which f(λi) = µi, 1 ≤ i ≤ r. Thendefine an operator S ′ ∈ L(V ) as follows. Let B be a basis of V consistingof the union of bases of the eigenspaces Wi. For each v ∈ B, say v ∈ Wi,put S ′(v) = f(T )(v) = f(λi)v = µiv = S(v). Then since S ′ and S agree ona basis of V , they must be the same operator. Hence S = f(T ). (Here wehave used the observation that if S ′(v) = f(T )(v) for each v in some basis ofV , then S ′ = f(T ).)

Corollary 7.4.6. Let T ∈ L(V ) have n distinct eigenvalues where n =dim(V ). Then the following are equivalent:(i) ST = TS.(ii) S and T are simultaneously diagonalizable.(ii) S is a polynomial in T .

Proof. Since T has n distinct eigenvalues, its minimal polynomial has norepeated factors, so T is diagonalizable. Then Theorem 7.4.3 says that ST =TS iff each eigenspace of T is invariant under S. Since each eigenspace of Tis 1-dimensional, this means that each eigenvector of T is also an eigenvectorof S. Using Theorem 7.4.5 we easily see that the theorem is completelyproved.

We note the following example: If T is any invertible operator, so the con-stant term of its minimal polynomial is not zero, it is easy to use the minimalpolynomial to write I as a polynomial in T . However, any polynomial in Iis just some constant times I. So if T is any invertible operator that is not ascalar times I, then I is a polynomial in T , but T is not a polynomial in I.

7.5 Commuting Families of Operators∗

Let V be an n-dimensional vector space over F , and let F be a family of linearoperators on V . We want to know when we can simultaneously triangularizeor diagonalize the operators in F , i.e., find one basis B such that all of thematrices [T ]B for T ∈ F are upper triangular, or they are all diagonal. Inthe case of diagonalization, it is necessary that F be a commuting family of

7.5. COMMUTING FAMILIES OF OPERATORS∗ 127

operators: UT = TU for all T, U ∈ F . That follows from the fact that alldiagonal matrices commute. Of course, it is also necessary that each operatorin F be a diagonalizable operator. In order to simultaneously triangularize,each operator in F must be triangulable. It is not necessary that F bea commuting family; however, that condition is sufficient for simultaneoustriangulation as long as each T in F can be individually triangulated.

The subspace W of V is invariant under (the family of operators) F ifW is invariant under each operator in F .

Suppose that W is a subspace that is T -invariant for some T ∈ L(V ).Recall the definition of the T -conductor of v into W . It is the set {f(x) ∈F [x] : f(T )(v) ∈ W}. We saw that since F [x] is a PID there must be a monicpolynomial g(x) of minimal degree in this ideal such that f(x) belongs to thisideal if and only if f(x) = g(x)h(x) for some h(x) ∈ F [x]. This polynomialg(x) was called the T -conductor of v into W . Since ~0 ∈ W , it is clear thatg(x) divides the minimal polynomial for T .

Lemma 7.5.1. Let F be a commuting family of triangulable linear operatorson V . Let W be a proper subspace of V which is invariant under F . Thereexists a vector v ∈ V such that:

(a) v is not in W ;(b) for each T in F , the vector T (v) is in the subspace spanned by v and

W .

Proof. Since the space of all linear operators on V is a vector space withdimension n2, it is easy to see that without loss of generality we may assumethat F has only finitely many operators. For let {T1, . . . , Tr} be a maxi-mal linearly independent subset of F , i.e., a basis for the subspace of L(V )spanned by F . If v is a vector such that (b) holds for each Ti, then (b) holdsfor each each operator which is a linear combination of T1, . . . , Tr.

First we establish the lemma for one operator T . To do this we need toshow that there is some vector v ∈ (V \W ) for which the T -conductor of vinto W is a linear polynomial. Since T is triangulable, its minimal polynomialp(x), as well as its characteristic polynomial, factor over F into a product oflinear factors. Say p(x) = (x − c1)e1(x − c2)e2 · · · (x − cr)er . Let w be anyvector of V that is not in W , and let g(x) be the T -conductor of w into W .Then g divides the minimal polynomial for T . Since w is not in W , g is notconstant. So g(x) = (x − c1)f1(x − c2)f2 · · · (x − cr)fr where at least one ofthe integers fi is positive. Choose j so that fj > 0. Then (x − cj) dividesg(x):


g = (x− cj)h(x).

If h(x) = 1 (it must be monic in any case), then w is the vector we seek. Bydefinition of g, if h(x) has degree at least 1, then the vector v = h(T )(w)cannot be in W . But

(T − cjI)(v) = (T − cjI)h(T )(w) (7.6)

= g(T )(w) ∈ W.

Now return to thinking about the family F . By the previous paragraphwe can find a vector v1 not in W and a scalar c1 such that (T1−c1I)(v1) ∈ W .Let V1 be the set of all vectors v ∈ V such that (T1 − c1I)(v) ∈ W . Then V1

is a subspace of V that properly contains W . Since T ∈ F commutes withT1 we have

(T1 − c1I)(T (v)) = T (T1 − c1I)(v).

If v ∈ V1, then (T1− c1I)v ∈ W . Since W is invariant under each T in F , wehave T (T1 − c1I)(v) ∈ W i.e., Tv ∈ V1, for all v ∈ V1 and all T ∈ F .

Note again that W is a proper subspace of V1. Put U2 = T2|V1 , the oper-ator obtained by restricting T2 to the subspace V1. The minimal polynomialfor U2 divides the munimum polynomial for T2. By the second paragraphof this proof applied to U2 and the invariant subspace W , there is a vectorv2 ∈ V1 but not in W , and a scalar c2 such that (T2 − c2I)(v2) ∈ W . Notethat

(a) v2 6∈ W ;(b) (T1 − c1I)(v2) ∈ W ;(c) (T2 − c2I)(v2) ∈ W .

Let V2 = {v ∈ V1 : (T2 − c2I)(v) ∈ W}. Then V2 is invariant under F .Apply the same ideas to U3 = T3|V2 . Continuing in this way we eventuallyfind a vector v = vr not in W such that (Tj−cjI)(v) ∈ W , for 1 ≤ j ≤ r.

Theorem 7.5.2. Let V be a finite-dimensional vector space over the fieldF . Let F be a commuting family of triangulable linear operators on V (i.e.,the minimal polynomial of each T ∈ F splits into linear factors). Thereexists an ordered basis for V such that every operator in F is represented bya triangular matrix with respect to that basis.

Proof. Start by applying Lemma 7.5.1 to the F -invariant subspace W = {0}to obtain a nonzero vector v1 for which T (v1) ∈ W1 = 〈v1〉 for all T ∈ F .

7.6. THE FUNDAMENTAL THEOREM OF ALGEBRA∗ 129

Then apply the lemma to W1 to find a vector v2 not in W1 but for whichT (v2) ∈ W2 = 〈v1, v2〉. Proceed in this way until a basis B = (v1, v2, . . .) ofV has been obtained. Clearly [T ]B is upper triangular for every T ∈ F .

Corollary 7.5.3. Let F be a commuting family of n × n matrices over analgebraically closed field F . There exists a nonsingular n× n matrix P withentries in F such that P−1AP is upper-triangular, for every matrix A in F .

Theorem 7.5.4. Let F be a commuting family of diagonalizable linear op-erators on the finite-dimensional vector space V . There existws an orderedbasis for V such that every operator in F is represented in that basis by adiagonal matrix.

Proof. If dim(V ) = 1 or if each T ∈ F is a scalar times the identity, thenthere is nothing to prove. So suppose 1 < n = dim(V ) and that the theoremis true for vector spaces of diimension less than n. Also assume that forsome T ∈ F , T is not a scalar multiple of the identity. Since the operatorsin F are all diagonalizable, we know that each minimal polynomial splitsinto distinct linear factors. Let c1, . . . , ck be the distinct eigenvalues of T ,and for each index i put Wi = null(T − ciI). Fix i. Then Wi is invariantunder every operator that commutes with T . Let Fi be the family of linearoperators on Wi obtained by restricting the operators in F to the invariantsubspace Wi. Each operator in Fi is diagonalizable, because its minimalpolynomial divides the minimal polynomial for the corresponding operatorin F . By hypothesis, dim(Wi) < dim(V ). So the operators in Fi can besimultaneously diagonalized. In other words, Wi has a basis Bi which consistsof vectors which are simultaneously characteristic vectors for every operatorin Fi. Then B = (B1, . . . ,Bk) is the basis of V that we seek.

7.6 The Fundamental Theorem of Algebra∗

This section is based on the following article: Harm Derksen, The Funda-mental Theorem of Algebra and Linear Algebra, The American MathematicalMonthly, vol 110, Number 7, August-September 2003, 620 – 623. We startby quoting from the third paragraph of the article by Derksen.

“Since the fundamental theorem of algebra is needed in linear algebracourses, it would be desirable to have a proof of it in terms of linear algebra.In this paper we prove that every square matrix with complex coefficients has


an eigenvector. This statement is equivalent to the fundamental theorem ofalgebra. In fact, we will prove the slightly stronger result that any number ofcommuting square matrices with complex entries have a common eigenvector.The proof lies entirely within the framework of linear algebra, and unlikemost other algebraic proofs of the fundamental theorem of algebra, it doesnot require Galois theory or splitting fields. ”

Preliminaries

Several results we have obtained so far have made the assumption thatthe field F was algebraically closed. Moreover, we often gave the complexnumbers C as the prototypical example. So in this section we have to be care-ful not to quote any results that might have hidden in them the assumptionthat C is algebraically closed.

For the proof we use only the following elementary properties of real andcomplex numbers that were established much earlier.

Lemma Every polynomial of odd degree with real coefficients has a (real)zero.

Lemma Every complex number has a square root.

Theorem 7.3.2 If A is real, n×n, with n odd, then A has an eigenvector(belonging to a real eigenvalue).

An Induction Argument

Keep in mind that we cannot use results that might have hidden in themthe assumption that C is algebraically closed.

For a field K and for positive integers d and r, consider the followingstatement:

P (K, d, r): Any r commuting linear transformations A1, A2, . . . , Ar of aK- vector space V of dimension n such that d does not divide n have acommon eigenvector.

Lemma 7.6.1. If P (K, d, 1) holds, then P (K, d, r) holds for all r ≥ 1.

It is important to realize that the smallest d for which the hypothesis ofthis lemma holds in a given situation might be much larger than d = 1.

Proof. The proof is by induction on r. The case of P (K, d, 1) is true byhypothesis. For r ≥ 2, suppose that P (K, d, r− 1) is true and let A1, . . . , Arbe commuting linear transformations of V of dimension such that d does not


divide n. Because (K, d, 1) holds, Ar has an eigenvalue λ in K. Let W bethe kernel and Z the image of Ar − λI. It is now easy to show that each ofW and Z are left invariant by each of A1, . . . , Ar−1.

First suppose that W 6= V . Because dim W + dim Z = dim V , either ddoes not divide dim W or d does not divide dim Z. Since dim W < n anddim Z < n, we may assume by induction on n that A1, . . . , Ar already havea common eigenvector in W or in Z.

In the remaining case, W = V . Because P (K, d, r − 1) holds, we mayassume that A1, . . . , Ar−1 have a common eigenvector in V , say v. SinceArv = λv (because W = V ), v is a common eigenvector of A1, . . . , Ar.

Lemma 7.6.2. P (K, 2, r) holds for all r ≥ 1. In other words, if A1, . . . , Arare commuting linear transformations on an odd dimensional R-vector space,then they have a common eigenvector.

Proof. By Lemma 7.6.1 it is enough to show that P (R, 2, 1) is true. If A isan linear transformation of an odd dimensional R-vector space, det(xI −A)is a polynomial of odd degree, which has a zero λ by Lemma 1.1.2. Then λis a real eigenvalue of A.

We now “lift” the result of Lemma 7.6.2 to the analogous result over thefield C.

Definition If A is any m×n matrix over C we let A∗ denote the transposeof the complex conjugate of the matrix A, i.e., (A∗)ij = Aji. Then A is saidto be Hermitian if m = n and A∗ = A.

Lemma 7.6.3. P (C, 2, 1) holds, i.e., every linear transformation of a C-vector space of odd diimension has an eigenvector.

Proof. Suppose that TA : Cn → Cn : v 7→ Av is a C-linear map with n odd.Let V be the R-vector space Hermn(C), the set of n×n Hermitian matrices.Define two linear operators L1 and L2 on V by

L1(B) =AB +BA∗

2,

and

L2(B) =AB −BA∗

2i.

It is now easy to show that dim V = n2, which is odd. It is also routineto check that L1 and L2 commute. Hence by Lemma 7.6.2, P (R, 2, 2) holds


and implies that L1 and L2 have a common eigenvector B, say L1(B) = λBand L2(B) = µB, with λ and µ both real. But then

(L1 + iL2)(B) = AB = (λ+ µi)B,

and any nonzero column vector of B gives an eigenvector for the matrixA.

Lemma 7.6.4. P (C, 2k, r) holds for all k ≥ 1and r ≥ 1.

Proof. The proof is by induction on k. The case k = 1 follows from Lem-mas 7.6.3 and 7.6.1. Assume that P (C, 2l, r) holds for l < k. We will establishthat P (C, 2k, r) holds. In view of Lemma 7.6.1 it suffices to prove P (C, 2k, 1).Suppose that A : Cn → Cn is linear, where n is divisible by 2k−1 but not by2k. Let V be the C-vector space Skewn(C) = {B ∈ Mn(C) : B∗ = −B}, theset of n × n skew-symmetric matrices with complex entries. Note that dimV = n(n− 1)/2, which ensures that 2k−1 does not divide dim V . Define twocommuting linear transformations L1 and L2 of V by

L1(B) = AB +BA∗

and

L2(B) = ABA∗.

It is an easy exercise to show that L1 and L2 are indeed both in L(V )and they commute.

By P (C, 2k−1, 2), L1 and L2 have a common eigenvector B, say

L1(B) = λ(B) = AB +BA∗, i.e., BA∗ = (λI − A)B

andL2(B) = µB = ABA∗ = A(λI − A)B = (λA− A2)B,

which implies(A2 − λA+ µI)B = 0,

where λ and µ are now complex numbers.Let v be a nonzero column of B. Then

(A2 − λA− µI)v = 0.


Since each element of C has a square root, there is a δ in C such thatδ2 = λ2−4µ. We can write x2−λx+µ = (x−α)(x−β), where α = (λ+δ)/2and β = (λ− δ)/2. We then have

(A− αI)w = 0,

where w = (A−βI)v. If w = 0, then v is an eigenvector of A with eigenvalueβ; if w 6= 0, then w is an eigenvector of A with eigenvalue α.

We have now reached the point where we can prove the main result forcommuting operators on a complex space.

Theorem 7.6.5. If A1, A2, . . . , Ar are commuting linear transformations ofa finite dimensional nonzero C-vector space V , then they have a commoneigenvector.

Proof. Let n be the dimension of V . There exists a positive integer k suchthat 2k does not divide n. Since P (C, 2k, r) holds by Lemma 7.6.4, the theo-rem follows.

Theorem 7.6.6. (The Fundamental Theorem of Algebra) If P (x) is a non-constant polynomial with complex coefficients, then there exists a λ in C suchthat P (λ) = 0.

Proof. It suffices to prove this for monic polynomials. So let

P (x) = xn + a1xn−1 + a2x

n−2 + · · ·+ an.

Then P (x) = det (xI − A), where A is the companion matrix of P :

A =

0 0 · · · 0 −an1 0 · · · 0 −an−1

0 1 · · · 0 −an−2...

.... . .

......

0 0 · · · 1 −a1

.

Theorem 7.6.5 implies that A has a complex eigenvalue λ in C, from whichit follows that P (λ) = 0.


7.7 Exercises

1. Let V be finite dimensional over F and let P ∈ L(V ) be idempotent.Determine the eigenvalues of P and show that P is diagonalizable.

2. If T, S ∈ L(V ) and TS = ST , show that

(i) null(T ) is S-invariant; and

(ii) If f(x) ∈ F [x], then null(f(T )) is S-invariant.

3. Suppose n is a positive integer and T ∈ L(F n) is defined by

T (z1, z2, . . . , zn) = (z1 + · · ·+ zn, z1 + · · ·+ zn, . . . , z1 + · · ·+ zn).

Determine all eigenvalues and eigenvectors of T .

4. Suppose T ∈ L(V ) and dim(Im(T )) = k. Prove that T has at mostk + 1 distinct eigenvalues.

5. Suppose that S, T ∈ L(V ), λ ∈ F and 1 ≤ k ∈ Z. Show that (TS −λI)kT = T (ST − λI)k.

6. Suppose that S, T ∈ L(V ). Prove that ST and TS have the sameeigenvalues but not necessarily the same minimal polynomial.

7. Suppose that S, T ∈ L(V ) and at least one of S, T is invertible. Showthat ST and TS have the same minimal and characteristic polynomials.

8. Suppose that F is algebraically closed, p(z) ∈ F [z] and a ∈ F . Provethat a is an eigenvalue of p(T ) if and only if a = p(λ) for some eigenvalueλ of T . (Hint: Suppose that a is an eigenvalue of p(T ). Factor p(z)−a =c(z−λ1) · · · (z−λm). Use the fact that p(T )−aI is not injective. Don’tforget to consider what happens if c = 0.)

9. Suppose that S, T ∈ L(V ) and that T is diagonalizable. Suppose thateach eigenvector of T is an eigenvector of S. Show that ST = TS.

10. Let T : V → V be a linear operator on the vector space over the fieldF . Let v ∈ V and let m be a positive integer for which v 6= 0, T (v) 6= 0,..., Tm−1(v) 6= 0, but Tm(v) = 0. Show that {v, T (v), . . . , Tm−1(v)} isa linearly independent set.

7.7. EXERCISES 135

11. Let W be a subspace of the vector space V over any field F .

(a) Prove the statements about the quotient space V/W made in theparagraph just before Theorem 7.4.2.

(b) First Isomorphism Theorem: Each linear transformation T : V →W induces a linear isomorphism Φ : V/(null(T ))→ Im(T ).

Chapter 8

Inner Product Spaces

8.1 Inner Products

Throughout this chapter F will denote a subfield of the complex numbers C,and V will denote a vector space over F .

Definition An inner product on V is a scalar-valued function 〈 , 〉 :V × V → F that satisfies the following properties:

(i) 〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉 for all u, v, w ∈ V .

(ii) 〈cu, v〉 = c〈u, v〉 for all c ∈ F , u, v ∈ V .

(iii) 〈v, u〉 = 〈u, v〉 for all u, v ∈ V , where the overline denotes complexconjugate .

(iv) 〈u, u〉 > 0 if u 6= ~0.

It is easy to check that the above properties force

〈u, cv + w〉 = c〈u, v〉+ 〈u,w〉 ∀u, v, w ∈ V, c ∈ F.

Example 8.1.1. On F n there is a “standard” inner product defined as fol-lows: for ~x = (x1, . . . , xn) and ~y = (y1, . . . , yn), put

〈~x, ~y〉 =n∑i=0

xiyi.

It is easy enough to show that the above definition really does give aninner product on F n. In fact, it is a special case of the following example.

137

138 CHAPTER 8. INNER PRODUCT SPACES

Example 8.1.2. Let A be an invertible n × n matrix over F . Define 〈 , 〉on F n (whose elements are written as row vectors for the purpose of thisexample) as follows. For u, v ∈ V put

〈u, v〉 = uAA∗v∗, where B∗ denotes the complex conjugate transpose of B.

It is a fairly straightforward exercise to show that this definition really givesan inner product. If A = I the standard inner product of the previous exampleis obtained.

Example 8.1.3. Let C(0, 1) denote the vector space of all continuous, real-valued functions on the interval [0, 1]. For f, g ∈ C(0, 1) define

〈f, g〉 =

∫ 1

0

f(t)g(t)dt.

Again it is a routine exercise to show that this really gives an inner product.

Definition An inner product space is a vector space over F (a subfieldof C) together with a specified inner product on that space.

Let V be an inner product space with inner product 〈 , 〉. The length ofa vector v ∈ V is defined to be ||v|| =

√〈v, v〉.

Theorem 8.1.4. If V is an inner product space, then for any vectors u, v ∈ Vand any scalar c ∈ F ,

(i) ||cu|| = |c| ||u||;(ii) ||u|| > 0 for u 6= ~0;

(iii) |〈u, v〉| ≤ ||u|| ||v||;(iv) ||u+ v|| ≤ ||u||+ ||v||.

Proof. Statements (i) and (ii) follow almost immediately from the variousdefinitions involved. The inequality in (iii) is clearly valid when u = ~0. Ifu 6= ~0, put

w = v − 〈v, u〉||u||2

u.

It is easily checked that 〈w, u〉 = 0 and

8.1. INNER PRODUCTS 139

0 ≤ ||w||2 = 〈v − 〈v, u〉||u||2

u, v − 〈v, u〉||u||2

u〉

= 〈v, v〉 − 〈v, u〉〈u, v〉||u||2

= ||v||2 − |〈u, v〉|2

||u||2.

Hence |〈u, v〉|2 ≤ ||u||2||v||2. It now follows that

Re〈u, v〉 ≤ |〈u, v〉| ≤ ‖u‖ · ‖v‖,

and

||u+ v||2 = ||u||2 + 〈u, v〉+ 〈v, u〉+ ||v||2

= ||u||2 + 2Re〈u, v〉+ ||v||2

≤ ||u||2 + 2||u|| ||v||+ ||v||2

= (||u||+ ||v||)2.

Thus ||u+ v|| ≤ ||u||+ ||v||.

The inequality in (iii) is called the Cauchy-Schwarz inequality. It hasa very wide variety of applications. The proof shows that if u is nonzero,then |〈u, v〉| < ||u|| ||v|| unless

v =〈v, u〉||u||2

u,

which occurs if and only if (u, v) is a linearly dependent list. You should tryout the Cauchy-Schwarz inequality on the examples of inner products givenabove. The inequality in (iv) is called the triangle inequality.

Definitions Let u and v be vectors in an inner product space V . Thenu is orthogonal to v if and only if 〈u, v〉 = 0 if and only if 〈v, u〉 = 0, in whichcase we say u and v are orthogonal and write u ⊥ v. If S is a set of vectorsin V , S is called an orthogonal set provided each pair of distinct vectors in Sis orthogonal. An orthonormal set is an orthogonal set S with the additionalproperty that ||u|| = 1 for every u ∈ S. Analogous definitions are made forlists of vectors.


Note: The standard basis of F n is an orthonormal list with respect tothe standard inner product.

Also, the zero vector is the only vector orthogonal to every vector. (Provethis!)

Theorem 8.1.5. An orthogonal set of nonzero vectors is linearly indepen-dent.

Proof. Let S be a finite or infinite orthogonal set of nonzero vectors in agiven inner product space. Suppose v1, . . . , vm are distinct vectors in S andthat

w = c1v1 + c2v2 + · · ·+ cmvm.

Then

〈w, vk〉 = 〈∑j

cjvj, vk〉

=∑j

cj〈vj, vk〉

= ck〈vk, vk〉.

Since 〈vk, vk〉 6= 0, it follows that

ck =〈w, vk〉||vk||2

, 1 ≤ k ≤ m.

When w = ~0, each ck = 0, so S is an independent set.

Corollary 8.1.6. If a vector w is a linear combination of an orthogonal list(v1, . . . , vm) of nonzero vectors, then w is the particular linear combination

w =m∑k=1

〈w, vk〉||vk||2

vk. (8.1)

Theorem 8.1.7. (Pythagorean Theorem) If u ⊥ v, then

||u+ v||2 = ||u||2 + ||v||2. (8.2)

Proof. Suppose u ⊥ v. Then

||u+ v||2 = 〈u+ v, u+ v〉= ||u||2 + ||v||2 + 〈u, v〉+ 〈v, u〉= ||u2||+ ||v||2.


Theorem 8.1.8. (Parallelogram Equality) If u, v are vectors in the innerproduct space V , then

||u+ v||2 + ||u− v||2 = 2(||u||2 + ||v||2).

Proof. The details are routine and are left as an exercise (see exercise 1).

Starting with an inner product 〈 , 〉 on a vector space U over F (whereF is some subfield of C), we defined a norm on U by ||v|| =

√〈v, v〉, for all

v ∈ U . This norm function satisfies a variety of properties as we have seenin this section. Sometimes we have a norm function given and would liketo know if it came from an inner product. The next theorem provides ananswer to this question, but first we give an official definition of a norm.

Definition A norm on a vector space U over the field F is a function

|| || : U → [0,∞) ⊆ R such that

(i) ||u|| = 0 iff u = ~0;(ii))||au|| = |a| · ||u|| ∀a ∈ F, u ∈ U ;(iii) ||u+ v|| ≤ ||u||+ ||v||.

Theorem 8.1.9. Let || || be a norm on U . Then there is an inner product

〈 , 〉 on U such that ||u|| = 〈u, u〉12 for all u ∈ U if and only if || || satisfies

the parallelogram equality.

Proof. We have already seen that if the norm is derived from an inner prod-uct, then it satisfies the parallelogram equality. For the converse, now sup-pose that || || is a norm on U satisfying the parallelogram equality. We willshow that there must have been an inner product from which the norm wasderived in the usual fashion. We first consider the case F = R. It is thenclear (from the real polarization identity - see Exercise 13) that 〈 , 〉 mustbe defined in the following way:

〈u, v〉 =||u+ v||2 − ||u− v||2

4. (8.3)

It is then clear that 〈u, u〉 = ||2u||2−||~0||2|4

= ||u||2, so ||u|| = 〈u, u〉 12 for allu ∈ U . but it is not at all clear that 〈 , 〉 is an inner product. However, since〈u, u〉 = ||u||2, by the definition of norm we see that


(a) 〈u, u〉 ≥ 0, with equality if and only if u = ~0.

Next we show that 〈 , 〉 is additive in the first slot. We use the parallel-

ogram equality in the form ||u||2 + ||v||2 = ||u+v||22

+ ||u−v||22

.Let u, v, w ∈ U . Then from the definition of 〈 , 〉 we have:

4(〈u+ v, w〉 − 〈u,w〉 − 〈v, w〉) (which should be 0)

= ||u+ v+w||2−||u+ v−w||2−||u+w||2 + ||u−w||2−||v+w||2 + ||v−w||2

= ||u+v+w||2 +(||u−w||2 + ||v−w||2)−||u+v−w||2−(||u+w||2 + ||v+w||2)

= ||u+v+w||2+||u+ v − 2w||2

2+||u− v||2

2−||u+v−w||2−‖u+ v + 2w||2

2−||u− v||

2

2

= (||u+v+w||2+||w||2)+||u+ v − 2w||2

2−(||u+v−w||2+||w||2)−||u+ v + 2w||2

2

=||u+ v + 2w||2

2+||u+ v||2

2+||u+ v − 2w||2

2

−||u+ v||2

2− ||u+ v − 2w||2

2− ||u+ v + 2w||2

2

= 0.

Hence 〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉, proving

(b) 〈 , 〉 is additive in the first slot.

To prove that 〈 , 〉 is homogeneous in the first slot is rather more involved.First suppose that n is a positive integer. Then using additivity in the firstslot we have 〈nu, v〉 = n〈u, v〉. Replacing u with 1

nu gives 〈u, v〉 = n〈 1

nu, v〉,

so 1n〈u, v〉 = 〈 1

nu, v〉. So if m,n are positive integers,

〈mnu, v〉 = m〈 1

nu, v〉 =

m

n〈u, v〉.

Using property (ii) in the definition of norm we see ||−u|| = |−1|||u|| = ||u||.Then using the definition of 〈 , 〉, we have

〈−u, v〉 =|| − u+ v||2 − || − u− v||2|

4=|| − (u− v)||2 − || − (u+ v)||2

4

=−||u+ v||2 + ||u− v||2

4= −〈u, v〉.


This shows that if r is any rational number, then

〈ru, v〉 = r〈u, v〉.

Now suppose that λ ∈ R is any real number (with special interest in thecase where λ is not rational). There must be a sequence {rn}∞n=1 of rationalnumbers for which limn→∞rn = λ. Thus

λ〈u, v〉 = limn→∞rn〈u, v〉 = limn→∞〈rnu, v〉 =

= limn→∞||rnu+ v||2 − ||rnu− v||2

4.

We claim that limn→∞||rnu + v||2 = ||λu + v||2 and limn→∞||rnu − v||2 =||λu− v||2. Once we have shown this, we will have

λ〈u, v〉 =λu+ v||2 − ||λu− v||2

4= 〈λu, v〉,

completing the proof that 〈 , 〉 is homogeneous in the first slot.

For x, y ∈ U , ||x|| = ||y+(x−y)|| ≤ ||y||+||y−x||, so ||x||−||y|| ≤ ||x−y||.Interchanging the roles of x and y we see that also ||y‖ − ‖x|| ≤ ||y − x|| =||x− y||. Hence |||x|| − ||y||| ≤ ||y − x||. With x = rnu + v and y = λu + v,this latter inequality gives

|||rnu+ v|| − ||λu+ v||| ≤ ||(rn − λ)u|| = |rn − λ| · ||u||.

Since rn − λ → 0, we have ||rnu + v|| → ||λu + v||. Replacing v with −vgives limn→∞||rnu − v|| = ||λu − v||. So indeed 〈 , 〉 is homogeneous in thefirst slot. Fiinally, we show that

(d) 〈u, v〉 = 〈v, u〉.So:

〈u, v〉 =||u+ v||2 − ||u− v||2

4=||v + u||2 − ||v − u||2

4= 〈v, u〉.

This completes the proof in the case that F = R.

Now consider the case F = C. By the complex polarization identity (seeExercise 15) we must have

〈u, v〉 =1

4

4∑n=1

in||u+ inv||2. (8.4)


Then putting v = u we find

〈u, u〉 =1

4

4∑n=1

in||u+ inu||2 =1

4

4∑n=1

in|1 + in|2||u||2

=||u||2

4[2i− 0− 2i+ 4] = ||u||2,

as desired. But we must still show that 〈 , 〉 has the properties of an innerproduct.

Because 〈u, u 〉 = ||u||2, it follows immediately from the properties of anorm that 〈u, u〉 ≥ 0 with equality if and only if u = ~0.

For convenience define 〈 , 〉R to be the real inner product defined above,so

〈u, v〉R =||u+ v||2 − ||u− v||2

4.

Note that〈u, v〉 = 〈u, v〉R + i〈u, iv〉R.

We have already proved that 〈 , 〉R is additive in the first slot, which cannow be used in a routine fashion to show that 〈 , 〉 is also additive in thefirst slot. It is even easier to show that 〈au, v〉 = a〈u, v〉 for a ∈ R using thehomogeneity of 〈 , 〉R in the first slot. However, we must still extend this toall complex numbers. Note that:

〈iu, v〉 =

=||iu+ v||2 − ||iu− v||2 + i||iu+ iv||2 − i||iu− iv||2

4

=||i(u+ v)||2i− ||i(u− v)||2i− ||i(u+ iv)||2 + ||i(u− iv)||2

4

=||u+ v||2i− ||u− v||2i− ||u+ iv||2 + ||u− iv||2

4

= i〈u, v〉.

Combining this with additivity and homogeneity with respect to realnumbers, we get that

〈(a+ bi)u, v〉 = (a+ bi)〈u, v〉 ∀a, b ∈ R.


Hence 〈 , 〉 is homogeneous in the first slot. The only thing remaining toshow is that 〈v, u〉 = 〈u, v〉.

〈u, v〉 =||u+ v||2 − ||u− v||2 + ||u+ iv||2i− ||u− iv||2i

4

=||v + u||2 − ||v − u||2 + ||i(v − ui)||2i− ||(−i)(v + ui)||2i

4

=||v + u||2 − ||v − u||2 + ||v + iu||2i− ||v − iu||2i

4

= 〈v, u〉.

Now suppose that F is a subfield of C and that V = F n. There are threenorms on V that are most commonly used in applications.

Definition For vectors x = (x1, . . . , xn)T ∈ V , the norms ‖ · ‖1, ‖ · ‖2,and ‖ ·‖∞, called the 1-norm, 2-norm, and∞-norm, respectively, are definedas:

‖x‖1 = |x1|+ |x2|+ · · ·+ |xn|;‖x‖2 = (|x1|2 + |x2|2 + · · ·+ |xn|2)1/2; (8.5)

‖x‖∞ = max{|x1|, |x2|, . . . , |xn|}.

Put x = (1, 1)T ∈ F 2 and y = (−1, 1)T ∈ F 2. Using these vectors x and yit is routine to show that ‖ · ‖1 and ‖ · ‖∞ do not satisfy the parallelogramequality, hence must not be derived from an inner product in the usual way.On the other hand, all three norms are equivalent in a sense that we areabout to make clear. First we pause to notice that the so-called norms reallyare norms. The only step that is challenging is the triangle inequality for the2-norm, and we proved this earlier. The details for the other two norms areleft to the reader.

Definition Let ‖ · ‖ be a norm on V . A sequence {vi}∞i=1 of vectors issaid to converge to the vector v∞ provided the sequence {‖vi − v∞‖} of realnumbers converges to 0.

With this definition we can now talk of a sequence of vectors in F n con-verging by using norms. But which norm should we use? What we meanby saying that all three norms are equivalent is that a sequence of vectors


converges to a vector v using one of the norms if and only if it converges tothe same vector v using either of the other norms.

Theorem 8.1.10. The 1-norm, 2-norm and ∞-norm on F n are all equiva-lent in the sense that

(a) If a sequence {xi} of vectors converges to x∞ as determined in one ofthe norms, then it converges to x∞ in all three norms, and for fixed index j,the entries (xi)j converge to the entry (x∞)j. This is an easy consequence of

(b) ‖x‖1 ≤ ‖x‖2

√n ≤ n‖x‖∞ ≤ n‖x‖1.

Proof. If u = (u1, . . . , un), put x = (|u1|, . . . , |un|)T and y = (1, 1, . . . , 1)T .Then by the Cauchy-Schwarz inequality applied to x and y, we have

|〈x, y〉| =∑|ui| = ‖u‖1 ≤ ‖x‖2‖y‖2 =

√∑|ui|2 ·

√n = ‖u‖2

√n.

So ‖x‖1 ≤ ‖x‖2

√n for all x ∈ F n.

Next, ‖x‖22 = x2

1 + · · ·+ x2n ≤ n(max{|xi|})2 = n · ‖x||2∞, implying ‖x‖2

2 ≤√n‖x‖∞. This proves the first and second inequalities in (b), and the third

is quite obvious.

In fact, any two norms on a finite dimensional vector space over F areequivalent (in the sense given above), but we do not need this result.

8.2 Orthonormal Bases

Theorem 8.2.1. Let L = (v1, . . . , vm) be an orthonormal list of vectors inV and put W = span(L). If w =

∑mi=1 aivi is an arbitrary element of W ,

then(i) ai = 〈w, vi〉, and(ii) ||w||2 =

∑mi=1 |ai|2 =

∑mi=1 |〈w, vi〉|2.

Proof. If w =∑m

i=1 aivi, compute 〈w, vj〉 = 〈∑m

i=1 aivi, vj〉 = aj. Then applythe Pythagorean Theorem.

The preceding result shows that an orthonormal basis can be extremelyhandy. The next result gives the Gram-Schmidt algorithm for replacing alinearly independent list with an orthonormal one having the same span asthe original.

8.2. ORTHONORMAL BASES 147

Theorem 8.2.2. (Gram-Schmidt method) If L = (v1, . . . , vm) is a lin-early independent list of vectors in V , then there is an orthonormal listB = (e1, . . . , em) such that span(e1, . . . , ej) = span(v1, . . . , vj) for each j =1, 2, . . . ,m.

Proof. Start by putting e1 = v1/||v1||, so e1 has norm 1 and spans the samespace as does v1.

We construct the remaining vectors e2, . . . , em inductively. Suppose thate1, . . . , ek have been determined so that (e1, . . . , ek) is orthonormal andspan(e1, . . . , ej) = span(v1, . . . , vj) for each j = 1, 2, . . . , k. We then con-

struct ek+1 as follows. Put e′k+1 = vk+1 −∑k

i=1〈vk+1, ei〉ei. Check that e′k+1

is orthogonal to each of e1, . . . , ek. Then put ek+1 = e′k+1/||e′k+1||.

At this point we need to assume that V is finite dimensional.

Corollary 8.2.3. Let V be a finite dimensional inner product space.(i) V has an orthonormal basis.(ii) If L is any orthonormal set in V it can be completed to an orthonormal

basis of V .

Proof. We know that any independent set can be completed to a basis towhich we can then apply the Gran-Schmidt algorithm.

Lemma 8.2.4. If T ∈ L(V ) has an upper triangular matrix with respect tosome basis of V , then it has an upper triangular matrix with respect to someorthonormal basis of V .

Proof. Suppose that B = (v1, . . . , vn) is a basis such that [T ]B is upper tri-angular. Basically this just means that for each j = 1, 2, . . . , n the subspacespan(v1, . . . , vj) is T -invariant. Use the Gram-Schmidt algorithm to constructan orthonormal basis S = (e1, . . . , en) for V such that span(v1, . . . , vj) =span(e1, . . . , ej) for each j = 1, 2, . . . , n. Hence for each j, span(e1, . . . , ej) isT -invariant, so that [T ]S is upper triangular.

Using the fact that C is algebraically closed we showed that if V is afinite dimensional complex vector space, then there is a basis B for V suchthat [T ]B is upper triangular. Hence we have the following result which issometimes called Schur’s Theorem.

Corollary 8.2.5. (Schur’s Theorem) If T ∈ L(V ) where V is a finite di-mensional complex inner product space, then there is an orthonormal basisfor V with respect to which T has an upper triangular matrix.


We now apply the preceding results to the case where V = Cn. Firstwe introduce a little more language. If P is an invertible matrix for whichP−1 = P ∗, where P ∗ is the conjugate transpose of P , then P is said to beunitary. If P is real and unitary (so P−1 = P T ), we say P is an orthogonalmatrix. Let A be an n×n matrix over C. View Cn as an inner product spacewith the usual inner product. Let S be the standard ordered (orthonormal)basis. Define TA ∈ L(Cn) by TA(~x) = A~x. We know that [TA]S = A. LetB = (v1, . . . , vn) be an orthonormal basis with respect to which TA has anupper triangular matrix. Let P be the matrix whose jth column is vj = [vj]S .Then [TA]B = P−1AP by Theorem 4.7.2. Since B is orthonormal it is easyto check that P is a unitary matrix. We have proved the following.

Corollary 8.2.6. If A is an n×n complex matrix, there is a unitary matrixP such that P−1AP is upper triangular.

8.3 Orthogonal Projection and Minimization

Let V be a vector space over the field F , F a subfield of C, and let 〈 , 〉 bean inner product on V . Let W be a subspace of V , and put

W⊥ = {v ∈ V : 〈w, v〉 = 0 for all w ∈ W}.

Obs. 8.3.1. W⊥ is a subspace of V and W ∩W⊥ = {0}. Hence W +W⊥ =W ⊕W⊥.

Proof. Easy exercise.

We do not know if each v ∈ V has a representation in the form v =w + w′ with w ∈ W and w′ ∈ W⊥, but at least we know from the precedingobservation that it has at most one. There is one case where we know thatV = W ⊕W⊥.

Theorem 8.3.2. If W is finite dimensional, then V = W ⊕W⊥.

Proof. Suppose W is finite dimensional. Then using the Gram-Schmidt pro-cess, for example, we can find an orthonormal basis D = (v1, . . . , vm) of W .For arbitrary v ∈ V , put ai = 〈v, vi〉. Then we know that

w =m∑i=1

aivi

8.3. ORTHOGONAL PROJECTION AND MINIMIZATION 149

is in W , and we write

v = w + w′ =m∑i=1

aivi + (v −m∑i=1

aivi).

We show that v −∑m

i=1 aivi ∈ W⊥.So let u ∈ W , say u =

∑mj=1 bjvj. Since D is orthonormal,

〈v −m∑1

aivi, u〉 = 〈v −m∑i=1

aivi,

m∑j=1

bjvj〉

=m∑j=1

bj〈v, vj〉 −m∑

i,j=1

bjai〈vi, vj〉

=m∑j=1

bjaj −m∑j=1

bjaj〈vj, vj〉 = 0.

So with w =∑m

i=1 aivi and w′ = v − w, v = w + w′ is the unique way towrite v as the sum of an element of W plus an element of W⊥.

Now suppose V = W ⊕W⊥, (which we have just seen is the case if Wis finite dimensional). A linear transformation P : V → V is said to be anorthogonal projection of V onto W provided the following hold:

(i) P (w) = w for all w ∈ W ,(ii) P (w′) = 0 for all w′ ∈ W⊥.So let P be an orthogonal projection onto W by this definition. Let

v = w + w′ be any element of V with w ∈ W , w′ ∈ W⊥. Then P (v) =P (w+w′) = P (w) +P (w′) = w+ 0 = w ∈ W . So P (v) is a uniquely definedelement of W for all v.

Conversely, still under the hypothesis that V = W⊕W⊥, define P ′ : V →V as follows. For v ∈ V , write v = w + w′, w ∈ W , w′ ∈ W⊥ (uniquely!),and put P ′(v) = w. It is an easy exercise to show that P ′ really is linear,P ′(w) = w for all w ∈ W , and P ′(w′) = P ′(0 + w′) = 0 for w′ ∈ W⊥. SoP ′ : v = w+w′ 7→ w is really the unique orthogonal projection of V onto W .Moreover,

Obs. 8.3.3. P 2 = P ; P (v) = v if and only if v ∈ W ; P (v) = 0 if and onlyif v ∈ W⊥.


Obs. 8.3.4. I − P is the unique orthogonal projection of V onto W⊥.

Both Obs. 8.3.3 and 8.3.4 are fairly easy to prove, and their proofs areworthwhile exercises.

Obs. 8.3.5. W ⊆ W⊥⊥.

Proof. W⊥ consists of all vectors in V orthogonal to every vector of W . Soin particular, every vector of W is orthogonal to every vector of W⊥, i.e.,each vector of W is in (W⊥)⊥. But in general we do not know if there couldbe some vector outside W that is in W⊥⊥.

Theorem 8.3.6. If V = W ⊕W⊥, then W = (W⊥)⊥.

Proof. By Obs. 8.3.5, we must show that W⊥⊥ ⊆ W . Since V = W ⊕W⊥,we know there is a unique orthogonal projection P of V onto W , and foreach v ∈ V , v − P (v) ∈ W⊥. Keep in mind that W ⊆ (W⊥)⊥ and P (v) ∈W ⊆ (W⊥)⊥, so 〈v − P (v), P (v)〉 = 0. It follows that for v ∈ W⊥⊥ we have

||v−P (v)||2 = 〈v−P (v), v−P (v)〉 = 〈v, v−P (v)〉−〈P (v), v−P (v)〉 = 0−0 = 0

by the comments just above. Hence v = P (v), which implies that v ∈ W ,i.e., W⊥⊥ ⊆ W . Hence W⊥⊥ = W .

Note: If V = W ⊕W⊥, then (W⊥)⊥ = W , so also V = W⊥ ⊕ (W⊥)⊥,and (W⊥)⊥⊥ = W⊥.

Given a finite dimensional subspace U of the inner product space V anda point v ∈ V , we want to find a point u ∈ U closest to v in the sense that||v− u|| is as small as possible. To do this we first construct an orthonormalbasis B = (e1, . . . , em) of U . The unique orthogonal projection of V onto Uis given by

PU(v) =m∑i=0

〈v, ei〉ei.

We show that PU(v) is the unique u ∈ U closest to v.

Theorem 8.3.7. Suppose U is a finite dimensional subspace of the innerproduct space V and v ∈ V . Then

||v − PU(v)|| ≤ ||v − u|| ∀u ∈ U.

Furthermore, if u ∈ U and equality holds, then u = PU(v).

8.3. ORTHOGONAL PROJECTION AND MINIMIZATION 151

Proof. Suppose u ∈ U . Then v−PU(v) ∈ U⊥ and PU(v)−u ∈ U , so we mayuse the Pythagorean Theorem in the following:

||v − PU(v)||2 ≤ ||v − PU(v)||2 + ||PU(v)− u||2 (8.6)

= ||(v − PU(v)) + (PU(v)− u)||2

= ||v − u||2,

where taking square roots gives the desired inequality. Also, the inequalityof the theorem is an equality if and only if the inequality in Eq. 8.6 is anequality, which is if and only if u = PU(v).

Example 8.3.8. There are many applications of the above theorem. Hereis one example. Let V = R[x], and let U be the subspace consisting of allpolynomials f(x) with degree less than 4 and satisfying f(0) = f ′(0) = 0.

Find a polynomial p(x) ∈ Usuch that∫ 1

0|2 + 3x − p(x)|2dx is as small as

possible.

Solution: Define an inner product on R[x] by 〈f, g〉 =∫ 1

0f(x)g(x)dx.

Put g(x) = 2 + 3x, and note that U = {p(x) = a2x2 + a3x

3 : a2, a3 ∈ R}.We need an orthonormal basis of U . So start with the basis B = (x2, x3)and apply the Gram-Schmidt algorithm. We want to put e1 = x2

||x2|| . First

compute ||x2||2 =∫ 1

0x4dx = 1

5, so

e1 =√

5 x2. (8.7)

Next we want to put

e2 =x3 − 〈x3, e1〉e1

||x3 − 〈x3, e1〉e1||.

Here 〈x3, e1〉 =√

5∫ 1

0x5dx =

√5

6. Then x3 − 〈x3, e1〉e1 = x3 − 5

6x2.

Now ||x3 − 56x2||2 =

∫ 1

0(x3 − 5

6x2)2dx = 1

7·36. Hence

e2 = 6√

7(x3 − 5

6x2) =

√7(6x3 − 5x2). (8.8)

Then the point p ∈ U closest to g = 2 + 3x is


p = 〈g, e1〉e1 + 〈g, e2〉e2

=

[√5

∫ 1

0

(2 + 3x)x2dx

]√5x2 + (8.9)

+

[√7

∫ 1

0

(2 + 3x)(6x3 − 5x2)dx

]√7(6x3 − 5x2),

(8.10)

so that after a bit more computation we have

p = 24x2 − 203

10x3. (8.11)

8.4 Linear Functionals and Adjoints

Recall that if V is a vector space over any field F , then a linear map fromV to F (viewed as a vector space over F ) is called a linear functional. Theset V ∗ of all linear functionals on V is called the dual space of V . When Vis a finite dimensional inner product space the linear functionals on V havea particularly nice form. Fix v ∈ V . Then define φv : V → F : u 7→ 〈u, v〉.It is easy to see that φv is a linear functional. If V is finite dimensional thenevery linear functional on V arises this way.

Theorem 8.4.1. Let V be a finite-dimensional inner product space. Letφ ∈ V ∗. Then there is a unique vector v ∈ V such that

φ(u) = 〈u, v〉

for every u ∈ V .

Proof. Let (e1, . . . , en) be an orthonormal basis of V . Then for any u ∈ V ,we have

u =n∑i=1

〈u, ei〉ei.

8.4. LINEAR FUNCTIONALS AND ADJOINTS 153

Hence

φ(u) = φ(n∑i=1

〈u, ei〉ei) =n∑i=1

〈u, ei〉φ(ei)

=n∑i=1

〈u, φ(ei)ei〉.

(8.12)

So if we put v =∑n

i=1 φ(ei)ei, we have φ(u) = 〈u, v〉 for every u ∈ V .This shows the existence of the desired v. For the uniqueness, suppose that

φ(u) = 〈u, v1〉 = 〈u, v2〉 ∀u ∈ V.

Then 0 = 〈u, v1 − v2〉 for all u ∈ V , forcing v1 = v2.

Now let V and W both be inner product spaces over F . Let T ∈ L(V,W )and fix w ∈ W . Define φ : V → F by φ(v) = 〈T (v), w〉. First, it is easy tocheck that φ ∈ V ∗. Second, by the previous theorem there is a unique vector(that we now denote by T ∗(w)) for which

φ(v) = 〈T (v), w〉 = 〈v, T ∗(w)〉 ∀v ∈ V.

It is clear that T ∗ is some kind of map from W to V . In fact, it is routineto check that T ∗ is linear, i.e., T ∗ ∈ L(W,V ). T ∗ is called the adjoint of T .

Theorem 8.4.2. Let V and W be inner product spaces and suppose thatS, T ∈ L(V,W ) are both such that S∗ and T ∗ exist. Then

(i) (S + T )∗ = S∗ + T ∗.(ii) (aT )∗ = aT ∗.(iii) (T ∗)∗ = T .(iv) I∗ = I.(v) (ST )∗ = T ∗S∗.

Proof. The routine proofs are left to the reader.

Theorem 8.4.3. Let V and W be finite dimensional inner product spacesover F . Suppose T ∈ L(V,W ). Then

(i) null(T ∗) = (Im(T ))⊥;(ii) Im(T ∗) = (null(T ))⊥;(iii) null(T ) = (Im(T ∗))⊥;(iv) Im(T ) = (null(T ∗))⊥.


Proof. For (i), w ∈ null(T ∗) iff T ∗(w) = ~0 iff 〈v, T ∗(w)〉 = 0 ∀v ∈ Viff 〈T (v), w〉 = 0 ∀v ∈ V iff w ∈ (Im(T ))⊥. This proves (i). By takingorthogonal complements of both sides of an equality, or by replacing anoperator with its adjoint, the other three equalities are easily established.

Theorem 8.4.4. Let V be a finite dimensional inner product space over Fand let B = (e1, . . . , en) be an orthonormal basis for V . Let T ∈ L(V ) andlet A = [T ]B. Then Aij = 〈T (ej), ei〉.

Proof. The matrix A is defined by

T (ej) =n∑i=1

Aijei.

Since B is orthonormal, we also have

T (ej) =n∑i=1

〈T (ej), ei〉ei.

Hence Aij = 〈T (ej), ei〉.

Corollary 8.4.5. Let V be a finite dimensional inner product space over Fand let B = (e1, . . . , en) be an orthonormal basis for V . Let T ∈ L(V ) andlet A = [T ]B. Then [T ∗]B = A∗, where A∗ is the conjugate transpose of A.

Proof. According to Theorem 8.4.4, Aij = 〈T (ej), ei〉, and if B = [T ∗]B, then

Bij = 〈T ∗(ej), ei〉 = 〈ei, T ∗(ej)〉 = 〈T (ei), ej〉 = Aji.

8.5 Exercises

1. Parallelogram Equality: If u, v are vectors in the inner productspace V , then

||u+ v||2 + ||u− v||2 = 2(||u||2 + ||v||2).

Explain why this equality should be so-named.

2. Let T : Rn → Rn be defined by

T (z1, z2, . . . , zn) = (z2 − z1, z3 − z2, . . . , z1 − zn).

8.5. EXERCISES 155

(a) Give an explicit expression for the adjoint, T ∗.

(b) Is T invertible? Explain.

(c) Find the eigenvalues of T .

(d) Compute the characteristic polynomial of T .

(e) Compute the minimal polynomial of T .

3. Let T ∈ L(V ), where V is a complex inner product space and T has anadjoint T ∗. Show that if T is normal and T (v) = λv for some v ∈ V ,then T ∗(v) = λv.

4. Let V and W be inner product spaces over F , and let T ∈ L(V,W ). Ifthere is an adjoint map T ∗ : W → V such that 〈T (v), w〉 = 〈v, T ∗(w)〉for all v ∈ V and all w ∈ W , show that T ∗ ∈ L(W,V ).

5. Let T : V → W be a linear transformation whose adjoint T ∗ : W → Vwith respect to 〈 , 〉V , 〈 , 〉W does exist. So for all v ∈ V , w ∈ W ,〈T (v), w〉W = 〈v, T ∗(w)〉V . Put N = null(T ) and R∗ = Im(T ∗).

(a) Show that (R∗)⊥ = N .

(b) Suppose that R∗ is finite dimensional and show that N⊥ = R∗.

6. Prove Theorem 8.4.2.

7. Prove Theorem 8.4.3.

8. On an in-class linear algebra exam, a student was asked to give (for tenpoints) the definition of “real symmetric matrix.” He couldn’t remem-ber the definition and offered the following alternative: A real, n × nmatrix A is symmetric if and only if A2 = AAT . When he received nocredit for this answer, he went to see the instructor to find out what waswrong with his definition. By that time he had looked up the definitionand tried to see if he could prove that his definition was equivalent tothe standard one: A is symmetric if and only if A = AT . He had aproof that worked for 2× 2 matrices and thought he could prove it for3×3 matrices. The instructor said this was not good enough. However,the instructor would give the student 5 points for a counterexample,or the full ten points if he could prove that the two definitions wereequivalent for all n. Your problem is to determine whether or not the


student should have been able to earn ten points or merely five points.And you might consider the complex case also: show that A2 = AA∗ ifand only A = A∗, or find a counterexample.

9. Let A be a real symmetric matrix. Either prove the following state-ment or disprove it with a counterexample: Each diagonal element ofA is bounded above (respectively, below) by the largest (respectively,smallest) eigenvalue of A.

10. Suppose V and W are finite dimensional inner product spaces withorthonormal bases B1 and B2, respectively. Let T ∈ L(V,W ), so weknow that T ∗ ∈ L(W,V ) exists and is unique. Prove that [T ∗]B1,B2 isthe conjugate transpose of [T ]B2,B1 .

11. Suppose that V is an inner product space over F . Let u, v ∈ V . Provethat 〈u, v〉 = 0 iff ||u|| ≤ ||u+ av|| ∀a ∈ F .

(Hint: If u ⊥ v, use the Pythagorean theorem on u+ av. For the con-verse, square the assumed inequality and then show that−2Re(a〈u, v〉) ≤|a|2||v|| for all a ∈ F . Then put a = −1/||v|| in the case v 6= ~0.)

12. For arbitrary real numbers a1, . . . , an and b1, . . . , bn, show that(n∑j=1

ajbj

)2

≤

(n∑j=1

ja2j

)(n∑j=1

b2j/j

).

13. Let V be a real inner product space. Show that

〈u, v〉 =1

4||u+ v||2 − 1

4||u− v||2. (8.13)

(This equality is the real polarization identity.)

14. Let V be a complex inner product space. Show that

〈u, v〉 = Re(〈u, v〉) + i Re(〈u, iv〉). (8.14)

15. Let V be a complex inner product space. Show that

〈u, v〉 =1

4

4∑n=1

in||u+ inv||2. (8.15)

8.5. EXERCISES 157

(This equality is the complex polarization identity.)

For the next two exercises let V be a finite dimensional inner productspace, and let P ∈ L(V ) satisfy P 2 = P .

16. Show that P is an orthogonal projection if and only if null(P ) ⊆(Im(P ))⊥.

17. Show that P is an orthogonal projection if and only if ||P (v)|| ≤ ||v||for all v ∈ V . (Hint: You will probably need to use Exercises 11 and16.)

18. Fix a vector v ∈ V and define T ∈ V ∗ by T (u) = 〈u, v〉. For a ∈ Fdetermine the formula for T ∗(a). Here we us the standard inner producton F given by 〈a, b〉 = ab for a, b ∈ F .

19. Let T ∈ L(V,W ). Prove that

(a) T is injective if and only if T ∗ is surjective;

(b) T is surjective if and only if T ∗ is injective.

20. If T ∈ L(V ) and U is a T -invariant subspace of V , then U⊥ is T ∗-invariant.

21. Let V be a vector space with norm ‖ · ‖. Prove that

| ‖u‖ − ‖v‖ | ≤ ‖u− v‖.

22. Use exercise 21 to show that if {vi}∞i=1 converges to v in V , then{‖ui‖}∞i=1 converges to ‖v‖. Give an example in R2 to show that thenorms may converge without the vectors converging.

23. Let B be the basis for R2 given by B = (e1, e1 + e2) where e1 = (1, 0)T

and e2 = (0, 1)T . Let T ∈ L(R2) be the operator whose matrix with

respect to the basis B is [T ]B =

(1 1−1 2

). Compute the matrix [T ∗]B,

where the standard inner product on R2 is used.

24. Let A, B denote matrices in Mn(C).


(a) Define the term unitary matrix; define what it means for A and Bto be unitarily equivalent; and show that unitary equivalence onMn(C) is an equivalence relation.

(b) Show that if A = (aij) and B = (bij) are unitarily equivalent, then∑ni,j=1 |aij|2 =

∑ni,j=1 |bij|2. (Hint: Consider tr(A∗A).)

(c) Show that A =

(3 1−2 0

)and B =

(1 10 2

)are similar but

not unitarily equivalent.

(d) Give two 2×2 matrices that satisfy the equality of part (b) abovebut are not unitarily equivalent. Explain why they are not uni-tarily equivalent.

25. Let U and W be subspaces of the finite-dimensional inner product spaceV .

(a) Prove that U⊥ ∩W⊥ = (U +W )⊥.

(b) d Prove that

dim(W )− dim(U ∩W ) = dim(U⊥)− dim(U⊥ ∩W⊥).

26. Let u, v ∈ V . Prove that 〈u, v〉 = 0 iff ‖u‖ ≤ ‖u+ av‖ for all a ∈ F .

Chapter 9

Operators on Inner ProductSpaces

Throughout this chapter V will denote an inner product space over F . Tomake life simpler we also assume that V is finite dimensional, so operatorson V always have adjoints, etc.

9.1 Self-Adjoint Operators

An operator T ∈ L(V ) is called Hermitian or self-adjoint provided T = T ∗.The same language is used for matrices. An n × n complex matrix A is

called Hermitian provided A = A∗ (where A∗ is the transpose of the complexconjugate of A). If A is real and Hermitian it is merely called symmetric.

Theorem 9.1.1. Let F = C, i.e., V is a complex inner product space, andsuppose T ∈ L(V ). Then

(a) If 〈T (v), v〉 = 0 ∀v ∈ V , then T = 0.(b) If T is self-adjoint, then each eigenvalue of T is real.(c) T is self-adjoint if and only if 〈T (v), v〉 ∈ R ∀v ∈ V .

Proof. It is routine to show that for all u,w ∈ V ,

〈T (u), w〉 =〈T (u+ w), u+ w〉 − 〈T (u− w), u− w〉

4+

+〈T (u+ iw), u+ iw〉 − 〈T (u− iw), u− iw〉

4i.

159

160 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES

Note that each term on the right hand side is of the form 〈T (v), v〉 for anappropriate v ∈ V , so by hypothesis 〈T (u), w〉 = 0 for all u,w ∈ V . Putw = T (u) to conclude that T = 0. This proves part (a).

For part (b), suppose that T = T ∗, and let v be a nonzero vector in Vsuch that T (v) = λv for some complex number λ. Then

λ〈v, v〉 = 〈T (v), v〉 = 〈v, T (v)〉 = λ〈v, v〉.

Hence λ = λ, implying λ ∈ R, proving part (b).For part (c) note that for all v ∈ V we have

〈T (v), v〉 − 〈T (v), v〉 = 〈T (v), v〉 − 〈v, T (v)〉= 〈T (v), v〉 − 〈T ∗(v), v〉= 〈(T − T ∗)(v), v〉.

If 〈T (v), v〉 ∈ R for every v ∈ V , then the left side of the equation equals0, so 〈(T − T ∗)(v), v〉 = 0 for each v ∈ V . Hence by part (a), T − T ∗ = 0,i.e., T is self-adjoint.

Conversely, suppose T is self-adjoint.Then the right hand side of theequation above equals 0, so the left hand side must also be 0, implying〈T (v), v〉 ∈ R for all v ∈ V , as claimed.

Now suppose that V is a real inner product space. Consider the operatorT ∈ L(R2) defined by T (x, y) = (−y, x) with the standard inner product onR2. Then 〈T (v), v〉 = 0 for all v ∈ R2 but T 6= 0. However, this cannothappen if T is self-adjoint.

Theorem 9.1.2. Let T be a self-adjoint operator on the real inner productspace V and suppose that 〈T (v), v〉 = 0 for all v ∈ V . Then T = 0.

Proof. Suppose the hypotheses of the theorem hold. It is routine to verify

〈T (u), w〉 =〈T (u+ w), u+ w〉 − 〈T (u− w), u− w〉

4

using 〈T (w), u〉 = 〈w, T (u)〉 = 〈T (u), w〉 because T is self-adjoint and V is areal vector space. Hence also 〈T (u), w〉 = 0 for all u,w ∈ V . With w = T (u)we see T = 0.

9.1. SELF-ADJOINT OPERATORS 161

Lemma 9.1.3. Let F be any subfield of C and let T ∈ L(V ) be self-adjoint.If α, β ∈ R are such that x2 + αx + β is irreducible over R, i.e., α2 < 4β,then T 2 + αT + βI is invertible.

Proof. Suppose α2 < 4β and ~0 6= v ∈ V . Then

〈(T 2 + αT + βI)(v), v〉 = 〈T 2(v), v〉+ α〈T (v), v〉+ β〈v, v〉= 〈T (v), T (v)〉+ α〈T (v), v〉+ β||v||2

≥ ||T (v)||2 − |α| ||T (v)|| ||v||+ β||v||2

=

(||T (v)|| − |α| ||v||

2

)2

+

(β − α2

4

)||v||2

> 0, (9.1)

where the first inequality holds by the Cauchy-Schwarz inequality. The lastinequality implies that (T 2 +αT+βI)(v) 6= ~0. Thus T 2 +αT+βI is injective,hence it is invertible.

We have seen that some operators on a real vector space fail to haveeigenvalues, but now we show that this cannot happen with self-adjoint op-erators.

Lemma 9.1.4. Let T be a self-adjoint linear operator on the real vector spaceV . Then T has an eigenvalue.

Proof. Suppose n = dim(V ) and choose v ∈ V with ~0 6= v. Then

(v, T (v), . . . , T n(v))

must be linearly dependent. Hence there exist real numbers a0, . . . , an, notall 0, such that

~0 = a0v + a1T (v) + · · ·+ anTn(v).

Construct the polynomial f(x) = a0 + a1x+ a2x2 + · · ·+ anx

n which canbe written in factored form as

f(x) = c(x2 + α1x+ β1) · · · (x2 + αkx+ βk)(x− λ1 · · · (x− λm),

where c is a nonzero real number, each αj, βj and λj is real, each α2j < 4βj,

m+ k ≥ 1, and the equation holds for all real x. Then we have

0 = a0v + a1T (v) + · · ·+ anTn(v)

= (a0I + a1T + · · ·+ anTn)(v)

= c(T 2 + α1T + β1I) · · · (T 2 + αkT + βk)(T − λ1I) · · · (T − λmI)(v).


Each T 2 +αjT+βjI is invertible by Lemma 9.1.3 because T is self-adjointand each α2

j < 4βj. Also c 6= 0. Hence the equation above implies that

0 = (T − λ1I) · · · (T − λmI)(v).

It then follows that T −λjI is not injective for at least one j. This says thatT has an eigenvalue.

The next theorem is very important for operators on real inner productspaces.

Theorem 9.1.5. The Real Spectral Theorem: Let T be an operator onthe real inner product space V . Then V has an orthonormal basis consistingof eigenvectors of T if and only if T is self-adjoint.

Proof. First suppose that V has an orthonormal basis B consisting of eigen-vectors of T . Then [T ]B is a real diagonal matrix, so it equals its conjugatetranspose, i.e., T is self-adjoint.

For the converse, suppose that T is self-adjoint. Our proof is by inductionon n = dim(V ). The desired result clearly holds if n = 1. So assume thatdim(V ) = n > 1 and that the desired result holds on vector spaces of smallerdimension. By Lemma 9.1.4 we know that T has an eigenvalue λ with anonzero eigenvector u, and without loss of generality we may assume that||u|| = 1. Let U = span(u). Suppose v ∈ U⊥, i.e. 〈u, v〉 = 0. Then

〈u, T (v)〉 = 〈T (u), v〉 = λ〈u, v〉 = 0,

so T (v) ∈ U⊥ whenever u ∈ U⊥, showing that U⊥ is T -invariant. Hence themap S = T |U⊥ ∈ L(U⊥). If v, w ∈ U⊥, then

〈S(v), w〉 = 〈T (v), w〉 = 〈v, T (w)〉 = 〈v, S(w)〉,

which shows that S is self-adjoint. Thus by the induction hypothesis thereis an orthonormal basis of U⊥ consisting of eigenvectors of S. Clearly everyeigenvector of S is an eigenvector of T . Thus adjoining u to an orthonormalbasis of U⊥ consisting of eigenvectors of S gives an orthonormal basis of Vconsisting of eigenvectors of T , as desired.

Corollary 9.1.6. Let A be a real n×n matrix. Then there is an orthogonalmatrix P such that P−1AP is a (necessarily real) diagonal matrix if and onlyif A is symmetric.

9.2. NORMAL OPERATORS 163

Corollary 9.1.7. Let A be a real symmetric matrix with distinct (necessarilyreal) eigenvalues λ1, . . . , λm. Then

V = null(T − λ1I)⊕ · · · ⊕ null(T − λmI).

9.2 Normal Operators

Definition An operator T ∈ L(V ) is called normal provided TT ∗ = T ∗T .Clearly any self-adjoint operator is normal, but there are many normal op-erators in general that are not self-adjoint. For example, if A is an n × nnonzero real matrix with AT = −A (i.e., A is skew-symmetric) , then A 6= A∗

but A is a normal matrix because AA∗ = A∗A. It follows that if T ∈ L(Rn)is the operator with [T ]S = A where S is the standard basis of Rn, then Tis normal but not self-adjoint.

Recall Theorem 7.1.2 (A list of nonzero eigenvectors belonging to distincteigenvalues must be linearly independent.)

Theorem 9.2.1. Let T ∈ L(V ). Then ||T (v)|| = ||T ∗(v)|| ∀v ∈ V iff T isnormal.

Proof.

T is normal ⇐⇒ T ∗T − TT ∗ = 0

⇐⇒ 〈(T ∗T − TT ∗)(v), v〉 = 0 ∀v ∈ V⇐⇒ 〈T ∗T (v), v〉 = 〈TT ∗(v), v〉 ∀v ∈ V⇐⇒ ||T (v)||2 = ||T ∗(v)||2 ∀v ∈ V. (9.2)

Since T ∗T −TT ∗ is self-adjoint, the theorem follows from Theorems 9.1.1part (a) and 9.1.2.

Corollary 9.2.2. Let T ∈ L(V ) be normal. Then(a) If v ∈ V is an eigenvector of T with eigenvalue λ ∈ F , then v is also

an eigenvector of T ∗ with eigenvalue λ.(b) Eigenvectors of T corresponding to distinct eigenvalues are orthogonal.

Proof. Note that (T − λI)∗ = T ∗ − λI, and that T is normal if and only ifT − λI is normal. Suppose T (v) = λv. Since T is normal, we have

0 = ||(T − λI)(v)|| = 〈v, (T ∗ − λI)(T − λI)(v)〉 = ||(T ∗ − λI)(v)||.


Part (a) follows.For part (b), suppose λ and µ are distinct eigenvalues with associated

eigenvectors u and v, respectively. So T (u) = λu and T (v) = µv, and frompart (a), T ∗(v) = µv. Hence

(λ− µ)〈u, v〉 = 〈λu, v〉 − 〈u, µv〉= 〈T (u), v〉 − 〈u, T ∗(v)〉= 0.

Because λ 6= µ, the above equation implies that 〈u, v〉 = 0.

The next theorem is one of the truly important results from the theory ofcomplex inner product spaces. Be sure to compare it with the Real SpectralTheorem.

Theorem 9.2.3. Complex Spectral Theorem Let V be a finite dimen-sional complex inner product space and T ∈ L(V ). Then T is normal if andonly if V has an orthonormal basis consisting of eigenvectors of T .

Proof. First suppose that V has an orthonormal basis B consisting of eigen-vectors of T , so that [T ]B = A is a diagonal matrix. Then A∗ is also diagonaland is the matrix A∗ = [T ∗]B. Since any two diagonal matrices commute,AA∗ = A∗A. This implies that T is normal.

For the converse, suppose that T is normal. Since V is a complex vectorspace, we know (by Schur’s Theorem) that there is an orthonormal basisB = (e1, . . . , en) for which A = [T ]B is upper triangular. If A = (aij), thenaij = 0 whenever i > j. Also T (e1) = a11e1, so

||T (e1)||2 = |a11|2

||T ∗(e1)||2 = |a11|2 + |a12|2 + · · ·+ |a1n|2.Because T is normal, ||T (e1)|| = ||T ∗(e1)||. So the two equations above implythat all entries in the first row of A, except possibly the diagonal entry a11,equal 0. It now follows that T (e2) = a12e1 + a22e2 = a22e2, so

||T (e2)||2 = |a22|2,

||T ∗(e2)||2 = |a22|2 + |a23|2 + · · ·+ |a2n|2.Because T is normal, ||T (e2)|| = ||T ∗(e2)||. Thus the two equations justabove imply that all the entries in the second row of A, except possibly thediagonal entry a22, must equal 0. Continuing in this fashion we see that allthe nondiagonal entries of A equal 0, i.e., A is diagonal.

9.2. NORMAL OPERATORS 165

Corollary 9.2.4. Let A be a normal, n × n complex matrix. Then there isa unitary matrix P such that

P−1AP is diagonal.

It follows that the minimal polynomial of A has no repeated roots.

Theorem 9.2.5. Let A be n × n. Then A is normal if and only if theeigenspaces of AA∗ are A-invariant.

Proof. First suppose that A is normal, so AA∗ = A∗A, and suppose thatAA∗~x = λ~x. We show that A~x also belongs to λ for AA∗ : AA∗(A~x) =A(AA∗~x) = A · λ~x = λ(A~x). For the converse, suppose the eigenspaces ofAA∗ are A-invariant. We want to show that A is normal. We start with theeasy case.

Lemma 9.2.6. Suppose BB∗ = diag(λ1, . . . , λk, 0 . . . , 0) is a diagonal ma-trix, and suppose that the eigenspaces of BB∗ are B-invariant. Then B isnormal.

Proof of Lemma: ~u = (0, . . . , 0, uk+1, . . . , un)T is a typical element of thenull space of BB∗, i.e., the eigenspace belonging to the eigenvalue 0. Firstnote that the bottom n−k rows ofB must be zero, since the (i, i) entry ofBB∗

is the inner product of the ith row ofB with itself, which must be 0 if i ≥ k+1.So by hypothesis B(0, . . . , 0, uk+1, . . . , un)T = (0, . . . , 0, vk+1, . . . , vn)T . Sincethe top k entries of B(0, . . . , 0, uk+1, . . . , un)T must be zero, the entries in thetop k rows and last n− k columns must be zero, so

B =

(B1 00 0

),

where B1 is k × k with rank k.For 1 ≤ i ≤ k, the standard basis vector ~ei is an eigenvector of BB∗

belonging to λi. And by hypothesis, BB∗ · B~ei = λiB~ei. So B · BB∗~ei =B · λi~ei, implying that [B2B∗ −BB∗B]~ei = ~0. But this latter equality isalso seen to hold for k + 1 ≤ i ≤ n. Hence B2B∗ = BB∗B. Now usingblock multiplication, B2

1B∗1 = B1B

∗1B1, implying that B1B

∗1 = B∗1B1. But

this implies that BB∗ = B∗B. So B is normal, proving the lemma.Now return to the general case. Let ~u1, . . . , ~un be an orthonormal basis

of eigenvectors of AA∗. Use these vectors as the columns of a matrix U , sothat U∗(AA∗)U = diag(λ1, . . . , λk, 0 = λk+1, . . . , 0 = λn), where 0 6= λ1 · · ·λk


and k = rank(A). Our hypothesis says that if AA∗~uj = λj~uj, then AA∗ ·A~uj = λjA~uj. Put B = U∗AU . So BB∗ = U∗AUU∗A∗U = U∗(AA∗)U =diag(λ1, . . . , λn). Compute BB∗(U∗~uj) = U∗AA∗~uj = U∗ · λj~uj = λjU

∗~uj.So U∗~uj = ~ej is an eigenvector of BB∗ belonging to λj.

Also (BB∗)BU∗~uj = U∗AUU∗A∗UU∗AUU∗~uj = U∗AA∗A~uj = U∗ ·λjA~uj = λj · U∗AUU∗~uj = λjB(U∗~uj). So the eigenspaces of BB∗ areB-invariant. Since BB∗ is diagonal, by the Lemma we know B is normal.But now it follows easily that A = UBU∗ must be normal

9.3 Decomposition of Real Normal Operators

Throughout this section V is a real inner product space.

Lemma 9.3.1. Suppose dim(V ) = 2 and T ∈ L(V ). Then the following areequivalent:

(a) T is normal but not self-adjoint.(b) The matrix of T with respect to every orthonormal basis of V has the

form

(a −bb a

), with b 6= 0.

(c) The matrix of T with respect to some orthonormal basis of V has the

form

(a −bb a

), with b > 0.

Proof. First suppose (a) holds and let S = (e1, e2) be an orthonormal basisof V . Suppose

[T ]S =

(a cb d

).

Then ||T (e1)||2 = a2 + b2 and ||T ∗(e1)||2 = a2 + c2. Because T is normal,by Theorem 9.2.1 ||T (e1)|| = ||T ∗(e1)||. Hence b2 = c2. Since T is not

self-adjoint, b 6= c, so we have c = −b. Then [T ∗]S =

(a b−b d

). So

[TT ∗]S =

(a2 + b2 ab− bdab− bd b2 + d2

), and [T ∗T ]S =

(a2 + b2 −ab+ bd−ab+ bd b2 + d2

).

Since T is normal it follows that b(a − d) = 0. Since T is not self-adjoint,b 6= 0, implying that a = d, completing the proof that (a) implies (b).

Now suppose that (b) holds and let B = (e1, e2) be any orthonormal basisof V . Then either B or B′ = (e1,−e2) will be a basis of the type needed toshow that (c) is satisfied.

9.3. DECOMPOSITION OF REAL NORMAL OPERATORS 167

Finally, suppose that (c) holds, i.e., there is an orthonormal basis B =(e1, e2) such that [T ]B has the form given in (c). Clearly T 6= T ∗, but asimple computation with the matrices representing T and T ∗ shows thatTT ∗ = T ∗T , i.e., T is normal, implying that (a) holds.

Theorem 9.3.2. Suppose that T ∈ L(V ) is normal and U is a T -invariantsubspace. Then

(a) U⊥ is T -invariant.(b) U is T ∗-invariant.(c) (T |U)∗ = (T ∗)|U .(d) T |U is a normal operator on U .(e) T |U⊥ is a normal operator on U⊥.

Proof. Let B′ = (e1, . . . , em) be an orthonormal basis of U and extend it to anorthonormal basis B = (e1, . . . , em, f1, . . . , fn) of V . Since U is T -invariant,

[T ]B =

(A B0 C

), where A = [T |U ]B′ .

For each j, 1 ≤ j ≤ m, ||T (ej)||2 equals the sum of the squares of theabsolute values of the entries in the jth column of A. Hence

m∑j=1

||T (ej)||2 =the sum of the squares of the absolute

values of the entries of A.(9.3)

For each j, 1 ≤ j ≤ m, ||T ∗(ej)||2 equals the sum of the squares of theabsolute values of the entries in the jth rows of A and B. Hence

m∑j=1

||T ∗(ej)||2 =the sum of the squares of the absolute

values of the entries of A and B.(9.4)

Because T is normal, ||T (ej)|| = ||T ∗(ej)|| for each j. It follows that theentries of B must all be 0, so

[T ]B =

(A 00 C

).

This shows that U⊥ is T -invariant, proving (a).But now we see that

[T ∗]B =

(A∗ 00 C∗

),


implying that U is T ∗-invariant. This completes a proof of (b).Now let S = T |U . Fix v ∈ U . Then

〈S(u), v〉 = 〈T (u), v〉 = 〈u, T ∗(v)〉 ∀u ∈ U.

Because T ∗(v) ∈ U (by (b)), the equation above shows that S∗(v) = T ∗(v),i.e., (T |U)∗ = (T ∗)|U , completing the proof of (c). Parts (d) and (e) nowfollow easily.

At this point the reader should review the concept of block multiplicationfor matrices partitioned into blocks of the appropriate sizes. In particular,if A and B are two block diagonal matrices each with k blocks down thediagonal, with the jth block being nj×nj, then the product AB is also blockdiagonal. Suppose that Aj, Bj are the jth blocks of A and B, respectively.Then the jth block of AB is AjBj:

A1 0 · · · 00 A2 · · · 0... · · · . . .

...0 · · · · · · Am

·

B1 0 · · · 00 B2 · · · 0... · · · . . .

...0 · · · · · · Bm

=

A1B1 0 · · · 0

0 A2B2 · · · 0... · · · . . .

...0 · · · · · · AmBm

.

We have seen the example T (x, y) = (−y, x) of an operator on R2 thatis normal but has no eigenvalues, so has no diagonal matrix. However, thefollowing theorem says that normal operators have block-diagonal matriceswith blocks of size at most 2 by 2.

Theorem 9.3.3. Suppose that V is a real inner product space and T ∈ L(V ).Then T is normal if and only if there is an orthonormal basis of V with respectto which T has a block diagonal matrix where each block is a 1-by-1 matrixor a 2-by-2 matrix of the form (

a −bb a

), (9.5)

with b > 0.

9.4. POSITIVE OPERATORS 169

Proof. First suppose that V has an orthonormal basis B for which [T ]B isblock diagonal of the type described in the theorem. Since a matrix of theform given in Eq. 9.5 commutes with its adjoint, clearly T is also normal.

For the converse, suppose that T is normal. Our proof proceeds by in-duction on the dimension n of V . For n = 1 the result is obvious. Forn = 2 if T is self-adjoint it follows from the Real Spectral Theorem; if T isnot self-adjoint, use Lemma 9.3.1. Now assume that n = dim(V ) > 2 andthat the desired result holds on vector spaces of dimension smaller than n.By Theorem 7.3.1 we may let U be a T -invariant subspace of dimension 1 ifthere is one. If there is not, then we let U be a 2-dimensional T -invariantsubspace. First, if dim(U) = 1, let e1 be a nonzero vector in U with norm1. So B′ = (e1) is an orthonormal basis of U . Clearly the matrix [T |U ]B′is 1-by-1. If dim(U) = 2, then T |U is normal (by Theorem 9.3.2), but notself-adjoint (since otherwise T |U , and hence T , would have an eigenvectorin U by Lemma 9.1.4). So we may choose an orthonormal basis of U withrespect to which the matrix of T |U has the desired form. We know thatU⊥ is T -invariant and T |U⊥ is a normal operator on U⊥. By our inductionhypothesis there is an orthonormal basis of U⊥ of the desired type. Puttingtogether the bases of U and U⊥ we obtain an orthonormal basis of V of thedesired type.

9.4 Positive Operators

In this section V is a finite dimensional inner product space. An operatorT ∈ L(V ) is said to be a positive operator provided

T = T ∗ and 〈T (v), v〉 ≥ 0 ∀v ∈ V.

Note that if V is a complex space, then having 〈T (v), v〉 ∈ R for all v ∈ Vis sufficient to force T to be self-adjoint (by Theorem 9.1.1, part (c)). So〈T (v), v〉 ≥ 0 for all v ∈ V is sufficient to force T to be positive.

There are many examples of positive operators. If P is any orthogonalprojection, then P is positive. (You should verify this!) In the proof ofLemma 9.1.3 we showed that if T ∈ L(V ) is self-adjoint and if α, β ∈ R aresuch that α2 < 4β, then T 2 + αT + βI is positive. You should think aboutthe analogy between positive operators (among all operators) and the non-negative real numbers among all complex numbers. This will be made easier


by the theorem that follows, which collects the main facts about positiveoperators.

If S, T ∈ L(V ) and S2 = T , we say that S is a square root of T .

Theorem 9.4.1. Let T ∈ L(V ). Then the following are equivalent:(a) T is positive;(b) T is self-adjoint and all the eigenvalues of T are nonnegative:(c) T has a positive square root;(d) T has a self-adjoint square root;(e) There exists an operator S ∈ L(V ) such that T = S∗S.

Proof. We will prove the (a) =⇒ (b) =⇒ (c) =⇒ (d) =⇒ (e) =⇒ (a).Suppose that (a) holds, i.e., T is positive, so in particular T is self-adjoint.Let v be a nonzero eigenvector belonging to the eigenvalue λ. Then

0 ≤ 〈T (v), v〉 = 〈λv, v〉 = λ〈v, v〉,

implying that λ is a nonnegative number, so (b) holds.Now suppose that (b) holds, so T is self-adjoint and all the eigenvalues

of T are nonnegative. By the Real and Complex Spectral Theorems, there isan orthonormal basis B = (e1, . . . , en) of V consisting of eigenvectors of T .Say T (ei) = λjei, where each λi ≥ 0. Define S ∈ L(B) by

S(ej) =√λjej, 1 ≤ j ≤ n.

Since S has a real diagonal matrix with respect to the basis B, clearly S is self-adjoint. Now suppose v =

∑nj=1 ajej. Then 〈S(v), v〉 =

∑nj=1

√λj|aj|2 ≥ 0,

so S is a positive square root of T . This shows (b) =⇒ (c).Clearly (c) implies (d), since by definition every positive operator is self-

adjoint. So suppose (d) holds and let S be a self-adjoint operator withT = S2. Since S is self-adjoint, T = S∗S, proving that (e) holds.

Finally, suppose that T = S∗S for some S ∈ L(V ). It is easy to checkthat T ∗ = T , and then 〈T (v), v〉 = 〈S∗S(v), v〉 = 〈(S(v), S(v)〉 ≥ 0, showingthat (e) implies (a).

An operator can have many square roots. For example, in addition to ±Ibeing square roots of I, for each a ∈ F and for each nonzero b ∈ F , we havethat

A =

(a b

1−a2

b−a

)satisfies A2 = I.

However, things are different if we restrict our attention to positive operators.

9.5. ISOMETRIES 171

Theorem 9.4.2. Each positive operator on V has a unique positive squareroot.

Proof. Let T ∈ L(V ) be positive with nonnegative distinct eigenvaluesλ1, . . . , λm. Since T is self-adjoint, we know by Theorem 7.2.8 (and theSpectral Theorems) that

V = null(T − λ1I)⊕ · · · ⊕ null(T − λmI).

By the preceding theorem we know that T has a positive square root S.Suppose α is an eigenvalue of S. If v ∈ null(S − αI). Then T (v) = S2(v) =α2v, so α2 is some eigenvalue of T , i.e., α =

√λi for some i. Clearly

null(S −√λj) ⊆ null(T − λjI).

Since the only possible eigenvalues of S are√λ1, . . . ,

√λm, and because

S is self-adjoint, we also know that

V = null(S −√λ1I)⊕ · · · ⊕ null(S −

√λmI).

A dimension argument then shows that

null(S −√λjI) = null(T − λjI)

for each j. In other words, on null(T−λjI), the operator S is just multiplica-tion by

√λj. Thus S, the positive square root of T , is uniquely determined

by T .

9.5 Isometries

An operator S ∈ L(V ) is called an isometry provided

||S(v)|| = ||v|| ∀v ∈ V.

For example, λI is an isometry whenever λ ∈ F satisfies |λ| = 1. Moregenerally, suppose that λ1, . . . , λn are scalars with absolute value 1 and S ∈L(V ) satisfies S(ej) = λjej for some orthonormal basis B = (e1, . . . , en) ofV . For v ∈ V we have

v = 〈v, e1〉e1 + · · ·+ 〈v, en〉en (9.6)


and (using the Pythagorean theorem)

||v||2 = |〈v, e1〉|2 + · · ·+ |〈v, en〉|2. (9.7)

Apply S to both sides of Eq. s9.6:

S(v) = λ1〈v, e1〉e1 + · · ·+ λn〈v, en〉en.

This last equation along with |λj| = 1 shows that

||S(v)||2 = |〈v, e1〉|2 + · · ·+ |〈v, en〉|2. (9.8)

If we compare Eqs. 9.7 and 9.8, we see that ||v|| = ||S(v)||, i.e., S is anisometry. In fact, this is the prototypical isometry, as we shall see. The nexttheorem collects the main results concerning isometries.

Theorem 9.5.1. Suppose V is an n-dimensional inner product space andS ∈ L(V ). Then the following are equivalent:

(a) S is an isometry;(b) 〈S(u), S(v)〉 = 〈u, v〉 ∀u, v ∈ V ;(c) S∗S = I;(d) (S(e1), . . . , S(en)) is an orthonormal basis of V whenever (e1, . . . , en)

is an orthonormal basis of V ;(e) there exists an orthonormal basis (e1, . . . , en) of V for which

(S(e1), . . . , S(en)) is orthonormal;(f) S∗ is an isometry;(g) 〈S∗(u), S∗(v)〉 = 〈u, v〉 ∀u, v ∈ V ;(h) SS∗ = I;(i) 〈S∗(e1), . . . , S∗(en)〉 is orthonormal whenever (e1, . . . , en) is an or-

thonormal list of vectors in V ;(j) there exists an orthonormal basis (e1, . . . , en) of V for which

(S∗(e1), . . . , S∗(en)) is orthonormal.

Proof. To start, suppose S is an isometry. If V is a real inner-product space,then for all u, v ∈ V , using the real polarization identity we have

〈S(u), S(v)〉 =||S(u) + S(v)||2 − ||S(u)− S(v)||2

4

=||S(u+ v)||2 − ||S(u− v)||2

4

=||u+ v||2 − ||u− v||2

4= 〈u, v〉.

9.5. ISOMETRIES 173

If V is a complex inner product space, use the complex polarization iden-tity in the same fashion. In either case we see that (a) implies (b).

Now suppose that (b) holds. Then

〈(S∗S − I)(u), v〉 = 〈S(u), S(v)〉 − 〈u, v〉= 0

for every u, v ∈ V . In particular, if v = (S∗S − I)(u), then necessarily(S∗S − I)(u) = 0 for all u ∈ V , forcing S∗S = I. Hence (b) implies (c).

Suppose that (c) holds and let (e1, . . . , en) be an orthonormal list of vec-tors in V . Then

〈S(ej), S(ek)〉 = 〈S∗S(ej), ek〉 = 〈ej, ek〉.

Hence (S(e1), . . . , S(en)) is orthonormal, proving that (c) implies (d).

Clearly (d) implies (e).

Suppose that (e1, . . . , en) is a basis of V for which (S(e1), . . . , S(en)) isorthonormal. For v ∈ V ,

||S(v)||2 = ||S(n∑i=1

〈v, ei〉ei||2

=n∑i=1

||〈v, ei〉S(ei)||2

=n∑i=1

|〈v, ei〉|2

= ||v||2.

Taking square roots we see that (e) implies (a).

We have now shown that (a) through (e) are equivalent. Hence replacingS by S∗ we have that (f) through (j) are equivalent. Clearly (c) and (h) areequivalent, so the proof of the theorem is complete.

The preceding theorem shows that an isometry is necessarily normal (seeparts (a), (c) and (h)). Using the characterization of normal operators provedearlier we can now give a complete description of all isometries. But as usual,there are separate statements for the real and the complex cases.


Theorem 9.5.2. Let V be a (finite dimensional) complex inner product spaceand let S ∈ L(V ). Then S is an isometry if and only if there is an orthonor-mal basis of V consisting of eigenvectors of S all of whose correspondingeigenvalues have absolute value 1.

Proof. The example given at the beginning of this section shows that thecondition given in the theorem is sufficient for S to be an isometry. For theconverse, suppose that S ∈ L(V ) is an isometry. By the complex spectraltheorem there is an orthonormal basis (e1, . . . , en) of V consisting of eigen-vectors of S. For 1 ≤ j ≤ n, let λj be the eigenvalue corresponding to ej.Then

|λj| = ||λjej|| = ||S(ej)|| = ||ej|| = 1.

Hence each eigenvalue of S has absolute value 1, completing the proof.

The next result states that every isometry on a real inner product spaceis the direct sum of pieces that look like rotations on 2-dimensional subpaces,pieces that equal the identity operator, and pieces that equal multiplicationby -1. It follows that an isometry on an odd-dimensional real inner productspace must have 1 or -1 as an eigenvalue.

Theorem 9.5.3. Suppose that V is a real inner product space and S ∈ L(V ).Then S is an isometry if and only if there is an orthonormal basis of Vwith respect to which S has a block diagonal matrix where each block on thediagonal is a 1-by-1 matrix containing 1 or -1, or a 2-by-2 matrix of the form(

cos(θ) −sin(θ)sin(θ) cos(θ)

), with θ ∈ (0, π). (9.9)

Proof. First suppose that S is an isometry. Because S is normal, there is anorthonormal basis of V such that with respect to this basis S has a blockdiagonal matrix, where each block is a 1-by-1 matrix or a 2-by-2 matrix ofthe form (

a −bb a

), with b > 0. (9.10)

If λ is an entry in a 1-by-1 block along the diagonal of the matrix of S (withrespect to the basis just mentioned), then there is a basis vector ej such thatS(ej) = λej. Because S is an isometry, this implies that |λ| = 1 with λ real,forcing λ = ±1.

9.6. THE POLAR DECOMPOSITION 175

Now consider a 2-by-2 matrix of the form in Eq. 9.10 along the diagonalof the matrix of S. There are basis vectors ej, ej+1 such that

S(ej) = aej + bej+1.

Thus1 = ||ej||2 = ||S(ej)||2 = a2 + b2.

This equation, along with the condition that b > 0, implies that there existsa number θ ∈ (0, π) such that a = cos(θ) and b = sin(θ). Thus the matrixin Eq. 9.10 has the required form. This completes the proof in one direction.

Conversely, suppose that there is an orthonormal basis of V with respectto which the matrix of S has the form stated in the theorem. There there isa direct sum decomposition

V = U1 ⊕ · · · ⊕ Um,

where each subspace Uj is a subspace of V having dimension 1 or 2. Fur-thermore, any two vectors belonging to distinct U ’s are orthogonal, and eachS|Uj is an isometry mapping Uj into Uj. If v ∈ V , write

v =m∑i=1

ui, uj ∈ Uj.

Applying S to this equation and taking norms gives

||S(v)||2 = ||S(u1) + · · ·+ S(um)||2

= ||S(u1)||2 + · · ·+ ||S(um)||2

= ||u1||2 + · · ·+ ||um||2

= ||v||2.

This shows that S is an isometry, as desired.

9.6 The Polar Decomposition

Theorem 9.6.1. Polar Decomposition If T ∈ L(V ), then there exists anisometry S ∈ L(V ) such that

T = S ◦√T ∗T .


Proof. Let T ∈ L(V ). Then for each v ∈ V , ||T (v)||2 = 〈T (v), T (v)〉 =〈T ∗T (v), v〉 = 〈

√T ∗T (v),

√T ∗T (v)〉 = ||

√T ∗T (v)||2.

So we have established

||T (v)|| = ||√T ∗T (v)|| ∀v ∈ V. (9.11)

The next step is to construct a map

S1 : Im(√T ∗T )→ Im(T ) :

√T ∗T (v) 7→ T (v),

and show that it is a well-defined isometry.

||T (v1)−T (v2)|| = ||T (v1−v2)|| = ||√T ∗T (v1−v2)|| = ||

√T ∗T (v1)−

√T ∗T (v2)||.

This shows that T (v1) = T (v2) if and only if√T ∗T (v1) =

√T ∗T (v2). In

fact it shows that S1 is well-defined and is a bijection from Im(√T ∗T ) onto

Im(T ). One consequence of this is that

dim(Im(√T ∗T )) = dim(Im(T )) and

dim(Im(√T ∗T ))⊥ = dim(Im(T ))⊥.

It is also easy to check that S1 is linear. Moreover, if v =√T ∗T (u), then

||S1(v)|| = ||T (u)|| = ||√T ∗T (u)|| = ||v||, implying that S1 is an isometry.

Now construct an orthonormal basis (e1, . . . , em) of (Im(√T ∗T )⊥ and an

orthonormal basis (f1, . . . , fm) of (Im(T ))⊥. Define

S2 : (Im(√T ∗T )⊥ → (Im(T ))⊥

by: S2(ej) = fj (and extend linearly). It follows that ||S2(w)|| = ||w|| for allw ∈ (Im(

√T ∗T )⊥. Here S2 is an isometry by part (e) of Theorem 9.5.1.

(Notation: If U and W are independent, orthogonal subspaces of V soU +W = U ⊕W and U ⊆ W⊥, we write U ⊥ W in place of U ⊕W .)

We know that

V = Im(√T ∗T ) ⊥ (Im(

√T ∗T ))⊥.

For v ∈ V , write v = u + w with u ∈ Im(√T ∗T ), w ∈ (Im(

√T ∗T ))⊥.

Define S : V → V by S(v) = S1(u) + S2(w). It is easy to check thatS ∈ L(V ). Moreover, ||S(v)||2 = ||S1(u)+S2(w)||2 = ||S1(u)||2 + ||S2(w)||2 =||u||2 + ||w||2 = ||v||2, implying S is an isometry. (Here we used the fact thatS1(u) ∈ Im(T ) and S2(w) ∈ (Im(T ))⊥.) The only thing left to check is thatT = S ◦

√T ∗T , but this is obvious by the way S1 is defined on the image of√

T ∗T .

9.7. THE SINGULAR-VALUE DECOMPOSITION 177

We can visualize this proof as follows. Start with T ∈ L(V ).

Im(√T ∗T ) ⊥

(Im(√T ∗T )

)⊥e1, . . . , em

√T ∗T (v) ej

S1 ↓ S2 ↓ ↓

T (v) fj

Im(T ) ⊥ (Im(T ))⊥ f1, . . . , fm

S = S1 ⊕ S2 =⇒ S is an isometry.

And

Polar Decomposition: T = S ◦√T ∗T .

If T is invertible, then S = T ◦ (√T ∗T )−1 is unique. If T is not invertible,

then Im(T ) 6= V , so S2 6= −S2. Hence S ′ = S1 ⊕ −S2 yields s a polardecomposition of T distinct from that given by S ′.

The polar decomposition states that each operator on V can be writtenas the product of an isometry and a positive operator. Thus we can writeeach operator on V as the product of two operators, each of which is of atype that we have completely described and understand reasonably well. Weknow there is an orthonormal basis of V with respect to which the isometryS has a diagonal matrix (if F = C) or a block diagonal matrix with blocks ofsize at most 2-by-2 (if F = R), and there is an orthonormal basis of V withrespect to which

√T ∗T has a diagonal matrix. Unfortunately, there may

not be one orthonormal basis that does both at the same time. However,we can still say something interesting. This is given by the singular valuedecomposition which is discussed in the next section.

9.7 The Singular-Value Decomposition

Statement of the General ResultLet F be a subfield of the complex number field C, and let A ∈Mm,n(F ).

Clearly A∗A is self-adjoint, so there is an orthonormal basis (v1, . . . , vn) forF n (whose elements we view as column vectors) consisting of eigenvectorsof A∗A with associated eigenvalues λ1, . . . , λn. Let 〈 , 〉 denote the usual


inner product on F n. Since ||Avi||2 = 〈Avi, Avi〉 = 〈A∗Avi, vi〉 = 〈λivi, vi〉 =λi||vi||2 = λi, we have proved the following lemma.

Lemma 9.7.1. With the notation as above,(i) Each eigenvalue λi of A∗A is real and nonnegative. Hence WLOG we

may assume that λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.(ii) With si =

√λi for 1 ≤ i ≤ n, we say that the si, 1 ≤ i ≤ n, are the

singular values of A. Hence the singular values of A are the lengths of thevectors Av1, . . . , Avn.

This is enough for us to state the theorem concerning the Singular ValueDecomposition of A.

Theorem 9.7.2. Given A ∈ Mm,n(F ) as above, there are unitary matricesU ∈Mm(F ) and V ∈Mn(F ) such that

A = UΣV ∗, where Σ =

s1 0

. . .

sr00

(9.12)

is a diagonal m × n matrix and s1 ≥ s2 ≥ · · · ≥ sr are the positive (i.e.,nonzero) singular values of A.

(i) The columns of U are eigenvectors of AA∗, the first r columns of Uform an orthonormal basis for the column space col(A) of A, and the lastm− r columns of U form an orthonormal basis for the (right) null space ofthe matrix A∗.

(ii) The columns of V are eigenvectors of A∗A, the last n− r columns ofV form an orthonormal basis for the (right) null space null(A) of A, and thefirst r columns of V form an orthonormal basis for the column space col(A∗).

(iii) The rank of A is r.

Proof. Start by supposing that λ1 ≥ λ2 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0.Since ||Avi|| =

√λi = si, clearly Avi 6= 0 iff 1 ≤ i ≤ r. Also, if i 6= j, then

〈Avi, Avj〉 = 〈A∗Avi, vj〉 = λi〈vi, vj〉 = 0. So (Av1, . . . , Avn) is an orthogonallist. Moreover, Avi 6= 0 iff 1 ≤ i ≤ r, since ||Avi|| =

√λi = si. So clearly

(Av1, . . . , Avr) is a linearly independent list that spans a subspace of col(A).On the other hand suppose that y = Ax ∈ col(A). Then x =

∑civi, so

y = Ax =∑n

i=1 ciAvi =∑r

i=1 ciAvi. It follows that (Av1, . . . Avr) is a basis


for col(A), implying that r =dim(col(A)) = rank(A). Of course, then theright null space of A has dimension n− r.

For 1 ≤ i ≤ r, put ui = Avi||Avi|| = 1

si·Avi, so that Avi = si ·ui. Now extend

(u1, . . . , ur) to an orthonormal basis (u1, . . . , um) of Fm. Let U = [u1, . . . , um]be the matrix whose jth column is uj. Similarly, put V = [v1, . . . , vn], so Uand V are unitary matrices. And

AV = [Av1, . . . , Avr, 0, . . . , 0] = [s1u1, . . . , srur, 0, . . . , 0] =

= [u1, . . . , um]

s1 0

. . .

sr

0. . .

= UΣ,

where Σ is diagonal, m×n, with the singular values of A along the diagonal.Then AV = UΣ implies A = UΣV ∗, A∗ = V Σ∗U∗, so A∗A = V Σ∗U∗ ·

UΣV ∗ = V Σ∗ΣV ∗.Then

(A∗A)V = V Σ∗ΣV ∗V = V Σ∗Σ = V

λ1

. . .

λr

= (λ1v1, . . . , λrvr, 0, · · · , 0).

Since (A∗A)vj = 0 for j > r, implying v∗jA∗Avj = 0, forcing Avj = 0,

it is clear that vr+1, . . . , vn form an orthonormal basis for null(A). Sinceλjvj = (A∗A)vj = A∗

√λjuj, we see A∗uj = sjvj. It now follows readily that

v1, . . . , vr form a basis for the column space of A∗.

Similarly, AA∗ = UΣΣ∗U∗, so that

(AA∗)U = UΣΣ∗ = (λ1u1, . . . , λrur, 0, · · · , 0).

It follows that (u1, . . . , un) consists of eigenvectors ofAA∗. Also, ur+1, . . . , umform an orthonormal basis for the (right) null space of A∗, and u1, . . . , ur forman orthonormal basis for col(A).

RecapitulationWe want to practice recognizing and/or finding a singular value decom-

position for a given m× n matrix A over the subfield F of C.


A = UΣV ∗ if and only if A∗ = V Σ∗U∗. (9.13)

Here Σ is m×n and diagonal with the nonzero singular values s1 ≥ s2 ≥. . . ,≥ sr down the main diagonal as its only nonzero entries. Similarly, Σ∗

is n × m and diagonal with s1 ≥ · · · ≥ sr down the main diagonal as itsonly nonzero entries. So the nonzero eigenvalues of A∗A are identical to thenonzero eigenvalues of AA∗. The only difference is the multiplicity of 0 asan eigenvalue.

For 1 ≤ i ≤ r, Avi = siui and A∗ui = sivi. (9.14)

(v1, . . . , vr, vr+1, . . . , vn) is an orthonormal basis of eigenvectors of A∗A.(9.15)

(u1, . . . , ur, ur+1, . . . , um) is an orthonormal basis of eigenvectors of AA∗.(9.16)

(v1, . . . , vr) is a basis for col(A∗) and (vr+1, . . . , vn) is a basis for null(A).(9.17)

(u1, . . . , ur) is a basis for col(A) and (ur+1, . . . , um) is a basis for null(A∗).(9.18)

We can first find (v1, . . . , vn) as an orthonormal basis of eigenvectors ofA∗A, put ui = 1

si· Avi, for 1 ≤ i ≤ r, and then complete (u1, . . . , ur) to an

orthonormal basis (u1, . . . , um) of eigenvectors of AA∗. Sometimes here it isefficient to use the fact that (ur+1, . . . , um) form a basis for the null space ofAA∗ or of A∗. If n < m, this is the approach usually taken.


Alternatively, we can find (u1, . . . , um) as an orthonormal basis of eigen-vectors of AA∗ (with eigenvalues ordered from largest to smallest), putvi = 1

siA∗ui for 1 ≤ i ≤ r, and then complete (v1, . . . , vr) to an orthonormal

basis (v1, . . . , vn) of eigenvectors of A∗A. If m < n, this is the approachusually taken.

9.7.3 Two Examples

Problem 1. Find the Singular Value Decomposition of A =

1 −1−2 22 −2

.

Solution: A∗A = ATA =

(9 −9−9 9

). In this case it is easy to find

an orthonormal basis of R2 consisting of eigenvectors of A∗A. Put v1 =(− 1√

21√2

), v2 =

(1√2

1√2

), V = [v1, v2].

Then A∗A[v1, v2] = [18v1, 0 · v2]. So put u1 = Av1||Av1|| = 1

3√

2

−√

2

2√

2

−2√

2

= −13

23

−23

= the first column of U .

It is now easy to see that w1 =

210

, w2 =

−201

form a basis of

u1⊥. We apply Gram-Schmidt to (w1, w2) to obtain

u2 =

2√5

1√5

0

, u3 =

−2√

454√455√45

.

Now put U = [u1, u2, u3].It follows that A = UΣV ∗ is one singular value decomposition of A, where

Σ =

3√

2 00 00 0

.


Problem 2. Compute a singular value decomposition ofA =

(1 0 i0 1 −i

).

Solution: In this case AA∗ is 2× 2, while A∗A is 3× 3, so we start with

AA∗ =

(2 −1−1 2

). Here AA∗ has eigenvalues λ1 = 3 and λ2 = 1. So the

singular values of A∗ (and hence of A) are s1 =√

3 and s2 = 1. It follows

that Σ =

( √3 0 0

0 1 0

). Also Σ∗ =

√3 00 10 0

.

In this case we choose to compute an orthonormal basis (u1, u2) of eigen-

vectors of AA∗. It is simple to check that AA∗− 3I =

(−1 −1−1 −1

), and we

may take u1 =

(1√2−1√

2

). Similarly, AA∗ − I =

(1 −1−1 1

), and we may

take u2 =

(1√2

1√2

). Then U = [u1, u2]. At this point we must put

v1 =1

s1

A∗u1 =1√3

1 00 1−i i

( 1√2−1√

2

)=

1√6−1√

6−2i√

6

,

and

v2 = 1 · A∗u2 =

1 00 1−i i

( 1√2

1√2

)=

1√2

1√2

0

.

At this point we know that (v3) must be an orthonormal basis for{span(v1, v2)}⊥. So we want v3 = (x, y, z)T with 0 = 〈(1,−1,−2i), (x, y, z)〉 =x − y − 2iz, and 0 = 〈(1, 1, 0), (x, y, z)〉 = x + y. It follows easily that(x, y, z) = (iz,−iz, z), where we must choose z so that the norm of thisvector is 1. If we put z = −i√

3, i.e., z = i√

3, then (x, y, z) = ( 1√

3, −1√

3, i√

3).

Then

A =

(1√2

1√2

− 1√2

1√2

)( √3 0 0

0 1 0

)1√6− 1√

62i√

61√2

1√2

01√3− 1√

3− i√

3

,

which is a singular value decomposition of A.


THE REMAINDER OF THIS SECTION MAY BE CONSIDERED TOBE STARRED.

Definition If λ is an eigenvalue of the matrix A, then dim(null(T −λI))is called the geometric multiplicity of the eigenvalue λ.

Theorem 9.7.4. Let M be m2 ×m1 and N be m1 ×m2. Put

A =

(0 NM 0

).

Then the following are equivalent:(i) λ 6= 0 is an eigenvalue of A with (geometric) multiplicity f .

(ii) −λ 6= 0 is an eigenvalue of A with (geometric) multiplicity f .

(iii) λ2 6= 0 is an eigenvalue of MN with (geometric) multiplicity f .

(iv) λ2 6= 0 is an eigenvalue of NM with (geometric) multiplicity f .

Proof. Step 1. Show (i) ↔ (ii) Let AU = λU for some matrix U of rank

f . Write U =

(U1

U2

), and put U =

(U1

−U2

), where Ui has mi rows for

i = 1, 2. Then AU = λU becomes(0 NM 0

)(U1

U2

)=

(NU2

MU1

)= λ

(U1

U2

),

so NU2 = λU1 and MU1 = λU2.

This implies AU =

(0 NM 0

)(U1

−U2

)=

(−λU1

λU2

)= −λU . Since

rank(U) = rank(U), the first equivalence follows.

Step 2. Show (iii) ↔ (iv). Let MNU ′ = λ2U ′ for some matrix U ′ ofrank f . Then (NM)(NU ′) = λ2NU ′, and rank(NU ′) = rank(U ′), sincerank(λ2U ′) = rank(MNU ′) ≤ rank(U ′), and λ 6= 0. So NM has λ2 aseigenvalue with geometric multiplicity at least f . Interchanging roles of Nand M proves (iii) ↔ (iv).

Step 3. Show (i) ↔ (iii) Let AU = λU with U having rank f , and

U =

(U1

−U2

)as in Step 1. Put n = m1 +m2. Then A2

(U ; U

)= λ2

(U ; U

).

since NU2 = λU1 and MU1 = λU2, U1 and U2 have the same row space. Sorow(U) = row(U1) = row(U2). This implies rank(U1) = rank(U2) = f .


Using column operations we can transform

(U1 U1

U2 −U2

)into

(U1 00 U2

),

which has rank 2f . So λ2 is an eigenvalue of A2 with geometric multiplicityat least 2f .

On the other hand, the geometric multiplicity of an eigenvalue λ2 6= 0 ofA2 equals n − rank(A2 − λ2I) = n − rank((A − λI)(A + λI)) ≤ n + n −rank(A− λI)− rank(A+ λI) = 2f .

Note: The singular values of a complex matrix N are sometimes de-

fined to be the positive eigenvalues of

(0 NN∗ 0

). By the above result

we see that they are the same as the positive square roots of the nonzeroeigenvalues of NN∗ (or of N∗N) as we have defined them above.

9.8 Pseudoinverses and Least Squares∗

Let A be an m × n matrix over F (where F is either R or C) and supposethe rank of A is r. Let

A = UΣV ∗ be a singular value decomposition of A.

So Σ is an m× n diagonal matrix with the nonzero singular values s1 ≥s2 ≥ . . . ,≥ sr down the main diagonal as its only nonzero entries. Define Σ+

to be the n×m diagonal matrix with diagonal equal to ( 1s1, 1s2, . . . , 1

sr, 0, . . .).

Then both ΣΣ+ and Σ+Σ have the general block form(Ir 00 0

),

but the first product is m×m and the second is n× n.Definition The Moore-Penrose generalized inverse of A (sometimes just

called the pseudoinverse of A) is the n×m matrix A+ over F defined by

A+ = V Σ+U∗.

Theorem 9.8.1. Let A be an m× n matrix over F with pseudoinverse A+

(as defined above). Then the following three properties hold:(a) AA+A = A;(b) A+AA+ = A+;(c) AA+ and A+A are hermitian (i.e., self-adjoint).

9.8. PSEUDOINVERSES AND LEAST SQUARES∗ 185

Proof. All three properties are easily shown to hold using the definition ofA+. You should do this now.

Our definition of “the” Moore-Penrose generalized inverse would not bevalid if it were possible for there to be more than one. However, the fol-lowing theorem shows that there is at most one (hence exactly one!) suchpseudoinverse.

Theorem 9.8.2. Given an m × n matrix A, there is at most one matrixsatisfying the three properties of A+ given in Theorem 9.8.1. This meansthat A has a unique pseudoinverse.

Proof. Let B and C be pseudoinverses of A. i.e., satisfying the three prop-erties of A+ in Theorem 9.8.1. Then

CA = C(ABA) = CA(BA)∗ = CAA∗B∗ = (A(CA)∗)∗B∗ = (ACA)∗B∗

= A∗B∗ = (BA)∗ = BA,

i.e.,

CA = BA.

Then

B = BAB = B(AB)∗ = BB∗A∗,

so that

B = BB∗(ACA)∗ = BB∗A∗(AC)∗ = BAC = CAC = C.

At this point we want to review and slightly revise the Gram-Schmidtalgorithm given earlier. Let (v1, . . . , vn) be any list of column vectors in Fm.Let these be the columns of an m × n matrix A. Let W = span(v1, . . . , vn)be the column space of A. Define u1 = v1. Then for 2 ≤ i ≤ n put

ui = vi − α1iu1 − α2iu2 − · · · − αi−1,iui−1,

where αji =〈vi,uj〉〈uj ,uj〉 , if uj 6= ~0, and αji = 0 if uj = ~0. Hence

vi = α1iu1 + α2iu2 + α3iu3 + · · ·+ αi−1,iui−1 + ui. (9.19)


Let Q0 be the m×n matrix whose columns are u1, u2, . . . , un, respectively,and let R0 be the n× n matrix given by

R0 =

1 α12 · · · α1n

0 1 · · · α2q...

......

...0 0 · · · 1

. (9.20)

The ui in Q0 are constructed by the Gram-Schmidt method and form anorthogonal set. This means that Q0 has orthogonal columns, some of whichmay be zero. Let Q and R be the matrices obtained by deleting the zerocolumns from Q0 and the corresponding rows from R0, and by dividing eachnonzero column of Q0 by its norm and multiplying each corresponding rowof R0 by that same norm. Then Eq. 9.20 becomes

A = QR with R upper triangular and Q having orthonormal columns.(9.21)

Eq. 9.21 is the normalized QR-decomposition of A. Note that if A hasrank k, then Q is m × k with rank k and R is k × n and upper triangularwith rank k. The columns of Q form an orthonormal basis for the columnspace W of A.

If we compute Q∗Q, since the columns of Q are orthonormal, we get

Q∗Q = Ik.

Since (u1, . . . , uk) is an orthonormal basis for W , if P0 ∈ L(F n) is theorthogonal projection onto W , then for v ∈ F n we have

P0(v) =k∑i=1

〈v, ui〉ui = (u1, . . . , uk)

〈v, u1〉...

〈v, uk〉

= Q ·

u∗1...u∗k

· v = QQ∗v.

So QQ∗ is the projection matrix projecting v onto W = col(Q). Hence QQ∗vis the unique vector in W closest to v.

Lemma 9.8.3. Suppose that the m × n matrix A has rank k and that A =BC, where B is m× k with rank k and C is k × n with rank k. Then

A++ = C∗(CC∗)−1(B∗B)−1B∗

is the pseudoinverse of A.

9.8. PSEUDOINVERSES AND LEAST SQUARES∗ 187

Proof. It is rather straightforward to verify that the three properties of The-orem 9.8.1 are satisfied by this A++. Then by Theorem 9.8.2 we know thata matrix A+ satisfying the three properties of Theorem 9.8.1 is uniquelydetermined by these properties.)

Corollary 9.8.4. If A = QR is a normalized QR-decomposition of A, thenA+++ = R∗(RR∗)−1Q∗ is the pseudoinverse of A.

Given that A = UΣV ∗ is a singular value decomposition of A, partitionthe two matrices as follows: U = (Uk, Um−k) and V = (Vk, Vn−k), where Ukis m × k and its columns are the first k columns of U , i.e., they form anorthonormal basis of col(A). Similarly, Vk is k × n and it columns are thefirst k columns of V , so they form an orthonormal basis for col(A∗). This isthe same as saying that the rows of V ∗k form an orthonormal basis for row(A).Now let D be the k × k diagonal matrix with the nonzero singular values of

A down the diagonal, i.e., Σ =

(D 00 0

). Now using block multiplication

we see that

A = (Uk, Um−k)

(D 00 0

)(V ∗kV ∗n−k

)= UkDkV

∗k .

This expression A = UkDkV∗k is called the reduced singular value decompo-

sition of A. Here Uk is m × k with rank k, and DkV∗k is k × n with rank

k. Then A = QR = Uk · DkV∗k where Q = Uk is m × k with rank k, and

R = DkV∗k is k × n with rank k. The columns of Q form an orthonormal

basis for col(A), and the rows of R form an orthogonal basis for row(A). Soa pseudoinverse A+is given by

A+ = R∗(RR∗)−1(Q∗Q)−1Q∗ = VkD−1k U∗k , after some computation.

Suppose we are given the equation

Ax = b, (9.22)

where A is m × n and b is m × 1. It is possible that this equation is notconsistent. In this case we want to find x so that Ax is as close to b aspossible, i.e., it should be the case that Ax is the projection b of b onto thecolumn space of A. Put

x = A+b = VkD−1k U∗k b.


Then

Ax = (UkDkV∗k )VkD

−1k U∗k b

= UkDD−1U∗k b (9.23)

= UkU∗k b,

which by the paragraph preceding Lemma 9.8.3 must be the projection b ofb onto col(A). Thus x is a “least-squares solution” to Ax = b, in the sensethat x minimizes ||Ax − b|| (which is the square root of a sum of squares).Moreover, it is true that x has the smallest length among all least-squaressolutions to Ax = b.

Theorem 9.8.5. Let A be m×n with rank k. Let A = UΣV ∗ be a normalizedsingular value decomposition of A. Let U = (Uk, Um−k) and V = (Vk, Vn−k)be partitioned as above. Then A+ = VkD

−1k U∗k , as above, and x = A+b is a

least-squares solution to Ax = b. If x0 is any other least-squares solution,i.e., ||Ax0 − b|| = ‖Ax − b||, then ||x|| ≤ ||x0||, with equality if and only ifx0 = x.

Proof. What remains to be shown is that if ||Ax0 − b|| = ‖Ax − b||, then||x|| ≤ ||x0||, with equality if and only if x0 = x. We know that ||Ax− b|| isminimized precisely when Ax = b is the projection UkU

∗k b of b onto col(A).

So suppose Ax0 = b = Ax, which implies A(x0−x) = 0, i.e., x0−x ∈ null(A).By Eq. 9.17 this means that x0 = x+Vn−kz for some z ∈ F n−k. We claim thatx = A+b = VkD

−1k U∗k b and Vn−kz are orthogonal. For, 〈VkD−1

K U∗k b, Vn−kz〉 =z∗(V ∗n−kVk)D

−1k U∗Kb = 0, because by Eq. 9.15, V ∗n−kVk = 0(n−k)×k. Then

||x0||2 = ||x+ Vn−kz||2 = ||x||2 + ||Vn−kz||2 ≥ ||x||2, with equality if and onlyif ||Vn−kz||2 = 0 which is if and only if Vn−kz = ~0 which is if and only ifx0 = x.

Theorem 9.8.6. Let A be m×n with rank k. Then A has a unique (Moore-Penrose) pseudoinverse. If k = n, A+ = (A∗A)−1A∗, and A+ is a leftinverse of A. If k = m, A+ = A∗(AA∗)−1, and A+ is a right inverse of A.If k = m = n, A+ = A−1.

Proof. We have already seen that A has a unique pseudoinverse. If k = n,A = B, C = Ik yields a factorization of the type used in Lemma 9.8.3,so A+ = (A∗A)−1A∗. Similarly, if k = m, put B = Ik, C = A. ThenA+ = A∗(AA∗)−1. And if A is invertible, so that (A∗A)−1 = A−1(A∗)−1, bythe k = n case we have A+ = (A∗A)−1A∗ = A−1(A∗)−1A∗ = A−1.

9.9. NORMS, DISTANCE AND MORE ON LEAST SQUARES∗ 189

9.9 Norms, Distance and More on Least Squares∗

In this section the norm ‖A‖ of an m× n matrix A over C is defined by:

‖A‖ = (tr(A∗A))1/2 =

(∑i,j

|Aij|2)1/2

.

(Here we write tr(A) for the trace of A.)Note: This norm is the 2-norm of A thought of as a vector in the mn-

dimensional vector space Mm,n(C). It is almost obvious that ‖A‖ = ‖A∗‖.

Theorem 9.9.1. Let A, P , Q be arbitrary complex matrices for which theappropriate multiplications are defined. Then

‖AP‖2 + ‖(I − AA+)Q‖2 = ‖AP + (I − AA+)Q‖2.

Proof.

‖AP + (I − AA+)Q‖2 = tr{

[AP + (I − AA+)Q]∗[AP + (I − AA+)Q]}

= ‖AP‖2+tr{

(AP )∗(I − AA+)Q}

+tr{

((I − AA+)Q)∗AP}

+‖(I−AA+)Q‖2.

We now show that each of the two middle terms is the trace of the zeromatrix. Since the second of these matrices is the conjugate transpose of theother, it is necessary only to see that one of them is the zero matrix. So forthe first matrix:

(AP )∗(I − AA+)Q = (AP )∗(I − AA+)∗Q = ((I − AA+)AP )∗Q =

= (AP − AA+AP )∗Q = (AP − AP )∗Q = 0.

Theorem 9.9.2. Let A and B be m× n and m× p complex matrices. Thenthe matrix X0 = A+B enjoys the following properties:

(i) ‖AX −B‖ ≥ ‖AX0 −B‖ for all n× p X. Moreover, equality holds ifand only if AX = AA+B.

(ii) ‖X‖ ≥ ‖X0‖ for all X such that ‖AX − B‖ = ‖AX0 − B‖, withequality if and only if X = X0.


Proof.

‖AX −B‖2 =

= ‖A(X−A+B)+((I−AA+)(−B)‖2 = ‖A(X−A+B‖2 +‖(I−AA+)(−B)‖2

= ‖AX − AA+B‖2 + ‖AA+B −B‖2 ≥ ‖AA+B −B‖2,

with equality if and only if AX = AA+B. This completes the proof of (i).Now interchange A+ and A in the preceding theorem and assume AX =AA+B so that equality holds in (i). Then we have

‖X‖2 = ‖A+B +X − A+(AA+B)‖2 = ‖A+B + (I − A+A)X‖2

= ‖A+B‖2 + ‖X − A+AX‖2 = ‖A+B‖2 + ‖X − A+B‖2 ≥ ‖A+B‖2,

with equality if and only if X = A+B.

NOTE: Suppose we have n (column) vectors v1, . . . , vn in Rm and somevector y inRm, and we want to find that vector in the space span(v1, . . . , vn) =V which is closest to the vector y. First, we need to use only an independentsubset of the vi’s. So use row-reduction techniques to find a basis (w1, . . . , wk)of V . Now use these wi to form the columns of a matrix A. So A is m × kwith rank k, and the pseudoinverse of A is simpler to compute than if wehad used all of the vi’s to form the columns of A. Put x0 = A+y. ThenAx0 = AA+y is the desired vector in V closest to y. Now suppose we wantto find the distance from y to V , i.e., the distance

d(y, AA+y) = ‖y − AA+y‖.

It is easier to compute the square of the distance first:

‖y − AA+y‖2 = ‖(I − AA+)y‖2 = y∗(I − AA+)∗(I − AA+)y =

= y∗(I−AA+)(I−AA+)y = y∗(I−AA+−AA+AA+AA+)y = y∗(I−AA+)y.

We have proved the following:

Theorem 9.9.3. The distance between a vector y of Rm and the columnspace of an m× n matrix A is (y∗(I − AA+)y)

12 .


Theorem 9.9.4. AXB = C has a solution X if and only if AA+CB+B = C,in which case for any Y ,

X = A+CB+ + Y − A+AY BB+ is a solution.

This gives the general solution. (The reader might want to review the exer-cises in Chapter 6 for a different approach to a more general problem.)

Proof. If AXB = C, then C = AXB = AA+AXBB+B = AA+CB+B.Conversely, if C = AA+CB+B, then X = A+CB+ is a particular solutionof AXB = C. Any expression of the form X = Y − A+AY BB+ satisfiesAXB = 0. And if AXB = 0, then X = Y − A+AY BB+ for Y = X.Put X0 = A+CB+ and let X1 be any other solution to AXB = C. ThenX = X1 −X0 satisfies AXB = 0, so that

X1 −X0 = X = Y − A+AY BB+ for some Y.

If x = (x1, . . . , xn)T and y = (y1, . . . , yn)T are two points of Cn, thedistance between x and y is defined to be

d(x, y) = ‖x− y‖ =(∑

|xi − yi|2)1/2

.

A hyperplane H in Cn is the set of vectors x = (x1, . . . , xn)T satisfying anequation of the form

∑ni=1 aixi + d = 0, i.e.,

H = {x ∈ Cn : Ax+ d = 0}, where A = (a1, . . . , an) 6= ~0.

A = 1 · (a1, . . . , an), where 1 is 1 × 1 of rank 1 and (a1, . . . , an) is 1 × n ofrank 1, so

A+ = A∗(AA∗)−1 =1

‖A‖2A∗.

For such an A you should now show that ‖A+‖ = 1‖A‖ .

Let y = (y1, . . . , yn)T be a given point of Cn and let H : Ax+ d = 0 be agiven hyperplane. We propose to find the point x0 of H that is closest to y0

and to find the distance d(x0, y0).Note that x is on H if and only if Ax+ d = 0 if and only if A(x− y0)−

(−d− Ay0) = 0. Hence our problem is to find x0 such that


(i) A(x0 − y0)− (−d− Ay0) = 0, (i.e., x0 is on H),

and

(ii) ‖x0 − y0‖ is minimal (so x0 is the point of H closest to y0).

Theorem 9.9.2 says that the vector x = x0− y0 that satisfies Ax− (−d−Ay0) = 0 with ‖x‖ minimal is given by

x = x0 − y0 = A+(−d− Ay0) =1

‖A‖2A∗(−d− Ay0).

Hence

x0 = y0 +−d‖A‖2

A∗ − 1

‖A‖2A∗Ay0.

And

d(y0, H) = d(y0, x0) = ‖x0 − y0‖ = ‖A+(−d− Ay0)‖

= |d+ Ay0| · ‖A+‖ =|Ay0 + d|‖A‖

.

Note: This distance formula generalizes well known formulas for thedistance from a point to a line in R2 and from a point to a plane in R3.

We now recall the method of approximating by least squares. Supposey is a function of n real variables t(1), . . . , t(n), and we want to approximatey as a linear function of these variables. this means we want to find (real)numbers x0, . . . , xn for which

1. x0 +x1t(1) + · · ·+xnt

(n) is as “close” to the function y = y(t(1), . . . , t(n))as possible.

Suppose we have m measurements of y corresponding to m differentsets of values of the t(j) : (yi; t

(1)i , . . . , t

(n)i ), i = 1, . . . ,m. The problem

is to determine x0, . . . , xn so that

2. yi = x1t(1)i + · · · + xnt

(n) + ri, 1 ≤ i ≤ m, where the ri’s are small insome sense.

Put t(j)i = aij, y = (y1, . . . , ym)T , x = (x0, . . . , xn)T , r = (r1, . . . , rm)T ,

and


A =

1 a11 · · · a1n

1 a21 · · · a2n...

......

...1 am1 · · · amn

.

Then 2. becomes:

3. y = Ax+ r.

A standard interpretation of saying that “r is small” is to say that forsome some weighting constants w1, . . . , wm, the number S =

∑mi=1wir

2i =

rTWr, where W = diag(w1, . . . , wm) is minimal.

As S =∑m

i=1wi(yi − s0 − x1ai1 − x2ai2 − · · · − xnain)2, to minimize Sas a function of x0, . . . , xn, we require that ∂S

∂xk= 0 for all k.

Then

∂S

∂x0

= −2∑i

wi(yi − x0 − x1ai1 − · · · − xnain) = 0

implies

4.

(m∑i=1

wi)x0 + (m∑i=1

wiai1)x1 + · · ·+ (m∑i=1

wiain)xn =∑

wiyi.

And for 1 ≤ k ≤ n : ∂S∂xk

= −2∑m

i=1wi(yi−x0−x1ai1−· · ·xnain)aik = 0implies

5.

(m∑i=1

wiaik)x0 + (m∑i=1

wiai1aik)x1 + · · ·+ (m∑i=1

wiainaik)xn =∑

wiyiaik.

It is easy to check that

ATW =

1 · · · 1a11 · · · am1... · · · ...a1n · · · amn

w1 · · · 0

.... . .

...0 · · · wm


=

w1 · · · wma11w1 · · · am1wm

... · · · ...a1nwn · · · amnwm

.

So putting together the right hand sides of 4. and 5., we obtain

6.

(ATW )y =

∑wiyi∑ai1wiyi

...∑ainwiyi

.

Also we compute

ATWA =

w1 · · · wma11w1 · · · am1wm

......

...a1nw1 · · · amnwm

1 a11 · · · a1n

......

......

1 am1 · · · amn

.

Comparing the above with the left hand sides of 4. and 5., and puttingthe above equations together, we obtain the following system of n + 1equations in n+ 1 unknowns:

7. (ATWA)x = ATWy.

We now reconsider this problem taking advantage of our results ongeneralized inverses. So A is a real m × n matrix, y is a real m × 1matrix, and x ∈ Rn is sought for which (y−Ax)T (y−Ax) = ‖Ax−y‖2

is minimal (here we put W = I). Theorem 9.9.2 solves this problem byputting x = A+y. We show that this also satisfies the above conditionATAx = ATy, as should be expected. For suppose A = BC with Bm× k, C k × n, where k = rank(A) = rank(B) = rank(C). Then

ATAx = ATAA+y = CTBTBCCT (CCT )−1(BTB)−1BTy

= CTBTy = ATy.

So when W = I the generalized inverse solution does the following:First, it actually constructs a solution of the original problem that,second, has the smallest norm of any solution.

9.10. THE RAYLEIGH PRINCIPLE∗ 195

9.10 The Rayleigh Principle∗

All matrices in this section are over the field C of complex numbers, andfor any matrix B, B∗ denotes the conjugate transpose of B. Also, 〈 , 〉denotes the standard inner product on Cn given by: 〈x, y〉 = xTy = y∗x.Let A be n × n hermitian, so A has real eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λnwith an orthonormal set {v1, . . . , vn} of associated eigenvectors: Avj = λjvj,〈vj, vi〉 = vTj vi = v∗i vj = δij.

Let Q = (v1, . . . , vn) be the matrix whose jth column is vj. Then Q∗Q =In and Q∗AQ = Q∗(λ1v1, . . . , λnvn) = Q∗QΛ = Λ = diag(λ1, . . . , λn), and Qis unitary (Q∗ = Q−1).

For 0 6= x ∈ Cn define the Rayleigh Quotient ρA(x) for A by

ρA(x) =〈Ax, x〉〈x, x〉

=x∗Ax

||x||2. (9.24)

Put O = {x ∈ Cn : 〈x, x〉 = 1}, and note that for 0 6= k ∈ C, 0 6= x ∈ Cn,

ρA(kx) = ρA(x). (9.25)

Hence

{ρA(x) : x 6= 0} = {ρA(x) : x ∈ O} = {x∗Ax : x ∈ O}. (9.26)

The set W (A) = {ρA(x) : x ∈ O} is called the numerical range of A.Observe that if x = Qy, then x ∈ O iff y ∈ O. Since W (A) is the continuousimage of a compact connected set, it must be a closed bounded intervalwith a maximum M and a minimum m. Since Q : Cn → Cn : x 7→ Qx isnonsingular, Q maps O to O in a one-to-one and onto manner. Hence

M = maxx∈O{x∗Ax} = maxy∈O{(Qy)∗A(Qy) = y∗Q∗AQy} =

= maxy∈O{y∗Λy} = maxy∈O{n∑j=1

λj|yj|2},

where y = (y1, y2, . . . , yn)T ∈ Cn,∑|yi|2 = 1.

Similarly,

m = minx∈O{ρA(x)} = miny∈O{n∑j=1

λj|yj|2}.


By the ordering of the eigenvalues, for y ∈ O we have

λ1 = λ1

∑|yj|2 ≤

n∑j=1

λj|yj|2 = ρλ(y) ≤ λn

n∑j=1

|yj|2 = λn. (9.27)

Furthermore, with y = (1, 0, . . . , 0)∗ ∈ O and z = (0, . . . , 0, 1)∗ ∈ O, wehave ρΛ(y) = λ1 and ρΛ(z) = λn. Hence we have almost proved the followingfirst approximation to the Rayleigh Principle.

Theorem 9.10.1. Let A be an n × n hermitian matrix with eigenvaluesλ1 ≤ λ2 ≤ · · · ≤ λn. Then for any nonzero x ∈ O,

λ1 ≤ ρA(x) ≤ λn, and (9.28)

λ1 = minx∈OρA(x); λn = maxx∈OρA(x). (9.29)

If 0 6= x ∈ Cn satisfies ρA(x) = λi for either i = 1 or i = n, (9.30)

then x is an eigenvector of A belonging to the eigenvalue λi.

Proof. Clearly Eqs. 9.28 and 9.29 are already proved. So consider Eq. 9.30.Without loss of generality we may assume x ∈ O. Suppose x =

∑nj=1 cjvj,

so that ρA(x) = x∗Ax =(∑n

j=1 cjv∗j

)(∑nj=1 cjλjvj

)=∑n

j=1 λj|cj|2.

Clearly λ1 = λ1

∑nj=1 |cj|2 ≤

∑nj=1 λj|cj|2 with equality iff λj = λ1 when-

ever cj 6= 0. Hence ρA(x) = λ1 iff x belongs to the eigenspace associated withλ1. The argument for λn is similar.

Recall that Q = (v1, . . . , vn), and note that if x = Qy, so y = Q∗x,then x = vi = Qy iff y = Q∗vi = ei. So with the notation x = Qy,y = (y1, . . . , yn)T , we have

〈x, vi〉 = x∗vi = x∗QQ∗vi = y∗ei = yi.

Hence x ⊥ vi iff y = Q∗x satisfies yi = 0.Def. Tj = {x 6= 0 : 〈x, vk〉 = 0 for k = 1, . . . , j} = {v1, . . . , vj}⊥ \ {0}.

Theorem 9.10.2. ρA(x) ≥ λj+1 for all x ∈ Tj, and ρA(x) = λj+1 for somex ∈ Tj iff x is an eigenvector associated with λj+1. Thus

λj+1 = minx∈Tj{ρA(x)} = ρA(vj+1).

9.11. EXERCISES 197

Proof. 0 6= x ∈ Tj iff x = Qy where y =∑n

k=j+1 ykek iff x =∑n

k=j+1 ykvk.Without loss of generality we may assume x ∈ O ∩ Tj. Then y = Q∗x ∈ Oand x ∈ Tj ∩ O iff ρA(x) =

∑nk=j+1 λk|yk|2 ≥ λj+1

∑nk=j+1 |y2

k| ≥ λj+1,with equality iff yk = 0 whenever λk > λj+1. In particular, if y = ej+1, sox = vj+1, ρA(x) = λj+1.

Theorem 9.10.3. Put Sj = {x 6= 0 : 〈x, vk〉 = 0 for k = n, n − 1, . . . , n −(j − 1)}. So Sj = {vn, vn−1, . . . , vn−(j−1)}⊥ \ {~0}. Then ρA(x) ≤ λn−j for allx ∈ Sj, and equality holds iff x is an eigenvector associated with λn−j. Thus

λn−j = maxx∈Sj{ρA(x)} = ρA(vn−j).

Proof. We leave the proof as an exercise for the reader.

The Rayleigh Principle consists of Theorems 9.10.2 and 9.10.3.

9.11 Exercises

1. Show that if T ∈ L(V ), where V is any finite-dimensional inner productspace, and if T is normal, then

(a) Im(T ) = Im(T ∗), and

(b) null(T ) = null(T ∗).

2. Prove that if T ∈ L(V ) is normal, then

null(T k) = null(T ) and Im(T k) = Im(T )

for every positive integer k.

3. Prove that a normal operator on a complex inner product space isself-adjoint if and only if all its eigenvalues are real.

4. Let V be a finite dimensional inner product space over C. SupposeT ∈ L(V ) and U is a subspace of V . Prove that U is invariant underT if and only if U⊥ is invariant under T ∗.


5. Let V be a finite dimensional inner product space over C. Supposethat T is a positive operator on V (called positive semidefinite by someauthors). Prove that T is invertible if and only if

〈Tv, v〉 > 0

for every v ∈ V \ {0}.

6. In this problem Mn(R) denotes the set of all n× n real matrices, andRn denotes the usual real inner product space of all column vectorswith n real entries and inner product (x,y) = xTy = yTx = (y,x).

Define the following subsets of MN(R):

Πn = {A ∈Mn(R) : (Ax,x) > 0 for all 0 6= x ∈ Rn}Sn = {A ∈Mn(R) : AT = A}Σn = Πn ∩ SnKn = {A ∈Mn(R) : AT = −A}

For this problem we say that A ∈ Mn(R) is positive definite if andonly if A ∈ Πn. A is symmetric if and only if it belongs to Sn. It isskew-symmetric if and only if it belongs to Kn.

Note: It is sometimes the case that a real matrix is said to be positivedefinite if and only if it belongs to Sn also, i.e. A ∈ Σn. Rememberthat we do not do that here.

Problem: Let A ∈Mn(R).

(i) Prove that there are unique matrices B ∈ Sn and C ∈ Kn forwhich A = B +C. Here B is called the symmetric part of A andC is called the skew-symmetric part of A.

(ii) Show that A is positive definite if and only if the symmetric partof A is positive definite.

(iii) Let A be symmetric. Show that A is positive definite if and only ifall eigenvalues of A are positive. If this is the case, then det(A) >0.

9.11. EXERCISES 199

(iv) Let ∅ 6= S ⊆ {1, 2, . . . , n}. Let AS denote the submatrix ofA formed by using the rows and columns of A indexed by theelements of S. (So in particular if S = {k}, then AS is the 1 × 1matrix whose entry is the diagonal entry Akk of A.) Prove that ifA is positive definite (but not necessarily symmetric), then AS ispositive definite. Hence conclude that each diagonal entry of A ispositive.

(v) Let A ∈ Σn and for 1 ≤ i ≤ n let Ai denote the principal subma-trix of A defined by using the first i rows and first i columns ofA. Prove that det(Ai) > 0 for each i = 1, 2, . . . , n. (Note: Theconverse is also true, but the proof is a bit more difficult.)

7. A is a real rectangular matrix (possibly square).

(a) Prove that A+ = A−1 when A is nonsingular.

(b) Determine all singular value decompositions of I with U = V .Show that all of them lead to exactly the same pseudoinverse.

(c) The rank of A+ is the same as the rank of A.

(d) If A is self-conjugate, then A+ is self-conjugate.

(e) (cA)+ = 1cA+ for c 6= 0.

(f) (A+)∗ = (A∗)+.

(g) (A+)+ = A.

(h) Show by a counterexample that in general (AB)+ 6= B+A+.

(i) If A is m × r, B is r × n, and both matrices have rank k, then(AB)+ = B+A+.

(j) Suppose that x is m× 1 and y is 1×m. Compute

(a) x+; (b) y+; (c) (xy)+.

8. Let A =

1 −2−1 20 32 5

, ~b =

31−42

.

(a) Find a least-squares solution of A~x = ~b.


(b) Find the orthogonal projection of ~b onto the column space of A.

(c) Compute the least-squares error (in the solution of part (a)).

9. (a) Show that if C ∈Mn(C) is Hermitian and x∗Cx = 0 for all x ∈ Cn,then C = 0.

(b) Show that for any A ∈ Mn(C) there are (unique) Hermitian ma-trices B and C for which A = B + iC.

(c) Show that if x∗Ax is real for all x ∈ Cn, then A is Hermitian.

10. Let V be the vector space over the realsR consisting of the polynomialsin x of degree at most 4 with coefficients in R, and with the usualaddition of vectors (i.e., polynomials) and scalar multiplication.

Let B1 = {1, x, x2, . . . , x4} be the standard ordered basis of V . Let Wbe the vector space of 2 × 3 matrices over R with the usual additionof matrices and scalar multiplication. Let B2 be the ordered basis of

W given as follows: B2 = {v1 =

(1 0 00 0 0

), v2 =

(0 1 00 0 0

), v3 =(

0 0 10 0 0

), v4 =

(0 0 01 0 0

), v5 =

(0 0 00 1 0

), v6 =

(0 0 00 0 1

)}.

Define T : V 7→ W by:

For f = a0 + a1x+ a2x2 + a3x

3 + a4x4, put

T (f) =

(0 a3 a2 + a4

a1 + a0 a0 0

).

Construct the matrix A = [T ]B1,B2 that represents T with respect tothe pair B1,B2 of ordered bases.

11. Show that if A ∈Mn(C) is normal, then A~x = ~0 if and only if A∗~x = ~0.

Use the matrices B =

(0 10 0

)and B∗ =

(0 01 0

)to show that

this “if and only if” does not hold in general.

9.11. EXERCISES 201

12. Let U ∈Mp,m(C).Show that the following are equivalent:

(a)

(Ip UU∗ Im

)is positive definite;

(b) Ip − UU∗ is positive definite;

(c) Im − U∗U is positive definite.

(Hint: consider

(v∗, w∗)

(Ip UU∗ Im

)(vw

)=

= v∗(Ip − UU∗)v + ??? = w∗(Im − U∗U)w + ???)

13. Let A =

2 −1 0−1 2 −10 −1 2

. Show that A is positive definite in three

different ways.

14. Compute a singular value decomposition of A where

A =

(1 + i 1 01− i 0 1

).

15. You may assume that V is a finite-dimensional vector space over F , asubfield of C, say dim(V ) = n. Let T ∈ L(V ). Prove or disprove eachof the following:

(a) V = null(T )⊕ range(T ).

(b) There exists a subspace U of V such that U ∩ null(T ) = {0} andrange(T ) = {T (u) : u ∈ U}.

16. Let R, S, T ∈ L(V ), where V is a complex inner product space.

(i) Suppose that S is an isometry and R is a positive operator suchthat T = SR. Prove that R =

√T ∗T .


(ii) Let σ denote the smallest singular value of T , and let σ∗ denote

the largest singular value of T . Prove that σ ≤∥∥∥T (v)‖v‖

∥∥∥ ≤ σ∗ for

every nonzero v ∈ V .

17. Let A be an n×n matrix over F , where F is either C or R. Show thatthere is a polynomial f(x) ∈ F [x] for which A∗ = f(A) if and only ifA is normal. What can you say about the minimal degree of such apolynomial?

18. Let F be any subfield of C. Let F n be endowed with the standard innerproduct, and let A be a normal, n×n matrix over F . (For example, Fcould be the rational field Q.) The following steps give another viewof normal matrices. Prove the following statements without using thespectral theorem (for C).

(i) null(A) = (col(A))⊥ = null(A∗).

(ii) If A2x = 0, then Ax = 0.

(iii) If f(x) ∈ F [x], then f(A) is normal.

(iv) Suppose f(x), g(x) ∈ F [x] with 1 = gcd(f(x), g(x)). If f(A)x = 0and g(A)y = 0, then 〈x,y〉 = 0.

(v) Suppose p(x) = is the minimal polynomial of A. Then p(x) hasno repeated (irreducible) factors.

19. (a) Prove that a normal operator on a complex inner product spacewith real eigenvalues is self-adjoint.

(b) Let T : V → V be a self-adjoint operator. Is it true that T musthave a cube root? Explain. (A cube root of T is an operatorS : V → V such that S3 = T .)

Chapter 10

Decomposition WRT a LinearOperator

To start this chapter we assume that F is an arbitrary field and let V be ann-dimensional vector space over F . Note that this necessarily means that weare not assuming that V is an inner product space.

10.1 Powers of Operators

First note that if T ∈ L(V ), if k is a nonnegative integer, and if T k(v) = ~0,then T k+1(v) = ~0. Thus null(T k) ⊆ null(T k+1). It follows that

{~0} = null(T 0) ⊆ null(T 1) ⊆ · · · null(T k) ⊆ null(T k+1) ⊆ · · · . (10.1)

Theorem 10.1.1. Let T ∈ L(V ) and suppose that m is a nonnegative integersuch that null(Tm) = null(Tm+1). Then null(Tm) = null(T k) for all k ≥ m.

Proof. Let k be a positive integer. We want to prove that

null(Tm+k) = null(Tm+k+1).

We already know that null(Tm+k) ⊆ null(Tm+k+1). To prove the inclusion inthe other direction, suppose that v ∈ null(Tm+k+1). Then ~0 = Tm+1(T k)(v).So T k(v) ∈ null(Tm+1) = null(Tm), implying that~0 = Tm(T k(v)) = Tm+k(v).Hence null(Tm+k+1) ⊆ null(Tm+k), completing the proof.

203

204 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR

Corollary 10.1.2. If T ∈ L(V ), n = dim(V ), and k is any positive integer,then null(T n) = null(T n+k).

Proof. If the set containments in Eq. 10.1 are strict for as long as possible, ateach stage the dimension of a given null space is at least one more than thedimension of the preceding null space. Since the entire space has dimensionn, at most n + 1 proper containments can occur. Hence by Theorem 10.1.1the Corollary must be true.

Definition: Let T ∈ L(V ) and suppose that λ is an eigenvalue of T .A vector v ∈ V is called a generalized eigenvector of T corresponding to λprovided

(T − λI)j(v) = ~0 for some positive integer j. (10.2)

By taking j = 1 we see that each eigenvector is also a generalized eigen-vector. Also, the set of generalized eigenvectors is a subspace of V . Moreover,by Corollary 10.1.2 (with T replaced by T − λI) we see that

Corollary 10.1.3. The set of generalized eigenvectors corresponding to λ isexactly equal to null[(T − λI)n], where n = dim(V ).

An operator N ∈ L(V ) is said to be nilpotent provided some power of N isthe zero operator. This is equivalent to saying that the minimal polynomialp(x) for N has the form p(x) = xj for some positive integer j, implying thatthe characteristic polynomial for N is xn. Then by the Cayley-HamiltonTheorem we see that Nn is the zero operator.

Now we turn to dealing with images of operators. Let T ∈ L(V ) andk ≥ 0. If w ∈ Im(T k+1), say w = T k+1(v) for some v ∈ V , then w =T k(T (v) ∈ Im(T k). In other words we have

V = Im(T 0) ⊇ ImT 1 ⊇ · · · ⊇ Im(T k) ⊇ Im(T k+1) · · · . (10.3)

Theorem 10.1.4. If T ∈ L(V ), n = dim(V ), and k is any positive integer,then

Im(T n) = Im(T n+k).

Proof. We use the corresponding result already proved for null spaces.

dim(Im(T n+k)) = n− dim(null(T n+k))

= n− dim(null(T n))

= dim(Im(T n)).

Now the proof is easily finished using Eq. 10.3

10.2. THE ALGEBRAIC MULTIPLICITY OF AN EIGENVALUE 205

10.2 The Algebraic Multiplicity of an

Eigenvalue

If the matrix A is upper triangular, so is the matrix xI − A, whose deter-minant is the product of its diagonal elements and also is the characteristicpolynomial of A. So the number of times a given scalar λ appears on thediagonal of A is also the algebraic multiplicity of λ as a root of the character-istic polynomial of A. The next theorem states that the algebraic multiplicityof an eigenvalue is also the dimension of the space of generalized eigenvectorsassociated with that eigenvalue.

Theorem 10.2.1. Let n = dim(V ), let T ∈ L(V ) and let λ ∈ F . Thenfor each basis B of V for which [T ]B is upper triangular, λ appears on thediagonal of [T ]B exactly dim(null[(T − λI)n]) times.

Proof. For notational convenience, we first assume that λ = 0. Once thiscase is handled, the general case is obtained by replacing T with T − λI.The proof is by induction on n, and the theorem is clearly true when n = 1.We assume that n > 1 and that the theorem holds on spaces of dimensionn− 1.

Suppose that B = (v1, . . . , vn) is a basis of V for which [T ]B is the uppertriangular matrix

λ1 ∗. . .

λn−1

0 λn

. (10.4)

Let U = span(v1, . . . , vn−1). Clearly U is invariant under T and thematrix of T |U with respect to the basis (v1, . . . , vn−1) is λ1 ∗

. . .

0 λn−1

. (10.5)

By our induction hypothesis, 0 appears on the diagonal of the matrix inEq. 10.5 exactly dim(null((T |U)n−1))) times. Also, we know that null((T |U)n−1) =null((T |U)n), because dim(U) = n− 1. Hence

0 appears on the diagonal of 10.5 dim(null((T |U)n))) times. (10.6)


The proof now breaks into two cases, depending on whether λn = 0 ornot. First consider the case where λn 6= 0. We show in this case that

null(T n) ⊆ U. (10.7)

This will show that null(T n) = null((T |U)n), and hence Eq 10.6 will say that 0appears on the diagonal of Eq. 10.4 exactly dim(null(T n)) times, completingthe proof in the case where λn 6= 0.

It follows from Eq. 10.4 that

[T n]B = ([T ]B)n =

λn1 ∗

. . .

λnn−1

0 λnn

. (10.8)

This shows that

T n(vn) = u+ λnnvn

for some u ∈ U . Suppose that v ∈ null(T n). Then v = u+ avn where u ∈ Uand a ∈ F . Thus

~0 = T n(v) = T n(u) + aT n(vn) = T n(u) + au+ aλnnvn.

Because T n(u) and au are in U and vn 6∈ U , this implies that aλnn = 0. Sinceλn 6= 0, clearly a = 0. Thus v = u ∈ U , completing the proof of Eq. 10.7,and hence finishing the case with λn 6= 0.

Suppose λn = 0. Here we show that

dim(null(T n)) = dim(null((T |U)n)) + 1, (10.9)

which along with Eq. 10.6 will complete the proof when λn = 0.First consider

dim(null(T n)) = dim(U ∩ null(T n) + dim(U + null(T n))− dim(U)

= dim(null((T |U)n)) + dim(U + null(T n))− (n− 1).

Also,

n = dim(V ) ≥ dim(U + null(T n)) ≥ dim(U) = n− 1.

10.2. THE ALGEBRAIC MULTIPLICITY OF AN EIGENVALUE 207

It follows that if we can show that null(T n) contains a vector not in U , thenEq. 10.9 will be established. First note that since λn = 0, we have T (vn) ∈ U ,hence

T n(vn) = T n−1(T (vn)) ∈ Im[(T |U)n−1] = Im[(T |U)n].

This says that there is some u ∈ U for which T n(u) = T n(vn). Then u−vn isnot in U but T n(u−vn) = ~0. Hence Eq. 10.9 holds, completing the proof.

At this point we know that the geometric multiplicity of λ is the dimensionof the null space of T − λI, i.e., the dimension of the eigenspace associatedwith λ, and this is less than or equal to the algebraic multiplicity of λ, which isthe dimension of the null space of (T−λI)n and also equal to the multiplicityof λ as a root of the characteristic polynomial of T , at least in the case thatT is upper triangularizable. This is always true if F is algebraically closed.Moreover, in this case the following corollary is clearly true:

Corollary 10.2.2. If F is algebraically closed, then the sum of the algebraicmultiplicities of of all the eigenvalues of T equals dim(V ).

For any f(x) ∈ F [x], T and f(T ) commute, so that the null space of p(T )is invariant under T .

Theorem 10.2.3. Suppose F is algebraically closed and T ∈ L(V ). Letλ1, . . . , λm be the distinct eigenvalues of T , and let U1, . . . , Um be the corre-sponding subspaces of generalized eigenvectors. Then

(a) V = U1 ⊕ . . .⊕ Um;(b) each Uj is T -invariant;(c) each (T − λjI)|Uj is nilpotent.

Proof. Since Uj = null(T − λjI)n for each j, clearly (b) follows. Clearly(c) follows from the definitions. Also the sum of the multiplicities of theeigenvalues equals dim(V ), i.e.

dim(V ) =m∑j=1

dim(Uj).

Put U = U1 + · · · + Um. Clearly U is invariant under T . Hence we candefine S ∈ L(U) by

S = T |U .


It is clear that S has the same eigenvalues, with the same multiplicities,as does T , since all the generalized eigenvectors of T are in U . Then thedimension of U is the sum of the dimensions of the generalized eigenspacesof T , forcing V = U , and V = ⊕

∑mj=1 Uj, completing the proof of (a).

There is another style of proof that offers a somewhat different insight intothis theorem, so we give it also. Suppose that f(λ) = (λ−λ1)k1 · · · (λ−λm)km

is the characteristic polynomial of T . Then by the Cayley-Hamilton theorem(T −λ1I)k1 · · · (T −λmI)km is the zero operator. Put bj(λ) = f(λ)

(λ−λj)kj, for 1 ≤

j ≤ m. Then since gcd(b1(λ), . . . , bm(λ)) = 1 there are polynomials aj(λ) forwhich 1 =

∑j aj(λ)bj(λ), which implies that I =

∑j aj(T )bj(T ). Then for

any vector u ∈ V we have u =∑

j aj(T )bj(T )(u), where aj(T )bj(T )(u) ∈ Uj.It follows that V =

∑j Uj. Since the Uj are linearly independent, we have

V = ⊕∑

j Uj.

By joining bases of the various generalized eigenspaces we obtain a basisfor V conisting of generalized eigenvectors, proving the following corollary.

Corollary 10.2.4. Let F be algebraically closed and T ∈ L(V ). Then thereis a basis of V consisting of generalized eigenvectors of T .

Lemma 10.2.5. Let N be a nilpotent operator on a vector space V over anyfield F . Then there is a basis of V with respect to which the matrix of N hasthe form 0 ∗

. . .

0 0

, (10.10)

Proof. First choose a basis of null(N). Then extend this to a basis ofnull(N2). Then extend this to a basis of null(N3). Continue in this fashionuntil eventually a basis of V is obtained, since V = null(Nm) for sufficientlylarge m. A little thought should make it clear that with respect to this basis,the matrix of N is upper triangular with zeros on the diagonal.

10.3 Elementary Operations

For 1 ≤ i ≤ n, let ei denote the column vector with a 1 in the ith positionand zeros elsewhere. Then eie

Tj is an n×n matrix with all entries other than

the (i, j) entry equal to zero, and with that entry equal to 1.

10.3. ELEMENTARY OPERATIONS 209

LetEij(c) = I + ceie

Tj .

We leave to the reader the exercise of proving the following elementaryresults:

Lemma 10.3.1. The following elementary row and column operations areobtained by pre- and post-multiplying by the elementary matrix Eij(c):

(i) Eij(c)A is obtained by adding c times the jth row of A to the ith row.(ii) AEij(c) is obtained by adding c times the ith column of A to the jth

column.(iii) Eij(−c) is the inverse of the matrix Eij(c).Moreover, if T is upper triangular with diagonal diag(λ1, . . . , λn), and if

λi 6= λj with i < j, then

T ′ = Eij(−c)TEij(c) = Eij(−c)

λ1

. . . ∗λi

. . .

0 λj. . .

Eij(c),

where T ′ is obtained from T by replacing the (i,j) entry tij of T with tij +c(λi − λj). The only other entries of T that can possibly be affected are tothe right of tij or above tij.

Using Lemma 10.3.1 over and over, starting low, working left to right ina given row and moving upward, we can transform T into a direct sum ofblocks

P−1TP =

A1

A2

. . .

Ar

,

where each block Ai is upper triangular with each diagonal element equal toλi, 1 ≤ i ≤ r.

Our next goal is to find an invertible matrix Ui that transforms the blockAi into a matrix U−1

i AiUi = λiI + N where N is nilpotent with a special


form. All entries on or below the main diagonal are 0, all entries immediatelyabove the diagonal are 0 or 1, and all entries further above the diagonal are0. This matrix λiI + N is called a Jordan block. We want to arrange it sothat it is a direct sum of elementary Jordan blocks. These are matrices of theform λiI +N where each entry N just above the diagonal is 1 and all otherelements of N are 0. Then we want to use block multiplication to transformP−1TP into a direct sum of elementary Jordan blocks.

10.4 Transforming Nilpotent Matrices

Let B be n × n, complex and nilpotent of order p: Bp−1 6= 0 = Bp. ByN(Bj) we mean the right null space of the matrix Bj. Note that if j > i,then N(Bi) ⊆ N(Bj). We showed above that if N(Bi) = N(Bi+1), thenN(Bi) = N(Bk) for all k ≥ k.

Step 1.

Lemma 10.4.1. Let W = (w1, . . . , wr) be an independent list of vectors inN(Bj+1) with 〈W 〉 ∩ N(Bj) = {0}. Then BW = (Bw1, . . . , Bwr) is anindependent list in N(Bj) with 〈BW 〉 ∩N(Bj−1) = {0}.Proof. Clearly BW ⊆ N(Bj). Suppose

∑ri=1 ciBwi + uj−1 = 0, with uj−1 ∈

N(Bj−1). Then 0 = Bj−1(∑r

i=1 ciBwi + uj−1) =∑r

i=1 ciBjwi + 0. This says∑r

i=1 ciwi ∈ N(Bj), so by hypothesis c1 = c2 = · · · = cr = 0, and hence alsouj−1 = 0. It is now easy to see that the Lemma must be true.

Before proceeding to the next step, we introduce some new notation.Suppose V1 is a subspace of V . To say that the list (w1, . . . , wr) is independentin V \V1 means first that it is independent, and second that if W is the span〈w1, . . . , wr〉 of the given list, then W ∩ V1 = {0}. To say that the list(w1, . . . , wr) is a basis of V \ V1 means that a basis of V1 adjoined to the list(w1, . . . , wr) is a basis of V . Also, 〈L〉 denotes the space spanned by the listL.

Step 2. Let Up be a basis of N(Bp) \ N(Bp−1) = Cn \ N(Bp−1), sinceBp = 0. By Lemma 10.4.1 BUp is an independent list in N(Bp−1)\N(Bp−2),with 〈BUp〉 ∩N(Bp−2) = {0}.

Complete BUp to a basis (BUp, Up−1) of N(Bp−1) \ N(Bp−2). At thispoint we have that

(BUp, Up−1, Up) is a basis of N(Bp) \N(Bp−2) = Cn \N(Bp−2).

10.4. TRANSFORMING NILPOTENT MATRICES 211

Step 3. (B2Up, BUp−1) is an independent list in N(Bp−2) \ N(Bp−3).Complete this to a basis (B2Up, BUp−1, Up−2) of N(Bp−2)\N(Bp−3). At thisstage we have that

(Up, BUp, Up−1, B2Up, BUp−1, Up−2) is a basis of N(Bp) \N(Bp−3).

Step 4. Proceed in this way until a basis for the entire space V has beenobtained. At that point we will have a situation described in the followingarray:

Independent set basis for subspace

(Up) N(Bp) \N(Bp−1) = Cn \N(Bp−1)

(Up−1, BUp) N(Bp−1) \N(Bp−2)

(Up−2, BUp−1, B2Up) N(Bp−2) \N(Bp−3) (10.11)

(Up−3, BUp−2, B2Up−1, B

3Up) N(Bp−3) \N(Bp−4)...

...

(U1, BU2, . . . , Bp−2Up−1, B

p−1Up) N(Bp−(p−1)) \N(B0) = N(B)

Choose x = x1 ∈ Bp−jUp−j+1 and follow it back up to the top of thatcolumn: xp−j ∈ Up−j; Bxp−j = xp−j−1; . . . ; Bx2 = x1. So Bp−j−1xp−j = x1.

We now want to interpret what it means for

P−1BP = H =

H1

. . .

Hs

with Hi =

0 1 0

0 1. . .

10

.

This last matrix is supposed to have 1’s along the superdiagonal and 0’selsewhere. This says that the jth column of BP (which is B times the jthcolumn of P ) equals the jth column of PH, which is either the zero columnor the (j − 1)st column of P . So we need

B(jth column of P ) = 0 or the (j − 1)st column of B.


So to form P , as the columns of P we take the vectors in the Array 10.11starting at the bottom of a column, moving up to the top, then from thebottom to the top of the next column to the left, etc. Each column ofArray 10.11 represents one block Hi, and the bottom row consists of a basisfor the null space of B. Then we do have

P−1BP = H =

H1

. . .

Hs

.

Suppose we have a matrix A = λI + B where B is nilpotent with allelements on or below the diagonal equal to 0. Construct P so that P−1BP =H, i.e., P−1AP = P−1(B + λI)P = λI + H is a direct sum of elementaryJordan blocks each with the same eigenvalue λ along the diagonal and 1’sjust above the diagonal. Such a matrix is said to be in Jordan form.

Start with a general n×n matrix A over C. Here is the general algorithmfor finding a Jordan form for A.

1. Find P1 so that P−11 AP1 = T is in upper triangular Schur form, so

T =

λ1

. . .

λ1 ∗λ2

. . .

λ2

0. . .

λr. . .

λr

.

2. Find P2 so that

P−12 TP2 =

Tλ1

Tλ2 0. . .

0 Tλr

= C.


3. Find P3 so that P−13 CP3 = D +H where

(P1P2P3)−1 · A · (P1P2P3) = D +H is in Jordan form.

An n×n elementary Jordan block is a matrix of the form Jn(λ) = λI+Nn,where λ ∈ C, Nn is the n×n nilpotent matrix with 1’s along the superdiago-nal and 0’s elsewhere. The minimal and characteristic polynomials of Jn(λ)are both equal to (x− λ)n, and Jn has a 1-dimensional space of eigenvectorsspanned by (1, 0, . . . , 0)T and an n-dimensional space of generalized eigen-vectors, all belonging to the eigenvalue λ. If A has a Jordan form consistingof a direct sum of s elementary Jordan blocks, the characteristic polynomi-als of these blocks are the elementary divisors of A. The characteristicpolynomial of A is the product of its elementary divisors. If λ1, . . . , λs arethe distinct eigenvalues of A, and if for each i, 1 ≤ i ≤ s, (x − λi)mi is thelargest elementary divisor involving the eigenvalue λi, then

∏si=1(x − λi)mi

is the minimal polynomial for A. It is clear that (x− λ) divides the minimalpolynomial of A if and only if it divides the characteristic polynomial of A ifand only if λ is an eigenvalue of A.

At this point we consider an example in great detail. In order to minimizethe tedium of working through many computations, we start with a block-upper triangluar matrix N with square nilpotent matrices along the maindiagonal, so it is clear from the start that N is nilpotent. Let N = (Pij)where the blocks Pij are given by

P11 =

0 1 0 00 0 1 00 0 0 10 0 0 0

P22 = P33 =

0 1 00 0 10 0 0

P44 =

(0 10 0

)P55 = (0)

These diagonal blocks determine the sizes of each of the other blocks, andall blocks Pij with i > j are zero blocks. For the blocks above the diagonal


we have the following.

P12 =

−1 1 0−1 1 0−1 1 00 1 0

; P13 =

−1 1 0−1 1 0−1 1 0−1 1 0

;

P14 =

−1 1−1 1−1 1−1 1

; P15 =

−1−1−1−1

;

P23 =

−1 1 0−1 1 00 1 0

; P24 =

−1 1−1 1−1 1

;

P25 = P35 =

−1−1−1

; P34 =

−1 1−1 10 1

;

P45 =

(−10

).

The reader might want to write out the matrix N as a 13 × 13 matrixto visualize it better, but for computing N2 using block multiplication theabove blocks suffice. We give N2 as an aid to computing its null space.

N2 =

0 0 1 0 −1 0 1 −1 0 1 −1 0 00 0 0 1 −1 0 1 −1 0 1 −1 0 00 0 0 0 0 0 1 −1 0 1 −1 0 00 0 0 0 0 0 1 −1 0 1 −1 0 00 0 0 0 0 0 1 −1 0 1 −1 0 00 0 0 0 0 0 0 0 0 1 −1 0 00 0 0 0 0 0 0 0 0 1 −1 0 00 0 0 0 0 0 0 0 0 1 −1 0 00 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0

Another routine computation shows that N3 has first row equal to

(0 0 0 1 − 1 0 0 0 0 0 0 0 0)


and all other entries are zeros.

Now let x = (x1, x2, . . . , x13)T and write out the conditions for x to be inthe null space of N j for j = 1, 2, 3, 4.

The null space of N4 is R13. So N4 has rank 0 and nullity 13.

x is in the null space of N3 if and only if x4 = x5. N3 has rank 1 andnullity 12.

x is in the null space of N2 if and only if x3 = x4 = x5, x7 = x8 andx10 = x11. N2 has rank 4 and nullity 9.

x is in the null space of N if and only if x2 = x3 = x4 = x5, x6 = x7 = x8,x9 = x10 = x11 and x12 = x13. N has rank 8 and nullity 5.

Now let S = (e1, e2, . . . , e13) be the standard basis of R13. U4 is to be abasis of null(N4) \ null(N3) and must have one element since the nullity ofN4 is one greater than the nullity of N3. Put U4 = {e1 + e2 + e3 + e4}. It iseasy to see that this vector is not in the null space of N3.

For the next step first compute N(e1 + e2 + e3 + e4) = e1 + e2 + e3.This vector is in null(N3) \ null(N2). Since the nullity of N3 is three morethan the nullity of N2, we need two more vectors in N3 \ N2. We proposee1 + e2 + · · ·+ e7 and e1 + · · ·+ e10. Then we would have

{e1 + · · ·+ e10, e1 + · · ·+ e7, e1 + e2 + e3}

as a basis of null(N3) \ null(N2). This is fairly easy to check. The next stepis to compute the image of each of these three vectors under N . We get thethree vectors

{e1 + · · ·+ e9, e1 + · · ·+ e6, e1 + e2} .

Since the nullity of N2 is four more than the nullity of N , we need oneadditional vector to have a basis of null(N2)\null(N). We propose e1 + · · ·+e12. Then it is routine to show that

{e1 + · · ·+ e12, e1 + · · · e9, e1 + · · ·+ e6, e1 + e2}

is a basis of null(N2) \ null(N).We next compute the image of each of these four vectors under N and

get{e1 + · · ·+ e11, e1 + · · ·+ e8, e1 + · · ·+ e5, e1}


in the null space of N . Since the nullity of N is 5, we need one more vector.We propose e1 + · · ·+ e13. So then we have

{e1 + · · ·+ e13, e1 + · · ·+ e11, e1 + · · ·+ e8, e1 + · · ·+ e5, e1}

as a basis for the null space of N .

At this point we are ready to form the chains of generalized eigenvectorsinto a complete Jordan basis. The first chain is:

v1 = e1, v2 = e1 + e2, v3 = e1 + e2 + e3, v4 = e1 + e2 + e3 + e4.

The second chain is

v5 = e1 + · · ·+ e5, v6 = e1 + · · ·+ e6, v7 = e1 + · · ·+ e7.

The third chain is

v8 = e1 + · · ·+ e8, v9 = e1 + · · ·+ e9, v10 = e1 + · · ·+ e10.

The fourth chain is

v11 = e1 + · · ·+ e11, v12 = e1 + · · ·+ e12.

The fifth chain isv13 = e1 + · · ·+ e13.

With the basis B = (v1, v2, . . . , v13) forming the columns of a matrixP , we have that P−1NP is in Jordan form with elementary Jordan blocks ofsizes 4, 3, 3, 2 and 1.

10.5 An Alternative Approach to the Jordan

Form

Let F be an algebraically closed field. Suppose A is n × n over F withcharacteristic polynomial f(λ) = |λI − A| = (λ − λ1)m1(λ − λ2)m2 · · · (λ −λq)

mq , where λi 6= λj if i 6= j, and each mi ≥ 1. Let TA ∈ L(F n) be definedas usual by TA : F n → F n : x 7→ Ax. We have just seen that there is abasis B of F n for which [TA]B is in Jordan form. This was accomplished

10.5. AN ALTERNATIVE APPROACH TO THE JORDAN FORM 217

by finding a basis B which is the union of bases, each corresponding to anelementary Jordan block. If Bi corresponds to the k × k elementary Jordanblock J = λiI + Nk, then Bi = (v1, v2, . . . , vk) where Avj = λivj + vj−1 for2 ≤ j ≤ k, and Av1 = λiv1. We call (v1, . . . , vk) a k-chain of generalizedeigenvectors for the eigenvalue λi.

If P is the n× n matrix whose coluns are the vectors in B arranged intochains, with all the chains associated with a given eigenvalue being groupedtogether, then P−1AP = J1 ⊕ J2 ⊕ · · · ⊕ Jq, where Ji is the direct sumof the elementary Jordan blocks associated with λi. It is easy to see thatP−1AP = D + N , where D is a diagonal matrix whose diagonal entriesare λ1, . . . , λq, with each λi repeated mi times, and N is nilpotent. For anelementary block λI +N it is clear that λI ·N = N · λI = λN , from whichit follows that also DN = ND. So

Lemma 10.5.1. A = PDP−1 + PNP−1, where PDP−1 is diagonalizable,PNP−1 is nilpotent and PDP−1 commutes with PNP−1.

Write the partial fraction decomposition of 1/f(λ) as

1

f(λ)=

a1(λ)

(λ− λ1)m1+ · · ·+ aq(λ)

(λ− λq)mq, (10.12)

where ai(λ) is a polynomial in λ with degree at most mi − 1.

Put

bi(λ) =f(λ)

(λ− λi)mi=∏j 6=i

(λ− λj)mj . (10.13)

Then put

Pi = ai(A) · bi(A) = ai(A) ·∏j 6=i

(A− λjI)mj . (10.14)

Since f(λ) divides bi(λ)bj(λ) when i 6= j, we have

PiPj = 0, if i 6= j. (10.15)

Since 1 =∑q

i=1 ai(λ)bi(λ), we have

I =

q∑i=1

ai(A)bi(A) = P1 + · · ·+ Pq. (10.16)


Then Pi = PiI = Pi(P1 + · · · + Pq) = P 2i , so we can combine these last

two results asPiPj = δijPi. (10.17)

Then using Pmii = Pi we have

[Pi(A− λiI)]mi = Pi(A− λiI)mi = ai(A)bi(A)(A− λiI)mi = ai(A)f(A) = 0.(10.18)

Hence if we put Ni = Pi(A− λiI), then if i 6= j

NiNj = 0 and Nmii = 0. (10.19)

Note that Pi and Ni are both polynomials in A.Start with I = P1 + · · ·+ Pq. Multiply by A on the right.

A = P1A+ P2A+ · · ·+ PqA

= P1(λ1I + A− λ1I) + · · ·+ Pq(λqI + A− λqI)

=

q∑i=1

(λiPi + Pi(A− λiI))

=

q∑i=1

λiPi +

q∑i=1

Ni = D +N, (10.20)

where D =∑q

i=1 λiPi and N =∑q

i=1 Ni. Clearly D and N are polynomialsin A, so they commute with each other and with A, and N is nilpotent.Starting with the observation that D2 = (

∑qi=1 λiPi)

2 =∑q

i=1 λ2iPi it is easy

to see that for any polynomial g(λ) ∈ F [λ] we have

g(D) =

q∑i=1

g(λi)Pi. (10.21)

It then follows that if g(λ) = (λ−λ1)(λ−λ2) · · · (λ−λq), then g(D) = 0.This says the minimal polynomial of D has no repeated roots, so that D isdiagonalizable.

Theorem 10.5.2. Let A be n×n over F and suppose the minimal polynomialfor A factors into linear factors. Then there is a diagonalizable matrix D anda nilpotent matrix N such that

(i) A = D +N ; and(ii) DN = ND.The diagonalizable matrix D and the nilpotent matrix N are uniquely

determined by (i) and (ii) and each of them is a polynomial in A.

10.6. A “JORDAN FORM” FOR REAL MATRICES 219

Proof. We have just observed that we can write A = D + N where D isdiagonalizable and N is nilpotent, and where D and N not only commutebut are polynomials in A. Suppose that we also have A = D′ +N ′ where D′

is diagonalizable and N ′ is nilpotent, and D′N ′ = N ′D′. (Recall that thiswas the case in Lemma 10.5.1.) Since D′ and N ′ commute with one anotherand A = D′ +N ′, they commute with A, and hence with any polynomial inA, especially with D and N . Thus D and D′ are commuting diagonalizablematrices, so by Theorem 7.4.4 they are simultaneously diagonalizable. Thismeans there is some invertible matrix P with P−1DP and P−1D′P bothdiagonal and, of course, P−1NP and P−1N ′P are both nilpotent. FromA = D + N = D′ + N ′ we have D − D′ = N ′ − N . Then P−1(D − D′)Pis diagonal and equals P−1(N ′ −N)P , which is nilpotent (Why?), implyingP−1(D − D′)P is both diagonal and nilpotent, forcing P−1(D − D′)P = 0.Hence D = D′ and N = N ′.

10.6 A “Jordan Form” for Real Matrices

Theorem 10.6.1. Let V be a real vector space and T ∈ L(V ). Then thereis a basis B of V for which

[T ]B =

A1 ∗. . .

0 Am

,

where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.

Proof. The result is clearly true if n = dim(V ) = 1, so suppose n = 2. IfT has an eigenvalue λ, let v1 be any nonzero eigenvector of T belonging toλ. Extend (v1) to a basis (v1, v2) of V . With respect to this basis T has anupper triangular matrix of the form(

λ a0 b

).

In particular, if T has an eigenvalue, then there is a basis of V with respectto which T has an upper triangular matrix. If T has no eigenvalues, thenchoose any basis (v1, v2) of V . With respect to this basis, the matrix of T has


no eigenvalues. So we have the desired result when n = 2. Now suppose thatn = dim(V ) > 2 and that the desired result holds for all real vector spaceswith smaller dimension. If T has an eigenvalue, let U be a 1-dimensionalsubspace of V that is T -invariant. Otherwise, by Theorem 7.3.1 let U be a2-dimensional T -invariant subspace of V . Choose any basis of U and let A1

denote the matrix of T |U with respect to this basis. If A1 is a 2-by-2 matrixthen T has no eigenvalues, since otherwise we would have chosen U to be1-dimensional. Hence T |U and A1 have no eigenvalues.

Let W be any subspace of V for which V = U ⊕W . We would like toapply the induction hypothesis to the subspace W . Unfortunately, W mightnot be T -invariant, so we have to be a little tricky. Define S ∈ L(W ) by

S(w) = PW,U(T (w)) ∀w ∈ W.

Note that

T (w) = PU,W (T (w)) + PW,U(T (w)) (10.22)

= PU,W (T (w)) + S(w) ∀w ∈ W.

By our induction hypothesis, there is a basis for W with respect to whichS has a block upper triangular matrix of the form S2 ∗

. . .

0 Am

,

where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.Adjoin this basis of W to the basis of U chosen above, getting a basis ofV . The corresponding matrix of T is a block upper triangular matrix of thedesired form.

Our definition of the characteristic polynomial of the 2 × 2 matrix A =(a cb d

)is that it is

f(x) = (x− a)(x− d)− bc = x2 − (trace(A))x+ det(A).

The fact that this is a reasonable definition (the only reasonable definitionif we want the Cayley-Hamilton theorem to be valid for such matrices) followsfrom the next result.


Theorem 10.6.2. Suppose V is a real vector space with dimension 2 andT ∈ L(V ) has no eigenvalues. Suppose A is the matrix of T with respect tosome basis. Let p(x) ∈ R[x] be a polynomial of degree 2.

(a) If p(x) equals the characteristic polynomial of A, then p(T ) = 0.(b) If p(x) does not equal the characteristic polynomial of A, then p(T )

is invertible.

Proof. Part (a) follows immediately from the Cayley-Hamilton theorem, butit is also easy to derive it independently of that result. For part (b), letq(x) denote the characteristic polynomial of A with p(x) 6= q(x). Writep(x) = x2 + α1x + β1 and q(x) = x2 + α2x + β2 for some α1, α2, β1, β2 ∈ R.Now

p(T ) = p(T )− q(T ) = (α1 − α2)T + (β1 − β2)I.

If α1 = α2, then β1 6= β2, since otherwise p = q. In this case p(T ) is somemultiple of the identity and hence is invertible. If α1 6= α2, then

p(T ) = (α1 − α2)(T − β2 − β1)

α1 − α2

I),

which is an invertible operator since T has no eigenvalues. Thus (b) holds.

Now suppose V is 1-dimensional and T ∈ L(V ). For λ ∈ R, null(T −λI)equals V if λ is an eigenvalue of T and {0} otherwise. If α, β ∈ R withα2 < 4β, so that x2 + αx+ β = 0 has no real roots, then

null(T 2 + αT + βI) = {0}.

(Proof: Because V is 1-dimensional, there is a constant λ ∈ R such thatTv = λv for all v ∈ V . (Why is this true?) So if v is a nonzero vector in V ,then (T 2 + αT + βI)v = (λ2 + αλ + β)v. The only way this can be 0 is forv = 0 or (λ2 + αλ + β) = 0. But this second equality cannot hold becausewe are assuming that α2 < 4β. Hence null(T 2 + αT + βI) = {0}.)

Now suppose that V is a 2-dimensional real vector space and that T ∈L(V ) still has no eigenvalues. For λ ∈ R, null(T−λI) = {0} exactly becauseT has no eigenvalues. If α, β ∈ R with α2 < 4β, then null(T 2 + αT + βI)equals all of V if x2+αx+β is the characteristic polynomial of T with respectto some (and hence to all) ordered bases of V , and equals {0} otherwise by


part (b) of Theorem 10.6.2. It is important to note that in this case the nullspace of T 2 + αT + βI is either {0} or the whole space 2-dimensional space!

The goal of this section is to prove the following theorem.

Theorem 10.6.3. Suppose that V is a real vector space of dimension n andT ∈ L(V ). Suppose that with respect to some basis of V , the matrix of T hasthe form

A =

A1 ∗. . .

0 Am

, (10.23)

where each Aj is a 1× 1 matrix or a 2× 2 matrix with no eigenvalues (as inTheorem 9.4).

(a) If λ ∈ R, then precisely dim(null((T−λI)n)) of the matrices A1, . . . , Amequal the 1× 1 matrix [λ].

(b) If α, β ∈ R satisfy α2 < 4β, then precisely

dim(null(T 2 + αT + βI)n)

2

of the matrices A1, . . . , Am have characteristic polynomial equal to x2+αx+b.

Note that this implies that null((T 2 +αT +βI)n) must have even dimen-sion.

Proof. With a little care we can construct one proof that can be used to proveboth (a) and (b). For this, let λ, α, β ∈ R with α2 < 4β. Define p(x) ∈ R[x]by

p(x) =

{x− λ, if we are trying to prove (a);x2 + αx+ β, if we are trying to prove (b).

Let d denote the degree of p(x). Thus d = 1 or d = 2, depending onwhether we are trying to prove (a) or (b).

The basic idea of the proof is to proceed by induction on m, the numberof blocks along the diagonal in Eq. 10.23. If m = 1, then dim(V ) = 1 ordim(V ) = 2. In this case the discussion preceding this theorem implies thatthe desired result holds. Our induction hypothesis is that for m > 1, thedesired result holds when m is replaced with m− 1.


Let B be a basis of V with respect to which T has the block upper-triangular matrix of Eq. 10.23. Let Uj denote the span of the basis vectorscorresponding to Aj. So dim (Uj) = 1 if Aj is 1 × 1 and dim(Uj) = 2 if Ajis a 2× 2 matrix (with no eigenvalues). Let

U = U1 + U2 + · · ·+ Um−1 = U1 ⊕ U2 ⊕ · · · ⊕ Um−1.

Clearly U is invariant under T and the matrix of T |U with respect to thebasis B′ obtained from the basis vectors corresponding to A1, . . . , Am−1 is

[T |U ]B′ =

A1 ∗. . .

0 Am−1

, (10.24)

Suppose that dim(U) = n′, so n′ is either n− 1 or n− 2. Also,dim(null(p(T |U)n

′)) = dim(null(p(T |U)n)). Hence our induction hypothesis

implies that

Precisely 1d

dim(null(p(T |U))n) of the matricesA1, . . . , Am−1 have characteristic polynomial p.

(10.25)

Let um be a vector in Um. T (um) might not be in Um, since the entries ofthe matrix A in the columns above the matrix Am might not all be 0. Butwe can project T (um) onto Um. So let S ∈ L(Um) be the operator whosematrix with respect to the basis corresponding to Um is Am. It follows thatS(um) = PUm,UT (um). Since V = U⊕Um, we know that for any vector v ∈ Vwe have v = PU,Um(v) + PUm,U(v). Putting T (um) in place of v, we have

T (um) = PU,UmT (um) + PUm,UT (um) (10.26)

= ∗u+ S(um),

where ∗u denotes an unknown vector in U . (Each time the symbol ∗u is used,it denotes some vector in U , but it might be a different vector each time thesymbol is used.) Since S(um) ∈ Um, so T (S(um)) = ∗u + S2(um), we canapply T to both sides of Eq. 10.26 to obtain

T 2(um) = ∗u+ S2(um). (10.27)

(It is important to keep in mind that each time the symbol ∗u is used, itprobably means a different vector in U .)


Using equations Eqs. 10.26 and 10.27 it is easy to show that

p(T )(um) = ∗u+ p(S)um. (10.28)

Note that p(S)(um) ∈ Um. Thus iterating the last equation gives

p(T )n(um) = ∗u+ p(S)n(um). (10.29)

Since V = U⊕Um, for any v ∈ V , we can write v = ∗u+um, with ∗u ∈ Uand um ∈ Um. Then using Eq. 10.29 and the fact that U is invariant underany polynomial in T , we have

p(T )n(v) = ∗u+ p(S)n(um), (10.30)

where v = ∗u + um as above. If v ∈ null(p(T )n), we now see that 0 =∗u+ P (S)n(um), from which it follows that P (S)n(um) = 0.

The proof now breaks into two cases: Case 1 is where p(x) is not thecharacteristic polynomial of Am; Case 2 is where p(x) is the characteristicpolynomial of Am.

So consider Case 1. Since p(x) is not the characteristic polynomial of Am,we see that p(S) must be invertible. This follows from Theorem 10.6.3 andthe discussion immediately following, since the dimension of Um is at most2. Hence P (S)n(um) = 0 implies um = 0. This says:

The null space of P (T )n is contained in U. (10.31)

This says that

null(p(T )n) = null(p(T |U)n).

But now we can apply Eq. 10.25 to see that precisely(

1d

)dim(null(p(T )n)

of the matrices A1, . . . , Am−1 have characteristic polynmial p(x). But thismeans that precisely

(1d

)dim(null(p(T )n) of the matrices A1, . . . , Am have

characteristic polynomial p(x). This completes Case 1. Now suppose thatp(x) is the characteristic polynomial of Am. It is clear that dim(Um) = d.

Lemma 10.6.4. We claim that

dim(null(p(T )n)) = dim(null(p(T |U)n) + d.


This along with the induction hypothesis Eq. 10.25 would complete theproof of the theorem.

But we still have some work to do.

Lemma 10.6.5. V = U + null(p(T )n).

Proof. Because the characteristic polynomial of the matrix Am of S equalsp(x), we have p(S) = 0. So if um ∈ Um, from Eq. 10.26 we see thatp(T )(um) ∈ U . So

p(T )n(um) = p(T )n−1(p(T )(um)) ∈ range(p(T |U)n−1) = range(p(T |U)n),

where the last identity follows from the fact that dim(U) < n. Thus we canchoose u ∈ U such that p(T )n(um) = p(T )n(u). Then

p(T )n(um − u) = p(T )n(um)− p(T )n(u)

= p(T |U)n(u)− p(T |U)n(u) (10.32)

= 0.

This says that um−u ∈ null(p(T )n). Hence um, which equals u+(um−u),is in U + null(p(T )n), implying Um ⊆ U + null(p(t)n). Therefore V =U + null(p(T )n) ⊆ V . This proves the lemma 10.6.5.

Since dim(Um) = d and dim(U) = n− d, we have

dim(null(p(T )n) = dim(U ∩ null(p(T )n) + dim(U + null(p(T )n)− dimU

= dim(null(p(T |U)n)) + dim(V )− (n− d) (10.33)

= dim(null(p(T |U)n)) + d. (10.34)

This completes a proof of the claim in Lemma 10.6.4, and hence a proofof Theorem 10.6.3

Suppose V is a real vector space and T ∈ L(V ). An ordered pair (α, β)of real numbers is called an eigenpair of T if α2 < 4β and

T 2 + αT + βI

is not injective. The previous theorem shows that T can have only finitelymany eigenpairs, because each eigenpair correpsponds to the characteristic


polynomial of a 2× 2 matrix on the diagonal of A in Eq. 10.23, and there isroom for only finitely many such matrices along that diagonal.

We define the multiplicity of an eigenpair (α, β) of T to be

dim(null(

(T 2 + αT + βI)dim(V ))

)

2.

From Theorem 10.5.3 we see that the multiplicity of (α, β) equals thenumber of times that x2 + αx+ β is the characteristic polynomial of a 2× 2matrix on the diagonal of A in Eq. 10.24.

Theorem 10.6.6. If V is a real vector space and T ∈ L(V ), then the sumof the multiplicities of all the eigenvalues of T plus the sum of twice themultiplicities of all the eigenpairs of T equals dim(V ).

Proof. There is a basis of V with respect to which the matrix of T is asin Theorem 10.5.3. The multiplicity of an eigenvalue λ equals the numberof times the 1 × 1 matrix [λ] appears on the diagonal of this matrix (from10.5.3). The multiplicity of an eigenpair (α, β) equal the number of timesx2 +αx+β is the characteristic polynomial of a 2×2 matirx on the diagonalof this matrix (from 10.5.3). Because the diagonal of this matrix has lengthdim(V ), the sum of the multiplicities of all the eigenvalues of T plus the sumof twice the multiplicities of all the eigenpairs of T must equal dim(V ).

Axler’s approach to the characteristic polynomial of a real matrix is todefine them for matrices of sizes 1 and 2, and then define the characteristicpolynomial of a real matrix A as follows. First find the matrix J = P−1APthat is the Jordan form of A. Then the characteristic polynomial of A isthe product of the characteristic polynomials of the 1× 1 and 2× 2 matricesalong the diagonal of J . Then he gives a fairly involved proof (page 207)that the Cayley-Hamilton theorem holds, i.e., the characteristic polynomialof a matrix A has A as a zero. So the minimal polynomial of A divides thecharacteristic polynomial of A.

Theorem 10.6.7. Suppose V is a real, n-dimensional vector space and T ∈L(V ). Let λ1, . . . , λm be the distinct eigenvalues of T , with U1, . . . , Um thecorresponding spaces of generalized eigenvectors. So Uj = null(T − λjI)n.Let (α1, β1), . . . , (αr, βr) be the distinct eigenpairs of T . Let Vj = null(T 2 +αjT + βj)

n, for 1 ≤ j ≤ r. Then

10.7. EXERCISES 227

(a) V = U1 ⊕ · · · ⊕ Um ⊕ V1 ⊕ · · · ⊕ Vr.(b) Each Ui and each Vj are invariant under T .(c) Each (T − λiI)|Ui and each (T 2 + αjT + βjI)|Vj are nilpotent.

10.7 Exercises

1. Prove or disprove: Two n×n real matrices with the same characteristicpolynomials and the same minimal polynomials must be similar.

2. Prove: Let A be an n × n idempotent matrix (i.e., A2 = A) with realentries (or entries from ANY field). Prove that A must be diagonaliz-able.

3. Suppose A is a block upper-triangular matrix

A =

A1 ∗. . .

0 Am

,

where each Aj is a square matrix. Prove that the set of eigenvalues ofA equals the union of the sets of eigenvalues of A1, . . . , Am.

4. Suppose V is a real vector space and T ∈ L(V ). Suppose α, β ∈ R aresuch that α2 < 4β. Prove that

null(T 2 + αT + βI)k

has even dimension for every positive integer k.

5. Suppose V is a real vector space and T ∈ L(V ). Suppose α, β ∈ R aresuch that α2 < 4β and T 2 + αT + βI is nilpotent. Prove that dim(V )is even and

(T 2 + αT + βI)dim(V )/2 = 0.

6. Let a and b be arbitrary complex numbers with b 6= 0. Put A =(a b−b a

). Define a map TA on M2(C) by

TA : M2(C)→M2(C) : B 7→ AB −BA.


(i) Show that TA ∈ L(Mn(C)), i.e., show that TA is linear.

(ii) Let B be the ordered basis of M2(C) given by

B = {E11, E12, E21, E22} ,

where Eij has a 1 in position (i, j) and zeros elsewhere. Computethe matrix [TA]B that represents TA with respect to the basis B.

(iii) Compute the characteristic polynomial of the linear operator TAand determine its eigenvalues.

(iv) Compute a basis of each eigenspace of TA.

(v) Give a basis for the range of TA.

(vi) Give the Jordan form of the matrix [TA]B and explain why yourform is correct.

7. Define T ∈ L(C3) by T : (x1, x2, x3) 7→ (x3, 0, 0).

(a) Find all eigenvalues and correspopnding eigenvectors of T .

(b) Find all generalized eigenvectors of T .

(c) Compute the minimal polynomial for T .

(d) Give the Jordan form for T .

8. An n×n matrix A is called nilpotent if Ak = 0 for some positive integerk. Prove:

(a) A is nilpotent if and only if all its eigenvalues are 0.

(b) If A is a nonzero real nilpotent matrix, it can not be symmetric.

(c) If A is a nonzero complex nilpotent matrix, it can not be hermitian(i.e., self-adjoint).

9. Let A =

1 1 0 00 0 1 00 0 0 10 1 0 0

, where A is to be considered as a matrix over

C.

10.7. EXERCISES 229

(a) Determine the minimal and characteristic polynomials of A and theJordan form for A.

(b) Determine all generalized eigenvectors of A and a basis B of C4 withrespect to which the operator TA : x 7→ Ax has Jordan form. Use thisto write down a matrix P such that P−1AP is in Jordan form.

10. Let A be an n× n symmetric real matrix with some power of A beingthe identity matrix. Show that either A = I or A = −I or A2 = I butA is not a scalar times I.

11. LetV be the usual real vector space of all 3 × 3 real matrices. Definesubsets of V by

U = {A ∈ V : AT = A} and W = {A ∈ V : AT = −A}.

(a) (3 points) Show that both U and W are subspaces of V .

(b) (3 points) Show that V = U ⊕W .

(c) (3 points) Determine the dimensions of U and W .

(d) (3 points) Let Tr : V → R : A 7→ trace(A). Put R = {A ∈ U :Tr(A) = 0}. Show that R is a subspace of U and compute a basisfor R.

(e) (8 points) Put P =

0 1 00 0 11 0 0

and define T : W → W : B 7→

PB + BP T . Show that T ∈ L(W ) and determine a basis B forW . Then compute the matrix [T ]B. What are the minimal andcharacteristic polynomials for T?

12. LetW be the space of 3×2 matrices overR. Put V1 =

a 0

b 0c 0

: a, b, c ∈ R

,

and V2 =

0 a

0 b0 c

: a, b, c ∈ R

. So W is isomorphic to V1 ⊕ V2.

Let A be the matrix

A =

1 2 32 0 43 4 −1


Define a map T ∈ L(W ) by T : W → W : B 7→ AB.

Compute the eigenvalues of T , the minimal polynomial for T , and thecharacteristic polynomial for T . Compute the Jordan form for T . (Hint:

One eigenvector of the matrix A is

111

.)

13. Let V = R5 and let T ∈ L(V ) be defined by T (a, b, c, d, e) = (2a, 2b, 2c+d, a+ 2d, b+ 2e).

(a) Find the characteristic and minimal polynomial of T .

(b) Determine a basis of F 5 consisting of eigenvectors and generalizedeigenvectors of T .

(c) Find the Jordan form of T (or of M(T )) with respect to yourbasis.

14. Suppose that S, T ∈ L(V ). Suppose that T has dim(V ) distinct eigen-values and that S has the same eigenvectors as T (though not neces-sarily with the same eigenvalues). Prove that ST = TS.

15. Let k be positive integer greater than 1 and let N be an n×n nilpotentmatrix over C. Say Nm = 0 6= Nm−1 for some positive integer m ≥ 2.

(a) Prove that I +N is invertible

(b) Prove that there is some polynomial f(x) ∈ C[x] with degree atmost m − 1 and constant term equal to 1 for which B = f(N)satisfies Bk = I +N .

16. Let A be any invertible n×n complex matrix, and let k be any positiveinteger greater than 1. Show that there is a matrix B for which Bk = A.

17. If A is not invertible the situation is much more complicated. Showthat if the minimal polynomial for A has 0 as a root with multiplicity1, then A has a k-th root. In particular if A is normal, then it has ak-th root.

18. If A is nilpotent a wide variety of possibilities can occur concerningk-th roots.

10.7. EXERCISES 231

(a) Show that there is an A ∈M10(C) for which A5 = 0 but A4 6= 0.

(b) Show that any A as in Part (a) cannot have a cube root or anyk-th root for k ≥ 3.

19. Let m, k be integers greater than 1, and let J be an m × m Jordanblock with 1’s along the superdiagonal and zeros elsewhere. Show thatthere is no m×m complex matrix B for which Bk = J .

Chapter 11

Matrix Functions∗

11.1 Operator Norms and Matrix Norms∗

Let V and W be vector spaces over F having respective norms ‖·‖V and ‖·‖W .Let T ∈ L(V,W ). One common problem is to understand the “size” of thelinear map T in the sense of its effects on the magnitude of the inputs. Foreach nonzero v ∈ V , the quotient ‖T (v)‖W/‖v‖V measures the magnificationcaused by the transformation T on that specific vector v. An upper bound onthis quotient valid for all v would thus measure the overall effect of T on thesize of vectors in V . It is well known that in every finite dimensional spaceV , the quotient ‖T (v)‖W/‖v‖V has a maximum value which is achieved withsome specific vector v0. Note that if v is replaced by cv for some nonzeroc ∈ F , the quotient is not changed. Hence if O = {x ∈ V : ‖x‖V = 1}, thenwe may define a norm for T (called a transformation norm) by

‖T‖V,W = max{‖T (v)‖W/‖v‖V : ~0 6= v ∈ V } = max{‖T (v)‖W : v ∈ O}.(11.1)

(In an Advanced Calculus course a linear map T is shown to be continu-ous, and the continuous image of a compact set is compact.) In this definitionwe could replace “max” with “supremum” in order to guarantee that ‖T‖V,Wis always defined even when V and W are not finite dimensional. However,in this text we just deal with the three specific norms we have already de-fined on the finite dimensional vector spaces (See Section 8.1). (For moreon matrix norms see the book MATRIX ANALYSIS by Horn and Johnson.)The norm of a linear map from V to W defined in this way is a vector norm

233

234 CHAPTER 11. MATRIX FUNCTIONS∗

on the vector space L(V,W ). (See Exercise 1.) We can actually say a bitmore.

Theorem 11.1.1. Let U , V and W be vector spaces over F endowed withvector norms ‖ · ‖U , ‖ · ‖V , ‖ · ‖W , respectively. Let S, T ∈ L(U, V ) andL ∈ L(V,W ) and suppose they each have norms defined by Eq. 11.1 Then:

(a) ‖T‖U,V ≥ 0, and ‖T‖U,V = 0 if and only if T (u) = ~0 for all u ∈ V .(b) ‖aT‖U,V = |a| ‖T‖U,V for all scalars a.(c) ‖S + T‖U,V ≤ ‖S‖U,V + ‖T‖U,V .(d) ‖T (u)‖V ≤ ‖T‖U,V · ‖u‖U for all u ∈ U .(e) ‖I‖U,U = 1, where I ∈ L(U) is defined by I(u) = u for all u ∈ U .(f) ‖L ◦ T‖U,W ≤ ‖L‖V,W · ‖T‖U,V .(g) If U = V , then ‖T i‖U,U ≤ (‖T‖U,U)i.

Proof. Parts (a), (d) and (e) follow immediately from the definition of thenorm of a linear map. Part (b) follows easily since |a| can be factored outof the appropriate maximum (or supremum). For part (c), from the triangleinequality for vector norms, we have

‖(S + T )(u)‖V = ‖S(u) + T (u)‖V ≤ ‖S(u)‖V + ‖T (u)‖V≤ (‖S‖U,V + ‖T‖U,V )‖u‖U ,

from part (d). This says the quotient used to define the norm of S + T isbounded above by (‖S‖U,V + ‖T‖U,V ), so the least upper bound is less thanor equal to this. Part (f) follows in a similar manner, and then (g) followsby repeated applications of part (f).

Let A be an m × n matrix over F (with F a subfield of C). As usual,we may consider the linear map TA : F n → Fm : x 7→ Ax. We consider thenorm on L(F n, Fm) induced by each of the standard norms ‖·‖1, ‖·‖2, ‖·‖∞(see Section 8.1), and use it to define a corresponding norm on the vectorspace of m× n matrices. It seems natural to let the transformation norm ofTA also be a “norm” of A. Specifically, we have

‖A‖1 = max{‖Ax‖1‖x‖1 : x 6= ~0

}‖A‖2 = max

{‖Ax‖2‖x‖2 : x 6= ~0

}‖A‖∞ = max

{‖Ax‖∞‖x‖∞ : x 6= ~0

}.

11.1. OPERATOR NORMS AND MATRIX NORMS∗ 235

Theorem 11.1.2. Let A be m× n over the field F . Then:(a) ‖A‖1 = max{

∑mi=1 |Aij| : 1 ≤ j ≤ n} (maximum absolute column

sum).(b) ‖A‖∞ = max{

∑nj=1 |Aij| : 1 ≤ i ≤ m} (maximum absolute row

sum).(c) ‖A‖2 = maximum singular value of A.

Proof. To prove (a), observe

‖Ax‖1 =m∑i=1

∣∣∣∣∣n∑j=1

Aijxj

∣∣∣∣∣ ≤m∑i=1

n∑j=1

|Aij| |xj|

=n∑j=1

(m∑i=1

|Aij|

)|xj| ≤

n∑j=1

(maxj

∑i

|Aij|

)|xj|

=

(maxj

m∑i=1

|Aij|

)‖x‖1 = α‖x‖1.

So ‖A‖1 ≤ α = maxj∑m

i=1 |Aij|. To complete the proof of (a) weneed to construct a vector x ∈ F n such that ‖Ax‖1 = α‖x‖1. Put x =(0, 0, . . . , 1, . . . , 0)T where the single nonzero entry 1 is in column j0, andthe maximum value of

∑mi=1 |Aij| occurs when j = j0. Then ‖x‖1 = 1, and

‖Ax‖1 =∑m

i=1 |Aij0 | = α = α‖x‖1 as desired.We now consider part (b). Define α by

α = max{n∑j=1

|Aij| : 1 ≤ i ≤ m}.

For an arbitrary v = (v1, . . . , vn)T ∈ F n suppose that the maximum definingα above is attained when i = i0. Then we have

‖TA(v)‖∞ = ‖Av‖∞ = maxi|(Av)i| = maxi

∣∣∣∣∣n∑j=1

Aijvj

∣∣∣∣∣≤ maxi

n∑j=1

(|Aij| |vj|) ≤ maxi

n∑j=1

(|Aij| maxk|vk|)

≤ α‖v‖∞,


so ‖TA(v)‖∞/‖v‖∞ ≤ α for all nonzero v. This says that this norm of TA isat most α. To show that it equals α we must find a vector v such that theusual quotient equals α. For each j = 1, . . . , n, chose vj with absolute value1 so that Ai0jvj = |Ai0j|. Clearly ‖v‖∞ = 1, so that ‖TA(v)‖∞ ≤ α, whichwe knew would have to be the case anyway. On the other hand

|(TA(v))i0| =

∣∣∣∣∣n∑j=1

Ai0jvj

∣∣∣∣∣ =

∣∣∣∣∣n∑j=1

|Ai0j|

∣∣∣∣∣ = α.

This shows that ‖TA(v)‖∞ = α for this particular v, and hence this norm ofTA is α.

Part (c) is a bit more involved. First note that ‖v‖ as used in earlierchapters is just ‖v‖2. Suppose that S is unitary and m ×m, and that A ism×n. Then ‖(SA)(v)‖2 = ‖S(A(v))‖2 = ‖(A(v)‖2 for all v ∈ F n. It followsimmediately that ‖SA‖2 = ‖A‖2. Now let A be an arbitrary m × n matrixwith singular value decomposition A = UΣV ∗. So ‖A‖2 = ‖UΣV ∗‖2 =‖ΣV ∗‖2. Since ‖ · ‖2 is a matrix norm coming from a transformation norm,by part (f) of Theorem 11.1.1 ‖ΣV ∗‖2 ≤ ‖Σ‖2 ‖V ∗‖2 = ‖Σ‖2. If Σ =diag(σ1, σ2, . . . , σk, 0, . . . , 0), with σ1 ≥ σ2 ≥ · · · ≥ σk > 0, then ‖Σx‖2 =‖(σ1x1, . . . , σkxk, 0, . . . , 0)T‖2 =

√σ2

1x21 + · · ·σ2

kx2k ≤ σ1 ·‖x‖2. So ‖Σ‖2 ≤ σ1.

Suppose x = (1, 0, . . . , 0)T . Then Σx = (σ1x1, 0, . . . , 0)T . So ‖x‖2 = 1,‖Σx‖2 = |σ1x1| = σ1 = σ1‖x‖2. This says ‖Σ‖2 = σ1. So we know that‖ΣV ∗‖2 ≤ σ1. Let x be the first column of V , so V ∗x = (1, 0, . . . , 0)T . Then‖ΣV ∗x‖2 = σ1, showing that ‖ΣV ∗‖2 = ‖Σ‖2 = σ1, so that finally we see‖A‖2 = σ1.

11.2 Polynomials in an Elementary Jordan

Matrix∗

Let Nn denote the n × n matrix with all entries equal to 0 except for thosealong the super diagonal just above the main diagonal:

Nn =

0 1 0 · · · 00 0 1 · · · 00 0 0 · · · 0...

.... . . 0 1

0 · · · · · · 0 0

. (11.2)

11.2. POLYNOMIALS IN AN ELEMENTARY JORDAN MATRIX∗ 237

So (Nn)ij =

{1, if j = i+ 1;0, otherwise.

The following lemma is easily established

Lemma 11.2.1. For 1 ≤ m ≤ n − 1, (Nmn )ij =

{1, if j = i+m;0, otherwise.

Also,

N0n = I and Nn

n = 0.

Then let Jn(λ) be the elementary n × n Jordan block with eigenvalue λgiven by Jn(λ) = λI +Nn. So

Jn(λ) =

λ 1 0 · · · 00 λ 1 · · · 00 0 λ · · · 0...

.... . . λ 1

0 · · · · · · 0 λ

= λI +Nn,

where Nn is nilpotent with minimal polynomial xn.

Theorem 11.2.2. In this Theorem (and until notified otherwise) write J inplace of Jn(λ). For each positive integer m,

Jm =

λm

(m1

)λm−1 · · ·

(mm−1

)λm−n+1

λm · · ·(mm−2

)λm−n+2

. . .

λm

,

that is,

(Jm)ij =

(m

j − i

)λm−j+i ∀ 1 ≤ i, j ≤ n.

Proof. It is easy to see that the desired result holds form = 1 by the definitionof J , so suppose that it holds for a given m ≥ 1. Then

(Jm+1)ij = (J · Jm)ij =n∑k=1

Jik(Jm)kj = λ(Jm)ij + 1 · (Jm)i+1,j

= λ

(m

j − i

)λm−j+i +

(m

j − i− 1

)λm−j+i+1

=

(m

j − i

)λm+1−j+i +

(m

j − i− 1

)λm+1−j+i =

(m+ 1

j − i

)λm+1−j+i.


Now suppose that f(x) =∑k

m=0 amxm, so that f(J) = a0I + a1J + · · ·+

akJk. Recall (or prove by induction on s) that for s ≥ 0,

f (s)(x) =k∑

m=0

am

(m

s

)s!xm−s, so

k∑m=0

am

(m

s

)λm−s =

f (s)(λ)

s!. (11.3)

So for s ≥ 0,

(f(J))i,i+s =k∑

m=0

am

(m

s

)λm−s =

1

s!f (s)(λ).

This says that

f(J) =

f(λ) 1

1!f (1)(λ) 1

2!f (2)(λ) · · · 1

(n−1)!f (n−1)(λ)

0 f(λ) 11!f (1)(λ) · · · 1

(n−2)!f (n−2)(λ)

0 0 f(λ) · · · 1(n−3)!

f (n−3)(λ)...

.... . .

...0 0 · · · · · · f(λ)

. (11.4)

Now suppose that J is a direct sum of elementary Jordan blocks:

J = J1 ⊕ J2 ⊕ · · · ⊕ Js.

Then for the polynomial f(x) we have

f(J) = f(J1)⊕ f(J2)⊕ · · · ⊕ f(Js).

Here f(J1), . . . , f(Js) are polynomials in separate Jordan blocks, whose val-ues are given by Eq. 11.3. This result may be applied in the computation off(A) even when A is not originally in Jordan form. First determine a T forwhich J = T−1AT has Jordan form. Then compute f(J) as above and use

f(A) = f(TJT−1) = Tf(J)T−1.

11.3 Scalar Functions of a Matrix∗

For certain functions f : F → F (satisfying requirements stipulated below)we can define a matrix function f : A 7→ f(A) so that if f is a polynomial, the

11.3. SCALAR FUNCTIONS OF A MATRIX∗ 239

value f(A) agrees with the value given just above. Start with an arbitrarysquare matrix A over F and let λ1, . . . , λs be the distinct eigenvalues of A.Reduce A to Jordan form

T−1AT = J1 ⊕ J2 ⊕ · · · ⊕ Jt,

where J1, . . . , Jt are elementary Jordan blocks. Consider the Jordan block

Ji = Jni(λi) =

λi 1 0 · · · 0

λi 1 · · · 0...

...λi

, (11.5)

which has (x − λi)ni as its minimal and characteristic polynomials. If the

function f : F → F is defined in a neighborhood of the point λi and hasderivatives f (1)(λi), . . . , f

(ni−1)(λi), then define f(Ji) by

f(Ji) =

f(λ) 1

1!f (1)(λ) 1

2!f (2)(λ) · · · 1

(n−1)!f (n−1)(λ)

0 f(λ) 11!f (1)(λ) · · · 1

(n−2)!f (n−2)(λ)

0 0 f(λ) · · · 1(n−3)!

f (n−3)(λ)...

.... . .

...0 0 · · · · · · f(λ)

. (11.6)

This says that

(f(Ji))r,r+j =1

j!f (j)(λ), for 0 ≤ j ≤ n− r. (11.7)

If f is defined in a neighborhood of each of the eigenvalues λ1, . . . , λs andhas (finite) derivatives of the proper orders in these neighborhoods, then also

f(J) = f(J1)⊕ f(J2)⊕ · · · ⊕ f(Jt), (11.8)

andf(A) = Tf(J)T−1 = Tf(J1)T−1 ⊕ · · · ⊕ Tf(Jt)T

−1. (11.9)

The matrix f(A) is called the value of the function f at the matrix A.We will show below that f(A) does not depend on the method of reducingA to Jordan form (i.e., it does not depend on the particular choice of T ),


and thus f really defines a function on the n × n matrices A. This matrixfunction is called the correspondent of the numerical function f . Not allmatrix functions have corresponding numerical functions. Those that do arecalled scalar functions.

Here are some of the simplest properties of scalar functions.

Theorem 11.3.1. It is clear that the definition of scalar function was chosenprecisely so that part (a) below would be true.

(a) If f(λ) is a polynomial in λ, then the value f(A) of the scalar functioinf coincides with the value of the polynomial f(λ) evaluated at λ = A.

(b) Let A be a square matrix over F and suppose that f1(λ) and f2(λ) aretwo numerical functions for which the expressions f1(A) and f2(A) aremeaningful. If f(λ) = f1(λ) + f2(λ), then f(A) is also meaningful andf(A) = f1(A) + f2(A).

(c) With A, f1 and f2 as in the preceding part, if f(λ) = f1(λ)f2(λ), thenf(A) is meaningful and f(A) = f1(A)f2(A).

(d) Let A be a matrix with eigenvalues λ1, . . . , λn, each appearing as oftenas its algebraic multiplicity as an eigenvalue of A. If f : F → F isa numerical function and f(A) is defined, then the eigenvalues of thematrix f(A) are f(λ1), . . . , f(λn).

Proof. The proofs of parts (b) and (c) are analogous so we just give thedetails for part (c). To compute the values f1(A), f2(A) and f(A) accordingto the definition, we must reduce A to Jordan form J and apply the formulasgiven in Eqs. 11.8 and 11.9. If we show that f(J) = f1(J)f2(J), then fromEq. 11.8 we immediately obtain f(A) = f1(A)f2(A). In fact,

f(J) = f(J1)⊕ · · · ⊕ f(Jt),

and

f1(J)f2(J) = f1(J)f2(J) = f1(J1)f2(J1)⊕ · · · ⊕ f1(Jt)f2(Jt),

so that the proof is reduced to showing that

f(Ji) = f1(Ji)f2(Ji), (i = 1, 2, . . . , t),

11.3. SCALAR FUNCTIONS OF A MATRIX∗ 241

where Ji is an elementary Jordan block.Start with the values of f1(Ji) and f2(Ji) given in Eq. 11.6, and multiply

them together to find that

[f1(Ji)f2(Ji)]r,r+s =s∑j=0

(f1(Ji))r,r+j(f2(Ji))r+j,r+s =

=s∑j=0

1

j!f

(j)1 (λi)

1

(s− j)!f

(s−j)2 (λi) =

=1

s!

s∑j=0

s!

j!(s− j)!f

(j)1 (λi)f

(s−j)2 (λi) =

1

s!(f1f2)(s)(λi),

where the last equality comes from Exercise 2, part (i).Thus f1(Ji)f2(Ji) = f(Ji), completing the proof of part (c) of the theo-

rem. For part (d), the eigenvalues of the matrices f(A) and T−1f(A)T =f(T−1AT ) are equal, and therefore we may assume that A has Jordan form.Formulas in Eq. 11.5 and 11.6 show that in this case f(A) is upper triangularwith f(λ1), . . . , f(λn) along the main diagonal. Since the diagonal elementsof an upper triangular matrix are its eigenvalues, part (d) is proved.

Similarly, if f and g are functions such that f(g(A)) is defined, and ifh(λ) = f(g(λ)), then h(A) = f(g(A)).

To finish this section we consider two examples.

Example 11.3.2. Let f(λ) = λ−1. This function is defined everywhereexcept at λ = 0, and has finite derivatives of all orders everywhere it isdefined. Consequently, if the matrix A does not have zero as an eigenvalue,i.e., if A is nonsingular, then f(A) is meaningful. But λ · f(λ) = 1, henceA·f(A) = I, so f(A) = A−1. Thus the matrix function A 7→ A−1 correspondsto the numerical function λ 7→ λ−1, as one would hope.

Example 11.3.3. Let f(λ) =√λ. To remove the two-valuedness of

√· it is

sufficient to slit the complex plane from the origin along a ray not containingany eigenvalues of A, and to consider one branch of the radical. Then thisfunction, for λ 6= 0, has finite derivatives of all orders. It follows that theexpression

√A is meaningful for all nonsingular matrices A. Putting λ = A

in the equationf(λ)f(λ) = λ,


we obtain

f(A)f(A) = A.

This shows that each nonsingular matrix has a square root.

11.4 Scalar Functions as Polynomials∗

At this point we need to generalize the construction given in Lagrange inter-polation (see Section 5.3).

Lemma 11.4.1. Let r1, . . . , rs be distinct complex numbers, and for each i,1 ≤ i ≤ s, let mi be a nonnegative integer. Let (aij) be a table of arbitrarynumbers , 1 ≤ i ≤ s, and for each fixed i, 0 ≤ j ≤ mi. Then there exists apolynomial p(x) such that p(j)(ri) = aij, for 1 ≤ i ≤ s and for each fixed i,0 ≤ j ≤ mi.

Proof. It is convenient first to construct auxiliary polynomials pi(x) suchthat pi(x) and its derivatives to the mith order assume the required valuesat the point ri and are all zero at the other given points. Put

φi(x) = bi0 + bi1(x− ri) + · · ·+ bimi(x− ri)mi =

mi∑j=0

bij(x− ri)j,

where the bij are complex numbers to be determined later. Note that φ(j)i (ri) =

j!bij.Set

Φi(x) = (x− r1)mi+1 · · · (x− ri−1)mi+1(x− ri+1)mi+1 · · · (x− rs)mi+1,

and

pi(x) = φi(x)Φi(x) =

{mi∑j=0

bij(x− ri)j}

1≤j≤s∏j 6=i

(x− rj)mi+1.

By the rule for differentiating a product (see Exercise 2),

p(j)i (ri) =

j∑l=0

(j

l

)φ

(l)i (ri)Φ

(j−l)i (ri),

11.4. SCALAR FUNCTIONS AS POLYNOMIALS∗ 243

or

aij =

j∑l=0

(j

l

)l!bilΦ

(j−l)i (ri). (11.10)

Using Eq. 11.10 with j = 0 and the fact that Φi(ri) 6= 0 we find

bi0 =ai0

Φi(ri), 1 ≤ i ≤ s. (11.11)

For each i = 1, 2, . . . , s, and for a given j with 0 ≤ j < mi, once bil isdetermined for all l with 0 ≤ l < j, we can solve Eq. 11.10 for

bij =aij

j!Φi(ri)−∑j−1

l=0 bilΦ(j−l)i (ri)

(j − l)!Φi(ri), 1 ≤ i ≤ s. (11.12)

This determines pi(x) so that pi(x) and its derivatives up to the mith deriva-tive have all the required values at ri and all equal zero at rt for t 6= i. It isnow clear that the polynomial p(x) = p1(x) + p2(x) + · · ·+ ps(x) satisfies allthe requirements of the Lemma.

Consider a numerical function λ 7→ f(λ) and an n×n matrix A for whichthe value f(A) is defined. We show that there is a polynomial p(x) for whichp(A) equals f(A). Let λ1, . . . , λs denote the distinct eigenvalues of the matrixA. Using only the proof of the lemma we can construct a polynomial p(x)which satisfies the conditions

p(λi) = f(λi), p′(λi) = f ′(λi), . . . , p

(n−1)(λi) = f (n−1)(λi), (11.13)

where if some of the derivatives f (j)(ri) are superfluous for the determinationof f(A), then the corresponding numbers in Eq. 11.12 may be replaced byzeros. Since the values of p(x) and f(λ) (and their derivatives) coincide atthe numbers λi, then f(A) = p(A). This completes a proof of the followingtheorem.

Theorem 11.4.2. The values of all scalar functions in a matrix A can beexpressed by polynomials in A.

Caution: The value f(A) of a given scalar function f can be representedin the form of some polynomial p(A) in A. However, this polynomial, for agiven function f will be different for different matrices A. For example, the


minimal polynomial p(x) of a nonsingular matrix A can be used to write A−1

as a polynomial in A, but for different matrices A different polynomials willoccur giving A−1 as a polynomial in A.

Also, considering the function f(λ) =√λ, we see that for every nonsin-

gular matrix A there exists a polynomial p(x) for which

p(A)p(A) = A.

VERY IMPORTANT: With the help of Theorem 11.4.2 we can nowresolve the question left open just preceding Theorem 11.3.1 concerningwhether f(A) was well-defined. If we know the function f(λ) and its deriva-tives at the points λ1, . . . , λs, we can construct the polynomial p(x) whosevalue p(A) does not depend on the reduction of the matrix A to Jordan form,and at the same time is equal to f(A). Consequently, the value f(A) definedin the preceding section using the reduction of A to Jordan form, does notdepend on the way this reduction is carried out.

Let f(λ) be a numerical function, and let A be a matrix for which f(A)is meaningful. By Theorem 11.4.2 we can find a polynomial p(x) for whichp(A) = f(A). For a given function f(λ), the polynomial p(x) depends onlyon the elementary divisors of the matrix A. But the elementary divisors ofA and its transpose AT coincide, so p(AT ) = f(AT ). Since for a polynomialp(x) we have p(AT ) = p(A)T , it must be that f(AT ) = f(A)T for all scalarfunctions f(λ) for which f(A) is meaningful.

11.5 Power Series∗

All matrices are n× n over F , as usual. Let

G(x) =∞∑k=0

akxk

be a power series with coefficients from F . Let GN(x) =∑N

k=0 akxk be the

Nth partial sum of G(x). For each A ∈Mn(F ) let GN(A) be the element ofMn(F ) obtained by substituting A in this polynomial. For each fixed i, j weobtain a sequence of real or complex numbers cNij , N = 0, 1, 2, . . . by takingcNij to be the (i, j) entry of the matrix GN(A). The series

G(A) =∞∑k=0

akAk

11.5. POWER SERIES∗ 245

is said to converge to the matrix C in Mn(F ) if for each i, j ∈ {1, 2, . . . , n}the sequence {cNij}∞N=0 converges to the (i, j) entry of C (in which case wewrite G(A) = C). We say G(A) converges if there is some C ∈ Mn(F ) suchthat G(A) = C.

For A = (aij) ∈Mn(F ) define

‖A‖ =n∑

i,j=1

|aij|,

i.e., ‖A‖ is the sum of the absolute values of all the entries of A.

Lemma 11.5.1. For all A,B ∈Mn(F ) and all a ∈ F(a) ‖A+B‖ ≤ ‖A‖+ ‖B‖;(b) ‖AB‖ ≤ ‖A‖ · ‖B‖;(c) ‖aA‖ = |a| · ‖A‖.

Proof. The proofs are rather easy. We just give a proof of (b).

‖AB‖ =∑i,j

|(AB)ij| =∑i,j

∣∣∣∣∣∑k

AikBkj

∣∣∣∣∣ ≤∑i,j,k

|Aik| · |Bkj| ≤

≤∑i,j,k,r

|Aik| · |Brj| = ‖A‖ · ‖B‖.

Suppose that G(x) =∑∞

k=0 akxk has radius of convergence equal to R.

Hence if ‖A‖ < R, then∑∞

k=0 ak‖A‖k converges absolutely, i.e.,∑∞

k=0 |ak| ·‖A‖k converges. But then |ak(AK)ij| ≤

∣∣ak‖Ak‖∣∣ ≤ |ak| ·‖A‖k, implying that∑∞k=0 |ak(Ak)ij| converges, hence

∑∞k=0 ak(A

k)ij converges. At this point wehave shown the following:

Lemma 11.5.2. If ‖A‖ < R where R is the radius of convergence of G(x),then G(A) converges to a matrix C.

We now give an example of special interest. Let f(λ) be the numericalfunction

f(λ) = exp(λ) =∞∑k=0

λk

k!.


Since this power series converges for all complex numbers λ, each matrixA ∈ Mn(F ) satisfies the condition in Lemma 11.5.2 so exp(A) converges toa matrix C = eA. Also, we know that f(λ) = exp(λ) has derivatives of allorders at each complex number. Hence f(A) is meaningful in the sense of thepreceding section. We want to be sure that the matrix C to which the series∑∞

k=01k!Ak converges is the same as the value f(A) given in the preceding

section. So let us start this section over.A sequence of square matrices

A1, A2, · · · , Am, Am+1, · · · , (11.14)

all of the same order, is said to converge to the matrixA provided the elementsof the matrices in a fixed row and column converge to the correspondingelement of the matrix A.

It is clear that if the sequences {Am} and {Bm} converge to matrices Aand B, respectively, then {Am + Bm} and {AmBm} converge to A + B andAB, respectively. In particular, if T is a constant matrix, and the sequence{Am} converges to A, then the sequence {T−1AmT} will converge to T−1AT .Further, if

Am = A(1)m ⊕ · · · ⊕ A(s)

m , (m = 1, 2, . . .),

where the orders of the blocks do not depend on m, then {Am} will converge

to some limit if and only if each block {A(i)m } converges separately.

The last remark permits a completely simple solution of the question ofthe convergence of a matrix power series. Let

a0 + a1x+ a2x2 + · · ·+ amx

m + · · · (11.15)

be a formal power series in an indeterminate x. The expression

a0I + a1A+ a2A2 + · · ·+ amA

m + · · · (11.16)

is called the corresponding power series in the matrix A, and the polynomial

fn(A) = a0I + a1A+ · · ·+ anAn

is the nth partial sum of the series. The series in Eq. 11.16 is convergentprovided the sequence {fn(A)}∞n=1 of partial sums has a limit. If this limitexists it is called the sum of the series.

Reduce the matrix A to Jordan form:


T−1AT = J = J1 ⊕ · · · ⊕ Jt,

where J1, · · · , Jt are elementary Jordan blocks. We have seen above thatconvergence of the sequence {fn(A)} is equivalent to the convergence of thesequence {T−1fn(A)T}. But

T−1fn(A)T = fn(T−1AT ) = fn(J) = fn(J1)⊕ · · · ⊕ fn(Jt),

and the question of the convergence of the series Eq. 11.16 is equivalent tothe following: Under what conditions is this series convergent for the Jordanblocks J1, · · · , Jt? Consider one of these blocks, say Ji. Let it have theelementary divisor (x − λi)

ni , i.e., it is an elementary Jordan block withminimal and characteristic polynomial equal to (x− λi)ni . By Eq. 11.4

fn(Ji) =

fn(λi)

11!f

(1)n (λi)

12!f

(2)n (λi) · · · 1

(n−1)!f

(n−1)n (λi)

0 fn(λi)11!f

(1)n (λi) · · · 1

(n−2)!f

(n−2)n (λi)

0 0 fn(λi) · · · 1(n−3)!

f(n−3)n (λi)

......

. . ....

0 0 · · · · · · f(λ)

. (11.17)

Consequently, {fn(Ji)} converges if and only if the sequences {f (j)n (λi)}∞n=1

for each j = 0, 1, . . . , ni − 1 converge, i.e., if and only if the series Eq. 11.15converges, and the series obtained by differentiating it term by term up toni − 1 times, inclusive, converges. It is known from the theory of analyticfunctions that all these series are convergent if either λi lies inside the circleof convergence of Eq. 11.15 or λi lies on the circle of convergence and the(ni − 1)st derivative of Eq. 11.15 converges at λi. Moreover, when λi liesinside the circle of convergence of Eq. 11.15, then the derivative of the seriesevaluated at λi gives the derivative of the original function evaluated at λi.Thus we have

Theorem 11.5.3. A matrix power series in A converges if and only if eacheigenvalue λi of A either lies inside the circle of convergence of the corre-sponding power series f(λ) or lies on the circle of convergence, and at thesame time the series of (ni − 1)st derivatives of the terms of f(λ) convergesat λi to the derivative of the function given by the original power series eval-uated at λi, where ni is the highest degree of an elementary divisor belonging


to λi (i.e., where ni is the size of the largest elementary Jordan block witheigenvalue λi). Moreover, if each eigenvalue λi of A lies inside the circle ofconvergence of the power series f(λ), then the jth derivative of the powerseries in A converges to the jth derivative of f(λ) evaluated at A.

Now reconsider the exponential function mentioned above. We know thatf(λ) =

∑∞k=0

λk

k!converges, say to eλ, for all complex numbers λ. Moreover,

the function f(λ) = eλ has derivatives of all orders at each λ ∈ C andthe power series obtained from the original by differentiating term by termconverges to the derivative of the function at each complex number λ. In fact,the derivative of the function is again the original function, and the powerseries obtained by differentiating the original power series term by term isagain the original power series. Hence the power series

∑∞k=0

1k!Ak not only

converges to some matrix C, it converges to the value exp(A) defined in theprevious section using the Jordan form of A. It follows that for each complexn× n matrix A there is a polynomial p(x) such that exp(A) = p(A).

Let Jλ denote the n× n elementary Jordan block

Jλ =

λ 1 0 . . . 00 λ 1 . . . 0...

. . . . . ....

0 λ 10 · · · · · · · · · λ

= λI +Nn.

If f(λ) = eλ, then we know

f(Jλ) =

eλ 1

1!eλ 1

2!eλ · · · 1

(n−1)!eλ

0 eλ 11!eλ · · · 1

(n−2)!eλ

.... . . . . .

...0 · · · · · · eλ 1

1!eλ

.

Now let t be any complex number. Let Bn(t) = Bt be the n × n matrixdefined by

Bt =

1 t

1!t2

2!· · · tn−1

(n−1)!

0 1 t1!· · · tn−2

(n−2)!

0 0 1 · · · tn−3

(n−3)!...

. . . . . ....

0 · · · · · · · · · 1

.


So (Bt)ij = 0 if j < i and (Bt)ij = tj−i

(j−i)! if i ≤ j.

With f(λ) = eλ put g(λ) = f(tλ) = etλ. A simple induction shows thatg(j)(λ) = tjg(λ). Then

g(Jλ) = etJλ = etλBt.

In particular,eJλ = eλB1.

We can even determine the polynomial p(x) for which p(Jλ) = eJλ . Put

pn(x) =n−1∑j=0

(x− λ)j

j!.

It follows easily that

p(j)n (λ) = 1 for 0 ≤ j ≤ n− 1, and p(n)

n (x) = 0.

From Eq. 11.6 we see that

pn(Jn(λ)) = Bn(1), so eλpn(Jn(λ)) = eλBn(1) = eJλ .

We next compute BtBs, for arbitrary complex numbers t and s. Clearlythe product is upper triangular. Then for i ≤ j we have

(Bt ·Bs)ij =

j∑k=i

(Bt)ik · (Bs)kj) =

j∑k=i

tk−i

(k − i)!· sj−k

(j − k)!

=1

(j − i)!

j∑k=i

(j − ij − k

)t(j−i)−(j−k)sj−k =

1

(j − i)!(t+ s)j−i = (Bt+s)ij.

HenceBt ·Bs = Bt+s = Bs ·Bt.

It now follows that

etJλ · esJλ = etλBt · esλBs = e(t+s)λBt+s = e(t+s)Jλ = esJλ · etJλ .

Fix the complex number s and put T0 = diag(sn−1, sn−2, . . . , s, 1). It isnow easy to verify that

T−10 · sJλ · T0 = Jsλ. (11.18)


Hence Jsλ is the Jordan form of sJλ. Also,(T0AT

−10

)ij

= sj−iAij. From this

it follows easily thatT0B1T

−10 = Bs.

It now follows that

esJλ = f(sJλ) = f(T0 · Jsλ · T−10 ) = T0 · f(Jsλ)T

−10 =

= T0 · eJsλ · T−10 = T0(esλB1)T−1

0 = esλBs,

which agrees with an earlier equation.Suppose A is a general n× n matrix with Jordan form

T−1AT = Jλ1 ⊕ · · · ⊕ Jλt , Jλi ∈Mni(F ).

Let T0 be the direct sum of the matrices diag(sni−1, sni−2, . . . , s, 1). Then

T−10 T−1sATT0 = Jsλ1 ⊕ · · · ⊕ Jsλt ,

soesA = T

(esλ1Bn1(s)⊕ · · · ⊕ esλtBnt(s)

)T−1.

Several properties of the exponential function are now easy corollaries.

Corollary 11.5.4. Using the above descriptions of A and esA, etc., we ob-tain:

(a) AeA = eAA for all square A.(b) esAetA = e(s+t)A = etAesA, for all s, t ∈ C.(c) e0 = I, where 0 is the zero matrix.(d) eA is nonsingular and e−A = (eA)−1.(e) eI = eI.

(f) det(eJλ) = enλ = etrace(Jλ), from which it follows that det(eA) =

etrace(A).

11.6 Commuting Matrices∗

Let A be a fixed n× n matrix over C. We want to determine which matricescommute with A. If A commutes with matrices B and C, clearly A commuteswith BC and with any linear combination of B and C. Also, if A commuteswith every n × n matrix, then A must be a scalar multiple A = aI of theidentity matrix I. (If you have not already verified this, do it now.)

11.6. COMMUTING MATRICES∗ 251

Now supposeT−1AT = J = J1 ⊕ · · · ⊕ Js (11.19)

is the Jordan form of A with each Ji an elementary Jordan block. It is easyto check that X commutes with J if and only if TXT−1 commutes with A.Therefore the problem reduces to finding the matrices X that commute withJ . Write X in block form corresponding to Eq. 11.19:

X =

X11 X12 · · · X1s

X21 X22 · · · X2s...

... · · · ...Xs1 Xs2 · · · Xss

. (11.20)

The condition JX = XJ reduces to the equalities

JpXpq = XpqJq (p, q = 1, · · · , s). (11.21)

Note that there is only one equation for each Xpq. Fix attention on oneblock Xpq = B = (bij). Suppose Jp is r × r and Jq is t× t. Then Xpq = B isr × t, Jp = λpI +Nr, Jq = λqI +Nt. From Eq. 11.21 we easily get

λpBuv +r∑

k=1

(Nr)ukBkv = λqBuv +t∑

k=1

Buk(Nt)kv, (11.22)

for all p, q, u, v with 1 ≤ p, q ≤ s, 1 ≤ u ≤ r; 1 ≤ v ≤ t.

Eq. 11.22 quickly gives the following four equations:

(For v 6= 1, u 6= r), λpBuv +Bu+1,v = λqBuv +Bu,v−1. (11.23)

(For v = 1;u 6= r), λpBu1 +Bu+1,1 = λqBu1 + 0. (11.24)

(For v 6= 1;u = r), λpBrv + 0 = λqBr,v +Br,v−1. (11.25)

(For u = r, v = 1), λpBr1 + 0 = λqBr1. (11.26)

First suppose that λp 6= λq. Then from Eq. 11.26 Br1 = 0. Then fromEq. 11.24 (λq−λp)Bu1 = Bu+1,1. Put u = r−1, r−2, . . . , 1, in that order, to


get Bu1 = 0 for 1 ≤ u ≤ r, i.e., the first column of B is the zero column. InEq. 11.25 put v = 2, 3, . . . , t to get Brv = 0 for 1 ≤ v ≤ t i.e., the bottom rowof B is the zero row. Put u = r− 1 in Eq. 11.23 and then put v = 2, 3, . . . , t,in that order, to get Br−1,v = 0 for all v, i.e., the (r − 1)st column has allentries equal to zero. Now put u = r−2 and let v = 2, 3, . . . , t, in that order,to get Br−2,v = 0 for 1 ≤ v ≤ t. Continue in this way for u = r − 3, r − 4,etc., to see that B = 0.

So for λp 6= λq we have Xpq = 0.

Now consider the case λp = λq. The equations Eq. 11.23 through 11.25become

(For v 6= 1;u 6= r), Bu+1,v = Bu,v−1. (11.27)

(For v = 1;u 6= r)Bu+1,1 = 0. (11.28)

(For v 6= 1;u = r)Br,v−1 = 0. (11.29)

It is now relatively straightforward to check that there are only the follow-ing three possibilities (each of which is said to be in linear triangular form):

1. r = t, in which case B =∑t

i=0 ζiNit for some scalars ζi ∈ C.

2. r > t, in which case B has the form

B =

( ∑ti=1 ζiN

it

0r−t,t

).

3. r < t, in which case B has the form

B =(

0r,t−r∑r

i=1 ζiNir

).

Conversely, the determination of the form of B shows that if B has anyof the forms shown above then B commutes with X. For example, if

B = ρI3 +N3 ⊕ ρI2 +N2 ⊕ σI3 +N3 (11.30)

for ρ 6= σ, then the matrices X that commute with V have the form

11.6. COMMUTING MATRICES∗ 253

X =

∑3

i=1 aiNi3

∑2i=1 ciN

i2

01,2

02,1

∑2i=1 diN

i2

∑2i=1 biN

i2

⊕ 3∑i=1

λiNi3,

where ai, bi, cidi, λi are arbitrary scalars.

Polynomials in a matrix A have the special property of commuting, notonly with the matrix A, but also with any matrix X which commutes withA.

Theorem 11.6.1. If C commutes with every matrix which commutes withB, then C is a polynomial in B.

Proof. In the usual way we may assume that B is in Jordan form. Suppose

B =s∑i=1

Bi =s∑i=1

Jni(λi) =s∑i=1

λiIni +Nni .

The auxiliary matrices

X =s∑i=1

aiIni

where the ai are arbitrary numbers, are known to commute with B. So byhypothesis X also commutes with C. From the preceding discussion, if wetake the ai to be distinct, we see that C must be decomposable into blocks:

C = C1 ⊕ · · · ⊕ Cs.Moreover, it follows from CB = BC that these blocks have linear trian-

gular form. Now let X be an arbitrary matrix that commutes with B. Thegeneral form of the matrix X was established in the preceding section. Byassumption, C commutes with X. Representing X in block form, we see thatthe equality CX = XC is equivalent to the relations

CpXpq = XpqCq (p, q = 1, 2, . . . , s). (11.31)

If the corresponding blocksBp, Bq have distinct eigenvalues, then Eq. 11.31contributes nothing, since then Xpq = 0. Hence we assume that Bp and Bq

have the same eigenvalues. Let

Cp =r∑i=1

aiNir, Cq =

t∑i=1

biNit .


For definiteness, suppose that r < t. Then Eq. 11.31 becomes{r∑i=1

aiNir

}·{(

0r,t−r∑r

i=1 ζiNir

)}={(

0r,t−r∑r

i=1 ζiNir

)}·

{t∑i=1

biNit

},

where the ζi are arbitrary complex numbers. Multiply the top row times theright hand column on both sides of the equation to obtain:

(a1, . . . , ar) · (ζr, . . . , ζ1)T = (0, . . . , 0, ζ1, . . . , ζr) · (bt, . . . , b1)T ,

which is equivalent to

a1ζr + a2ζr−1 + · · ·+ arζ1 = b1ζr + b2ζr−1 + · · ·+ brζ1.

Since the ζi can be chosen arbitrarily, it must be that

ai = bi, for 1 ≤ i ≤ r. (11.32)

It is routine to verify that a similar equality is obtained for r ≥ t.

Suppose that the Jordan blocks of the matrix B are such that blocks withthe same eigenvalues are adjacent. For example, let

B = (B1 ⊕ · · · ⊕Bm1)⊕ (Bm1+1 ⊕ · · · ⊕Bm2)⊕ · · · ⊕ (Bmk+1 ⊕ · · · ⊕Bs),

where blocks with the sam eigenvalues are contained in the same set of paren-theses. Denoting the sums in parentheses by B(1), B(2), · · · , respectively ,divide the matrix B into larger blocks which we call cells. Then divide thematrices X and C into cells in a corresponding way. The results above showthat all the nondiagonal cells of the matrix X are zero; the structure of thediagonal cells was also described above. The conditions for the matrix Cobtained in the present section show that the diagonal blocks of this matrixdecompose into blocks in the same way as those of B. Moreover, the blocksof the matrix C have a special triangular form. Equalities 11.32 mean thatin the blocks of the matrix C belonging to a given cell, the nonzero elementslying on a line parallel to the main diagonal are equal.

For example, let B have the form given in Eq. 11.30,

11.7. A MATRIX DIFFERENTIAL EQUATION∗ 255

B =

ρ 1 0 0 0 0 0 00 ρ 1 0 0 0 0 00 0 ρ 0 0 0 0 00 0 0 ρ 1 0 0 00 0 0 0 ρ 0 0 00 0 0 0 0 σ 1 00 0 0 0 0 0 σ 10 0 0 0 0 0 0 σ

. (11.33)

Our results show that every matrix C that commutes with each matrixwhich commutes with B must have the form

C =

a0 a1 a2

a0 a1

a0

a0 a1

a0

d0 d1 d2

d0 d1

d0

. (11.34)

We need to prove that C can be represented as a polynomial in B. Wedo this only for the particular case where B has the special form given inEq. 11.33, so that C has its form given in Eq. 11.34. By Lemma 11.4.1 thereis a polynomial f(λ) that satisfies the conditions:

f(ρ) = a0, f ′(ρ) = a1, f ′′(ρ) = a2,f(σ) = d0, f ′(σ) = d1, f ′′(σ) = d2.

By applying Eq. 11.6 we see that f(B) = C

11.7 A Matrix Differential Equation∗

Let F denote either R or C. If B(t) is an n× n matrix each of whose entriesis a differentiable function bij(t) from F to F , we say that the derivative ofB with respect to t is the matrix d

dt(B(t)) whose (i, j)th entry is (bij)

′(t).Let x(t) = (x1(t), . . . , xn(t))T be an n-tuple of unknown functions F → F .

The derivative of x(t) will be denoted x(t). Let A be an n × n matrix overF , and consider the matrix differential equation (with initial condition):


x(t) = Ax(t), x(0) = x0. (11.35)

Theorem 11.7.1. The solution to the system of differential equations inEq. 11.35 is given by x(t) = etAx0.

Before we can prove this theorem, we need the following lemma.

Lemma 11.7.2.d

dt(etA) = AetA.

Proof. First suppose that A is the elementary Jordan block A = J = Jn(λ) =λI+N , where N = Nn. If n = 1, recall that the ordinary differential equationx′(t) = λx(t) has the solution x(t) = aeλt where a = x(0). Since N1 = 0,we see that the theorem holds in this case. Now suppose that n > 1. SoA = J = λI +N where Nn = 0. Recall that

eA = etJ = eλtBt =n−1∑j=0

eλt · tj

j!N j.

A simple calculation shows that

d

dt

(etJ)

= λeλtI +n−1∑j=1

(λeλttj + jeλttj−1

j!

)N j.

On the other hand,

JetJ = (λI +N)

(eλt

n−1∑j=0

tj

j!N j

)

= eλt

[(λI +N)

(I +

n−2∑j=1

tj

j!N j +

tn−1

j!Nn−1

)]

= eλt

[λI + (1 + λt)N +

n−2∑j=2

(λtj

j!+

tj−1

(j − 1)!

)N j+

+

(λtn−1

(n− 1)!+

tn−2

(n− 2)!

)Nn−1)

]

11.8. EXERCISES 257

=d

dt

(etJ), as desired.

This completes a proof of the Lemma. Now turn to a proof of Theorem 11.7.1.First suppose that A has the Jordan form J = P−1AP , where J =

J1 ⊕ · · · ⊕ Js. Since etA is a polynomial in tA, eP−1tAP = P−1etAP , i.e.

etA = P · etJ · P−1 = T · (etJ1 ⊕ · · · ⊕ etJs)P−1.

It now follows easily that

d

dt

(etA)

= AetA.

Since x0 = x(0) is a constant matrix, ddt

(etAx0

)= AtAx0, so that x = etAx0

satisfies the differential equation x = Ax.

It is a standard result from the theory of differential equations that thissolution is the unique solution to the given equation.

11.8 Exercises

1. Show that the norm of T ∈ L(V,W ) as defined in Equation 11.1 is avector norm on L(V,W ) viewed as a vector space in the usual way.

2. Let D denote the derivative operator, and for a function f (in our case aformal power series or Laurent series) let f (j) denote the jth derivativeof f , i.e., Dj(f) = f (j).

(i) Prove that

Dn(f · g) =n∑i=0

(n

i

)f (i)g(n−i).

(ii) Derive as a corollary to part (i) the fact that

Dj(f 2) =∑

i1+i2=j

(j

i1, i2

)f (i1)f (i2).

(iii) Now use part (i) and induction on n to prove that

Dj(fn) =∑

i1+···+in=j

(j

i1, . . . , in

)f (i1) · · · f (in).


3. The Frobenius norm ‖A‖F of an m × n matrix A is defined to be thesquare root of the sum of the squares of the magnitudes of all the entriesof A.

(a) Show that the Frobenius norm really is a matrix norm (i.e., a vectornorm on the vector space of all m× n matrices).

(b) Compute ‖I‖F and deduce that ‖ · ‖F cannot be a transformationnorm induced by some vector norm.

(c) Let U and V be unitary matrices of the appropriate sizes and showthat

‖A‖F = ‖UA‖F = ‖AV ‖F = ‖UAV ‖F .

4. Let λ 7→ f(λ) and λ 7→ g(λ) be two numerical functions and let A bean n× n matrix for which both f(A) and g(A) are defined. Show thatf(A)g(A) = g(A)f(A).

Chapter 12

Infinite Dimensional VectorSpaces∗

12.1 Partially Ordered Sets & Zorn’s Lemma∗

There are occasions when we would like to indulge in a kind of “infiniteinduction.” Basically this means that we want to show the existence of someset which is maximal with respect to certain specified properties. In this textwe want to use Zorn’s Lemma to show that every vector space has a basis,i.e., a maximal linearly independent set of vectors in some vector space. Themaximality is needed to show that these vectors span the entire space.

In order to state Zorn’s Lemma we need to set the stage.Definition A partial order on a nonempty set Z is a relation ≤ on A

satisfying the following:

1. x ≤ x for all x ∈ A (reflexive) ;

2. if x ≤ y and y ≤ x then x = y for all x, y ∈ A (antisymmetric);

3. if x ≤ y and y ≤ z then x ≤ z for all x, y, z ∈ A (transitive).

Given a partial order ≤ on A we often say that A is partially ordered by≤, or that (A,≤) is a partially ordered set.

Definition Let the nonempty set A be partially ordered by ≤.

1. A subset V of A is called a chain if for all x, y ∈ B, either x ≤ y ory ≤ x.

259

260 CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES∗

2. An upper bound for a subset B of A is an element u ∈ A such thatb ≤ u for all b ∈ B.

3. A maximal element of A is an element m ∈ A such that if m ≤ x forsome x ∈ A, then m = x.

In the literature there are several names for chains, such as linearly orderedsubset or simply ordered subset. The existence of upper bounds and maximalelements depends on the nature of (A,≤).

As an example, let A be the collection of all proper subsets of Z+ (theset of positive integers) ordered by ⊆. Then, for example, the chain

{1} ⊆ {1, 2} ⊆ {1, 2, 3} ⊆ · · ·

does not have an upper bound. However, the set A does have maximalelements: for example Z+ \ {n} is a maximal element of A for any n ∈ Z+.

Zorn’s Lemma If A is a nonempty partially ordered set in which everychain has an upper bound, then A has a maximal element. It is a nontrivialresult that Zorn’s Lemma is independent of the usual (Zermelo-Fraenkel)axioms of set theory in the sense that if the axioms of set theory are consistent,then so are these axioms together with Zorn’s Lemma or with the negationof Zorn’s Lemma. The two other most nearly standard axioms that areequivalent to Zorn’s Lemma (in the presence of the usual Z-F axioms) arethe Axiom of Choice and the Well Ordering Principle. In this text, we justuse Zorn’s Lemma and leave any further discussion of these matters to others.

12.2 Bases for Vector Spaces∗

Let V be any vector space over an arbitrary field. Let S = {A ⊆ V :A is linearly independent} and let S be partially ordered by inclusion. If Cis any chain in S, then the union of all the sets in C is an upper boundfor C. Hence by Zorn’s Lemma S must have a maximal element B. Bydefinition B is a linearly independent set not properly contained in any otherlinearly independent set. If a vector v were not in the space spanned by B,then B ∪ {v} would be a linearly independent set properly containing B, acontradiction. Hence B is a basis for V . This proves the following theorem:

Theorem 12.2.1. Each vector space V over an arbitrary field F has a basis.This is a subset B of V such that each vector v ∈ V can be written in justone way as a linear combination of a finite lis of vectors in B.

12.3. A THEOREM OF PHILIP HALL∗ 261

12.3 A Theorem of Philip Hall∗

Let S and I be arbitrary sets. For each i ∈ I let Ai ⊆ S. If ai ∈ Ai for alli ∈ I, we say {ai : i ∈ I} is a system of representatives for A = (Ai : i ∈ I).If in addition ai 6= aj whenever i 6= j, even though Ai may equal Aj, then{ai : i ∈ I} is a system of distinct representatives (SDR) for A. Our firstproblem is: Under what conditions does some family A of subsets of a set Shave an SDR?

For a finite collection of sets a reasonable answer was given by Philip Hallin 1935. It is obvious that if A = (Ai : i ∈ I) has an SDR, then the unionof each k of the members of A = (Ai : i ∈ I) must have at least k elements.Hall’s observation was that this obvious necessary condition is also sufficient.We state the condition formally as follows:

Condition (H) : Let I = [n] = {1, 2, . . . , n}, and let S be any (nonempty)set. For each i ∈ I, let Si ⊆ S. Then A = (S1, . . . , Sn) satisfies Condition(H) provided for each K ⊆ I, | ∪k∈K Sk| ≥ |K|.

Theorem 12.3.1. The family A = (S1, . . . , Sn) of finitely many (not neces-sarily distinct) sets has an SDR if and only if it satisfies Condition (H).

Proof. As Condition (H) is clearly necessary, we now show that it is alsosufficient. Br,s denotes a block of r subsets (Si1 , . . . , Sir) belonging to A,where s = | ∪ {Sj : Sj ∈ Br,s}|. So Condition (H) says: s ≥ r for each blockBr,s. If s = r, Br,s is called a critical block. (By convention, the empty blockB0,0 is critical.)

If Br,s = (A1, . . . , Au, Cu+1, . . . , Cr) andBt,v = (A1, . . . , Au, Du+1, . . . , Dt), write Br,s ∩Bt,v =(A1, . . . , Au); Br,s ∪ Bt,v = (A1, . . . , Au, Cu+1, . . . , Cr, Du+1, . . . , Dt). Herethe notation implies that A1, . . . , Au are precisely the subsets in both blocks.Then write

Br,s∩Bt,v = Bu,w, where w = |∪{Ai : 1 ≤ i ≤ u}|, and Br,s∪Bt,v = By,z,where y = r + t− u, z = | ∪ {Si : Si ∈ Br,s ∪Bt,v}|.

The proof will be by induction on the number n of sets in the family A,but first we need two lemmas.

Lemma 12.3.2. If A satisfies Condition (H), then the union and intersec-tion of critical blocks are themselves critical blocks.


Proof of Lemma 12.3.2. Let Br,r and Bt,t be given critical blocks. SayBr,r ∩ Bt,t = Bu,v; Br,r ∪ Bt,t = By,z. The z elements of the union will bethe r + t elements of Br,r and Bt,t reduced by the number of elements inboth blocks, and this latter number includes at least the v elements in theintersection: z ≤ r + t − v. Also v ≥ u and z ≥ y by Condition (H). Note:y+ u = r+ t. Hence r+ t− v ≥ z ≥ y = r+ t− u ≥ r+ t− v, implying thatequality holds throughout. Hence u = v and y = z as desired for the proofof Lemma 12.3.2 .

Lemma 12.3.3. If Bk,k is any critical block of A, the deletion of elementsof Bk,k from all sets in A not belonging to Bk,k produces a new family A′ inwhich Condition (H) is still valid.

Proof of Lemma12.3.3. Let Br,s be an arbitrary block, and (Br,s)′ = B′r,s′

the block after the deletion. We must show that s′ ≥ r. Let Br,s∩Bk,k = Bu,v

and Br,s ∪Bk,k = By,z. Say

Br,s = (A1, . . . , Au, Cu+1, . . . , Cr),

Bk,k = (A1, . . . , Au, Du+1, . . . , Dk).

So Bu,v = (A1, . . . , Au), By,z = (A1, . . . , Au, Cu+1, . . . , Cr, Du+1, . . . , Dk).The deleted block (Br,s)

′ = B′r,s′ is (A1, . . . , Au, C′u+1, . . . , C

′r). But Cu+1, . . . , Cr,

as blocks of the union By,z, contain at least z− k elements not in Bk,k. Thuss′ ≥ v + (z − k) ≥ u + y − k = u + (r + k − u) − k = r. Hence s′ ≥ r, asdesired for the proof of Lemma 12.3.3.

As indicated above, for the proof of the main theorem we now use induc-tion on n. For n = 1 the theorem is obviously true.

Induction Hypothesis: Suppose the theorem holds (Condition (H) impliesthat there is an SDR) for any family of m sets, 1 ≤ m < n.

We need to show the theorem holds for a system of n sets. So let 1 <n, assume the induction hypothesis, and let A = (S1, . . . , Sn) be a givencollection of subsets of S satisfying Condition (H).

First Case: There is some critical block Bk,k with 1 ≤ k < n. Deletethe elements in the members of Bk,k from the remaining subsets, to obtaina new family A′ = Bk,k ∪ B′n−k,v, where Bk,k and B′n−k,v have no commonelements in their members. By Lemma 12.3.3, Condition (H) holds in A′,and hence holds separately in Bk,k and in B′n−k,v viewed as families of sets.

12.3. A THEOREM OF PHILIP HALL∗ 263

By the induction hypothesis, Bk,k and B′n−k,v have (disjoint) SDR’s whoseunion is an SDR for A.

Remaining Case: There is no critical block for A except possibly theentire system. Select any Sj of A and then select any element of Sj as itsrepresentative. Delete this element from all remaining sets to obtain a familyA′. Hence a block Br,s with r < n becomes a block B′r,s′ with s′ ∈ {s, s− 1}.By hypothesis Br,s was not critical, so s ≥ r + 1 and s′ ≥ r. So Condition(H) holds for the family A′ \ {Sj}, which by induction has an SDR. Add tothis SDR the element selected as a representative for Sj to obtain an SDRfor A.

We now interpret the SDR problem as one on matchings in bipartitegraphs. Let G = (X, Y, E) be a bipartite graph. For each S ⊆ X, let N(S)denote the set of elements of Y connected to at least one element of S byan edge, and put δ(S) = |S| − |N(S)|. Put δ(G) = max{δ(S) : S ⊆ X}.Since δ(∅) = 0, clearly δ(G) ≥ 0. Then Hall’s theorem states that G has anX-saturating matching if and only if δ(G) = 0.

Theorem 12.3.4. G has a matching of size t (or larger) if and only ift ≤ |X| − δ(S) for all S ⊆ X.

Proof. First note that Hall’s theorem says that G has a matching of sizet = |X| if and only if δ(S) ≤ 0 for all S ⊆ X iff |X| ≤ |X| − δ(S) forall S ⊆ X. So our theorem is true in case t = |X|. Now suppose thatt < |X|. Form a new graph G′ = (X, Y ∪ Z, E ′) by adding new verticesZ = {z1, . . . , z|X|−t} to Y , and join each zi to each element of X by an edgeof G′.

If G has a matching of size t, then G′ has a matching of size |X|, implyingthat for all S ⊆ X,

|S| ≤ |N ′(S)| = |N(S)|+ |X| − t,

implying

|N(S)| ≥ |S| − |X|+ t = t− (|X| − |S|) = t− |X \ S|.

This is also equivalent to t ≤ |X| − (|S| − |N(S)|) = |X| − δ(S).


Conversely, suppose |N(S)| ≥ t−|X\S| = t−(|X|−|S|). Then |N ′(S)| =|N(S)|+ |X| − t ≥ (t− |X|+ |S|) + |X| − t = |S|. By Hall’s theorem, G′ hasan X-saturating matching M . At most |X| − t edges of M join X to Z, soat least t edges of M are from X to Y .

Note that t ≤ |X| − δ(S) for all S ⊆ X iff t ≤ minS⊆X(|X| − δ(S)) =|X| −maxS⊆Xδ(S) = |X| − δ(G).

Corollary 12.3.5. The largest matching of G has size |X| − δ(G) = m(G),i.e., m(G) + δ(G) = |X|.

12.4 A Theorem of Marshall Hall, Jr.∗

Many of the ideas of “finite” combinatorics have generalizations to situationsin which some of the sets involved are infinite. We just touch on this subject.

Given a family A of sets, if the number of sets in the family is infinite,there are several ways the theorem of P. Hall can be generalized. One of thefirst (and to our mind one of the most useful) was given by Marshal Hall, Jr.(no relative of P. Hall), and is as follows.

Theorem 12.4.1. Suppose that for each i in some index set I there is afinite subset Ai of a set S. The system A = (Ai)i∈I has an SDR if and onlyif the following Condition (H’) holds: For each finite subset I ′ of I the systemA′ = (Ai)i∈I′ satisfies Condition (H).

Proof. We establish a partial order on deletions, writing D1 ⊆ D2 for dele-tions D1 and D2 iff each element deleted by D1 is also deleted by D2. Ofcourse, we are interested only in deletions which preserve Condition (H’).If all deletions in an ascending chain D1 ⊆ D2 ⊆ · · · ⊆ Di ⊆ · · · preserveCondition (H), let D be the deletion which consists of deleting an elementb from a set A iff there is some i for which b is deleted from A by Di. Weassert that deletion D also preserves Condition (H).

In any block Br,s of A, (r, s <∞), at most a finite number of deletions inthe chain can affect Br,s. If no deletion of the chain affects Br,s, then of courseD does not affect Br,s, and Condition (H) still holds for Br,s. Otherwise, letDn be the last deletion that affects Br,s. So under Dn (and hence also underD) (Br,s)

′ = B′r,s′ still satisfies Condition (H) by hypothesis, i.e., s′ ≥ r. ButBr,s is arbitrary, so D preserves Condition (H) on A. By Zorn’s Lemma,

12.4. A THEOREM OF MARSHALL HALL, JR.∗ 265

there will be a maximal deletion D preserving Condition (H). We show thatunder such a maximal deletion D preserving Condition H, each deleted setS ′i has only a single element. Clearly these elements would form an SDR forthe original A.

Suppose there is an a1 not belonging to a critical block. Delete a1 fromevery set Ai containing a1. Under this deletion a block Br,s is replaced by ablock B′r,s′ with s′ ≥ s − 1 ≥ r, so Condition (H) is preserved. Hence aftera maximal deletion each element left is in some critical block. And if Bk,k isa critical block, we may delete elements of Bk,k from all sets not in Bk,k andstill preserve Condition (H) by Lemma 12.3.3 (since it needs to apply onlyto finitely many sets at a time). By Theorem 12.3.1 each critical block Bk,k

(being finite) possesses an SDR when Condition (H) holds. Hence we mayperform an additional deletion leaving Bk,k as a collection of singleton setsand with Condition (H) still holding for the entire remaining sets. It is nowclear that after a maximal deletion D preserving Condition (H), each elementis in a critical block, and each critical block consists of singleton sets. Henceafter a maximal deletion D preserving Condition (H), each set consists of asingle element, and these elements form an SDR for A.

The following theorem, sometimes called the Cantor–Schroeder–BernsteinTheorem, will be used with the theorem of M. Hall, Jr. to show that anytwo bases of a vector space V over a field F must have the same cardinality.

Theorem 12.4.2. Let X, Y be sets, and let θ : X → Y and ψ : Y → X beinjective mappings. Then there exists a bijection φ : X → Y .

Proof. The elements of X will be referred to as males, those of Y as females.For x ∈ X, if θ(x) = y, we say y is the daughter of x and x is the father ofy. Analogously, if ψ(y) = x, we say x is the son of y and y is the mother ofx. A male with no mother is said to be an “adam.” A female with no fatheris said to be an “eve.” Ancestors and descendants are defined in the naturalway, except that each x or y is both an ancestor of itself and a descendant ofitself. If z ∈ X ∪ Y has an ancestor that is an adam (resp., eve) we say thatz has an adam (resp., eve). Partition X and Y into the following disjointsets:

X1 = {x ∈ X : x has no eve};

X2 = {x ∈ X : x has an eve};


Y1 = {y ∈ Y : y has no eve};

Y2 = {y ∈ Y : y has an eve}.

Now a little thought shows that θ : X1 → Y1 is a bijection, and ψ−1 :X2 → Y2 is a a bijection. So

φ = θ|X1 ∪ ψ−1|X2

is a bijection from X to Y .

Corollary 12.4.3. If V is a vector space over the field F and if B1 and B2

are two bases for V , then |B1| = |B2|.

Proof. Let B1 = {xi : i ∈ I} and B2 = {yj : j ∈ J}. For each i ∈ I, letΓi = {j ∈ J : yj occurs with nonzero coefficient in the unique linearexpression for xi in terms of the y′js}. Then the union of any k (≥ 1) Γ′is, sayΓi1 , . . . ,Γik , each of which of course is finite, must contain at least k distinctelements. For otherwise xi1 , . . . , xik would belong to a space of dimensionless than k, and hence be linearly dependent. Thus the family (Γi : i ∈ I) ofsets must have an SDR. This means there is a function θ : I → J which isan injection. Similarly, there is an injection ψ : J → I. So by the precedingtheorem there is a bijection J ↔ I, i.e., |B1| = |B2|.

12.5 Exercises∗

Exercise 12.5.0.1. Let A = (A1, . . . , An) be a family of subsets of {1, . . . , n}.Suppose that the incidence matrix of the family is invertible. Show that thefamily has an SDR.

Exercise 12.5.0.2. Prove the following generalization of Hall’s Theorem:Let A = (A1, . . . , An) be a family of subsets of X that satisfies the follow-

ing property: There is an integer r with 0 ≤ r < n for which the union ofeach subfamily of k subsets of A, for all k with 0 ≤ k ≤ n, has at least k− relements. Then there is a subfamily of size n− r which has an SDR. (Hint:Start by adding r “dummy” elements that belong to all the sets.)

12.5. EXERCISES∗ 267

Exercise 12.5.0.3. Let G be a (finite, undirected, simple) graph with vertexset V . Let C = {Cx : x ∈ V } be a family of sets indexed by the vertices ofG. For X ⊆ V , let CX = ∪x∈XCx. A set X ⊆ V is C-colorable if one canassign to each vertex x ∈ X a “color” cx ∈ Cx so that cx 6= cy whenever xand y are adjacent in G. Prove that if |CX | ≥ |X| whenever X induces aconnected subgraph of G, then V is C-colorable. (In the current literature ofgraph theory, the sets assigned to the vertices are called lists, and the desiredproper coloring of G chosen from the lists is a list coloring of G. When G isa complete graph, this exercise gives precisely Hall’s Theorem on SDR’s. Acurrent research topic in graph theory is the investigation of modifications ofthis condition that suffice for the existence of list colorings.

Exercise 12.5.0.4. With the same notation of the previous exercise, provethat if every proper subset of V is C-colorable and |CV | ≥ |V |, then V isC-colorable.

Index

A = [T ]B2,B1 , 38QR-decomposition, 186T -annihilator of v, 122T -conductor of v into W , 122, 127[v]B, 36∞-norm, 145L(U, V ), 32n-linear, 611-norm, 1452-norm, 145

adjoint of linear operator, 153adjoint, classical, 76adjugate, 79algebraic multiplicity of an eigenvalue,

205algebraically closed, 57alternating (n-linear), 64

basis of a vector space, 24basis of quotient space, 210bijective, 45binomial theorem, 58block multiplication, 13

Cayley-Hamilton Theorem, 79, 83chain, 259characteristic polynomial, 77classical adjoint, 76codomain, 33companion matrix, 81

complex number, 8convergence of a sequence of matri-

ces, 246convergence of a sequence of vectors,

145convergence of power series in a ma-

trix, 245coordinate matrix, 36

determinant, 65diagonalizable, 118dimension, 25direct sum of subspaces, 18dual basis, 43dual space, 43, 152

eigenvalue, 113eigenvector, 114elementary divisors of a matrix, 213elementary Jordan block, 213Euclidean algorithm, 60exponential of a matrix, 246, 250

finite dimensional, 24Frobenius norm of a matrix, 258Fundamental Theorem of Algebra, 57

generalized eigenvector, 204geometric multiplicity of an eigenvalue,

183Gram-Schmidt algorithm, 146

268

INDEX 269

Hermitian matrix, 131, 159Hermitian operator, 159hyperplane, 191

ideal in F [x], 56idempotent, 36independent list of subspaces, 18injective, 33inner product, 137inner product space, 138isometry, 171isomorphic, 45, 52

Jordan form, 212

kernel, 33Kronecker delta, 8

Lagrange Interpolation, 52Lagrange interpolation generalized, 242Laplace expansion, 67, 75least-squares solution, 188linear algebra, 47linear functional, 43, 152linear operator, 38linear transformation, 31linear triangular form, 252linearly dependent, 22linearly independent, 22list, 21lnull(A), 34

maximal element, 260minimal polynomial , 56monic polynomial, 49Moore-Penrose generalized inverse, 184

nilpotent, 204norm, 145norm (on a vector space), 141

normal operator, 163normal matrix, 163normalized QR-decomposition, 186null space, 33nullity, 34

orthogonal, 139orthogonal matrix, 148orthonormal, 139

partial order, 259partially ordered set, 259polarization identity, complex, 157polarization identity, real, 156polynomial, 49positive (semidefinite) operator, 169positive definite, 198prime polynomial, 59principal ideal, 56principal submatrix, 78projection, 35projection matrix, 186pseudoinverse, 184

quotient space, 123

range, 33Rayleigh Principle, 196reduced singular value decomposition,

187rnull(A), 34

scalar matrix functions, 240Schur’s Theorem, 147self-adjoint operator, 159singular value, 178singular value decomposition, 178skew-symmetric matrix, 163span, 22

270 INDEX

sum of subspaces, 18surjective, 34system of distinct representatives (SDR),

261

Taylor’s Formula, 58tensor product, 91trace of a matrix, 78transformation norm, 233

unitary matrix, 148upper bound, 260

Vec, 109vector space, 15vector subspace, 17

Zorn’s Lemma, 260

linear algebra by s. e. payne

Documents