1 linear algebra

Mathematics 595 (CAP/TRA)Introduction to Linear Analysis

Fall 2005

1 Linear Algebra

1.1 Vector spaces

Definition 1.1.1. Let F be a field. A vector space over F is a set V of elements,called vectors, equipped with two operations + : V × V → V (addition) and · :F × V → V (scalar multiplication) such that (V, +) is an abelian group, and thefollowing compatibility relations hold:

1. (distributivity) a, b ∈ F, v, w ∈ V ⇒ a(v + w) = av + aw, (a + b)v = av + bv,

2. (associativity of multiplication) a(bv) = (ab)v,

3. (identity) 1v = v.

Here 1 denotes the unit element of F. Notice that we typically omit the notation ·for scalar multiplication, writing av = a · v.

Examples 1.1.2. 1. n-dimensional Euclidean space Rn is a vector space over R.

2. The space C[0, 1] of continuous functions on the interval [0, 1] is a vector spaceover R.

3. Let k = 0, 1, 2, . . .. The space Ck[0, 1] of functions f on the interval [0, 1]whose kth derivative f (k) exists and is continuous is a vector space over R.Similarly, the space C∞[0, 1] of functions on [0, 1] for which f (k) exists for allk, is a vector space over R.

4. R is a vector space over Q (see Exercise 1.1.17).

5. This example requires some basic fluency in abstract algebra. Let p be a primeand let K be a finite field of characteristic p. Then K is a vector space overZp.

6. Let V be a vector space over F, and let W ⊂ V be closed under addition andscalar multiplication: v, w ∈ W , a ∈ F ⇒ v + w ∈ W , av ∈ W . Then W is avector space over F (called a (vector) subspace of V ).

7. Let W be a vector subspace of V , both over a field F. Define a relation on Vas follows: v ∼ v′ iff v − v′ ∈ W . Then ∼ is an equivalence relation on V , andthe collection of equivalence classes V/ ∼ is a vector space over F. This spaceis called the quotient space of V by W , and is written V/W . We typicallywrite elements of V/W in the form v+W , v ∈ V , with the understanding thatv + W = v′ + W iff v ∼ v′, i.e., v − v′ ∈ W .

8. Let V and W be vector spaces over F. The product space V × W equippedwith the operations (v, w) + (v′, w′) = (v + v′, w + w′) and a(v, w) = (av, aw)is a vector space over F. It is called the direct sum of V and W , and is writtenV ⊕ W .

Exercise 1.1.3. Verify the claims in the various parts of Example 1.1.2.

Throughout this course, we will work almost entirely with either the field R ofreal numbers or the field C of complex numbers. From now on we omit the phrase“over F”.

Let S be any set. A set R ⊂ S is called cofinite if S \R is a finite set. A functionλ : S → F is called essentially zero if {s : λ(s) = 0} is a cofinite subset of S.

Definition 1.1.4. Let V be a vector space. A set S ⊂ V is said to be linearlyindependent if whenever λ : S → F is essentially zero, and

∑

s∈S

λ(s)s = 0,

then λ is identically zero. A set S ⊂ V is linearly dependent if it is not linearlyindependent.

Notice that the sum∑

s∈S λ(s)s only involves a finite number of nonzero terms,hence is well-defined.

Definition 1.1.5. Let V be a vector space, and let S ⊂ V . A linear combinationof elements of S is an element of V which may be written in the form

∑

s∈S

λ(s)s

for some essentially zero function λ : S → F. The span of S is the set span(S) oflinear combinations of elements of S.

Definition 1.1.6. A set B ⊂ V is a basis if B is linearly independent and span(B) =V .

Theorem 1.1.7. Every vector space admits a basis.

Proof. We will use Zorn’s Lemma: every partially ordered set P , such that all totallyordered subsets of P admit upper bounds in P , has maximal elements.

Let V be a vector space, and let L be the collection of all linearly independentsubsets of V , partially ordered by inclusion. Let (Si)i∈I be a chain in L, i.e. a totallyordered subset. We claim that

⋃i Si is in L, i.e.,

⋃i Si is linearly independent. If

λ :⋃

i Si → F is essentially zero and∑

s∈∪iSiλ(s)s = 0, then λ(s) 6= 0 only for

finitely many elements si1 , . . . , sim in⋃

i Si. Then there exists an index k so thatsij ∈ Sk for all j = 1, . . . , m. Since Sk is linearly independent, we conclude that λ isidentically zero as desired.

Zorn’s lemma guarantees the existence of a maximal element B in L. We claimthat V = span(B). To do this, we will show that v /∈ span(B) ⇒ B ∪{v} is linearly

independent. Suppose that v /∈ span(B) and there exists an essentially zero functionλ : B ∪ {v} → F such that

λ(v)v +∑

w∈B

λ(w)w = 0.

If λ(v) 6= 0, then v =∑

w∈B(−λ(w)λ(v)

)w, which contradicts the fact that v /∈ span(B).

Thus λ(v) = 0. But then ∑

w∈B

λ(w)w = 0;

since B is linearly independent we find λ ≡ 0 and so B∪{v} is linearly independent.

Corollary 1.1.8. Any linearly independent set S in a vector space V can be extendedto a basis.

Proof. Repeat the proof of the theorem, replacing L with the collection of all linearlyindependent subsets of V which contain S.

Theorem 1.1.9. Any two bases for a vector space V have the same cardinality.

Proof. We consider the finite cardinality case. Suppose that V has a basis B ={v1, . . . , vm} consisting of m elements. It suffices to prove that every collection ofm + 1 elements in V is linearly dependent. (Why?) Let C = {w1, . . . , wm+1} besuch a collection. Consider the expression

σ = c1w1 + · · · cm+1wm+1.

Since each wj may be written as a linear combination of the elements of B, σ may bewritten as a linear combination of v1, . . . , vm. Setting σ = 0 generates a system of mlinear homogeneous equations in the m + 1 variables c1, . . . , cm+1, which necessarilyhas a nontrivial solution. But such a nontrivial solution is clearly equivalent to alinear dependence among the vectors w1, . . . , wm+1.

The infinite cardinality case can be treated by transfinite induction. We omitthe proof.

Definition 1.1.10. Let V be a vector space. The number of elements in a basis of Vis called the dimension of V and written dim V . We say that V is finite dimensionalif dim V < ∞.

Example 1.1.11. The standard basis for Rn is the collection of n vectors e1, . . . , en,where the coordinates of ej are equal to 0 in all entries except for the jth entry,where the coordinate is equal to 1.

Exercise 1.1.12. Let V be a vector space of dimension n, and let S be a set of nlinearly independent elements of V . Then S is a basis for V .

Exercise 1.1.13. Let V = W1⊕W2 be finite dimensional. Then dim V = dim W1 +dim W2.

Exercise 1.1.14. Let W be a subspace of a finite dimensional vector space V . Thendim V = dim W + dim V/W . (This result will also follow as an easy consequence ofTheorem 1.2.21.)

Let B be a basis for a vector space V and let S ⊂ V . For each s ∈ S, there existscalars λsb such that

s =∑

b∈B

λsbb.

Definition 1.1.15. The matrix MSB := (λsb)s∈S,b∈B is called the transition matrixof S with respect to B.

Lemma 1.1.16. Let B and S be bases for V with transition matrices MSB = (λsb)and MBS = (µbs). Then

I = MBSMSB = MSBMBS.

Note that MSB and MBS may be infinite matrices, in which case the meaning ofthe matrix multiplications may not be immediately clear. However, the fact that thecoefficients λsb define essentially zero functions of s and b, provided the remainingvariable is fixed, ensures that the notion of matrix multiplication is well-defined forsuch matrices.

Proof. For b ∈ B, we have

b =∑

s∈S

µbss =∑

s∈S

∑

b′∈B

µbsλsb′b′ =

∑

b′∈B

(∑

s∈S

µbsλsb′

)b′.

By the linear independence of B, we find

∑

s∈S

µbsλsb′ = δbb′ ,

where δbb′ denotes the Kronecker delta function: δbb′ = 0 if b 6= b′ and δbb = 1. Inother words,

MBSMSB = I.

The other direction is similar.

Exercise 1.1.17. What is the cardinality of a basis for R over Q?

1.2 Linear Maps

Definition 1.2.18. Let V and W be vector spaces. A map T : V → W is calledlinear if

T (av + bw) = aT (v) + bT (w)

for all v, w ∈ V and a, b ∈ F. We use the terms linear map and linear transformationinterchangeably. An isomorphism is a bijective linear map. Two vector spacesV and W are called isomorphic (written V ' W ) if there exists an isomorphismT : V → W .

A few easy observations:

1. T (V ) is a subspace of W .

2. If W1 is a subspace of W , then T−1(W1) := {v ∈ V : T (v) ∈ W1} is a subspaceof V .

3. If S : U → V and T : V → W are linear, then T ◦ S : U → W is linear.

4. If T is a bijective linear map, then T−1 is linear.

5. If T is surjective, then dim V ≥ dim W . If T is injective, then dim V ≤ dim W .In particular, isomorphic vector spaces have the same dimension.

Examples 1.2.19. 1. V = Rn, W = Rm, A an m × n real matrix, T (v) = Av.

2. V = Ck[0, 1], W = Ck−1[0, 1], T (f) = f ′.

3. W a subspace of a vector space V , T : V → V/W given by T (v) = v + W .This example will be important for us in later work. The map T : V → V/Wis called a quotient map.

Definition 1.2.20. The kernel of a linear map T : V → W is

ker(T ) = {v ∈ V : T (v) = 0}.

The range of T is range(T ) = T (V ).

Theorem 1.2.21 (Rank–Nullity Theorem). Let T : V → W be a linear map.Then

V ' ker(T ) ⊕ range(T )

anddim ker(T ) + dim range(T ) = dim V.

Corollary 1.2.22. Let W be a subspace of a vector space V . Then V ' W ⊕ V/Wand dim V/W = dim V − dim W .

Proof. Apply Theorem 1.2.21 to the linear map Q : V → V/W given by Q(v) =v + W .

Corollary 1.2.23. Let V be a finite dimensional vector space and let T : V → Vbe linear. Then T is injective if and only if T is surjective.

Proof. T is injective if and only if dim ker(T ) = 0. T is surjective if and only ifdim range(T ) = dim V .

Exercise 1.2.24. Show that Corollary 1.2.23 is false for infinite dimensional vectorspaces; a linear self-map of such a space can be injective without being surjective,and vice versa.

Proof of Theorem 1.2.21. Let S1 = (vi)i∈I be a basis for ker(T ), and let T (S2),S2 = (wj)j∈J , be a basis for range(T ). We claim that S1 ∪ S2 is a basis for V . Thissuffices to complete the proof, since we may then exhibit an isomorphism betweenV and ker(T ) ⊕ range(T ) by sending vi to (vi, 0) and wj to (0, T (wj)).

First, we show that S1 ∪ S2 is linearly independent. Let λ : S1 ∪ S2 → F beessentially zero, with

0 =∑

v∈S1∪S2

λ(v)v =∑

i

λ(vi)vi +∑

j

λ(wj)wj.

Then

0 = T (0) = T

(∑

i

λ(vi)vi +∑

j

λ(wj)wj

)=∑

j

λ(wj)T (wj)

since vi ∈ ker(T ). Since T (S2) is a basis for range(T ), we see that λ(wj) = 0 for allj ∈ J . Thus

0 =∑

i

λ(vi)vi;

since S1 is a basis for ker(T ) we conclude that λ(vi) = 0 for all i ∈ I. Thus λ ≡ 0and we have shown that S1 ∪ S2 is linearly independent.

Next, we show that S1 ∪ S2 spans all of V . Given any v ∈ V , we may write T (v)as a linear combination of the elements of T (S2):

T (v) =∑

j

µ(wj)T (wj)

for a suitable essentially zero function µ : S2 → F. Then v −∑

j µ(wj)wj ∈ ker(T )whence

v −∑

j

µ(wj)wj =∑

i

λ(vi)vi

for some essentially zero function λ : S1 → F. This exhibits v as a linear combinationof S1 ∪ S2, as desired.

Any linear map may be factored through its kernel by an injective linear map.

Proposition 1.2.25. Let T : V → W be a linear map, and let Q : V → V/ ker(T )be the quotient map Q(v) = v + ker(T ). Then there exists a unique injective linearmap T ′ : V/ ker(T ) → W such that T ′ ◦ Q = T .

Proof. Define T ′(v+ker(T )) = T (v). This is well-defined: if v′+ker(T ) = v+ker(T )then v′ − v ∈ ker(T ) so T (v′ − v) = 0, i.e., T (v′) = T (v). It is obviously linear.To see that it is injective, suppose that T ′(v + ker(T )) = T ′(v′ + ker(T )). ThenT (v) = T (v′) so v − v′ ∈ ker(T ) and v + ker(T ) = v′ + ker(T ).

The collection of all linear maps from V to W is denoted by L(V, W ). Of partic-ular interest is the space L(V, V ) of linear maps of V into itself. Note that any twosuch maps can be added and multiplied (composition), and multiplied by a scalar.Thus L(V, V ) is an example of an algebra. It is typically not commutative, sinceS ◦ T need not equal T ◦ S. Furthermore, L(V, V ) typically contains zerodivisors,elements S, T 6= 0 such that S ◦ T = 0. For example, choose S and T so thatrange(S) ⊂ ker(T ).

The set of invertible elements in L(V, V ) forms a group. The isomorphism typeof this group depends only on the dimension of V , and on the scalar field F. Itis called the general linear group of dimension n = dim V over F, and denoted byGL(n, F).

Given S ∈ GL(n, F), the transformation T 7→ STS−1 is an invertible linear mapof L(Fn, Fn), i.e., an element of GL(N, F), where N = dim L(Fn, Fn). It is called asimilarity transformation, and the maps T and STS−1 are said to be similar to eachother.

Exercise 1.2.26. Verify that similarity transformations of V = Fn are elements ofGL(N, F), N = dim L(V, V ). What is the value of dim L(V, V )?

Linear maps from V to the base field F are called linear functionals on V . Thespace L(V, F) of linear functionals on V is also known as the dual space of V , and issometimes denoted V ∗.

Exercise 1.2.27. Let V be a finite dimensional vector space of dimension n, withbasis v1, . . . , vn.

1. Show that every element v ∈ V can be written uniquely in the form v =∑ni=1 λivi.

2. Show that the functions ki : V → F given by ki(v) = λi, where λi is as in theprevious part, are linear functionals on V .

3. Show that every linear functional T on V can be written in the form T =∑ni=1 aiki, for some ai ∈ F.

4. Show that the mapping T 7→∑n

i=1 aivi defines a (non-canonical) isomorphismbetween V ∗ and V . Conclude that dim V ∗ = dim V .

Exercise 1.2.28. Let P be the vector space of polynomials over C with degreestrictly less than n, and let a1, . . . , an ∈ C and c1, . . . , cn ∈ C be arbitrary. Showthat there is an element p ∈ P so that p(ai) = ci for all i = 1, . . . , n.

Matrix representations of linear maps: If T : V → W is a linear map and C,resp. B, are bases for V resp. W , then each element c ∈ C can be written uniquelyin the form

c =∑

b∈B

λcbb.

In the finite-dimensional case, the matrix M = [T ]C,B = (λcb)c∈C,b∈B is called thematrix representation for T with respect to the bases C and B.

Lemma 1.2.29. Let A and B be bases for V , let C and D be bases for W , and letT : V → W be a linear map. Then

[T ]D,A = MDC [T ]C,BMBA.

Corollary 1.2.30. If A and B are bases for V and T ∈ L(V, V ), then

[T ]A,A = MAB [T ]B,BMBA = MAB [T ]B,BM−1AB.

In particular, the matrices [T ]A,A and [T ]B,B are similar.

1.3 Geometry of linear transformations of finite-dimensionalvector spaces

Instead of beginning with the algebraic definitions for determinant and trace, wewill take a geometric perspective. This is arguably more enlightening, as it indicatesthe underlying physical meaning for these quantities.

We assume that the notion of volume of open sets in Rn is understood. In fact, wewill only need to work with the volumes of simplexes, which admit a simple recursivedefinition (see Definition 1.3.34). The definition relies on the geometric structure ofRn, specifically the notions of perpendicularity (orthogonality) and distance, whichwe take as known.

Definition 1.3.31. An oriented simplex in Rn is the convex hull of a set of n + 1vertices. More precisely, the simplex spanned by points a0, . . . , an ∈ Rn is

S = [a0, . . . , an] = {x ∈ Rn : x =∑n

i=0 λiai for some λi ≥ 0 with∑

i λi = 1}.

The order of the vertices matters in this notation, i.e., [a0, a1, . . . , an] and [a1, a0, . . . , an]are different as simplices (although they define the same set in Rn).

An oriented simplex S is called degenerate if it lies in an (n − 1)-dimensionalsubspace of Rn.

Exercise 1.3.32. Show that [a0, . . . , an] is degenerate if and only if the vectorsa1 − a0, . . . , an − a0 are linearly dependent.

Our considerations will be invariant under rigid translations of Rn, so we may(and will) from now on assume that a0 = 0, since we may identify [a0, . . . , an] with[0, a1 − a0, . . . , an − a0].

Next, we would like to assign an orientation O(S) to each nondegenerate orientedsimplex S. We will classify all such simplexes as either positively oriented (O(S) =+1) or negatively oriented (O(S) = −1), so that the following requirements aresatisfied:

1. (anti-symmetry) The orientation of S is reversed if any two vertices ai and aj

of S are interchanged.

2. (continuity) The orientation O(S(t)) is constant when S(t) = [0, a1(t), . . . , an(t)]is a continuous family of nondegenerate simplexes.

3. (normalization) The orientation of the standard n-dimensional simplex Sn0 =

[0, e1, . . . , en] is +1.

Here (ej) denotes the standard basis for Rn, see Example 1.1.11.

Proposition 1.3.33. There is a unique notion of orientation for simplexes in Rn

satisfying these three requirements.

Proof. Consider those simplexes S = [a0, . . . , an] which may be deformed by a con-tinuous family of nondegenerate simplexes to the standard simplex. That is, thereexist n continuous functions aj : [0, 1] → Rn, j = 1, . . . , n, so that (i) aj(0) = ej for

all j, (ii) aj(1) = aj for all j, and (iii) S(t) = [0, a1(t), . . . , an(t)] is nondegenerate forall 0 ≤ t ≤ 1. We call each of these simplexes positively oriented and assign it theorientation O(S) = +1. Otherwise we call S negatively oriented and assign it theorientation O(S) = −1. It is straightforward to check that all three requirementsare satisfied in this case.

Definition 1.3.34. The n-volume of a simplex S = [0, a1, . . . , an] ⊂ Rn is

Voln(S) =1

nVoln−1(B)h,

where B = [0, a1, . . . , aj, an] is a base of S (the notation means that the point aj isomitted from the list a1, . . . , an) and h is the altitude of S with respect to B, i.e., his the distance from aj to the (n − 1)-dimensional hyperplane Π containing B. (Tobegin the induction, we let Vol1 be the standard length measure for intervals on thereal line R1.) To simplify the notation, we drop the subscripts, writing Vol = Volnfor every n. Notice that Vol(S) = 0 if and only if S is degenerate.

The signed volume of S is

signVol(S) = O(S) Vol(S).

Proposition 1.3.35. The signed volume satisfies the following properties:

1. signVol(S) = 0 if and only if S is degenerate.

2. signVol([0, a1, . . . , an]) is a linear function of the vertices a1, . . . , an.

The first claim is obvious by the definition of O(S) and Vol(S). To prove thesecond claim, it is enough to consider signVol([0, a1, . . . , an]) as a function of an (byanti-symmetry). From the definitions we have

signVol([0, a1, . . . , an]) =1

nVoln−1(B) · O(S)h.

The expression O(S)h represents the signed distance from an to Π. This is nothingmore than a particular coordinate function, if coordinates in Rn are chosen so thatone axis is perpendicular to Π and the remaining n − 1 axes are chosen to lie in Π.Exercise 1.2.27 (2) shows that this is a linear function.

We now come to the definition of determinant.

Definition 1.3.36. Let A be an n × n matrix whose columns are given by thevectors a1, . . . , an. The determinant of A is equal to

det(A) = n! · signVol([0, a1, . . . , an]).

An immediate corollary of Proposition 1.3.35 is

Proposition 1.3.37. The determinant satisfies the following properties:

1. (antisymmetry) det(A) is an alternating function of the columns of A; its signchanges when two columns of A are interchanged. In particular, det(A) = 0 iftwo columns of A are identical.

2. (multilinearity) det(A) is a linear function of the columns of A.

3. (normalization condition) det(I) = +1, where I is the n × n identity matrix,with columns e1, . . . , en (in that order).

4. det(A) = 0 if and only if the columns of A are linearly dependent.

5. det(A) is unchanged if a multiple of one column of A is added into anothercolumn.

In fact, the determinant is the unique function on n×n matrices satisfying parts1, 2 and 3 of Proposition 1.3.37.

Part 1 is immediate from the properties of O(S) and Vol(S). Parts 2 and 4 followfrom Proposition 1.3.35 and Exercise 1.3.32. To verify part 3, we first compute

Vol(Sn0 ) =

1

nVol(Sn−1

0 )

so

Vol(Sn0 ) =

1

n!

and

det(I) = n! · signVol(Sn0 ) = n!O(Sn

0 ) Vol(Sn0 ) = (n!)(+1)(

1

n!) = 1.

Finally, part 5 follows from antisymmetry and multilinearity.The determinant is a multiplicative function:

Theorem 1.3.38. det(AB) = det(A) det(B) for n × n matrices A, B.

Exercise 1.3.39. Use the uniqueness assertion for the determinant function to proveTheorem 1.3.38. Hint: In the case det A 6= 0, consider the function

C(B) =det(AB)

det A.

Show that C satisfies Proposition 1.3.37 parts 1, 2 and 3. Note that, upon writingout B = (b1, . . . , bn) in columns, we have

C(b1, . . . , bn) =det(Ab1, . . . , Abn)

det A.

For the case det A = 0, use a continuity argument.

Definition 1.3.40. The ijth minor of an n×n matrix A = (aij) is the (n−1)×(n−1)matrix Aij obtained by deleting the ith row and jth column of A. The adjoint matrixof A is the matrix adjA = (bij), where

bij = (−1)i+j det Aji.

Lemma 1.3.41. (1) For each j, det A =∑n

i=1(−1)i+jaij det Aij.(2) A adj A = det A · I.

Proof. To prove (1) we begin with the special case

A =

(1 0

0 A11

),

where the zeros denote rows (and columns) of zero entries. The map A11 7→ det Aclearly satisfies parts 1, 2 and 3 of Proposition 1.3.37, whence det A = det A11 inthis case. Next, the case

A =

(1 ∗

0 A11

)

(where ∗ denotes a row of unspecified entries) follows from part 5. Finally, thegeneral case follows by antisymmetry and multilinearity of the determinant. Weleave the full details as an exercise to the reader.

To prove (2), we compute the entries of C = A adj A = (cij):

cii =

n∑

k=1

(−1)i+kaik(adjA)ki

=

n∑

k=1

(−1)i+kaik det Aik = det A

by (1), while for i 6= j,

cij =n∑

k=1

(−1)j+kaik(adjA)kj

=

n∑

k=1

(−1)j+kaik det Ajk = det A′ij

where A′ij is the n × n matrix whose columns are the same as those of A, except

that the jth column is omitted and replaced by a copy of the ith column. Byantisymmetry, det A′

ij = 0 socij = det A · δij

and the proof is complete.

Part (1) of Lemma 1.3.41 is the well-known Laplace expansion for the determi-nant.

Definition 1.3.42. A matrix A is invertible if det A 6= 0. The inverse of an invertiblematrix A is

A−1 =1

det Aadj A.

Lemma 1.3.41(2) yields the standard properties of the inverse:

A · A−1 = A−1 · A = I.

Proposition 1.3.43. The determinant of a linear transformation T : Rn → Rn isindependent of the choice of basis of Rn.

Proof. Changing the basis of Rn corresponds to performing a similarity transfor-mation T 7→ STS−1. Denoting by A 7→ BAB−1 the matrix representation of thistransformation with respect to the standard basis of Rn, we have

det(BAB−1) = det(B) det(A) det(B−1) = det(A).

Because of this proposition, we may write det(T ) for T ∈ L(Rn, Rn) to denotethe determinant of any matrix representation [T ]B,B for T . Moreover, the notion ofinvertibility is well-defined for elements of L(Rn, Rn).

Corollary 1.3.44. Let T be a linear map of Rn to Rn. Then

| det(T )| =Vol(T (S))

Vol(S)

for any nondegenerate simplex S in Rn.

This corollary provides the “true” geometric interpretation of the determinant:it measures the distortion of volume induced by the linear map.

Definition 1.3.45. The trace of an n × n matrix A with columns a1, . . . , an is

tr(A) =

n∑

i=1

ai · ei.

Here v · w denotes the usual Euclidean dot product of vectors v, w ∈ Rn. If ai =(ai,1, . . . , ai,n) then tr(A) =

∑ni=1 aii.

Exercise 1.3.46. 1. Show that trace is a linear function on matrices: tr(A+B) =tr(A) + tr(B) and tr(kA) = k tr(A).

2. Show that tr(AB) = tr(BA) for any n × n matrices A and B.

Proposition 1.3.47. The trace of a linear transformation T : Rn → Rn is indepen-dent of the choice of basis of Rn.

Proof. Changing the basis of Rn corresponds to performing a similarity transfor-mation T 7→ STS−1. If A 7→ BAB−1 denotes the matrix representation of thistransformation with respect to the standard basis of Rn, then

tr(BAB−1) = tr(AB−1B) = tr(A)

by Exercise 1.3.46 (2).

Because of this proposition, we may write tr(T ) for T ∈ L(Rn, Rn) to denote thetrace of any matrix representation for T .

1.4 Spectral theory of finite-dimensional vector spaces andJordan canonical form

Definition 1.4.48. Let V be a vector space and let T ∈ L(V, V ). An elementλ ∈ F is called an eigenvalue of T if there exists a nonzero vector v ∈ V (called aneigenvector for λ) so that Tv = λv.

Equivalently, if we denote by I the identity element in L(V, V ), then λ is aneigenvalue of T if and only if

ker(λI − T ) 6= 0.

The set of eigenvalues of T is called the spectrum of T , and is written Spec T .

Exercise 1.4.49. Let λ be an eigenvalue for a linear transformation T on a vectorspace V . The set of eigenvectors associated with λ is a vector subspace of V .

For the rest of this section, we assume that V is a finite-dimensional vector space,defined over an algebraically closed field F (typically C).

Definition 1.4.50. The characteristic polynomial for a linear transformation T onV is

pT (λ) = det(λI − T ).

Here λ denotes an indeterminate variable, and the computation of the determinantis understood in terms of the formulas from the previous section in the field F[[λ]]of rational functions in λ with coefficients from F.

The Cayley–Hamilton theorem states that A satisfies its own characteristic poly-nomial.

Theorem 1.4.51 (Cayley–Hamilton). For any linear transformation T on afinite-dimensional vector space V , the identity pT (T ) = 0 holds.

Proof. By the invariance of determinant with respect to basis, we may choose a basisB for V and study the matrix representation A = [T ]B,B. By Lemma 1.3.41(2),

(λI − A) adj(λI − A) = det(λI − A)I = pA(λ)I. (1.4.52)

Set

adj(λI − A) :=

n∑

k=0

Bkλk,

where Bn = 0, and set

pA(λ) :=

n∑

k=0

ckλk,

where cn = 1. Then (1.4.52) gives

n∑

k=0

ckλkI = (λI − A) adj(λI − A) = (λI − A)

n∑

k=0

Bkλk

= −AB0 +

n∑

k=1

(Bk−1 − ABk)λk.

Thus Bn−1 = I, −AB0 = c0I, and

Bk−1 − ABk = ckI, for k = 1, . . . , n − 1.

Multiplying by Ak gives

AkBk−1 − Ak+1Bk = ckAk, for k = 1, . . . , n − 1.

Finally,

pA(A) =n∑

k=0

ckAk = An +

n−1∑

k=1

(AkBk−1 − Ak+1Bk) − AB0 = 0.

By the fundamental theorem of algebra, each n×n matrix has n complex eigen-values (counted with multiplicity).

Exercise 1.4.53. Show that pT (λ) = λn − (trT )λn−1 + · · · + (−1)n det T for anyT . Deduce that the sum of the eigenvalues of T is tr T and that the product of theeigenvalues is det T . (Eigenvalues must be counted according to multiplicity.)

The set of polynomials in F[λ] which vanish on T forms an ideal I (I is closedunder addition and closed under multiplication by elements of F[λ].) Since F [λ] isa principal ideal domain, there exists an element mT ∈ F[λ] so that I = (mT ), i.e.,every element of I is a multiple of mT . We can and do assume that mT is monic,and we call it the minimal polynomial for T . Obviously pT is divisible by mT .

In the principal theorem of this section (Theorem 1.4.57) we give, among otherthings, a geometric description of the characteristic and minimal polynomials. Toobtain this description, we introduce a generalization of the notion of eigenvector.

Definition 1.4.54. A nonzero vector v ∈ V is a generalized eigenvector of T witheigenvalue λ if (λI − T )kv = 0 for some positive integer k.

For a given eigenvalue λ of T , consider the increasing sequence of subspaces

N1(λ) ⊂ N2(λ) ⊂ · · ·

where Nk(λ) = ker((λI − T )k). Since these are all subspaces of a finite-dimensionalvector space V , they eventually stabilize: there exists an index K so that NK(λ) =NK+1(λ) = · · · . We denote by d = d(λ) the smallest such index, i.e.,

Nd−1(λ) ( Nd(λ) = Nd+1(λ) = · · · .

d(λ) is called the index of λ. Thus the nonzero elements of Nd(λ) are precisely thegeneralized eigenvectors associated with λ.

Lemma 1.4.55. For each j, d(λj) ≤ n = dim V .

Proof. Let λ = λj and d = d(λj) as above, and let v ∈ Nd(λ)\Nd−1(λ). To prove theresult, it suffices to establish that the d vectors v, (λI − T )(v), . . . , (λI − T )d−1(v)are linearly independent. Suppose that a0, . . . , ad−1 ∈ F satisfy

d−1∑

k=0

ak(λI − T )k(v) = 0. (1.4.56)

Applying (λI − T )d−1 to both sides gives

a0(λI − T )d−1(v) = 0

so a0 = 0. Returning to (1.4.56) and applying (λI−T )d−2 to both sides gives a1 = 0.Continuing in this fashion, we find that ak = 0 for all k. The proof is complete.

We now state the main theorem of this section, a combination of the classicalspectral theorem for transformations of finite-dimensional vector spaces and theJordan canonical form for matrix representations of such transformations.

Theorem 1.4.57 (Spectral theorem and Jordan canonical form). Let T ∈L(V, V ) be a linear transformation of an n-dimensional vector space V with distincteigenvalues λ1, . . . , λm. Then

1. V = N1 ⊕ · · · ⊕ Nm, where N j := Nd(λj )(λj). Moreover, T : N j → N j and(λjI − T )|Nj is nilpotent, in fact, (λjI − T )n|Nj = 0.

2.pT (λ) =

∏

j

(λ − λj)dimNj

(1.4.58)

andmT (λ) =

∏

j

(λ − λj)d(λj). (1.4.59)

3. For each j there exists a block diagram of generalized eigenvectors {vjil}, where

i = 1, . . . , kj and l = 1, . . . , pji , which forms a basis for the generalized eigenspace

N j. Here pj1 ≥ pj

2 ≥ pj3 ≥ · · · ,

∑i p

ji = dim N j and

(T − λjI)(vjil) = vj

i,l−1

for all relevant j, i and l.

4. The full collection B =⋃m

j=1{vjil} forms a basis for V . With respect to that

(ordered) basis, [T ]B,B is a block diagonal matrix with Jordan blocks.

A k × k matrix J is a Jordan block if it takes the form

J =

λ 1 0 · · · 00 λ 1 · · · 00 0 λ · · · 0...

. . ....

0 0 · · · 0 λ

Remark 1.4.60. (1) The Jordan canonical form of a matrix is uniquely determinedby the following data:

• the values and multiplicities of the eigenvalues λj,

• the block diagrams {vjil} associated with each λj, specifically, the number of

rows kj and the values

{dim Ni(λj)/Ni−1(λj) : j = 1, . . . , m, i = 1, . . . , kj}.

In other words, any two similar matrices have all of this data in common, and anytwo matrices which have all of this data in common are necessarily similar.

Exercise 1.4.61. For each j, show that λj is the only eigenvalue of T |Nj : N j → N j.

Examples 1.4.62. (1) Let

A =

1 1 −10 1 00 0 1

and let T : C3 → C3 be the linear transformation T (v) = Av. The only eigenvalueis λ = 1, of multiplicity r = 3. Thus the characteristic polynomial pT (λ) = (λ− 1)3.A computation gives

A − I =

0 1 −10 0 00 0 0

and (A−I)2 = 0, so d = d(1) = 2 and mT (λ) = (λ−1)2. The vectors e1 and e1+e2+e3

span the eigenspace N1(1), while N2(1) = C3. Choosing a vector in N2(1) \ N1(1),e.g., v12 := e1 + e2 we compute (T − I)(v12) =: v11 := e1. (Note: to simplify thenotation, since there is only one eigenvalue to keep track of in this example, we omitit from the notation.) To complete a basis for C3, we choose any other elementof N1(1), e.g., v21 = e1 + e2 + e3. Relative to the basis B = {v11, v12, v21}, thetransformation T takes the Jordan canonical form

[T ]B,B =

1 1 00 1 00 0 1

.

(2) Let

A =

2 1 1 1 10 2 0 −1 −10 0 2 1 10 0 0 2 00 0 0 0 2

and let T : C5 → C5 be the linear transformation T (v) = Av. As before, theonly eigenvalue is λ = 1, of multiplicity r = 5. Thus the characteristic polynomial

pT (λ) = (λ − 2)5. A computation gives

A − 2I =

0 1 1 1 10 0 0 −1 −10 0 0 1 10 0 0 0 00 0 0 0 0

and (A − 2I)2 = 0, so d = d(1) = 2 and mT (λ) = (λ − 2)2. The vectors e1,e2 − e3 and e4 − e5 span the eigenspace N1(2), while N2(2) = C5. Choosing twovectors in N2(2) \ N1(2), e.g., v12 := e1 + e2 and v22 = e1 + e2 + e3 − e4 we compute(T − 2I)(v12) =: v11 := e1 and (T − 2I)(v22) =: v21 := e1 + e2 − e3. To complete abasis for C5, we choose any other element of N1(2), e.g., v31 = e1 + e2 − e3 + e4 − e5.Relative to the basis B = {v11, v12, v21, v22, v31}, the transformation T takes theJordan canonical form

[T ]B,B =

2 1 0 0 00 2 0 0 00 0 2 1 00 0 0 2 00 0 0 0 2

.

We now start the proof of Theorem 1.4.57.

Lemma 1.4.63. Generalized eigenvectors associated with distinct eigenvalues arelinearly independent.

Proof. Let λ1, . . . , λm be distinct eigenvalues of a transformation T : V → V of ann-dimensional vector space V , and suppose that v1, . . . , vm are associated generalizedeigenvectors. Suppose that

a1v1 + · · · + amvm = 0; (1.4.64)

we must show that aj = 0 for all j.Let j be arbitrary. Choose k ≤ d(λj) so that vj ∈ Nk(λj) \ Nk−1(λj). Applying

the operator

(λjI − T )k−1∏

j′=1,...,m:j′ 6=j

(λj′I − T )n

to each side of (1.4.64) yields

aj(λjI − T )k−1∏

j′=1,...,m:j′ 6=j

(λj′I − T )n(vj) = 0. (1.4.65)

(Here we used Lemma 1.4.55.) For each j ′ = 1, . . . , m, j ′ 6= j, we write λj′I − T =(λj′ − λj)I + (λjI − T ). Expanding (1.4.65) using the Multinomial Theorem gives

aj

{(λjI − T )k−1

∏

j′=1,...,m:j′ 6=j

(λj′ − λj)n +

terms divisible by

(λjI − T )k

}(vj) = 0.

Henceaj

∏

j′=1,...,m:j′ 6=j

(λj′ − λj)n

︸︷︷︸6=0

(λjI − T )k−1(vj)︸︷︷︸6=0

= 0

so aj = 0.

Lemma 1.4.66. The collection of all generalized eigenvectors spans V .

Proof. We prove this by induction on n = dim V . The case n = 1 is obvious.Given V and an eigenvalue λ of T ∈ L(V, V ), we begin by establishing the

decompositionV = ker(λI − T )n ⊕ range(λI − T )n. (1.4.67)

Let V1 = ker(λI −T )n and V2 = range(λI −T )n and suppose that v ∈ V1⊕V2. Then(λI − T )nv = 0 and (λI − T )nu = v for some u. But then (λI − T )2nu = 0 so uis a generalized eigenvector for T and so (by Lemma 1.4.55) (λI − T )nu = 0. Thusv = 0. The Rank-Nullity theorem gives the equality of dimensions n = dim V =dim V1 + dim V2 which, together with the fact that V1 ∩ V2 = {0}, shows (1.4.67).

Since λ is an eigenvalue of T , V1 6= 0 so dim V2 < n. Thus the inductive hy-pothesis applies to V2 and T |V2

∈ L(V2, V2). We conclude that V2 is spanned bygeneralized eigenvectors of T |V2

, hence also of T . Clearly, V1 is also spanned bygeneralized eigenvectors of T . This finishes the proof.

Proof of Theorem 1.4.57(1). The decomposition V = N 1 ⊕ · · · ⊕ Nm follows fromLemmas 1.4.63 and 1.4.66. The fact that T : N j → N j is a consequence of theobservation that T commutes with (λjI − T )n for each j. Finally, it is clear that(λjI − T )n|Nj = 0. This finishes the proof.

Exercise 1.4.68. Give an alternate proof of the decomposition V = N 1 ⊕· · ·⊕Nm

using the division algorithm for polynomials:

If P and Q are polynomials in F[λ] with no common zeros, then thereexist A, B ∈ F[λ] so that AP + BQ ≡ 1.

(Hint: Induct to get a similar statement for m-tuples of polynomials P1, . . . , Pm withno common zeros. Then apply this to the polynomials P1(λ) = (λ − λ1)

d(λ1), . . . ,Pm(λ) = (λ − λm)d(λm).

Proof of Theorem 1.4.57(2). We will prove the identity (1.4.59). The other identity(1.4.58) will follow as a consequence of the Jordan canonical form in part (3) of thetheorem.

Form the polynomial

Q(λ) =∏

j

(λ − λj)d(λj ).

In order to show that Q = mT , it suffices to prove that Q(T ) = 0 and that Q dividesany polynomial P with P (T ) = 0.

Let v ∈ V be arbitrary. By part (1), we may write v =∑

j′ vj′ with vj′ ∈ N j′.In the j ′th term of the expansion

Q(T )(v) =∑

j′

∏

j

(T − λjI)d(λj)(vj′)

we commute the operators (T −λjI)d(λj ) so that the term with j = j ′ acts on vj′ first.Since that term annihilates vj′ we find Q(T )(v) = 0. Thus Q(T ) = 0 as desired.

Next, suppose that P ∈ F[λ] satisfies P (T ) = 0. Fix an index j and write

P (λ) = c∏

zk 6=λj

(λ − zk)δk · (λ − λj)

δ

for some values zk, c ∈ F and nonnegative exponents δk, δ. We may also assume thatc 6= 0. Let v ∈ N j be arbitrary. Then (T − λjI)δ(v) ∈ N j and P (T )(v) = 0. ByExercise 1.4.61 the operator c

∏k(T−zkI)δk is one-to-one on N j, so (T−λjI)δ(v) = 0.

Since v was arbitrary, (T − λjI)δ annihilates all of N j, so δ ≥ d(λj). Repeating thisfor each index j proves that Q divides P . This finishes the proof.

To prove part (3) of the theorem, we need the following

Lemma 1.4.69. Let T ∈ L(V, V ′) and let W ⊂ V , W ′ ⊂ V ′ be subspaces sothat W = T−1(W ′) and ker(T ) ⊂ W . Then there exists a one-to-one map T ∈L(V/W, V ′/W ′) so that Q′ ◦ T = T ◦Q, where Q : V → V/W and Q′ : V ′ → V ′/W ′

are the canonical quotient maps.

The proof of this lemma is an easy extension of Homework #1, Problem 3(a),apart from the claim that T is one-to-one. Let us sketch the proof of that claim.The map T is defined by the identity T (v + W ) := T (v) + W ′. If

T (v1 + W ) = T (v2 + W ),

then T (v1)+W ′ = T (v2)+W ′ so T (v1−v2) ∈ W ′ and v1−v2 ∈ ker(T )+T−1(W ′) =ker(T ) + W = W . Thus v1 + W = v2 + W and we have shown that T is injective.

Proof of Theorem 1.4.57(3). For the purposes of this proof, we fix j and write d =dj, λ = λj, N = N j , and so on.

We begin by choosing a basis {vji,d}i for Nd(λ)/Nd−1(λ). We now apply Lemma

1.4.69 with V = Nd(λ), W = V ′ = Nd−1(λ), W ′ = Nd−2(λ) and with the trans-formation in the lemma (called T there) given by T − λI. Check that all of thehypotheses are verified. The lemma ensures that the quotient map

T − λI : Nd(λ)/Nd−1(λ) → Nd−1(λ)/Nd−2(λ)

is one-to-one. Thus the images (T − λI)(vji,d + Nd−1(λ)) are linearly independent in

Nd−1(λ)/Nd−2(λ); we write these in the form vji,d−1+Nd−2(λ) and complete them to a

basis {vji,d−1}i of Nd−1(λ)/Nd−2(λ). We repeat the process inductively, constructing

bases {vji,l}i for Nl(λ)/Nl−1(λ). (In the final case we assume N0(λ) = {0}.) The

result of all of this is a set of vectors {vjil} which forms a basis for N = N j. (See

Homework #1, Problem 3(b) for a similar claim.)

Proof of Theorem 1.4.57(4). By part (3),

T (vjil) =

{λjv

jil + vj

i,l−1 if 1 < l ≤ pji ,

λjvjil if l = 1

for all j = 1, . . . , m, i = 1, . . . , kj. Consequently, the matrix representation of Twith respect to the basis B consists of a block diagonal matrix made up of Jordanblocks with diagonal entries λj, as asserted.

1.5 Exponentiation of matrices and calculus of vector- andmatrix-valued functions

Throughout this section, all matrices are defined over the complex numbers.

Definition 1.5.70. The Hilbert-Schmidt norm on n × n matrices is

||A||HS :=

(n∑

i,j=1

|aij|2

)1/2

, A = (aij).

Exercise 1.5.71. Verify that || · ||HS is a norm, i.e., ||0||HS = 0, ||cA||HS =|c| ||A||HS, and ||AB||HS ≤ ||A||HS||B||HS for all n × n matrices A, B and c ∈ F.

Definition 1.5.72. Let A be an n×n matrix. The matrix exponential eA is definedto be

eA =

∞∑

j=0

1

j!Aj.

Lemma 1.5.73. The infinite sum in the definition of eA converges.

Proof. Let EN(A) :=∑N

j=01j!Aj denote the Nth partial sum. Then

EM(A) − EN (A) =

N∑

j=M+1

1

j!Aj

so

||EM(A) − EN(A)||HS ≤

N∑

j=M+1

1

j!||A||jHS → 0 as M, N → ∞.

The conclusion follows.

Exercise 1.5.74. Show that eA+B = eAeB if A and B commute. Show that ingeneral eA+B and eAeB can disagree.

Lemma 1.5.75. Let A, S be n × n matrices with S invertible. For every powerseries f(t) =

∑j cjt

j converging on C we have f(S−1AS) = S−1f(A)S.

Proof. By uniform convergence it suffices to check this for polynomials. By linearityit suffices to check this for monomials. Finally,

(S−1AS)j = S−1AS S−1A · · ·AS S−1AS︸︷︷︸j repetitions

= S−1AjS.

How do we calculate eA in practice? Let B = S−1AS be the Jordan canon-ical form for A. Thus B is a block diagonal matrix with blocks in Jordan form.In this case eA = SeBS−1, so it suffices to describe eB for matrices B in Jordancanonical form. Matrix exponentiation factors across block diagonal decompositionof matrices, so it suffices to describe eJ for a matrix J consisting of a single Jordanblock.

Proposition 1.5.76. Let J be an n × n Jordan block:

J =

λ 1 0 · · · 00 λ 1 · · · 00 0 λ · · · 0...

. . ....

0 0 · · · 0 λ

Then

eJ =

eλ eλ 12eλ · · · 1

(n−1)!eλ

0 eλ eλ · · · 1(n−2)!

eλ

0 0 eλ · · · 1(n−3)!

eλ

.... . .

...0 0 · · · 0 eλ

.

Proof. Let

A =

0 1 0 · · · 00 0 1 · · · 00 0 0 · · · 0...

. . ....

0 0 · · · 0 0

so J = λI + A. By Exercise 1.5.74, eJ = eλIeA = eλ(eA). Since A is nilpotent ofstep n,

eA = I + A +1

2A2 + · · ·+

1

(n − 1)!An−1 =

1 1 12

· · · 1(n−1)!

0 1 1 · · · 1(n−2)!

0 0 1 · · · 1(n−3)!

.... . .

...0 0 · · · 0 1

.

This finishes the proof.

The theory of matrix exponentials can also be developed via systems of ODE’s.

Theorem 1.5.77. Let A be an n×n matrix and let j ∈ {1, . . . , n}. The jth columnof eA is the solution x(1) at time t = 1 to the ODE

x′(t) = Ax(t), x(0) = ej.

Proof. Expanding both sides as series shows that x(t) = etAx0 is the solution to theinitial value problem

x′(t) = Ax(t), x(0) = x0.

Choosing x0 = ej and evaluating at t = 1 finishes the proof.

The function t 7→ etA is an example of a matrix-valued function. To finish thissection, we describe a few results from the calculus of such functions.

We consider functions v : [a, b] → Fn and A : [a, b] → Fn×n taking values in afinite-dimensional vector space or the space of n × n matrices over F. Continuityand differentiability of such functions is defined in the usual way. (We use theusual Euclidean norm on Fn, and the Hilbert-Schmidt norm on Fn×n.) Most ofthe standard rules of differentiation hold true, but there are some complicationsin the matrix-valued case due to the possibility of non-commutativity. The chainrule for differentiating matrix-valued maps need not hold in general. Indeed, ifA : [a, b] → Fn×n is a differentiable matrix-valued map, then

d

dtAj = A · Aj−1 + A A Aj−2 + · · · + Aj−1 · A,

where A = dAdt

. In particular, if A(t) and A(t) commute for some value of t, thenddt

Aj = jAj−1A andd

dtp(A)(t) = p′(A)(t)A(t)

for every polynomial p. Even if A(t) and A(t) do not commute, we still have

d

dttr p(A)(t) = tr(p′(A)(t)A(t))

since the trace function is commutative.To conclude, we describe a remarkable connection between determinant and

trace. Roughly speaking, trace is the derivative of determinant. Consider the func-tion t 7→ det A(t) for some matrix-valued function A satisfying A(0) = I. Wemay write A(t) = (a1(t), . . . , an(t)) in terms of columns with aj(t) = ej, and viewt 7→ det(a1(t), . . . , an(t)) as a multilinear function valued on n-tuples of vectors.Then

d

dtdet(a1, . . . , an) = det(a1, a2, . . . , an) + · · ·+ det(a1, a2, . . . , an);

evaluating at t = 0 gives

d

dtdet A(t)

∣∣∣∣t=0

= det(a1(0), e2, . . . , en) + · · · + det(e1, e2, . . . , an(0))

= ˙a11(0) + · · · ˙ann(0) = tr A(0).

More generally, if Y (t) is any differentiable matrix-valued function and Y (t0) isinvertible, then

d

dtlog det Y (t)

∣∣∣∣t=t0

= tr(Y −1(t0)Y (t0)

).

To prove this, set A(t) = Y (t0)−1Y (t0 + t) and apply the preceding result.

1 linear algebra

Documents