chapteriii - people.math.carleton.ca

CHAPTER III

SPECTRA OF OPERATORS

In this chapter, we investigate the central problem in linear algebra: the eigenvalue

and eigenvector problem. The importance of this problem can be understood from a purely

mathematical point of view: it is the gateway leading to our understanding of the structure

of a linear operator. It is also needed for understanding our physical world. We can tell if

a star million light years away is mainly composed of hydrogen atoms by reading through

a spectroscope the “eigenvalues” of the Schrodinger operator for hydrogen from the light

it emits. When a bell shows cracks, it starts to sound dull because of the decrease in

each eigenvalue of certain operator. Working in natural science or engineering research,

we should always be prepared to encounter eigenvalue problems.

§1. Eigenvalues and Eigenvectors

1.1. As we know, given a linear operator T on a finite dimensional space V ,

we can convert it into a matrix by using a basis of V . However, different bases give

different matrix representation of T . Naturally we wonder, can we pick a basis so that the

corresponding matrix representing T is a diagonal matrix, which is considered to be the

simplest? Certainly, diagonal matrices are easy for computation purposes and an answer

to this question has practical values. Suppose that a basis B in V consisting of vectors

v1, v2, . . . , vn is judicially chosen so that the matrix representing T relative to B is

diagonal, say

[T ]B =

λ1 0 · · · 00 λ2 · · · 0...

......

0 0 · · · λn

.

This means Tvj = λjvj , or (T −λjI)vj = 0 for all j = 1, 2, . . . , n. This identity indicates

that λj is an eigenvalue of T and vj is a corresponding eigenvector.

Definition 1.1. By the spectrum of an operator T on a finite dimensional complex

vector space V , denoted by σ(T ), we mean the set of all complex numbers λ such that

T −λI is not invertible. W e know that an operator on a finite dimensional vector space is

1

not invertible if and only if its kernel is nonzero. Therefore we may put

σ(T ) = {λ ∈ C : ker(T − λI) �= {0} }.

A complex number λ in the spectrum σ(T ) is also called an eigenvalue of T . By our

definition here, if λ is an eigenvalue λ of T , the subspace ker(T − λI) is nonzero. This

subspace is called the eigenspace corresponding to λ. A nonzero vector in this eigenspace

is called an eigenvector of T corresponding to the eigenvalue λ.

Notice that

v ∈ ker(T − λI) ⇔ (T − λI)v = 0 ⇔ Tv = λv.

Hence we have:

λ ∈ σ(T ) if and only if Tv = λv holds for some nonzero vector v.

♠ Aside: The importance of the word “nonzero” in the above statement cannot be overem-

phasized. If it were dropped, the statement would become absolute nonsense, because

Tv = λv is always satisfied for some vector v, namely, 0. ♠

1.2. How do we find eigenvalues? Take any basis E in V and consider the matrix

representation [T − λI] := [T ]− λI of T − λI relative to this basis; (here we use the same

symbol for the identity operator as well as the identity matrix). Now λ is an eigenvalue

of T means that T − λI is not invertible, and descending to matrices, this means that

[T ]− λI is not invertible, or λI − [T ] is not invertible. But we know that a matrix is not

invertible if and only if its determinant is zero. Hence λ is an eigenvalue of T if and only

if it satisfies:

det(λI − [T ]) = 0. (1.2.1)

If the dimension of the space V is n, i.e. if [T ] is an n × n matrix, then (1.2.1) is a

polynomial equation in λ of degree n. It is called the characteristic equation of the

matrix [T ], or of the operator T . The expression det(λI − [T ]), which is a polynomial in

λ of degree n, (n = dimV ), is called the characteristic polynomial of the operator T .

Example 1.2.1. Define an operator T on P2 by

T (p(x)) = xp′(x)− p(x+ 1).

2

Take the standard basis B = {1, x, x2} of P2. To find the matrix [T ]B, we compute:

T (1) = 0− 1 = −1,

T (x) = x− (x+ 1) = −1,

T (x2) = x(2x)− (x+ 1)2 = x2 − 2x− 1.

Hence

[T ] ≡ A =

−1 −1 −10 0 −20 0 1

, and det([T ]− λI) =

∣

∣

∣

∣

∣

∣

−1− λ −1 −10 −λ −20 0 1− λ

∣

∣

∣

∣

∣

∣

.

Thus the characteristic equation of T is

(−1− λ)(−λ)(1− λ) = 0

and hence the eigenvalues of T are 1, 0,−1. In other words, σ(T ) = {1, 0,−1}. Next we

find an eigenvector of T for each eigenvalue. First consider the eigenvalue λ = 1. An

eigenvector corresponding to λ = 1 is a polynomial p ≡ p(x) such that (T − λI)(p) = 0.

This gives [T − λI][p] = 0, or ([T ]− λI)[p] = 0. Here [T ] is the matrix A given above, λ is

1, and [p] is a column, say [p] = X = [x1, x2, x3]⊤. Thus we have (A− I)X = O, i.e.

−2 −1 −10 −1 −20 0 0

x1x2x3

=

000

.

This matrix equation gives the following homogeneous system of linear equations:

(−2)x1 + (−1)x2 + (−1)x3 = 0

(−1)x2 + (−2)x3 = 0

We only need to find one nontrivial solution to this equation. This is easy to do. Set x3 = 1.

Then, from the second equation, we have x2 = (−2)x3 = −2. From the first equation,

we have x1 = (1/2)(−x2 − x3) = (1/2)(−(−2) − 1) = 1/2. Thus X = [1/2,−2, 1]⊤ is

a solution. To get a neater expression, we multiply this solution by 2 to obtain another

solution X1 = [1,−4, 2]⊤. The polynomial p1(x) with X1 = [1,−4, 2]⊤ as its column

representation relative to the standard basis {1, x, x2} is p1(x) = 1−4x+2x2. (Aside. We

can check that this polynomial is indeed an eigenvector corresponding to λ = 1:

(T − I)(p1) = xp′1(x)− p1(x+ 1)− p1(x)

= x(−4 + 4x)− (1− 4(x+ 1) + 2(x+ 1)2)− (1− 4x+ 2x2)

= 0. end of Aside)

3

In the same way, we find an eigenvector p2(x) = −1 + x corresponding to λ = 0 and

an eigenvector p3(x) = 1 corresponding to λ = −1. It is easy to see that p1(x), p2(x)

and p3(x) are linearly independent. (This fact is not accidental: in the next section

we will prove that eigenvectors corresponding to distinct eigenvalues are always linearly

independent.) Since dimP2 = 3, three linearly independent “vectors” in P2 form a basis

of P2. Therefore

B = {p1, p2, p3} ≡ {1− 4x+ 2x2, −1 + x, 1}

is a basis of P2. From T (p1(x)) = p1(x), T (p2(x)) = 0 and T (p3(x)) = −p3(x), we get a

diagonal [T ]B:

[T ]B =

1 0 00 0 00 0 −1

.

Thus we can say: the basis B diagonalizes the operator T .

1.3. The key to deal with the diagonalization problem is the following

Theorem 1.3.1. If v1,v2, . . . ,vr are eigenvectors of a linear operator T on a complex

vector space corresponding to eigenvalues λ1, λ2, . . . , λr respectively and if these eigenvalues

λ1, λ2, . . . , λr are all distinct, then the vectors v1,v2, . . . ,vr are linearly independent.

We prove this theorem by induction on is the number of given eigenvectors r. In

case r = 1, we consider only one vector v1. This vector is an eigenvector of T . Since

an eigenvector by definition is a nonzero vector, the single vector v1 forms a linearly

independent set.

Now we make the induction hypothesis: the statement is true for r = k. We consider

the situation r = k+1. Thus, we are provided with k+1 eigenvectors v1,v2, . . . ,vk,vk + 1

corresponding to distinct eigenvalues λ1, λ2, . . . , λk, λk + 1:

Tv1 = λ1v1, . . . , Tvk = λkvk, Tvk + 1 = λk + 1vk + 1. (1.3.1)

To show that these k + 1 eigenvectors are linearly independent, we set

a1v1 + a2v2 + · · ·+ akvk + ak + 1vk + 1 = 0. (1.3.2)

We have to show that a1, a2, . . . , ak + 1 all equal zeros. To this end, we apply operator T to

this identity and use the relation (1.3.1) for obtaining

a1λ1v1 + a2λ2v2 + · · ·+ akλkvk + ak + 1λk + 1vk + 1 = 0.

4

Subtract this identity from λk + 1 times (1.3.2), noticing that the last term on the left-hand

side is canceled out:

a1(λ1 − λk + 1)v1 + a2(λ2 − λk + 1)v2 + · · ·+ ak(λk − λk + 1)vk = 0.

Since v1,v2, . . . ,vk are eigenvalues of T with respect to distinct eigenvalues λ1, λ2, . . . , λkrespectively, these k eigenvectors, by our induction hypothesis, are linearly independent.

Thus the above identity entails

a1(λ1 − λk + 1) = 0, a2(λ2 − λk + 1) = 0, . . . , ak(λk − λk + 1) = 0.

We have assumed that λ1, λ2, . . . . . . , λk, λk + 1 are distinct. In particular

λ1 − λk + 1 �= 0, λ2 − λk + 1 �= 0, . . . , λk − λk + 1 �= 0.

Therefore we must have a1 = 0, a2 = 0, . . . , ak = 0. It remains to show ak + 1 = 0.

Return to (1.3.2). We can rewrite this identity as ak + 1vk + 1 = 0. Since vk + 1 is nonzero

(because it is an eigenvector), we also have ak + 1 = 0. The proof is complete.

Theorem 1.3.2. If a linear operator T defined on an n-dimensional complex vector

space V has n distinct eigenvalues, then T is diagonalizable, that is, there exists a basis B

consisting of eigenvectors of T so that the representing matrix [T ]B of T relative to B is a

diagonal matrix.

Let λ1, λ2, . . . , λn distinct eigenvalues of T and let v1,v2, . . .vn be their corresponding

eigenvectors: Tv1 = λ1v1, Tv2 = λ2v2, . . . , Tvn = λnvn. By Theorem 1.3.1, we know

that v1,v2, . . . ,vn are linearly independent. Since n = dimV , these vectors form a basis

of V . So the above theorem is valid.

1.4. Let A be an n× n complex matrix. We may regard A as a linear operator on

Cn; (that is, we identify A with the linear operator MA defined by MAx = Ax for all

x in Cn). Assume that A has n distinct eigenvalues λ1, λ2, . . . , λn with corresponding

eigenvectors P1, P2, . . . , Pn (which are column vectors). According to Theorem 1.3.1,

these column vectors in linearly independent. Let us write

AP1 = P1λ1, AP2 = P2λ2, . . . , APn = Pnλn.

Here, certainly Pkλk is the scalar multiple of the column vector Pk by λk. The reason

we write in this way instead of λkPk is because Pk is regarded as an n × 1 matrix and

λk as an 1 × 1 matrix. The correct order here is crucial for performing block matrix

5

multiplication below. Let P be the n × n matrix [P1 P2 · · · Pn]. The matrix P is

invertible, since it is a square matrix and its columns are linearly independent. Now

AP = A[P1 P2 · · · Pn] = [AP1 AP2 · · · APn]

= [P1λ1 P2λ2 · · · Pnλn] = [P1 P2 · · · Pn]

λ1 0 · · · 00 λ2 · · · 0...

0 0 · · · λn

= PD,

Thus AP = PD, where D is a diagonal matrix with eigenvalues of A along its diagonal,

or A = PDP−1, which gives a diagonalization of matrix A.

Example 1.4.1. In this example we show how diagonalization helps for solving linear

differential equations. We are asked to find a general solution to the system of equations

dy1dt

= y1 + y2,dy2dt

= 2y1.

We can rewrite this system as

dy

dt= Ay with y =

[

y1y2

]

, A =

[

1 12 0

]

.

Using the method described in Example 1.2.1, we find the eigenvalues 2, −1 of A with

corresponding eigenvectors P1 = (1, 1), P2 = (1,−2). Let

P =

[

1 11 −2

]

and D =

[

2 00 −1

]

with P−1 = −1

3

[

−2 −1−1 1

]

.

Then we have AP = PD, as we can check directly. Replacing A in dy/dt = Ay by

PDP−1, we have dy/dt = PDP−1y, or dP−1y/dt = DP−1y. Let w = P−1y. Then

dw/dt = Dw and y = Pw. Thus we have

dw1

dt= 2w1,

dw2

dt= −w2 with y1 = w1 +w2, y2 = w1 − 2w2.

The new system of differential equations is easy to solve: w1 = C1e2t, w2 = C2e

−t. Our

final answer is

y1 = C1e2t +C2e

−t, y2 = C1e2t − 2C2e

−t.

(The reader should check this answer.)

Example 1.4.2. Consider the system of difference equations

un + 1 = un + vn, vn+ 1 = 2un; n ≥ 0.

6

We can rewrite this system as

yn+ 1 = Ayn with yn =

[

unvn

]

, A =

[

1 12 0

]

.

We have y1 = Ay0, y2 = Ay1 = A2y0, y3 = Ay2 = A3y0, etc. In general, yn = Any0.

So, in order to find the general solution to this system of difference equations, we need to

give an explicit expression for An. Using the method described in Example 1.2.1, we find

A = PDP−1, where

P =

[

1 11 −2

]

, D =

[

2 00 −1

]

and P−1 = −1

3

[

−2 −1−1 1

]

.

Now

An = (PDP−1)n = PDP−1PDP−1PDP−1 · · ·PDP−1 = PDnP−1.

So

An = −1

3

[

1 11 −2

] [

2n 00 (−1)n

] [

−2 −1−1 1

]

=1

3

[

2n+ 1 + (−1)n 2n − (−1)n

2n+ 1 − 2(−1)n 2n + 2(−1)n

]

Thus [un vn]⊤ = yn = Any0 gives

un =2n+ 1 + (−1)n

3u0 +

2n − (−1)n

3v0,

vn =2n+ 1 − 2(−1)n

3u0 +

2n + 2(−1)n

3v0.

Example 1.4.3. An elevator in Herzberg Building has two states: s1 for “working”

and s2 for “out of order”. Let pij denote the probability of being in state i on the next day,

when today’s elevator is in state j. A student trying to take this elevator everyday comes

up with the following subjective probabilities: p11 = 0.5, p21 = 0.5, p12 = 0.1 p22 = 0.9

after a year of observation. Let

P =

[

p11 p12p21 p22

]

=

[

0.5 0.10.5 0.9

]

To find the frequency of the elevator in working condition, we need to compute limn→ ∞ Pn.

Following the method described in Example 1.2.1, we find eigenvalues 1, 0.4 with corre-

sponding eigenvectors (1, 5), (1,−1). Then we have P = SDS−1, where

S =

[

1 15 −1

]

, D =

[

1 00 0.4

]

and S−1 =1

6

[

1 15 −1

]

.

7

As n → ∞,

Pn = SDnS−1 =1

6

[

1 15 −1

] [

1 00 0.4n

] [

1 15 −1

]

tends to1

6

[

1 15 −1

] [

1 00 0

] [

1 15 −1

]

=

[

1/6 1/65/6 5/6

]

.

So, on the average, about 1/6 of time the elevator is in working condition. This seems to

fit the student’s experience over the year.

1.5. We have seen that, if A = PDP−1, then An = PDnP−1. More generally, if

p is a polynomial, then p(A) = Pp(D)P−1. This suggests the definition of f(A) for any

function f (defined on the spectrum σ(A) of A) by putting f(A) = Pf(D)P−1, where

f(D) =

f(λ1) 0 · · · 00 f(λ2) · · · 0

0 0 · · · f(λn)

for D =

λ1 0 · · · 00 λ2 · · · 0

0 0 · · · λn

.

Computing f(A) is called functional calculus of A. Besides polynomial, another com-

monly used functions for functional calculus is ft(x) = ext, where t is a parameter. We

will write ft(A) as eAt in the future.

Consider the initial value problem dy/dt = Ay with y(0) = y0, where y0 is a given

vector in Cn. Formally we can write down the solution as

y(t) = eAty0. (1.5.1)

The 1–dimensional case is well known: the initial value problem dy/dt = ay with y(0) = y0is y(t) = eaty0. It is known that the Taylor expansion for the exponential function eat is

given by eat =∑∞

n = 0antn/n!. Analogously we have

eAt =∞∑

n= 0

An

n!tn = I +At+

A2

2!t2 +

A3

3!t3 + · · · . (1.5.2)

Example 1.5.1. From Example 1.4.2 we see that

An =1

3

[

2.2n + (−1)n 2n − (−1)n

2.2n − 2(−1)n 2n + 2(−1)n

]

where A =

[

1 12 0

]

.

8

So, from (1.5.2) we find that

eAt =1

3

[

2∑

2n

n! tn+

∑ (−1)n

n! tn∑

2n

n! tn−

∑ (−1)n

n! tn

2∑

2n

n! tn− 2

∑ (−1)n

n! tn∑

2n

n! tn+ 2

∑ (−1)n

n! tn

]

=1

3

[

2e2t+ e−t e2t− e−t

2e2t− 2e−t e2t+ 2e−t

]

.

Here we simply write∑

for∑∞

n= 0. Hence we have

[

y1y2

]

= y = eAty0 =1

3

[

(2e2t + e−t)y1(0) + (e2t − e−t)y2(0)(2e2t − 2e−t)y1(0) + (e2t + 2e−t)y2(0)

]

.

The reader should compare this answer to the general solution in Example 1.4.1, namely

y1 = C1e2t + C2e

−t, y2 = C1e2t − 2C2e

−t. Setting t = 0, we have y1(0) = C1 + C2 and

y2(0) = C1 − 2C2, which gives C1 = (2y1(0) + y2(0))/3, C2 = (y1(0)− y2(0))/3 and hence

y1 =2e2t + e−t

3y1(0) +

e2t − e−t

3y2(0)

y2 =2e2t − 2e−t

3y1(0) +

e2t + 2e−t

3y2(0).

The answer agrees with the present one.

In some cases, eAt can be obtained directly by using (1.5.2).

Example 1.5.2. Find eAt in each of the following cases:

(a) A =

[

0 10 0

]

(b) A =

[

1 10 0

]

.

Solution. (a) Direct computation shows An = O for n ≥ 2. So (1.5.2) gives

eAt = I +At =

[

1 t0 1

]

.

(b) Direct computation shows An = A for all n ≥ 1. So (1.5.2) gives

eAt = I +A∞∑

n= 1

tn

n!= I +A(et − 1) = (I −A) + etA =

[

et et − 10 1

]

.

9

EXERCISE SET III.1.

Review Questions. What is the spectrum σ(T ) of a linear operator T (defined on a

finite dimensional complex vector space V )? What are the numbers in σ(T ) called? How

to find σ(T ) by using the characteristic polynomial? What does the word “eigenspace”

mean? What is the diagonalization problem? Why is it intimately related to the eigenvalue

problem? What is the significance to diagonalization, of the fact which roughly says that

eigenvectors corresponding to distinct eigenvalues are linearly independent? How do we

use this fact to conclude that a linear operator defined on an n-dimensional complex space

with n distinct eigenvalues is diagonalizable?

Drills

1. Find the eigenvalues and their corresponding eigenvectors for each of the following

matrices:

(a)

[

1 20 3

]

(b)

[

1 11 1

]

(c)

[

1 11 −1

]

(d)

[

1 34 2

]

(e)

[

0 1 + i1− i 0

]

.


matrices; (here, a, b, p, q z are arbitrary complex numbers and θ is a real number).

(a)

[

a bb a

]

(b)

[

p 1− pq 1− q

]

(c)

[

0 zz 0

]

(d)

[

cos θ sin θ− sin θ cos θ

]

.

3. In each of the following cases, for the given matrix A, find an invertible matrix P such

that P−1AP is a diagonal matrix.

(a)

[

2 11 2

]

(b)

[

2 34 1

]

(c)

[

0 3 + 4i3− 4i 0

]

(d)

[

cos θ sin θsin θ − cos θ

]


matrices. (If the corresponding eigenspace has dimension > 1, you should find a basis

for this eigenspace.)

(a)

1 1 10 2 10 0 3

(b)

1 1 10 1 10 0 1

(c)

1 1 00 1 00 0 1

.

5. Find the eigenvalues and their corresponding eigenvectors for the linear operators T

on V for each of the following cases:

10

(a) V = P2, T (p(x)) = xp′(x).

(b) V =M2,2 (the space of all 2× 2 matrices), T (A) = A⊤ (the transpose of A.)

(c) V = R3, T (x) = ω × x (cross product), where ω is a fixed unit vector.

6. In each of the following cases, find the 2× 2 matrix A.

(a) 0 and 1 are eigenvalues of MA with corresponding eigenvectors [0 1]⊤ and [i 1]⊤

respectively.

(b) [1 i]⊤ is an eigenvector of MA with corresponding eigenvalue i, and the first

column of A is [1 1]⊤.

7. True or false:

(a) Real matrices have real eigenvalues.

(b) If λ, µ are eigenvalues of n× n matrices A and B respectively, then λ + µ must

be an eigenvalue of A+B.

(c) If λ is an eigenvalue of a linear operator T , then, for each scalar a, λ − a is an

eigenvalue of T − aI.

(d) The sum of two diagonalizable linear operators is diagonalizable.

(e) The product of two diagonalizable linear operators is diagonalizable.

(f) If a diagonalizable operator is invertible, then its inverse is also diagonalizable.

(g) If a linear operator T is diagonalizable, then so is its square T 2.

8. In each of the following cases, find a basis relative to which the representation matrix

of the given operator T on V is diagonal, if possible:

(a) V = P2, T (p(x)) = p(0) + p(2)x2.

(b) V = P2, T (p(x)) = p(0) + p′(1)x+ p(2)x2.

(c) V =M2,2 (the space of 2× 2 matrices), T (X) = AX , where A =

[

1 10 2

]

.

9. In each of the following cases, use the formula eAt =∑ ∞

n= 0An/n! to find eAt where

A is the given matrix.

(a)

[

1 10 0

]

(b)

[

2 00 3

]

(c)

[

0 20 0

]

(d)

[

1 a0 0

]

(e)

[

2 02 0

]

(f) D =

0 1 0 10 1 0 00 0 0 10 0 0 0

(g) D =

0 1 1 10 0 1 10 0 0 10 0 0 0

.

11

Exercises

1. Let T be a linear operator on a complex vector space V (not necessarily finite dimen-

sional) and let λ be a complex number. Prove the following statements:

(a) If λ is not an eigenvalue of T , then T is invertible.

(b) if λ2 is an eigenvalue of T 2, then either λ or −λ is an eigenvalue of T .

2. Find the characteristic polynomial of the matrix

A =

0 0 0 −a01 0 0 −a10 1 0 −a20 0 1 −a3

.

Also show that, if α is an eigenvalue of A, then [1, α, α2, α3]⊤ is an eigenvector of A⊤.

3. In each of the following cases, use the formula eAt =∑ ∞

n= 0An/n! to find eAt for

the given matrix A (where a is any nonzero constant).

(a) A =

[

a 10 a

]

, (b) A =

[

a 10 0

]

, (c) A =

[

0 11 0

]

.

(Hint for (a): write A = aI + N with N2 = O. Hint for (b): write A = aP with

P 2 = P .)

4. Solve the eigenvalue problem for the following circulant matrix:

C =

a0 a1 a2 a3a3 a0 a1 a2a2 a3 a0 a1a1 a2 a3 a0

,

where a0, a1, a2, a3 are arbitrary complex numbers. (As you may recognize, the

difficulty of this problem lies in the fact that the numbers a0, a1, a2, a3 are not

specifically given.) Hint: Consider the special case of P which is obtained from C by

setting a0 = 0, a1 = 1, a2 = 0 and a3 = 0. Notice that C = a0I+a1P +a2P2+a3P

3.

5. Let T ∈ V be a diagonalizable operator, that is, there is a basis in V consisting of

eigenvectors of T , say v1,v2, . . . ,vn. Prove that, if v := v1 + v2 + · · ·+ vn is also an

eigenvector of T , then T is a scalar multiple of the identity operator, i.e. T = λI for

some scalar λ.

12

§2. SUMMATION NOW! (And Change of Basis)

2.1 The present section has a technical aspect different from other sections: the heavy

dose of the summation symbol∑

. The usage of the summation symbol spreads broadly in

science and engineering literature (such as research papers and technical reports) and the

skill of handling it is a “must” for any professional scientist or engineer. In what follows

you will find some detail explanation in many steps involving the summation symbol to

help you to understand the mental process in juggling with this symbol.

The mathematical expression∑n

k = 1ak

is read as “the sum of all ak, where k runs from 1 to n”. When k runs from 1 to

k, ak goes through the following list of symbols

a1, a2, . . . , an (2.1)

and hence∑n

k = 1 ak stands for the sum a1+a2+ · · ·+an. The letter k in the expression∑n

k = 1 ak is said to be dummy because you can replace it by other letters, say j. Indeed,∑n

j = 1 aj is the same as∑n

k = 1 ak, because both of them represent the sum a1+a2+· · ·+an.

This is not a frivolous remark because, as you will see, in some circumstances changing

letters is absolutely necessary.

Besides changing letters, there are other ways to write the sum∑n

k = 1 ak, such as∑n−1

k = 0 ak + 1. To see that they are the same sum, we rewrite∑n

k = 1 ak as∑n

j = 1 aj and then

set j = k+1. Notice that aj becomes ak + 1, and, as j runs from 1 to n, k = j−1 runs from

0 to n − 1. Also notice that we can rewrite∑n

k = 0 ak as a0 +∑n

k = 1 ak. The following

two identities are self evident:

∑n

k = 1cak = c

∑n

k = 1ak

∑n

k = 1(ak + bk) =

∑n

k = 1ak +

∑n

k = 1bk.

Avoid by all means bad mistakes such as putting∑n

k = 1 akbk =∑n

k = 1 ak∑n

k = 1 bk, which

is terribly wrong.

Example 2.1.1.. What is the value of∑n

k = 0 c, where c is a constant? What is the

value of∑n

k = 0 (−1)k ?

Solution. The sum∑n

k = 0 c is obtained by adding a number of c’s. The crucial thing

is to count how may c’s are there. Notice that there are n+ 1 integers from 0 to n. So we

have∑n

k = 0 c = (n+ 1)c. The sum∑n

k = 0 (−1)k is 1− 1 + 1− 1 + · · ·+ (−1)n. If the last

term (−1)n is −1, which occurs exactly when n is odd, then all terms canceled so that the

13

answer is 0; otherwise the sum is 1. We conclude that∑n

k = 0 (−1)k is 1 when n is even,

and is 0 if n is odd.

We will encounter “double summation” of the form

∑n

j,k = 1ajk (2.1.2)

The is read as “the sum of all ajk where j and k are running independently from 1 to n.

We can rewrite it as one of the following

∑n

j = 1

∑n

k = 1ajk,

∑n

k = 1

∑n

j = 1ajk. (2.1.3)

When n = 2, the last two expressions are (a11+a12)+(a21+a22) and (a11+a21)+(a12+a22)

respectively, which are certainly equal. More generally, we have the following identity

∑m

j = 1

∑n

k = 1ajk =

∑n

k = 1

∑m

j = 1ajk. (2.1.4)

The validity of this identity is explained as follows. Both sides of this identity are the sum

of all entries cij of the matrix [cij ]. The only difference is in the ways they are performed:

one by taking row sums first, followed by adding up all row sums; the other by taking the

column sums first. Common sense tells us that they lead to the same sum. Thus, we can

switch the order of summation over j and summation over k, if j and k are indices running

independently.

2.2. We go back to check the identity (2.7.1) at the end of §2 in Chapter I, which

says, Relative to a basis B = {b1, b2, . . . , bn} of V , for S, T ∈ L (V ), a, b ∈ F and

v ∈ V , we have

[aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Tv] = [T ][v]. (2.2.1)

We begin with verifying the last identity [Tv] = [T ][v]. Let us put

[v] = [v1 v2 · · · vn]⊤, [Tv] = [u] = [u1 u2 · · · un]

⊤ and [T ] = [tij ]1≤i.j≤n.

According to our definition of column representation of vectors and matrix representation

of linear maps, we have

v =n∑

j = 1

vj bj , Tv =n∑

i= 1

ui bi and Tvj =n∑

i= 1

tij bj . (2.2.2)

14

Notice that our choice of letters for indices in (2.2.2) allows an immediate substitution.

Now

∑n

i= 1uibi = Tv = T

(∑n

j = 1vjbj

)

=∑n

j = 1vj Tvj

=∑n

j = 1vj∑n

i = 1tijbi =

∑n

j = 1

∑n

i= 1vj tijbi

=∑n

i= 1

∑n

j = 1tijvjbi =

∑n

i= 1

(∑n

j = 1tijvj

)

bi

(2.2.3)

♠ Aside: Here we use various elementary properties of summation for manipulating com-

plicated sums. We have used

vj∑n

i= 1tijbi =

∑n

i = 1vjtijbi.

This is legitimate because only i is running in taking the sum, while the index j is inde-

pendent of i and hence can be regarded as fixed if you want. So, essentially we are using

the identity a∑m

i = 1bi =∑m

i= 1abi here, which is clearly true. For the same reason we can

pull out bi from a sum where j is running:

∑n

j = 1tijvjbi =

(∑n

j = 1tijvj

)

bi.

We also have switched the order of two summation symbols∑n

i= 1 and∑n

j = 1 in the above

manipulation. This is possible – in fact we have the following general identity

∑n

i= 1

∑n

j = 1aij =

∑n

j = 1

∑n

i= 1aij .

(See (2.1.3) above.) ♠

Both sides of the identity (2.2.3) are linear combinations of vectors b1,b2, . . . ,bn,

which form a basis of V . Hence the corresponding coefficients of these two linear combi-

nations are the same:

ui =∑n

j = 1tijvj , 1 ≤ i ≤ n.

We can collect the above identities and put them into the matrix form:

u1u2...un

=

t11 t12 · · · t1nt21 t22 · · · t2n...

......

tn1 tn2 · · · tnn

v1v2...vn

,

which is just [u] = [T ][v].

15

Next we check [aS + bT ] = a[S] + b[T ] and [ST ] = [S][T ]. For each vector x in V ,

[aS + bT ][x] = [(aS + bT )(x)] = [aS(x) + bT (x)]

= a[S(x)] + b[T (x)] = a[S][x] + b[T ][x] = (a[S] + b[T ])[x].

Since an arbitrary column X with n-entries can always by put in the form [x] for some

vector x in V , we must have [aS + bT ] = a[S] + b[T ], in view of the following

Fact. m× n matrices A and B are equal if AX = BX holds for each column X.

This fact can be quickly checked as follows. Suppose AX = BX for all X . Let X =

[1, 0, 0, . . . , 0]⊤. Then AX and BX are the first columns of A and B respectively. So

AX = BX tells us that the first columns of A and B are equal. The same argument shows

that other columns are equal.

Similarly, we have [ST ][x] = [(ST )(x)] = [S(Tx)] = [S][Tx] = [S][T ][x], from which

it follows that [ST ] = [S][T ].

2.3. Let V be a finite dimensional vector space and let E and F be two bases in V .

What is the connection between the column representations [v] E and [v]F relative to these

two bases, for an arbitrary vector v in V ? In other words, what is the effect to the column

representation of a vector when the basis is changed? To answer this question, we have

to be more specific about these two bases: let E consist of vectors e1, e2, . . . , en and let F

consist of f1, f2, . . . , fn. Take an arbitrary vector v in V . Let

[v] E = (v1, v2, . . . , vm) and [v]F = (w1, w2, . . . , wn)

These are the column representations of the same vector v relative to two different bases

E and F . Recalling what it means by the column representation of a vector, we write

v = v1e1 + v2e2 + · · ·+ vmen = w1f1 + w2f2 + · · ·+wnfn.

Again we use summation symbols to streamline this:

v =∑

jvjej =

∑

kwkfk ;

it is understood both that j and k run from 1 to m. Suppose, for each j with 1 ≤ j ≤ m,

the column representation of ej , relative to F , is given by [ej ]F = (p1j , p2j , . . . , pnj).

Similarly, for each k with 1 ≤ k ≤ n, we suppose [fk] E = [q1k, q2k, . . . , qmk). Thus

ej =∑

k pkjfk and fk =∑

j qjkej . We have set up everything. It is time to compute:

v =∑

jvjej =

∑

jvj∑

kpkjfk =

∑

j

∑

kvjpkjfk

=∑

k

∑

jpkjvjfk =

∑

k

(∑

jpkjvj

)

fk.

16

(Aside: By now you should be capable of understanding what is going on with the above

manipulation involving double summation.) Recall that we also have v =∑

k wkfk. Use

this to replace the left-hand side of the previous identity:

∑

kwkfk =

∑

k

(∑

jpkjvj

)

fk.

Since F = {f1, f2. . . . , fn} is a basis, this identity entails

wk =∑

jpkjvj. (2.3.1)

which is called the “change of basis formula”. It can be rewritten in the matrix form as

[v]F = [pij ][v] E , or [v]F = P [v]E (2.3.2)

where P = [pij ] is of course the n × n matrix with pij in its (i, j)-entry. (♠ Aside: This

is the identity telling us the relation between the column representation of v relative to

different bases E and F . But it is not easy to apply, unless you are a highly organized

person with great skill in bookkeeping. Fortunately only the existence of P concerns us.♠)

Reversing the roles of v’s and w’s, exchanging j and k and replacing p by q, we have

something similar

vj =∑

kqjkwk, or [v] E = Q[v]F (2.3.3)

where Q = [qij ]. We need a new letter ℓ for the index k to rewrite the above identity as

vj =∑

ℓ qjℓwℓ Then we substitute this into (2.3.1) to get

wk =∑

jpkjvj =

∑

jpkj

∑

ℓqjℓwℓ =

∑

j

∑

ℓpkjqjℓwℓ

=∑

ℓ

∑

jpkjqjℓwℓ =

∑

ℓ

(∑

jpkjqjℓ

)

wℓ.

This identity holds for all w’s and hence we must have:

∑

jpkjqjℓ = δkℓ ≡

{

1, if k = ℓ;0, otherwise.

(2.3.4)

We can rewrite (2.3.3) in matrix form: PQ = In. By reversing the roles of p’s and q’s,

we obtain in the same way that QP = In. This shows that P is an invertible matrix with

[v]F = P [v] E and hence [v]E = P−1[v]F , which is the same as (2.3.3) above.

2.4. Next we consider the effect on the matrix representation of an operator on V ,

say T , due to the change of basis. We retain above notation such as E, F . We have to find

17

a relation between [T ]E and [T ]F , the matrix representations of the same linear operator

T relative to two different bases E and F .

Take an arbitrary vector x in V and let y be its image under T : y = T (x). Relative

to the basis E , we have the matrix representation [T ] E for the operator T and the column

representations [x]E and [y] E respectively for the vectors x and y. Recall from §2.2 that

they are related by [y]E = [T ] E [x]E . If we replace the basis E by F , in the same way we

have [y]F = [T ]F [x]F . By the above discussion about change of basis, we know that there

is an invertible matrix P such that [x]F = P [x] E . We have the following four identities:

[x]F = P [x] E , [y]F = P [y]E , [y]E = [T ] E [x] E , [y]F = [T ]F [x]F .

What can we do with them? Well, something natural: substitute the first and the second

to the fourth identity to obtain P [y]E = [T ]FP [x] E . Then substitute the third identity to

the left-hand side of this identity to arrive at P [T ] E [x]E = [T ]FP [x] E . The last identity

holds for every column X ≡ [x] E . Therefore we must have P [T ] E = [T ]FP , (by the fact

stated in §1.11.) As P is an invertible matrix, this can be rearranged as

[T ]F = P [T ] E P−1.

Definition. We say that two n×n matrices A and B are similar, or A is similar to

B, and we write A ∼ B, if there is an invertible n× n matrix P such that A = PBP−1.

Aside: In this definition, we may replace the last identity by A = P−1BP , because it is

just a matter of change in notation: replace P by P−1.) By means of this definition, we

can conclude the above discussion as follows:

Theorem 2.4.1. Representing matrices of a linear operator relative to different bases

are similar.

2.5. In general it is rather difficult to tell if two matrices are similar. You cannot

tell by observing how “similar” they look. For example, the following two matrices look

completely different, yet they are similar:

A =

[

1 11 1

]

, B =

[

0 00 2

]

.

In fact, a direct computation shows PAP−1 = B, where

P =

[

1 −11 1

]

, and hence P−1 =1

2

[

1 1−1 1

]

.

18

Another example: the matrices

C =

[

0 10 0

]

, D =

[

0 88

0 0

]

are similar. You can check QCQ−1 = D where Q is the diagonal matrix with 88, 1 as the

diagonal entries. To find if two matrices are similar, we use “canonical form” and, to do

this, we begin with solving some eigenvalue problem.

Similar matrices can be regarded as representing matrices for the same operator (rel-

ative to different bases) and hence they are regarded as “essentially” the same. For the

rest of the present section we discuss some basic aspects of similarity of matrices and their

implications to operators.

Notice that similarity is an equivalence relation. This means it obeys the following

three laws of equivalence:

(S1) A ∼ A. (Similarity is reflexive.)

(S2) If A ∼ B, then B ∼ A. (Similarity is symmetric)

(S3) If A ∼ B and B ∼ C, then A ∼ C. (Similarity is transitive.)

Here A,B,C are square matrices of the same size, say n × n. The first law says that

every matrix is similar to itself. This is obvious: you do have A = PAP−1 where P = I.

The second law is also clear, because you can rewrite a relation like A = PBP−1 as

B = QAQ−1 where Q = P−1. The proof of (S3) is a bit more interesting. Suppose A ∼ B

and B ∼ C, that is A = PBP−1 and B = QCQ−1 for some invertible matrices P and

Q. Then A = PBP−1 = P (QCQ−1)P−1 = (PQ)C(Q−1P−1) = (PQ)C(PQ)−1. Hence

A = RCR−1, where R = PQ is an invertible matrix. Therefore A ∼ C.

Similarity preserves algebraic operations of matrices. Suppose A ∼ B. Then we also

have 2A+A3 ∼ 2B+B3. Why? Well, by our assumption, we have A = PBP−1 for some

invertible matrix P . So

2A+A3 = 2PBP−1 + PBP−1PBP−1PBP−1

= 2PBP−1 + PB3P−1 = P (2B +B3)P−1.

More generally, if p(x) is a polynomial, we can substitute x by a square matrix, say

A, to form a new matrix p(A). For example, the matrix 2A + A3 is just q(A), where

q(x) = 2x + x3. Of course there is nothing special about the polynomial q(x) = 2x + x3

and the following statement holds for an arbitrary polynomial p(x):

If A ∼ B, then p(A) ∼ p(B).

A quick example: the matrices

A =

[

1 00 0

]

and B =

[

0 10 0

]

19

are not similar. Why? Well, from A2 = A �= O and B2 = O, we see that A2 and B2 are

not similar; (notice that only O can similar to O.) By our discussion here, A and B cannot

be similar; (otherwise A2 and B2 would be similar.)

2.6. Similar matrices have the same determinant:

A ∼ B implies det(A) = det(B).

To prove this significant fact, we need a basic property of the determinant function det(·):

for n×n matrices C and D, we have det(CD) = det(C) det(D). (Aside: You may describe

this property in words: the determinant function is multiplicative.) Now assume that A

and B are similar matrices, say A = PBP−1 for some invertible matrix P . Then

det(A) = det(PBP−1) = det(P ) det(B) det(P−1)

= det(P ) det(P−1) det(B) = det(PP−1B) = det(B).

So A and B have the same determinant.

What is the upshot of all this? Well, it allows us to define the determinant of a

linear operator (instead of matrix) legitimately. Given an operator, say T , on a (finite

dimensional) vector space V . How should we define its determinant? Well, the natural

definition is to take a basis E of V , look at the matrix representation [T ] E relative to this

basis, and call the determinant of the matrix [T ] E to be the determinant of T . But V

has many different bases; which one should we pick? There is no clear choice. So pick

any one. Now the problem is: if I pick a basis E and call det([T ] E ) the determinant of T ,

and you pick another basis, say F , and call det([T ]F ) the determinant of T , will it ever

happen that my determinant of T differs from yours? This is something we really have to

worry about. Fortunately, such discrepancy never occurs. The reason is, even though our

matrices are different, they are similar and hence they have the same determinant. Now

we can legitimately define the determinant det(T ) of a linear operator T by putting

det(T ) = det([T ]),

where [T ] is the matrix representation of T relative to any basis. The determinant function

for matrices is a similarity invariant, or simply an invariant. This means that the

determinant of a matrix is a quantity unchanged when this matrix is replaced by a similar

one. Figuratively speaking, a matrix representation of an operator is like a disguise. A

different disguise changes the look of an operator. But an invariant is like a characteristic

trait of a person, impossible to hide under any disguise.

Besides the determinant function, there is another important similarity invariant for

matrices, called the trace function. By definition, the trace of a square matrix A, denoted

20

by tr(A) or just tr A, is simply the sum of its diagonal entries. A quick example:

tr

[

1 23 4

]

= 1 + 4 = 5.

♠ Aside: This seems to be rather cheap. In mathematics it is very unlikely that you can

obtain something significant at no cost. Now we would like to show that similar matrices

have the same trace. This is by no means a trivial matter. The effort to prove this is

the price we have to pay. One might try to prove this by establishing an identity for the

trace function similar to the one for determinant function, namely tr(AB) = tr(A) tr(B)

and then follow the same argument as above for the determinant function. But, alas! This

identity is not true. In fact, it is terribly wrong, even under the most favorable condition

that both A and B are diagonal matrices! The saving grace is the following theorem. ♠

Theorem 2.6.1. For n× n matrices A and B, tr(AB) = tr(BA).

We prove this theorem as follows. Let A = [aij ] and B = [bij ]. Then the (i, j) entry

of C ≡ AB is given by cij =∑

k aikbkj . Hence the sum of the diagonal entries of C = AB,

giving us the trace of AB, is

tr(AB) =∑

icii =

∑

i

∑

kaikbki.

Similarly, the trace of BA is tr(BA) =∑

i

∑

k bikaki. The question is: how to see that

two “double sums” above are equal? Of course you can “ expand” them and check. This

method is fool-proof but quite tedious. A better way is: in the second double sum, swap

the names of the subscripts (i is renamed as k and k is renamed as i) and then swap

summation signs:

tr(AB) =∑

i

∑

k

akibik =∑

k

∑

i

aikbki =∑

i

∑

k

aikbki = tr(BA).

Now we return to the proof of the fact that similar matrices have the same trace. Indeed,

if B = PAP−1, then

tr(B) = tr(PAP−1) = tr((PA)P−1) = tr(P−1(PA)) = tr(A).

Since the trace function is a similarity invariant for matrices, we can define the trace of

an operator T by putting tr(T ) = tr([T ]), where [T ] is a matrix representation of T . We

should mention that the trace function is linear: tr(aA + bB) = a tr(A) + b tr(B). This

follows directly from the definition of trace.

We have introduced two invariants for a linear operator, namely, the determinant and

the trace. More invariants will be considered in the next chapter, such as the spectrum

21

and the characteristic polynomial. (♠ Aside: Do not belittle the trace function! It is

widely used in diverse areas such as group characters, which have bearings on chemical

properties of crystals. It has been recently generalized immensely, giving rise to the theory

of so-called “noncommutative integration” and “tracial states” in the algebraic approach

to quantum mechanics. ♠.)

Let T be a linear operator defined on a finite dimensional vector space V . Suppose

that T is diagonalizable, that is, there is a basis B in V consisting of eigenvectors

of T , say vj (1 ≤ j ≤ n) with Tvj = λjvj . Then [T ]B is a diagonal matrix with

λ1, λ2, . . . , λn as its diagonal entries. According to the definition of the trace and the

determinant of an operator, we have

tr = λ1 + λ2 + · · ·+ λn, det(T ) = λ1λ2 · · ·λn.

Actually, the above identities still hold without assuming T diagonalizable. Thus, given a

linear operator on a finite dimensional complex vector space, its trace is the sum of all its

eigenvalues and its determinant is the product of all its eigenvalues (counting multiplici-

ties). For example, the trace and the determinant of

A =

[

1 24 3

]

are tr(A) = 4 and det(A) = −5 respectively. A quick guess gives (−1) + 5 = 4 and

(−1)× 5 = −5. So the eigenvalues of A are −1, 5. Another example: for

B =

[

1 −ii 1

]

we have tr(B) = 2 and det(B) = 0 respectively. A quick guess gives 0+2 = 2 and 0×2 = 0.

So the eigenvalues of B are 0, 2.

EXERCISE SET III.2.

Review Questions. What does it mean by saying that two matrices are similar. Why do

we care about the concept of similarity between matrices? Why the determinant and the

trace of a matrix are its similarity invariants? Am I still intimidated by the summation

sign? From now on, can I handle it with ease?

22

Drills

1. Simplify each of the following expressions:

(a)∑n

k = 1ak +

∑n

j = 1(b− aj).

(b)∑n

k = 1(ak − ak + 1).

(c)∑n

k = 1(ak − ak−1)

(d)∑n

k = 1(ak − ak + 1)(ak + ak + 1)

(e)∑n

k = −nk.

2. In each of the following cases, the matrix of a linear operator T on V relative to a

basis B = {b1,b2} is given, and the representing columns [e1]B and [e2]B (relative to

B) of vectors in the basis E = {e1, e2} are also given. Find the representing matrix

[T ] E of T relative to E in each case.

(a) [e1]B = [1, 1]⊤, [e2]B = [1,−1]⊤, and [T ]B =

[

1 11 1

]

.

(b) E is the same as the one in (a), and [T ]B =

[

1 1−1 1

]

.

(c) [e1]B = [i, 1]⊤, [e2]B = [1, i]⊤ and [T ]B =

[

1 ii −1

]

.

3. Same as the previous question, but with [b1] E , [b2] E given instead of [e1]B, [e2]B.

(a) [b1]E = [1, 0]⊤, [b2] E = [1, 1]⊤ and [T ]B =

[

1 11 1

]

.

(b) [b1]E = [1, 1]⊤, [b2] E = [i,−i]⊤ and [T ]B =

[

0 i−i 0

]

.

4. True or false (S and T are operators on some vector space V and A is a square matrix):

(a) det(ST ) = det(S) det(T ). (b) det(S + T ) = det(S) + det(T ).

(c) tr(S + T ) =tr(S)+tr(T ). (d) tr(ST ) =tr(S)tr(T ).

(e) det(A⊤) = det(A). (f) tr(A⊤) =tr(A).

5. True or false (A,B,C and D are n× n matrices):

(a) If A is similar to B and if C is similar to D, then AC is similar to BD.

(b) If A is similar to B, then A2 is similar to B2.

23

(c) If A is similar to B, the A⊤ is similar to B⊤.

(d) It A is invertible and B is similar to A, then B is also invertible.

(e) If A is similar to the identity matrix I, then A = I.

Exercises

1. Show that the matrices A =

[

1 10 2

]

and B =

[

1 20 2

]

are similar by finding an

invertible matrix P such that PAP−1 = B.

2. Consider the 2-dimensional complex vector space V of functions spanned by sinx and

cosx. For a fixed real number α, define a linear operator T ≡ Tα on V by putting

T (f(x)) = f(x + α). Find the matrices [T ]B and [T ]E of T relative to the bases

B = {cosx, sinx} and E = {cosx + i sinx, cosx − i sinx}. Find an invertible matrix

P implementing the similarity between [T ]B and [T ] E : [T ]B = P [T ]E P−1.

3. Show that, if A = [aij ] is an n× n matrix, then

tr(A⊤A) =∑n

j,k = 1a2jk.

4. Show that the n × n identity matrix I cannot be written in the form AB − BA for

some n× n matrices A and B. (Hint: Use a basic property of tr(·).)

5. Let A and B be n× n matrices. (a) Show that, if A is invertible, then AB is similar

to BA. (b) Show that AB and BA may not be similar in general.

6. Criticize the following “proof” of the statement “if a (square) matrix A is similar to12A, then A = O”. “Proof ”: For simpicity, we write B ∼ C for B is similar to C.

Notice that, if B ∼ C, then 12B ∼ 1

2C. So, from A ∼ 1

2A we obtain 1

2A ∼ 1

4A. In the

same way, we get 14A ∼ 1

8A etc. Hence A ∼ 12nA for all positive integer n. Letting

n → ∞, we get A ∼ O, from which it follows A = O.

7. Let A abd B be n× n matrices with B invertible. Simplify the expression

∑n

k = 1Ak(A−B−1)Bk.

8. Use the summation sign to give a careful inductive proof of the binomial formula

(a+ b)n =∑n

k = 0

(

n

k

)

an−kbk.

24

§3. Basic Spectral Theory

3.1. Spectral Theory is considered as “the heart of the matter” in linear algebra. Here

we only present some basic aspect of this theory which is adequate for most applications.

The full theory, not described here, includes the Jordan canonical form (see Appendix C),

which is substantually more difficult.

Let T be a linear operator on a finite dimensional vector space V over the complex

field C with dimV = n. As we know, the dimension of L (V ), the space of all linear

operators on V , is also finite dimensional with dim L (V ) = n2. Thus, letting N be any

integer with N ≥ n2, the N +1 operators I, T, T 2, . . . , TN must be linearly dependent.

Hence there exist complex numbers a0, a1, . . . , aN , not all zeroes, such that

a0I + a1T + · · · + aNTN = O.

Let p(x) = a0 + a1x + · · · + aNxN . Then p(x) is a nonzero polynomial such that

p(T ) = O. We have proved that, given a linear operator on a finite dimensional space,

there is a nonzero polynomial p(x) such that p(T ) = O.

Among all nonzero polynomials p(x) satisfying p(T ) = O, we pick the one with

the smallest degree, say m, with the leading coefficient one, and denote it by pT (x).

Now, suppose that q(x) is any nonzero polynomial such that q(T ) = O. Divide q(x) by

pT (x) to get q(x) = Q(x)pT (x)+r(x), where r(x) is the remainder, which is a polynomial

of degree less than m, the degree of pT (x). Now q(T ) = Q(T )pT (T ) + r(T ). From

q(T ) = O and pT (T ) = O we get r(T ) = O. Since the degree of r(x) is less than that of

pT (x), it is necessarily the zero polynomial; (otherwise it would be a nonzero polynomial

of lower degree satisfying r(T ) = O, contradicting our choice of pT (x)). Thus we have

q(x) = Q(x)pT (x). In the future we will call pT (x) the minimal polynomial of T . We

have proved that any polynomial q(x) satisfying q(T ) = O is a multiple of the minimal

polynomial pT (x) of T .

Let λ1, λ2, . . . , λr be all roots of pT (x) with multiplicities m1, m2, . . . , mr respec-

tively. Thus pT (x) = (x− λ1)m1(x− λ2)

m2 · · · (x− λr)mr and

(T − λ1I)m1(T − λ2I)

m2 · · · (T − λrI)mr = pT (T ) = O.

Let q1(x) = (x− λ1)m1−1(x− λ2)

m2 · · · (x− λr)mr . Since the degree of q1(x) is less than

the degree of pT (x), we must have q1(T ) �= O. Hence there exists a vector u in V such

that v ≡ q(T )u �= 0. Since (x− λ1)q(x) = pT (x), we have

(T − λ1I)v = (T − λ1I)q(T )u = pT (T )u = 0.

25

This shows that λ1 is an eigenvalue of T . In the same way, we can show λk for any k ≤ r

is an eigenvalue of T . We have proved that the roots of the minimal polynomial of a linear

operator on a finite dimensional complex vector space are eigenvalues of T . In particular,

we have proved that eigenvalues do exist for such an operator. (♠ Remark: If we work in

a field other than C, say R, eigenvalues may not exist.♠)

Next, we use an idea in the proof of Proposition 3.4.1 in Chapter I to investigate

the so–called spectral decomposition of T . Let fk(x) be the polynomial obtained from

pT (x) = (x− λ1)m1(x− λ2)

m2 · · · (x− λr)mr by deleting the factor (x− λk)

mk . Thus we

have (x−λk)mkfk(x) = pT (x) for all k. Clearly, the polynomials f1(x), f2(x), . . . , fr(x)

do not have any common root. So they are coprime polynomials. Hence there exist

polynomials g1(x), g2(x), . . . , gr(x) such that

f1(x)g1(x) + f2(x)g2(x) + · · ·+ fr(x)gr(x) = 1. (3.1.1)

Let pk(x) = fk(x)gk(x) so that p1(x) + p2(x) + · · ·+ pr(x) = 1. Let Pk = pk(T ). The

above identity gives p1(T ) + p2(T ) + · · ·+ pr(T ) = I, or

P1 + P2 + · · ·+ Pr = I. (3.1.2)

Notice that, when j �= k, pT (x) is a factor of fj(x)fk(x) and hence

fj(T )fk(T ) = O, (3.1.3)

which gives PjPk = fj(T )gk(T )fk(T )gk(T ) = O. Multiply both sides of (3.1.2) by Pk.

The right hand side becomes Pk. The left hand side is the sum∑r

j = 1PjPk, where the

term PjPk vanishes if j �= k. Hence this sum becomes P 2k , which gives P 2

k = Pk. We

conclude

P 2k = Pk, and PkPj = O if k �= j. (3.1.4)

Notice that Pk can be expressed as a polynomial in T , namely Pk = pk(T ). From Tpk(T ) =

pk(T )T we get TPk = PkT , or, putting in words, Pk commutes with T . The following

observation is crucial: (T − λkI)mkPk = (T − λkI)

mkpk(T ) = (T − λkI)mkfk(T )gk(T ) =

pT (T )gk(T ) = O. We make another short summary:

TPk = PkT, (T − λkI)mkPk = O. (3.1.5)

The relations (3.1.2), (3.1.4) and (3.1.5) all together give us the spectral decomposition

of T where Pk is called the spectral projection corresponding to the eigenvalue λk. From

the computational point of view, the important step of obtaining the spectral decompo-

sition is to find the polynomials gk(x) so that (3.1.1) holds. This can be done by finding

partial fractions of the rational function 1/pT (x), as shown in the following examples.

26

Example 3.1.1. Consider the operator T =MA on R3 (Tx = Ax), where

A =

1 2 30 1 10 0 2

.

The characteristic polynomial of A is p(x) = (x− 1)2(x− 2). The partial fraction decom-

position of 1/p(x) is

1

(x− 1)2(x− 2)=

1

x− 2−

1

x− 1−

1

(x− 1)2=

1

x− 2−

x

(x− 1)2.

(The basic working knowledge of partial fractions is assumed here.) So 1 = (x−1)2−x(x−

2). Thus, if we let f1(x) = (x − 1)2 and f2(x) = x − 2, then g1(x) = 1 and g2(x) = −x.

Correspondingly P1 = f1(A)g1(A) = (A − I)2 and P2 = f2(A)g2(A) = −A(A − 2I).

After some matrix computation, we arrive at

P1 =

0 0 50 0 10 0 1

, P2 =

1 0 −50 1 −10 0 0

.

As you can check, P 21 = P1, P

22 = P2, P1P2 = P2P1 = O, P1 + P2 = I, (A − 2I)P1 = O,

(A− I)2P2 = O.

Example 3.1.2. Consider the operator T =MA on R3, where

A =

0 2 30 1 10 0 2

.

The characteristic polynomial of A is p(x) = x(x− 1)(x− 2). The partial fraction decom-

position of 1/p(x) is

1

x(x− 1)2(x− 2)=

1/2

x−

1

x− 1+

1/2

x− 2.

So 1 = (1/2)(x − 1)(x − 2) − x(x − 2) + (1/2)x(x − 1). Thus P1 = 12(A − I)(A − 2I),

P2 = −A(A− 2I) and P3 =12A(A− I). Actual computation shows

P1 =

1 −2 −1/20 0 00 0 0

, P2 =

0 2 −20 1 −10 0 0

, P3 =

0 0 5/20 0 10 0 1

.

As you can check, PjPk = δjkPj , P1 + P2 + P3 = I, AP1 = O, AP2 = P2, AP3 = 2P3.

27

3.2. We go back to our discussion of a linear operator T on V with minimal polynomial

pT (x) = (x−λ1)m1(x−λ2)

m2 · · · (x−λr)mr . In a lucky situation, all of the exponents mk

are equal to 1, in other words, the minial polynomial is of the form

pT (x) = (x− λ1)(x− λ2) · · · (x− λr)

where λk (1 ≤ k ≤ r) are distinct roots of pT (x) and thus pT (x) has simple roots (that is,

each root does not repeat itself). Then (3.1.5) says that (T − λkI)Pk = O for all k. This

identity tells us the the range Pk(V ) of Pk is contained in the eigenspace of T corresponding

to the eigenvalue λk. From (3.1.2) we see that each vector v can be expressed as a sum

v =∑r

k = 1Pkvk,

where Pkv is either zero or an eigenvector of T . This tells us that T is diagonalizable, that

is, it has a basis consisting of eigenvectors; (a more detail argument is given in Appendix

B). The converse is also true: if T is diagonalizable, then its minimal polynomial pT (x) has

simple roots. Indeed, if b1, b2, . . . , bn is a basis of eigenvectors and let λ1, λ2, . . . , λrbe the distinct eigenvalues of T , then each bj is annihilated by T − λkI for some k and

hence (T − λ1I)(T − λ2I) · · · (T − λrI)bj = O. Since bj (1 ≤ j ≤ n) span V ,

(T − λ1I)(T − λ2I) · · · (T − λrI)v = O

for all v. Thus p(T ) = O, where p(x) = (x− λ1)(x− λ2) · · · (x− λr) is a polynomial with

simple roots. We have proved:

Theorem 3.2.1. A linear operator T defined on a finite dimensional complex

vetcor space is diagonalizable if and only if its minimal polynomial is of the form pT (x) =

(x−λ1)(x−λ2) · · · (x−λr), where λ1, λ2, . . . , λr is the set of distinct eigenvalues of T .

According to this theorem, to see if an operator T is diagonalizable, we can take the

following two steps: first, find all distinct eigenvalues λ1, λ2, . . . , λr of T ; second, form

the polynomial p(x) = (x− λ1)(x − λ2) · · · (x − λr) and check that if p(T ) = O holds. If

p(T ) = O, the answer is yes; if p(T ) �= O, the answer is no.

Example 3.2.1. Find the condition on a, b, c such that the matrix

A =

1 0 ab 2 c0 0 2

is diagonalizable.

28

Solution. We find that the characteristic polynomial of A is (x− 1)(x− 2)2 and hence

the distinct eigenvalues are 1, 2. Form p(x) = (x− 1)(x− 2). Then

p(A) =

0 0 ab 1 c0 0 1

−1 0 ab 0 c0 0 0

=

0 0 00 0 ab+ c0 0 0

which is the zero matrix if and only if ab+ c = 0. Thus A is diagonalizable if and only if

ab+ c = 0.

If we know that T satisfies p(T ) = O for a polynomial p(x) with simple roots, then T is

diagonalizable. This is because the minimal polynomial pT (x) is a factor of p(x) and hence

it also has simple roots.

Example 3.2.2. An operator P is called a projection if P 2 = P . A projection P

is diagonalizable, since P 2 = P tells us that p(P ) = O, where p(x) = x2 − x = x(x − 1),

which is a polynomial with simple roots. Next, suppose that T is an operator satisfying

Tm = I for some positive integer, then T is diagonalizable because p(T ) = O is satisfied

with p(x) = xm − 1 and p(x) has simple roots: indeed,

xm − 1 = (x− 1)(x− ω)(x− ω2) · · · (x− ωm−1)

with m distinct roots 1, ω, ω2, . . . , ωm−1, where ω = e2πi/m.

3.3. We continue with the general discussion of the spectral decomposition of T and

keep the notation used in subsection 3.1. LetMk be the range of the spectral projection Pk:

Mk = Pk(V ). We call Mk the spectral subspace of T corresponding to the eigenvalue

λk. The identity TPk = PkT (see (3.1.5)) tells us that Mk is invariant for T . Indeed a

vector in Mk := Pk(V ) has the form Pkv and T (Pkv) = Pk(Tv), showing that T (Pkv) is

also in Mk. Thus T sends vectors in Mk to vectors in Mk. So we can define an operator

Tk on Mk by putting Tkv = Tv for all v in Mk. (The operator Tk defined in the

way is called the restriction of T to Mk.) Let Qk = Tk − λkIk, where Ik is the identity

operator on Mk. Then

Tk = λkIk +Qk and Qmk

k = O, (3.3.1)

according to (3.1.5). We call a linear operator Q a nilpotent operator if its certain

power vanishes, that is, Qm = O for some positive integer m. Thus Qk here is a nilpoient

operator on Mk for each k. So, by means of the spectral decomposition, the problem

about the structure of a general operator is boiled down to the one about the structure of

a general nilpotent operator.

29

Example 3.3.1 Cosider the operator D on the space Pn (of polynomials of degrees

at most n) defined by D(p(x)) = p′(x), the derivative of p(x) We have Dn+ 1 = O, which

tells us that D is nilpotent. This can be seen from the fact that D reduces the degree

of a nonconstant polynomial by one and sending constant polynomials to zero. Take any

constant a and let

τk(x) =(x− a)k

k!k = 0, 1, 2, . . . , n.

(By convention, 0! = 1 and hence τ0(x) = 1. The Greek letter τ is pronounced as “tau”.

We choose this letter because of its association with “Taylor”.) Notice that the degree of

τk(x) is k. From this fact it is not hard to deduce that τ0(x), τ1(x), τ2(x), . . . , τn(x) form

a basis of Pn, say T . Notice that, for k ≥ 1,

D(τk(x)) =d

dx

(x− a)k

k!=

k(x− a)k−1

k!=

(x− a)k−1

(k − 1)!= τk−1(x).

This shows that the matrix [D]T of D relative to T is given by

N =

0 1 0 · · · 00 0 0 · · · 00 0 1 · · · 0...0 0 0 · · · 10 0 0 · · · 0

(3.3.2)

One can check directly that Nn+ 1 = O.

Example 3.3.2. Next we study the “difference operator” ∆ on Pn defined by

∆(p(x)) = p(x)− p(x− 1).

We introduce the polynomials

δk(x) =x(x+ 1)(x+ 2) · · · (x+ k − 1)

k!k = 0, 1, 2, . . . , n.

Again, by convention, δ0(x) = 1. Now

∆(δk(x)) =x(x+ 1) · · · (x+ k − 2)(x+ k − 1)

k!−

(x− 1)x(x+ 1) · · · (x+ k − 2)

k!

=x(x+ 1) · · · (x+ k − 2)((x+ k − 1)− (x− 1))

k!

=x(x+ 1) · · · (x+ k − 2)

(k − 1)!= δk−1(x).

30

So the matrix [∆]D of ∆ relative to the basis D = (δ0(x), δ1(x), . . . , δn(x)) is also given

by (3.3.2) above.

In the above two examples, notice that, except for k = 0, we have τk(a) = 0 and

δk(0) = 0. This common feature deserves an effort of addtional discussion as follows. Let

T be a linear operator defined on Pn, B = (p0(x), p1(x), . . . , pn(x)} a basis of Pn such

that T (pk(x)) = pk−1(x) for k ≥ 1 and T (p0(x)) = 0. Furthermore, there is a constant a

such that pk(a) = 0 for all k ≥ 1 and p0(a) = 1. Take any polynomial p(x) in Pn. Since B

is a basis of Pn, we can write p(x) as a linear combination of vectors in B, say

p(x) = a0p0(x) + a1p1(x) + · · ·+ anpn(x) =∑n

k = 0akpk(x). (3.3.3)

(This means [p(x)]B = (a0, a1, . . . , an).) Evaluating at x = a, we get p(a) = a0p0(a) = a0.

So a0 = p(a). Next, apply T to both sides of (3.3.3) to get

Tp(x) = a1p0(x) + a2p1(x) + · · ·+ anpn−1(x) (3.3.4).

Evaluating at x = a, we get Tp(a) = a1. (Here we write Tp(a) for T (p(x)) evaluated

at x = a.) Applying T to (3.3.4), we get T 2p(x) = a2p0(x) + a3p1(x) + · · · + anpn−2(x).

Evaluating at x = a, we get T 2p(a) = a2. Continuing in this manner, we obtain T kp(a) =

ak for all k. Thus we have

p(x) =∑n

k = 0(T kp(a)) pk(x). (3.3.5)

In case T = D and pk(x) = τk(x), Tkp = p(k) (kth derivative of p) and hence (3.3.5)

becomes

p(x) =∑n

k = 0p(k)(a)

(x− a)k

k!

Usually we prefer to write this as

p(x) =n∑

k = 0

p(k)(a)

k!(x− a)k, (3.3.6)

which is Taylor’s formula for polynomials. In case T = ∆ and pk(x) = δk(x), we have

p(x) =n∑

k = 0

∆kp(0)

k!x(x+ 1) · · · (x+ k − 1)

This neat formula has many applications but we have no intention of pursuing this matter.

Now we apply Taylor’s formula (3.3.6) to p(x) = xn. We have

p(k)(x) = n(n− 1) · · · (n− k + 1)xn−k,

31

and hence (3.3.6) gives

xn =n∑

k = 0

n(n− 1) · · · (n− k + 1)an−k

k!(x− a)k =

n∑

k = 0

(

n

k

)

an−k(x− a)k.

Evaluating at x = a+ b, the above identity becomes

(a+ b)n =n∑

k = 0

(

n

k

)

an−kbk

which is the binomial theorem.

3.4. In the present subsection we show how to use the spectral decomposition to

compute the powers Tn and the exponential etT of T . Identity (3.1.2) tells us that

Tn = TnP1 + TnP2 + · · ·+ TnPr and etT = etTP1 + etTP2 + · · ·+ etTPr

So it is enough to consider the restriction Tk of T to the spectral subspace Mk for each k.

Identity (3.3.1) tells us Tk = λkIk +Qk and Qmk

k = O. Thus

etTk = etλketQk = etλk

∑mk−1

j = 0

Qjk

j!tj

In view of (1.5.2) in §1 of the present chapter and the fact that Qjk = O for j ≥ mk. We

can see the pattern better by considering small values of mk, say mk = 1, 2, 3:

(mk = 1) etTk = etλk

(mk = 2) etTk = etλk(Ik +Qkt)

(mk = 3) etTk = etλk

(

Ik +Qkt+Qk

2t2)

For the powers Tnk , we can use the binomial expansion to evaluate

Tnk = (λkIk +Qk)

n =∑mk−1

j = 0

(

n

j

)

λn−jk Qj

k

For small values of mk, we have

(mk = 1) Tnk = λnkIk

(mk = 2) Tnk = λnkIk + nλn−1

k Qk

(mk = 3) Tnk = λnkIk + nλn−1

k Qk + n(n− 1)λn−2k

Q2k

2.

32

Recognizing the patterns here is all what we need for computing Tn and etT , as shown in

the following examples. In the course we only focus on powers of matrices.

Example 3.4.1. Find a closed formula for the powers of A =

[

2 10 3

]

.

Solution. We find that the eigenvalues of A are λ1 = 2 and λ2 = 3 and p(A) = O,

where p(x) = (x− 2)(x− 3). So m1 = m2 = 1. Thus the powers of A has the form

An = 2nX + 3nY

for some matrices X and Y ; (here we write X , Y for P1, P2 since they are treated as

unknowns in equations given below). Setting n = 0 and n = 1, we get

I = X + Y, A = 2X + 3Y

Solving this matrix equation, we obtain

X = 3I −A =

[

1 −10 0

]

Y = A− 2I =

[

0 10 1

]

.

Our final answer is

An = 2n[

1 −10 0

]

+ 3n[

0 10 1

]

=

[

2n 3n − 2n

0 3n

]

.

(♠ Remark: Contrary to some textbooks in linear algebra, there is no need to find eigen-

vectors to answer this kind of questions ♠.)

Example 3.4.2. Find a closed formula for the powers of A =

1 2 20 2 1

−1 2 2

.

Solution. The characteristic polynomial of A is found to be p(z) = (z − 1)(z − 2)2.

We can check that p(A) ≡ (A− 1)(A− 2)2 = O but (A− 1)(A− 2) �= O. Hence A is

not diagonalizable. So we write

An = 1nX + 2nY + n2n−1Z,

where X , Y and Z to be determined. Setting n = 0, 1, 2, we obtain I = X + Y , A =

X +2Y +Z and A2 = X +4Y +4Z. Using the first identity I = X +Y , the others can be

33

written as A = I+Y +Z and A2 = I+3Y +4Z, which gives Y = A2−4A−4I = (A−2I)2.

So

Y = (A− 2I)2 =

−1 2 20 0 1

−1 2 0

2

=

−1 2 0−1 2 01 −2 0

,

X = I − Y =

2 −2 01 −1 0

−1 2 1

and Z = A− Y − I =

−2 4 2−1 2 10 0 0

.

Hence

An =

−1 2 0−1 2 01 −2 0

+ 2n

2 −2 01 −1 0

−1 2 1

+ n2n−1

−2 4 2−1 2 10 0 0

.

Example 3.4.1. Find a closed form of the nth power of A =

14 4 −24 14 2

−2 2 17

.

Solution. The distinct eigenvalues of A are 18 and 9. A brute force computation

shows that p(A) = O for p(x) = (x − 18)(x − 9), which a polynomial with simple roots.

(Notice that A is a real symmetric matrix. It follows from a theorem presented in the next

chapter that A is diagonalizable and hence the minimal polynomial of A has simple roots.)

Thus we may write An = 18nX+9nY , whereX and Y are 3×3 matrices to be determined.

Setting n = 0 and n = 1, we obtain I = X + Y and A = 18X + 9Y respectively. From

these two identities we obtain X = 9−1(A−9 I) and Y = −9−1(A−18 I). Thus we have

An = 18n9−1(A− 9 I)− 9n9−1(A− 18 I) = An = 2n9n−1(A− 9 I)− 9n−1(A− 18 I), or

14 4 −24 14 2

−2 2 17

n

= 2n9n−1

5 4 −24 5 2

−2 2 8

− 9n−1

−4 4 −24 −4 2

−2 2 −1

.

3.5. The rest of this section will be a brief description of some advanced material in

spectral theory. Detailed arguments will be presented in the Appendices at the end of the

present chapter. Mathematically inclined students are encouraged to study them on their

own, for enjoying a truly beautiful piece of mathematics.

Let us recall some notation. Let pT (x) = (x − λ1)m1(x − λ2)

m2 · · · (x − λr)mr be

the minimal polynomial of T , let Pk be the spectral projection and Mk = Pk(V ) be the

spectral subspace, corresponding to the eigenvalue λk. Let Tk be the restriction of T to

Mk. Recall from subsection 3.1 that

P1 + P2 + · · ·+ Pr = I, PjPk = δjkPk, (3.5.1)

34

and

TPk = PkT, (T − λkI)mkPk = O. (3.5.2)

Also recall from (3.3.1) that Tk = λkIk + Qk, with Qmk

k = O. The last identity tells us

that λk is the only eigenvalue of Tk.

Let nk = dimMk. The characteristic polynomial of Tk is det(xIk − T ) = (x− λk)nk .

Take a basis of Mk, say Bk = (b(k)1 , b

(k)2 , . . . , b

(k)nk

). Putting all Bk together, say:

B = (b(1)1 , b

(1)2 , . . . , b(1)

n1, b

(2)1 , b

(2)2 , . . . , b(2)

n2, . . . , b

(r)1 , b

(r)2 , . . . , b(r)

nr),

which can be shown to be a basis of V . The matrix [T ]B representing T relative to B is a

block diagonal matrix with blocks [T1]B1, [T2]B2

, . . . , [Tr]Bralong its diagonal:

[T ]B =

[T1]B1O · · · O

O [T2]B2· · · O

...

O O · · · [Tr]Br

.

Thus the characteristic equation of T is

cT (x) = det[xI − T ] = det[xI1 − T1]B1det[xI2 − T2]B2

· · · det[xIr − Tr]Br

= (x− λ1)n1(x− λ2)

n2 · · · (x− λr)nr .

The positive integer nk, which is the dimension of the spectral subspace Mk and appears

as the power of the factor x− λk in the characteristic polynomial, is called the algebraic

multiplicity of λk. From the fact that Qk is a nilpotent operator on Mk and mk is the

least positive integer satisfying Qmk

k = O, it can be proved that mk ≤ nk. This shows that

the minimal polynomial

pT (x) = (x− λ1)m1(x− λ2)

m2 · · · (x− λr)mr

is a factor of cT (x). Thus, from pT (T ) = O we obtain cT (T ) = O. We have arrived at the

following celebrated:

Cayley–Hamilton Theorem If p(x) is the characteristic polynomial of a linear

operator T on a finite dimensional vector space, then p(T ) = O.

The deepest (and the hardest) result in linear algebra is perhaps the Jordan canon-

ical form theorem. It says that, relative to an appropriate basis in V , the matrix [T ]

35

representing T is a block diagonal matrix with blocks of the form

J =

λ 1 0 · · · 0 00 λ 1 · · · 0 00 0 λ...

0 0 0 · · · λ 10 0 0 · · · 0 λ

along the diagonal, where λ is an eigenvalue of T . A matrix of the form J above is called

a Jordan block. It turns out that the dimension of the eigenspace ker(λkI − T ) is the

number of Jordan blocks with λk as its eigenvalue in [T ]. Also, mk is the maximal size

of the Jordan blocks with λk as its eigenvalue in [T ]. Thanks to the Jordan canonical

form, the structure of a linear operator on a finite dimensional complex vector space is

considered to be completely understood.

EXERCISE SET III.3.

Review Questions. What is the minimal polynomial of a linear operator T (on a finite

dimensional space)? Why is the spectrum σ(T ) equal to the set of all roots of the minimal

polynomial of T? What is the spectral decomposition of T ? Am I able to state all

important facts about spectral decomposition described in this section? How do I take

advantage of these facts to compute the powers and the exponential of T in an efficient

way (without finding eigenvectors)? What is a nilpotent operator? What is the Jordan

canonical form?

Drills

1. Find the spectral decomposition for each of the following matrices:

(a)

[

1 20 3

]

, (b)

[

1 11 1

]

, (c)

[

1 31 −1

]

(d)

[

0 1 + i1− i 0

]

.

(e)

0 1 10 1 10 0 0

(f)

0 1 00 0 01 2 1

(g)

−1 0 21 0 −20 1 2

.

2. Given the minimal polynomial pT (x) of an operator T in each of the following cases,

determine if T is diagonalizable:

36

(a) pT (x) = x2 + 2x (b) pT (x) = x2 + 2x+ 1 (c) p(x) = x2

(d) pT (x) = x2 + 1 (e) pT (x) = x2 − 3x+ 2

3. Use Taylor’s formula to express the polynomial 2x2 + x+ 1 in the form

a2(x− 1)2 + a1(x− 1) + a0.

Then use you answer to find the partial fraction decomposition

2x2 + x+ 1

(x− 1)3=

c1x− 1

+c2

(x− 1)2+

c3(x− 1)3

.

(In this question you are asked to find the constants a0, a1, a2, c1, c2, c3.)

4. True or false:

(a) The minimal polynomial of an operator divides its characteristic polynomial.

(b) If an operator T on a vector space V of dimension n has n distinct eigenvalues,

then its minimal polynomial is equal to its characteristic polynomial.

(c) If ST = TS, then the range of S is invariant for T .

(d) A projection is diagonalizable. (Recall that a projection is an operator P satis-

fying P 2 = P .)

(e) A Jordan block is diagonalizable.

5. In each of the following cases, prove that the given matrix is not diagonalizable; (you

may either use your knowledge about minimal polynomial or use the first principle.)

(a)

[

1 10 1

]

, (b)

[

1 1−1 −1

]

, (c)

[

1 ii −1

]

.

6. In each of the following cases, use the method described in the present section to find

an explicit expression for the nth power of the given matrix.

(a)

[

1 20 3

]

, (b)

[

1 11 1

]

, (c)

[

1 31 −1

]

(d)

[

1 34 2

]

, (e)

[

0 1 + i1− i 0

]

.

(f)

0 1 10 1 10 0 0

(g)

0 1 00 0 01 2 1

(h)

−1 0 21 0 −20 1 2

.

Exercises

37

1. Prove that if T is an operator such that T 2 = O but T �= O, then T is not

diagonalizable. Try to do this without using minimal polynomial.

2. Let T be a linear operator on a complex vector space V (not necessarily finite di-

mensional) and let λ be a complex number. (a) Show that the ranges of (T − λI)n

(n = 1, 2, . . .) form a decreasing sequence:

(T − λI)(V ) ⊇ (T − λI)2(V ) ⊇ (T − λI)3(V ) ⊇ · · ·

(b) Show that if

(T − λI)k(V ) = (T − λI)k + 1(V )

for some k, then (T − λI)n(V ) = (T − λI)k(V ) for all n ≥ k.

3. Let T be a linear operator on a complex vector space V (not necessarily finite di-

mensional) and let λ be a complex number. (a) Show that the kernels of (T − λI)n

(n = 1, 2, . . .) form a increasing sequence:

ker(T − λI) ⊆ ker(T − λI)2 ⊆ ker(T − λI)3 ⊆ · · ·

(b) Show that if

ker(T − λI)k = ker(T − λI)k + 1

for some k, then ker(T − λI)n = ker(T − λI)k for all n ≥ k.

4. Let T be a linear operator on a finite dimensional complex vector space V and let λ

be an eigenvalue of T . Prove that the spectral subspace corresponding to λ is equal

to the set

Mλ = {v ∈ V : (T − λI)nv = 0 for some positive integer k}.

38

Appendices for Chapter III

Appendix A*: cyclicity and companion matrices

In the present appendix we establish the Jordan canonical form theorem for a linear

operator T on an n–dimensional space V with a cyclic vector v0: the vectors

v0, Tv0, T2v0, T

3v0, T4v0, . . . (A1)

span the whole space V . Under this cyclicity assumption, the heavy technicality for study-

ing the structure of operators in the general case is avoided, while it still shares many

important general features. Furthermore, this special case is important in many applica-

tions. For example, in linear control theory, the state space model of a controllable system

with a single input/single output is essentially a linear operator with a cyclic vector.

Since V is finite dimensional, the vectors in (A1) above must be linearly dependent.

It turns out that n, the dimension of V , is the smallest number for which the vectors

v0, Tv0, . . . , Tnv0 are linearly dependent. (Exercise A1. Prove this assertion.) Conse-

quently the vectors v0, Tv0, T2v0, . . . , T

n−1v0 form a basis B of V . The vector Tnv0

can be expressed as a linear combination of of these basis vectors, say

Tnv0 = a0v0 + a1Tv0 + a2T2v0 + · · ·+ an−1T

n−1v0.

The representation matrix [T ]B of T relative to this basis now can be determined:

[T ]B =

0 0 0 · · · 0 0 a01 0 0 · · · 0 0 a10 1 0 · · · 0 0 a20 0 1 · · · 0 0 a3

0 0 0 · · · 1 0 an−2

0 0 0 · · · 0 1 an−1

. (A2)

A matrix of the above form (or its transpose) is called a companion matrix. We have

seen that an operator with a cyclic vector can be represented by a companion matrix of the

form (A2) above. It turns out that the converse of this statement is also true. (Exercise

A2: Prove the last assertion.) The polynomial

pT (x) = xn − an−1xn−1 − · · · − a2x

2 − a1x1 − a0

is the minimal polynomial as well as the characteristic polynomial of T . (Exercise A3:

Prove this.) A linear operator with a cyclic vector is essentially determined by its char-

acteristic polynomial: if both S ∈ L (U) and T ∈ L (V ) have cyclic vectors and have the

39

same characteristic polynomial, then S and T are similar, that is, there is an isomorphism

P from U onto V such that T = PSP−1. (Exercise A4: Prove this assertion.)

If we assume that T is nilpotent, then its characteristic polynomial becomes p(x) = xn

and hence its representation matrix (C2) becomes

[T ] =

0 0 0 0 · · · 0 01 0 0 0 · · · 0 00 1 0 0 · · · 0 00 0 1 0 · · · 0 0

0 0 0 0 · · · 0 00 0 0 0 · · · 1 0

.

More generally, if the spectrum an operator of T consists of a single point λ and T has a

cyclic vector v0, then S = T −λI is nilpotent and v0 is also a cyclic vector for S. Relative

to the basis B = {v0, Sv0, S2v0, . . . , S

n−1v0}, the matrix representing T is

[T ] =

λ 0 0 0 · · · 0 01 λ 0 0 · · · 0 00 1 λ 0 · · · 0 00 0 1 λ · · · 0 0

0 0 0 0 · · · λ 00 0 0 0 · · · 1 λ

. (A3)

(Exercise C4: prove this.) A matrix of the above form is called a Jordan block.

Suppose that λ1, λ2, . . . , λr are distinct eigenvalues of T and let Pk and Vk = Pk(V )

be the spectral projection and the spectral subspace respectively corresponding to λk. Let

Tk be the restriction of T to Vk. Again, we assume that v0 is a cyclic vector for T . Then

one can prove that the vector Pkv0 in Vk is a cyclic vector for Tk. (Exercise A5: prove

this.) Notice that the spectrum σ(Tk) consists of a single point λk and hence we can choose

a basis Bk of Vk relative to which the representation matrix [Tk]Bkis a Jordan block, say

Jk. Let B = B1 ∪ B2 ∪ · · · ∪ Br. Then B is a basis of V and

[T ]B =r

⊕

k = 1

[T ]Bk=

r⊕

k = 1

Jk ≡

J1 O O · · · OO J2 O · · · OO O J3 · · · O

O O O · · · Jr

.

We have proved the Jordan canonical theorem for the cyclic case.

40

Appendix B*: operator-valued functions and linear ODEs

Let us consider differentiation of operator–valued or matrix–valued functions of one

variable. Let V be a finite dimensional vector space and let Φ(t), Ψ(t) be linear operators

on V depending on one variable t. Assume that both of them are differentiable as a function

of t, that is, the limits

Φ′(t) = limh → 0

1

h(Φ(t+ h)− Φ(t)) and Ψ′(t) = lim

h → 0

1

h(Ψ(t+ h)−Ψ(t))

exist. Then, just like the usual product rule, we have

(ΦΨ)′ = ΦΨ′ +Φ′Ψ (∗)

except that we must be very careful about the order presenting the right hand side because

in general ΦΨ and ΨΦ are not the same. The validity (∗) can be shown in the usual way:

for h �= 0, we have

Φ(t+ h)Ψ(t+ h)− Φ(t)Ψ(t)

h= Φ(t+ h)

Ψ(t+ h)−Ψ(t)

h+

Φ(t+ h)−Φ(t)

hΨ(t)

and, letting h→ 0, we get the desired identity.

Suppose furthermore that Φ(t) is invertible. We expect ddtΦ(t)

−1 = −Φ(t)−2Φ′(t)

but unfortunately this is incorrect! To get the correct one, we begin by writing down

Φ(t)Ψ(t) = I, where Ψ(t) = Φ(t)−1. Differentiating both sides and using the product

rule, we have ΦΨ′ +Φ′ Ψ = O. Multiplying by Φ−1, we have Ψ′ +Φ−1Φ′Ψ = O, which

gives Ψ′ = −Φ−1Φ′Ψ = −Φ−1Φ′Φ−1. So we have

d

dtΦ(t)−1 = −Φ(t)−1Φ′(t)Φ(t)−1.

As you can tell, differentiation of a operator–valued or matrix–valued function some-

times is a very touchy business. Let me mention another difficult situation. When

AB = BA, we do have (d/dt)eAt+ B = AeAt+ B , as we expect. But when AB �= BA,

this is no longer true; however there is something called Campbell–Hausdorff–Baker for-

mula to handle the derivative of eAt+ B, which is too complicated to describe here.

Now we use operator–valued function to describe a (time–dependent) linear dynamical

system. We use a vector space V to model such a system. We call it the state space

and each vector in V a state. Assume that the dynamics of the system is governed by a

differential equation of the form

dy

dt= A(t)y + f(t), (B1)

41

where f(t) is an ‘input” or an external force. The corresponding homogeneous equation is

dy

dt= A(t)y. (B2)

Let s be a real number that stands for “starting time” and let y0 be a vector that stands

for the “initial state”. Then, under some very some general condition, there is a unique

solution to (B2) satisfying the initial condition

y(s) = y0. (B3)

For studying this initial value problem, it is convenient to introduce the operator equation

d

dtΦ(t, s) = A(t)Φ(s, t) with Φ(s, s) = I. (B4)

A solution to (B4) gives rise to a solution to (B3) by putting y(t) = Φ(t, s)y0. The

operator-valued function Φ(t, s), called a flow, satisfies the following identities which sig-

nify the evolution or the dynamics of the system:

Φ(t, s)Φ(s, r) = Φ(t, r), Φ(s, s) = I (B5)

Here we mention that, if the system is time independent, or, in another word, stationary,

meaning that the “generator” A(t) in (B1) or (B2) is independent of t, then Φ(t, s) depends

only on the difference t− s and we can write Φ(t, s) = Ψ(s− t) for some operator–valued

function Ψ(t). In that case (B5) becomes

Ψ(s)Ψ(t) = Ψ(s+ t), Ψ(0) = I.

Equation (B4) can be converted into an integral equation:

Φ(t, s) = I +

∫ t

s

A(u)Φ(u, s) du.

One way to solve this is to define a sequence {Φn(t, s)}n≥0 recursively by putting Φ0(t, s) =

I and Φn+ 1(t, s) = I +∫ t

s A(u)Φn(u, s) du. It can be proved that this sequence converges

to some function Φ(t, s) which is a solution to (B4). In the stationary case. it can be

easily checked that

Ψn(t) = Φn(t, 0) =∑n

k = 0

Aktk

k!

which converges to eAt as n → ∞. Clearly, the sequence yn(t) = Φn(t, s)y0 (n ≥ 0)

of vector–valued function converges to y(t) = Φ(t, s)y0, which is the solution to (B2)

satisfying the initial condition (B3).

42

There is a trick to solve the initial value problem for nonhomogeneous equation (B1),

called “variation of constant” and it works as follows. In the solution Φ(t, s)y0 to the

homogeneous equation, replace the “constant” y0 by a function of t, say z(t). Thus we

are looking for the solution of the form Φ(t, s)z(t) to (B1) satisfying the initial condition

(B2). Substitute y(t) = Φ(t, s)z(t) into (B1):

dΦ(t, s)

dtz(t) + Φ(t, s)

dz(t)

dt= A(t)Φ(t, s)z(t) + f(t).

In view of (B4), we have

dz(t)

dt= Φ(t, s)−1f(t) = Φ(s, t)f(t). (B6)

Notice that Φ(s, s)z(s) = y(s) = y0 and hence z(s) = y0. Thus (B6) gives

z(t) = y0 +

∫ t

s

Φ(s, u)f(u) du.

So

y(t) = Φ(t, s)z(t) = Φ(t, s)y0 +

∫ t

s

Φ(t, s)Φ(s, u)f(u) du

= Φ(t, s)y0 +

∫ t

s

Φ(t, u)f(u) du.

Thus the solution to (B1) satisfying the initial; condition (B2) is given by

y(t) = Φ(t, s)y0 +

∫ t

s

Φ(t, u)f(u) du.

Now you can see the advantage of using linear operators here for solving linear ODEs: in

complete generality presented in the simplest way!

43

chapteriii - people.math.carleton.ca

Documents