chapteriii - people.math.carleton.ca
TRANSCRIPT
CHAPTER III
SPECTRA OF OPERATORS
In this chapter, we investigate the central problem in linear algebra: the eigenvalue
and eigenvector problem. The importance of this problem can be understood from a purely
mathematical point of view: it is the gateway leading to our understanding of the structure
of a linear operator. It is also needed for understanding our physical world. We can tell if
a star million light years away is mainly composed of hydrogen atoms by reading through
a spectroscope the “eigenvalues” of the Schrodinger operator for hydrogen from the light
it emits. When a bell shows cracks, it starts to sound dull because of the decrease in
each eigenvalue of certain operator. Working in natural science or engineering research,
we should always be prepared to encounter eigenvalue problems.
§1. Eigenvalues and Eigenvectors
1.1. As we know, given a linear operator T on a finite dimensional space V ,
we can convert it into a matrix by using a basis of V . However, different bases give
different matrix representation of T . Naturally we wonder, can we pick a basis so that the
corresponding matrix representing T is a diagonal matrix, which is considered to be the
simplest? Certainly, diagonal matrices are easy for computation purposes and an answer
to this question has practical values. Suppose that a basis B in V consisting of vectors
v1, v2, . . . , vn is judicially chosen so that the matrix representing T relative to B is
diagonal, say
[T ]B =
λ1 0 · · · 00 λ2 · · · 0...
......
0 0 · · · λn
.
This means Tvj = λjvj , or (T −λjI)vj = 0 for all j = 1, 2, . . . , n. This identity indicates
that λj is an eigenvalue of T and vj is a corresponding eigenvector.
Definition 1.1. By the spectrum of an operator T on a finite dimensional complex
vector space V , denoted by σ(T ), we mean the set of all complex numbers λ such that
T −λI is not invertible. W e know that an operator on a finite dimensional vector space is
1
not invertible if and only if its kernel is nonzero. Therefore we may put
σ(T ) = {λ ∈ C : ker(T − λI) �= {0} }.
A complex number λ in the spectrum σ(T ) is also called an eigenvalue of T . By our
definition here, if λ is an eigenvalue λ of T , the subspace ker(T − λI) is nonzero. This
subspace is called the eigenspace corresponding to λ. A nonzero vector in this eigenspace
is called an eigenvector of T corresponding to the eigenvalue λ.
Notice that
v ∈ ker(T − λI) ⇔ (T − λI)v = 0 ⇔ Tv = λv.
Hence we have:
λ ∈ σ(T ) if and only if Tv = λv holds for some nonzero vector v.
♠ Aside: The importance of the word “nonzero” in the above statement cannot be overem-
phasized. If it were dropped, the statement would become absolute nonsense, because
Tv = λv is always satisfied for some vector v, namely, 0. ♠
1.2. How do we find eigenvalues? Take any basis E in V and consider the matrix
representation [T − λI] := [T ]− λI of T − λI relative to this basis; (here we use the same
symbol for the identity operator as well as the identity matrix). Now λ is an eigenvalue
of T means that T − λI is not invertible, and descending to matrices, this means that
[T ]− λI is not invertible, or λI − [T ] is not invertible. But we know that a matrix is not
invertible if and only if its determinant is zero. Hence λ is an eigenvalue of T if and only
if it satisfies:
det(λI − [T ]) = 0. (1.2.1)
If the dimension of the space V is n, i.e. if [T ] is an n × n matrix, then (1.2.1) is a
polynomial equation in λ of degree n. It is called the characteristic equation of the
matrix [T ], or of the operator T . The expression det(λI − [T ]), which is a polynomial in
λ of degree n, (n = dimV ), is called the characteristic polynomial of the operator T .
Example 1.2.1. Define an operator T on P2 by
T (p(x)) = xp′(x)− p(x+ 1).
2
Take the standard basis B = {1, x, x2} of P2. To find the matrix [T ]B, we compute:
T (1) = 0− 1 = −1,
T (x) = x− (x+ 1) = −1,
T (x2) = x(2x)− (x+ 1)2 = x2 − 2x− 1.
Hence
[T ] ≡ A =
−1 −1 −10 0 −20 0 1
, and det([T ]− λI) =
∣
∣
∣
∣
∣
∣
−1− λ −1 −10 −λ −20 0 1− λ
∣
∣
∣
∣
∣
∣
.
Thus the characteristic equation of T is
(−1− λ)(−λ)(1− λ) = 0
and hence the eigenvalues of T are 1, 0,−1. In other words, σ(T ) = {1, 0,−1}. Next we
find an eigenvector of T for each eigenvalue. First consider the eigenvalue λ = 1. An
eigenvector corresponding to λ = 1 is a polynomial p ≡ p(x) such that (T − λI)(p) = 0.
This gives [T − λI][p] = 0, or ([T ]− λI)[p] = 0. Here [T ] is the matrix A given above, λ is
1, and [p] is a column, say [p] = X = [x1, x2, x3]⊤. Thus we have (A− I)X = O, i.e.
−2 −1 −10 −1 −20 0 0
x1x2x3
=
000
.
This matrix equation gives the following homogeneous system of linear equations:
(−2)x1 + (−1)x2 + (−1)x3 = 0
(−1)x2 + (−2)x3 = 0
We only need to find one nontrivial solution to this equation. This is easy to do. Set x3 = 1.
Then, from the second equation, we have x2 = (−2)x3 = −2. From the first equation,
we have x1 = (1/2)(−x2 − x3) = (1/2)(−(−2) − 1) = 1/2. Thus X = [1/2,−2, 1]⊤ is
a solution. To get a neater expression, we multiply this solution by 2 to obtain another
solution X1 = [1,−4, 2]⊤. The polynomial p1(x) with X1 = [1,−4, 2]⊤ as its column
representation relative to the standard basis {1, x, x2} is p1(x) = 1−4x+2x2. (Aside. We
can check that this polynomial is indeed an eigenvector corresponding to λ = 1:
(T − I)(p1) = xp′1(x)− p1(x+ 1)− p1(x)
= x(−4 + 4x)− (1− 4(x+ 1) + 2(x+ 1)2)− (1− 4x+ 2x2)
= 0. end of Aside)
3
In the same way, we find an eigenvector p2(x) = −1 + x corresponding to λ = 0 and
an eigenvector p3(x) = 1 corresponding to λ = −1. It is easy to see that p1(x), p2(x)
and p3(x) are linearly independent. (This fact is not accidental: in the next section
we will prove that eigenvectors corresponding to distinct eigenvalues are always linearly
independent.) Since dimP2 = 3, three linearly independent “vectors” in P2 form a basis
of P2. Therefore
B = {p1, p2, p3} ≡ {1− 4x+ 2x2, −1 + x, 1}
is a basis of P2. From T (p1(x)) = p1(x), T (p2(x)) = 0 and T (p3(x)) = −p3(x), we get a
diagonal [T ]B:
[T ]B =
1 0 00 0 00 0 −1
.
Thus we can say: the basis B diagonalizes the operator T .
1.3. The key to deal with the diagonalization problem is the following
Theorem 1.3.1. If v1,v2, . . . ,vr are eigenvectors of a linear operator T on a complex
vector space corresponding to eigenvalues λ1, λ2, . . . , λr respectively and if these eigenvalues
λ1, λ2, . . . , λr are all distinct, then the vectors v1,v2, . . . ,vr are linearly independent.
We prove this theorem by induction on is the number of given eigenvectors r. In
case r = 1, we consider only one vector v1. This vector is an eigenvector of T . Since
an eigenvector by definition is a nonzero vector, the single vector v1 forms a linearly
independent set.
Now we make the induction hypothesis: the statement is true for r = k. We consider
the situation r = k+1. Thus, we are provided with k+1 eigenvectors v1,v2, . . . ,vk,vk + 1
corresponding to distinct eigenvalues λ1, λ2, . . . , λk, λk + 1:
Tv1 = λ1v1, . . . , Tvk = λkvk, Tvk + 1 = λk + 1vk + 1. (1.3.1)
To show that these k + 1 eigenvectors are linearly independent, we set
a1v1 + a2v2 + · · ·+ akvk + ak + 1vk + 1 = 0. (1.3.2)
We have to show that a1, a2, . . . , ak + 1 all equal zeros. To this end, we apply operator T to
this identity and use the relation (1.3.1) for obtaining
a1λ1v1 + a2λ2v2 + · · ·+ akλkvk + ak + 1λk + 1vk + 1 = 0.
4
Subtract this identity from λk + 1 times (1.3.2), noticing that the last term on the left-hand
side is canceled out:
a1(λ1 − λk + 1)v1 + a2(λ2 − λk + 1)v2 + · · ·+ ak(λk − λk + 1)vk = 0.
Since v1,v2, . . . ,vk are eigenvalues of T with respect to distinct eigenvalues λ1, λ2, . . . , λkrespectively, these k eigenvectors, by our induction hypothesis, are linearly independent.
Thus the above identity entails
a1(λ1 − λk + 1) = 0, a2(λ2 − λk + 1) = 0, . . . , ak(λk − λk + 1) = 0.
We have assumed that λ1, λ2, . . . . . . , λk, λk + 1 are distinct. In particular
λ1 − λk + 1 �= 0, λ2 − λk + 1 �= 0, . . . , λk − λk + 1 �= 0.
Therefore we must have a1 = 0, a2 = 0, . . . , ak = 0. It remains to show ak + 1 = 0.
Return to (1.3.2). We can rewrite this identity as ak + 1vk + 1 = 0. Since vk + 1 is nonzero
(because it is an eigenvector), we also have ak + 1 = 0. The proof is complete.
Theorem 1.3.2. If a linear operator T defined on an n-dimensional complex vector
space V has n distinct eigenvalues, then T is diagonalizable, that is, there exists a basis B
consisting of eigenvectors of T so that the representing matrix [T ]B of T relative to B is a
diagonal matrix.
Let λ1, λ2, . . . , λn distinct eigenvalues of T and let v1,v2, . . .vn be their corresponding
eigenvectors: Tv1 = λ1v1, Tv2 = λ2v2, . . . , Tvn = λnvn. By Theorem 1.3.1, we know
that v1,v2, . . . ,vn are linearly independent. Since n = dimV , these vectors form a basis
of V . So the above theorem is valid.
1.4. Let A be an n× n complex matrix. We may regard A as a linear operator on
Cn; (that is, we identify A with the linear operator MA defined by MAx = Ax for all
x in Cn). Assume that A has n distinct eigenvalues λ1, λ2, . . . , λn with corresponding
eigenvectors P1, P2, . . . , Pn (which are column vectors). According to Theorem 1.3.1,
these column vectors in linearly independent. Let us write
AP1 = P1λ1, AP2 = P2λ2, . . . , APn = Pnλn.
Here, certainly Pkλk is the scalar multiple of the column vector Pk by λk. The reason
we write in this way instead of λkPk is because Pk is regarded as an n × 1 matrix and
λk as an 1 × 1 matrix. The correct order here is crucial for performing block matrix
5
multiplication below. Let P be the n × n matrix [P1 P2 · · · Pn]. The matrix P is
invertible, since it is a square matrix and its columns are linearly independent. Now
AP = A[P1 P2 · · · Pn] = [AP1 AP2 · · · APn]
= [P1λ1 P2λ2 · · · Pnλn] = [P1 P2 · · · Pn]
λ1 0 · · · 00 λ2 · · · 0...
0 0 · · · λn
= PD,
Thus AP = PD, where D is a diagonal matrix with eigenvalues of A along its diagonal,
or A = PDP−1, which gives a diagonalization of matrix A.
Example 1.4.1. In this example we show how diagonalization helps for solving linear
differential equations. We are asked to find a general solution to the system of equations
dy1dt
= y1 + y2,dy2dt
= 2y1.
We can rewrite this system as
dy
dt= Ay with y =
[
y1y2
]
, A =
[
1 12 0
]
.
Using the method described in Example 1.2.1, we find the eigenvalues 2, −1 of A with
corresponding eigenvectors P1 = (1, 1), P2 = (1,−2). Let
P =
[
1 11 −2
]
and D =
[
2 00 −1
]
with P−1 = −1
3
[
−2 −1−1 1
]
.
Then we have AP = PD, as we can check directly. Replacing A in dy/dt = Ay by
PDP−1, we have dy/dt = PDP−1y, or dP−1y/dt = DP−1y. Let w = P−1y. Then
dw/dt = Dw and y = Pw. Thus we have
dw1
dt= 2w1,
dw2
dt= −w2 with y1 = w1 +w2, y2 = w1 − 2w2.
The new system of differential equations is easy to solve: w1 = C1e2t, w2 = C2e
−t. Our
final answer is
y1 = C1e2t +C2e
−t, y2 = C1e2t − 2C2e
−t.
(The reader should check this answer.)
Example 1.4.2. Consider the system of difference equations
un + 1 = un + vn, vn+ 1 = 2un; n ≥ 0.
6
We can rewrite this system as
yn+ 1 = Ayn with yn =
[
unvn
]
, A =
[
1 12 0
]
.
We have y1 = Ay0, y2 = Ay1 = A2y0, y3 = Ay2 = A3y0, etc. In general, yn = Any0.
So, in order to find the general solution to this system of difference equations, we need to
give an explicit expression for An. Using the method described in Example 1.2.1, we find
A = PDP−1, where
P =
[
1 11 −2
]
, D =
[
2 00 −1
]
and P−1 = −1
3
[
−2 −1−1 1
]
.
Now
An = (PDP−1)n = PDP−1PDP−1PDP−1 · · ·PDP−1 = PDnP−1.
So
An = −1
3
[
1 11 −2
] [
2n 00 (−1)n
] [
−2 −1−1 1
]
=1
3
[
2n+ 1 + (−1)n 2n − (−1)n
2n+ 1 − 2(−1)n 2n + 2(−1)n
]
Thus [un vn]⊤ = yn = Any0 gives
un =2n+ 1 + (−1)n
3u0 +
2n − (−1)n
3v0,
vn =2n+ 1 − 2(−1)n
3u0 +
2n + 2(−1)n
3v0.
Example 1.4.3. An elevator in Herzberg Building has two states: s1 for “working”
and s2 for “out of order”. Let pij denote the probability of being in state i on the next day,
when today’s elevator is in state j. A student trying to take this elevator everyday comes
up with the following subjective probabilities: p11 = 0.5, p21 = 0.5, p12 = 0.1 p22 = 0.9
after a year of observation. Let
P =
[
p11 p12p21 p22
]
=
[
0.5 0.10.5 0.9
]
To find the frequency of the elevator in working condition, we need to compute limn→ ∞ Pn.
Following the method described in Example 1.2.1, we find eigenvalues 1, 0.4 with corre-
sponding eigenvectors (1, 5), (1,−1). Then we have P = SDS−1, where
S =
[
1 15 −1
]
, D =
[
1 00 0.4
]
and S−1 =1
6
[
1 15 −1
]
.
7
As n → ∞,
Pn = SDnS−1 =1
6
[
1 15 −1
] [
1 00 0.4n
] [
1 15 −1
]
tends to1
6
[
1 15 −1
] [
1 00 0
] [
1 15 −1
]
=
[
1/6 1/65/6 5/6
]
.
So, on the average, about 1/6 of time the elevator is in working condition. This seems to
fit the student’s experience over the year.
1.5. We have seen that, if A = PDP−1, then An = PDnP−1. More generally, if
p is a polynomial, then p(A) = Pp(D)P−1. This suggests the definition of f(A) for any
function f (defined on the spectrum σ(A) of A) by putting f(A) = Pf(D)P−1, where
f(D) =
f(λ1) 0 · · · 00 f(λ2) · · · 0
0 0 · · · f(λn)
for D =
λ1 0 · · · 00 λ2 · · · 0
0 0 · · · λn
.
Computing f(A) is called functional calculus of A. Besides polynomial, another com-
monly used functions for functional calculus is ft(x) = ext, where t is a parameter. We
will write ft(A) as eAt in the future.
Consider the initial value problem dy/dt = Ay with y(0) = y0, where y0 is a given
vector in Cn. Formally we can write down the solution as
y(t) = eAty0. (1.5.1)
The 1–dimensional case is well known: the initial value problem dy/dt = ay with y(0) = y0is y(t) = eaty0. It is known that the Taylor expansion for the exponential function eat is
given by eat =∑∞
n = 0antn/n!. Analogously we have
eAt =∞∑
n= 0
An
n!tn = I +At+
A2
2!t2 +
A3
3!t3 + · · · . (1.5.2)
Example 1.5.1. From Example 1.4.2 we see that
An =1
3
[
2.2n + (−1)n 2n − (−1)n
2.2n − 2(−1)n 2n + 2(−1)n
]
where A =
[
1 12 0
]
.
8
So, from (1.5.2) we find that
eAt =1
3
[
2∑
2n
n! tn+
∑ (−1)n
n! tn∑
2n
n! tn−
∑ (−1)n
n! tn
2∑
2n
n! tn− 2
∑ (−1)n
n! tn∑
2n
n! tn+ 2
∑ (−1)n
n! tn
]
=1
3
[
2e2t+ e−t e2t− e−t
2e2t− 2e−t e2t+ 2e−t
]
.
Here we simply write∑
for∑∞
n= 0. Hence we have
[
y1y2
]
= y = eAty0 =1
3
[
(2e2t + e−t)y1(0) + (e2t − e−t)y2(0)(2e2t − 2e−t)y1(0) + (e2t + 2e−t)y2(0)
]
.
The reader should compare this answer to the general solution in Example 1.4.1, namely
y1 = C1e2t + C2e
−t, y2 = C1e2t − 2C2e
−t. Setting t = 0, we have y1(0) = C1 + C2 and
y2(0) = C1 − 2C2, which gives C1 = (2y1(0) + y2(0))/3, C2 = (y1(0)− y2(0))/3 and hence
y1 =2e2t + e−t
3y1(0) +
e2t − e−t
3y2(0)
y2 =2e2t − 2e−t
3y1(0) +
e2t + 2e−t
3y2(0).
The answer agrees with the present one.
In some cases, eAt can be obtained directly by using (1.5.2).
Example 1.5.2. Find eAt in each of the following cases:
(a) A =
[
0 10 0
]
(b) A =
[
1 10 0
]
.
Solution. (a) Direct computation shows An = O for n ≥ 2. So (1.5.2) gives
eAt = I +At =
[
1 t0 1
]
.
(b) Direct computation shows An = A for all n ≥ 1. So (1.5.2) gives
eAt = I +A∞∑
n= 1
tn
n!= I +A(et − 1) = (I −A) + etA =
[
et et − 10 1
]
.
9
EXERCISE SET III.1.
Review Questions. What is the spectrum σ(T ) of a linear operator T (defined on a
finite dimensional complex vector space V )? What are the numbers in σ(T ) called? How
to find σ(T ) by using the characteristic polynomial? What does the word “eigenspace”
mean? What is the diagonalization problem? Why is it intimately related to the eigenvalue
problem? What is the significance to diagonalization, of the fact which roughly says that
eigenvectors corresponding to distinct eigenvalues are linearly independent? How do we
use this fact to conclude that a linear operator defined on an n-dimensional complex space
with n distinct eigenvalues is diagonalizable?
Drills
1. Find the eigenvalues and their corresponding eigenvectors for each of the following
matrices:
(a)
[
1 20 3
]
(b)
[
1 11 1
]
(c)
[
1 11 −1
]
(d)
[
1 34 2
]
(e)
[
0 1 + i1− i 0
]
.
2. Find the eigenvalues and their corresponding eigenvectors for each of the following
matrices; (here, a, b, p, q z are arbitrary complex numbers and θ is a real number).
(a)
[
a bb a
]
(b)
[
p 1− pq 1− q
]
(c)
[
0 zz 0
]
(d)
[
cos θ sin θ− sin θ cos θ
]
.
3. In each of the following cases, for the given matrix A, find an invertible matrix P such
that P−1AP is a diagonal matrix.
(a)
[
2 11 2
]
(b)
[
2 34 1
]
(c)
[
0 3 + 4i3− 4i 0
]
(d)
[
cos θ sin θsin θ − cos θ
]
4. Find the eigenvalues and their corresponding eigenvectors for each of the following
matrices. (If the corresponding eigenspace has dimension > 1, you should find a basis
for this eigenspace.)
(a)
1 1 10 2 10 0 3
(b)
1 1 10 1 10 0 1
(c)
1 1 00 1 00 0 1
.
5. Find the eigenvalues and their corresponding eigenvectors for the linear operators T
on V for each of the following cases:
10
(a) V = P2, T (p(x)) = xp′(x).
(b) V =M2,2 (the space of all 2× 2 matrices), T (A) = A⊤ (the transpose of A.)
(c) V = R3, T (x) = ω × x (cross product), where ω is a fixed unit vector.
6. In each of the following cases, find the 2× 2 matrix A.
(a) 0 and 1 are eigenvalues of MA with corresponding eigenvectors [0 1]⊤ and [i 1]⊤
respectively.
(b) [1 i]⊤ is an eigenvector of MA with corresponding eigenvalue i, and the first
column of A is [1 1]⊤.
7. True or false:
(a) Real matrices have real eigenvalues.
(b) If λ, µ are eigenvalues of n× n matrices A and B respectively, then λ + µ must
be an eigenvalue of A+B.
(c) If λ is an eigenvalue of a linear operator T , then, for each scalar a, λ − a is an
eigenvalue of T − aI.
(d) The sum of two diagonalizable linear operators is diagonalizable.
(e) The product of two diagonalizable linear operators is diagonalizable.
(f) If a diagonalizable operator is invertible, then its inverse is also diagonalizable.
(g) If a linear operator T is diagonalizable, then so is its square T 2.
8. In each of the following cases, find a basis relative to which the representation matrix
of the given operator T on V is diagonal, if possible:
(a) V = P2, T (p(x)) = p(0) + p(2)x2.
(b) V = P2, T (p(x)) = p(0) + p′(1)x+ p(2)x2.
(c) V =M2,2 (the space of 2× 2 matrices), T (X) = AX , where A =
[
1 10 2
]
.
9. In each of the following cases, use the formula eAt =∑ ∞
n= 0An/n! to find eAt where
A is the given matrix.
(a)
[
1 10 0
]
(b)
[
2 00 3
]
(c)
[
0 20 0
]
(d)
[
1 a0 0
]
(e)
[
2 02 0
]
(f) D =
0 1 0 10 1 0 00 0 0 10 0 0 0
(g) D =
0 1 1 10 0 1 10 0 0 10 0 0 0
.
11
Exercises
1. Let T be a linear operator on a complex vector space V (not necessarily finite dimen-
sional) and let λ be a complex number. Prove the following statements:
(a) If λ is not an eigenvalue of T , then T is invertible.
(b) if λ2 is an eigenvalue of T 2, then either λ or −λ is an eigenvalue of T .
2. Find the characteristic polynomial of the matrix
A =
0 0 0 −a01 0 0 −a10 1 0 −a20 0 1 −a3
.
Also show that, if α is an eigenvalue of A, then [1, α, α2, α3]⊤ is an eigenvector of A⊤.
3. In each of the following cases, use the formula eAt =∑ ∞
n= 0An/n! to find eAt for
the given matrix A (where a is any nonzero constant).
(a) A =
[
a 10 a
]
, (b) A =
[
a 10 0
]
, (c) A =
[
0 11 0
]
.
(Hint for (a): write A = aI + N with N2 = O. Hint for (b): write A = aP with
P 2 = P .)
4. Solve the eigenvalue problem for the following circulant matrix:
C =
a0 a1 a2 a3a3 a0 a1 a2a2 a3 a0 a1a1 a2 a3 a0
,
where a0, a1, a2, a3 are arbitrary complex numbers. (As you may recognize, the
difficulty of this problem lies in the fact that the numbers a0, a1, a2, a3 are not
specifically given.) Hint: Consider the special case of P which is obtained from C by
setting a0 = 0, a1 = 1, a2 = 0 and a3 = 0. Notice that C = a0I+a1P +a2P2+a3P
3.
5. Let T ∈ V be a diagonalizable operator, that is, there is a basis in V consisting of
eigenvectors of T , say v1,v2, . . . ,vn. Prove that, if v := v1 + v2 + · · ·+ vn is also an
eigenvector of T , then T is a scalar multiple of the identity operator, i.e. T = λI for
some scalar λ.
12
§2. SUMMATION NOW! (And Change of Basis)
2.1 The present section has a technical aspect different from other sections: the heavy
dose of the summation symbol∑
. The usage of the summation symbol spreads broadly in
science and engineering literature (such as research papers and technical reports) and the
skill of handling it is a “must” for any professional scientist or engineer. In what follows
you will find some detail explanation in many steps involving the summation symbol to
help you to understand the mental process in juggling with this symbol.
The mathematical expression∑n
k = 1ak
is read as “the sum of all ak, where k runs from 1 to n”. When k runs from 1 to
k, ak goes through the following list of symbols
a1, a2, . . . , an (2.1)
and hence∑n
k = 1 ak stands for the sum a1+a2+ · · ·+an. The letter k in the expression∑n
k = 1 ak is said to be dummy because you can replace it by other letters, say j. Indeed,∑n
j = 1 aj is the same as∑n
k = 1 ak, because both of them represent the sum a1+a2+· · ·+an.
This is not a frivolous remark because, as you will see, in some circumstances changing
letters is absolutely necessary.
Besides changing letters, there are other ways to write the sum∑n
k = 1 ak, such as∑n−1
k = 0 ak + 1. To see that they are the same sum, we rewrite∑n
k = 1 ak as∑n
j = 1 aj and then
set j = k+1. Notice that aj becomes ak + 1, and, as j runs from 1 to n, k = j−1 runs from
0 to n − 1. Also notice that we can rewrite∑n
k = 0 ak as a0 +∑n
k = 1 ak. The following
two identities are self evident:
∑n
k = 1cak = c
∑n
k = 1ak
∑n
k = 1(ak + bk) =
∑n
k = 1ak +
∑n
k = 1bk.
Avoid by all means bad mistakes such as putting∑n
k = 1 akbk =∑n
k = 1 ak∑n
k = 1 bk, which
is terribly wrong.
Example 2.1.1.. What is the value of∑n
k = 0 c, where c is a constant? What is the
value of∑n
k = 0 (−1)k ?
Solution. The sum∑n
k = 0 c is obtained by adding a number of c’s. The crucial thing
is to count how may c’s are there. Notice that there are n+ 1 integers from 0 to n. So we
have∑n
k = 0 c = (n+ 1)c. The sum∑n
k = 0 (−1)k is 1− 1 + 1− 1 + · · ·+ (−1)n. If the last
term (−1)n is −1, which occurs exactly when n is odd, then all terms canceled so that the
13
answer is 0; otherwise the sum is 1. We conclude that∑n
k = 0 (−1)k is 1 when n is even,
and is 0 if n is odd.
We will encounter “double summation” of the form
∑n
j,k = 1ajk (2.1.2)
The is read as “the sum of all ajk where j and k are running independently from 1 to n.
We can rewrite it as one of the following
∑n
j = 1
∑n
k = 1ajk,
∑n
k = 1
∑n
j = 1ajk. (2.1.3)
When n = 2, the last two expressions are (a11+a12)+(a21+a22) and (a11+a21)+(a12+a22)
respectively, which are certainly equal. More generally, we have the following identity
∑m
j = 1
∑n
k = 1ajk =
∑n
k = 1
∑m
j = 1ajk. (2.1.4)
The validity of this identity is explained as follows. Both sides of this identity are the sum
of all entries cij of the matrix [cij ]. The only difference is in the ways they are performed:
one by taking row sums first, followed by adding up all row sums; the other by taking the
column sums first. Common sense tells us that they lead to the same sum. Thus, we can
switch the order of summation over j and summation over k, if j and k are indices running
independently.
2.2. We go back to check the identity (2.7.1) at the end of §2 in Chapter I, which
says, Relative to a basis B = {b1, b2, . . . , bn} of V , for S, T ∈ L (V ), a, b ∈ F and
v ∈ V , we have
[aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Tv] = [T ][v]. (2.2.1)
We begin with verifying the last identity [Tv] = [T ][v]. Let us put
[v] = [v1 v2 · · · vn]⊤, [Tv] = [u] = [u1 u2 · · · un]
⊤ and [T ] = [tij ]1≤i.j≤n.
According to our definition of column representation of vectors and matrix representation
of linear maps, we have
v =n∑
j = 1
vj bj , Tv =n∑
i= 1
ui bi and Tvj =n∑
i= 1
tij bj . (2.2.2)
14
Notice that our choice of letters for indices in (2.2.2) allows an immediate substitution.
Now
∑n
i= 1uibi = Tv = T
(∑n
j = 1vjbj
)
=∑n
j = 1vj Tvj
=∑n
j = 1vj∑n
i = 1tijbi =
∑n
j = 1
∑n
i= 1vj tijbi
=∑n
i= 1
∑n
j = 1tijvjbi =
∑n
i= 1
(∑n
j = 1tijvj
)
bi
(2.2.3)
♠ Aside: Here we use various elementary properties of summation for manipulating com-
plicated sums. We have used
vj∑n
i= 1tijbi =
∑n
i = 1vjtijbi.
This is legitimate because only i is running in taking the sum, while the index j is inde-
pendent of i and hence can be regarded as fixed if you want. So, essentially we are using
the identity a∑m
i = 1bi =∑m
i= 1abi here, which is clearly true. For the same reason we can
pull out bi from a sum where j is running:
∑n
j = 1tijvjbi =
(∑n
j = 1tijvj
)
bi.
We also have switched the order of two summation symbols∑n
i= 1 and∑n
j = 1 in the above
manipulation. This is possible – in fact we have the following general identity
∑n
i= 1
∑n
j = 1aij =
∑n
j = 1
∑n
i= 1aij .
(See (2.1.3) above.) ♠
Both sides of the identity (2.2.3) are linear combinations of vectors b1,b2, . . . ,bn,
which form a basis of V . Hence the corresponding coefficients of these two linear combi-
nations are the same:
ui =∑n
j = 1tijvj , 1 ≤ i ≤ n.
We can collect the above identities and put them into the matrix form:
u1u2...un
=
t11 t12 · · · t1nt21 t22 · · · t2n...
......
tn1 tn2 · · · tnn
v1v2...vn
,
which is just [u] = [T ][v].
15
Next we check [aS + bT ] = a[S] + b[T ] and [ST ] = [S][T ]. For each vector x in V ,
[aS + bT ][x] = [(aS + bT )(x)] = [aS(x) + bT (x)]
= a[S(x)] + b[T (x)] = a[S][x] + b[T ][x] = (a[S] + b[T ])[x].
Since an arbitrary column X with n-entries can always by put in the form [x] for some
vector x in V , we must have [aS + bT ] = a[S] + b[T ], in view of the following
Fact. m× n matrices A and B are equal if AX = BX holds for each column X.
This fact can be quickly checked as follows. Suppose AX = BX for all X . Let X =
[1, 0, 0, . . . , 0]⊤. Then AX and BX are the first columns of A and B respectively. So
AX = BX tells us that the first columns of A and B are equal. The same argument shows
that other columns are equal.
Similarly, we have [ST ][x] = [(ST )(x)] = [S(Tx)] = [S][Tx] = [S][T ][x], from which
it follows that [ST ] = [S][T ].
2.3. Let V be a finite dimensional vector space and let E and F be two bases in V .
What is the connection between the column representations [v] E and [v]F relative to these
two bases, for an arbitrary vector v in V ? In other words, what is the effect to the column
representation of a vector when the basis is changed? To answer this question, we have
to be more specific about these two bases: let E consist of vectors e1, e2, . . . , en and let F
consist of f1, f2, . . . , fn. Take an arbitrary vector v in V . Let
[v] E = (v1, v2, . . . , vm) and [v]F = (w1, w2, . . . , wn)
These are the column representations of the same vector v relative to two different bases
E and F . Recalling what it means by the column representation of a vector, we write
v = v1e1 + v2e2 + · · ·+ vmen = w1f1 + w2f2 + · · ·+wnfn.
Again we use summation symbols to streamline this:
v =∑
jvjej =
∑
kwkfk ;
it is understood both that j and k run from 1 to m. Suppose, for each j with 1 ≤ j ≤ m,
the column representation of ej , relative to F , is given by [ej ]F = (p1j , p2j , . . . , pnj).
Similarly, for each k with 1 ≤ k ≤ n, we suppose [fk] E = [q1k, q2k, . . . , qmk). Thus
ej =∑
k pkjfk and fk =∑
j qjkej . We have set up everything. It is time to compute:
v =∑
jvjej =
∑
jvj∑
kpkjfk =
∑
j
∑
kvjpkjfk
=∑
k
∑
jpkjvjfk =
∑
k
(∑
jpkjvj
)
fk.
16
(Aside: By now you should be capable of understanding what is going on with the above
manipulation involving double summation.) Recall that we also have v =∑
k wkfk. Use
this to replace the left-hand side of the previous identity:
∑
kwkfk =
∑
k
(∑
jpkjvj
)
fk.
Since F = {f1, f2. . . . , fn} is a basis, this identity entails
wk =∑
jpkjvj. (2.3.1)
which is called the “change of basis formula”. It can be rewritten in the matrix form as
[v]F = [pij ][v] E , or [v]F = P [v]E (2.3.2)
where P = [pij ] is of course the n × n matrix with pij in its (i, j)-entry. (♠ Aside: This
is the identity telling us the relation between the column representation of v relative to
different bases E and F . But it is not easy to apply, unless you are a highly organized
person with great skill in bookkeeping. Fortunately only the existence of P concerns us.♠)
Reversing the roles of v’s and w’s, exchanging j and k and replacing p by q, we have
something similar
vj =∑
kqjkwk, or [v] E = Q[v]F (2.3.3)
where Q = [qij ]. We need a new letter ℓ for the index k to rewrite the above identity as
vj =∑
ℓ qjℓwℓ Then we substitute this into (2.3.1) to get
wk =∑
jpkjvj =
∑
jpkj
∑
ℓqjℓwℓ =
∑
j
∑
ℓpkjqjℓwℓ
=∑
ℓ
∑
jpkjqjℓwℓ =
∑
ℓ
(∑
jpkjqjℓ
)
wℓ.
This identity holds for all w’s and hence we must have:
∑
jpkjqjℓ = δkℓ ≡
{
1, if k = ℓ;0, otherwise.
(2.3.4)
We can rewrite (2.3.3) in matrix form: PQ = In. By reversing the roles of p’s and q’s,
we obtain in the same way that QP = In. This shows that P is an invertible matrix with
[v]F = P [v] E and hence [v]E = P−1[v]F , which is the same as (2.3.3) above.
2.4. Next we consider the effect on the matrix representation of an operator on V ,
say T , due to the change of basis. We retain above notation such as E, F . We have to find
17
a relation between [T ]E and [T ]F , the matrix representations of the same linear operator
T relative to two different bases E and F .
Take an arbitrary vector x in V and let y be its image under T : y = T (x). Relative
to the basis E , we have the matrix representation [T ] E for the operator T and the column
representations [x]E and [y] E respectively for the vectors x and y. Recall from §2.2 that
they are related by [y]E = [T ] E [x]E . If we replace the basis E by F , in the same way we
have [y]F = [T ]F [x]F . By the above discussion about change of basis, we know that there
is an invertible matrix P such that [x]F = P [x] E . We have the following four identities:
[x]F = P [x] E , [y]F = P [y]E , [y]E = [T ] E [x] E , [y]F = [T ]F [x]F .
What can we do with them? Well, something natural: substitute the first and the second
to the fourth identity to obtain P [y]E = [T ]FP [x] E . Then substitute the third identity to
the left-hand side of this identity to arrive at P [T ] E [x]E = [T ]FP [x] E . The last identity
holds for every column X ≡ [x] E . Therefore we must have P [T ] E = [T ]FP , (by the fact
stated in §1.11.) As P is an invertible matrix, this can be rearranged as
[T ]F = P [T ] E P−1.
Definition. We say that two n×n matrices A and B are similar, or A is similar to
B, and we write A ∼ B, if there is an invertible n× n matrix P such that A = PBP−1.
Aside: In this definition, we may replace the last identity by A = P−1BP , because it is
just a matter of change in notation: replace P by P−1.) By means of this definition, we
can conclude the above discussion as follows:
Theorem 2.4.1. Representing matrices of a linear operator relative to different bases
are similar.
2.5. In general it is rather difficult to tell if two matrices are similar. You cannot
tell by observing how “similar” they look. For example, the following two matrices look
completely different, yet they are similar:
A =
[
1 11 1
]
, B =
[
0 00 2
]
.
In fact, a direct computation shows PAP−1 = B, where
P =
[
1 −11 1
]
, and hence P−1 =1
2
[
1 1−1 1
]
.
18
Another example: the matrices
C =
[
0 10 0
]
, D =
[
0 88
0 0
]
are similar. You can check QCQ−1 = D where Q is the diagonal matrix with 88, 1 as the
diagonal entries. To find if two matrices are similar, we use “canonical form” and, to do
this, we begin with solving some eigenvalue problem.
Similar matrices can be regarded as representing matrices for the same operator (rel-
ative to different bases) and hence they are regarded as “essentially” the same. For the
rest of the present section we discuss some basic aspects of similarity of matrices and their
implications to operators.
Notice that similarity is an equivalence relation. This means it obeys the following
three laws of equivalence:
(S1) A ∼ A. (Similarity is reflexive.)
(S2) If A ∼ B, then B ∼ A. (Similarity is symmetric)
(S3) If A ∼ B and B ∼ C, then A ∼ C. (Similarity is transitive.)
Here A,B,C are square matrices of the same size, say n × n. The first law says that
every matrix is similar to itself. This is obvious: you do have A = PAP−1 where P = I.
The second law is also clear, because you can rewrite a relation like A = PBP−1 as
B = QAQ−1 where Q = P−1. The proof of (S3) is a bit more interesting. Suppose A ∼ B
and B ∼ C, that is A = PBP−1 and B = QCQ−1 for some invertible matrices P and
Q. Then A = PBP−1 = P (QCQ−1)P−1 = (PQ)C(Q−1P−1) = (PQ)C(PQ)−1. Hence
A = RCR−1, where R = PQ is an invertible matrix. Therefore A ∼ C.
Similarity preserves algebraic operations of matrices. Suppose A ∼ B. Then we also
have 2A+A3 ∼ 2B+B3. Why? Well, by our assumption, we have A = PBP−1 for some
invertible matrix P . So
2A+A3 = 2PBP−1 + PBP−1PBP−1PBP−1
= 2PBP−1 + PB3P−1 = P (2B +B3)P−1.
More generally, if p(x) is a polynomial, we can substitute x by a square matrix, say
A, to form a new matrix p(A). For example, the matrix 2A + A3 is just q(A), where
q(x) = 2x + x3. Of course there is nothing special about the polynomial q(x) = 2x + x3
and the following statement holds for an arbitrary polynomial p(x):
If A ∼ B, then p(A) ∼ p(B).
A quick example: the matrices
A =
[
1 00 0
]
and B =
[
0 10 0
]
19
are not similar. Why? Well, from A2 = A �= O and B2 = O, we see that A2 and B2 are
not similar; (notice that only O can similar to O.) By our discussion here, A and B cannot
be similar; (otherwise A2 and B2 would be similar.)
2.6. Similar matrices have the same determinant:
A ∼ B implies det(A) = det(B).
To prove this significant fact, we need a basic property of the determinant function det(·):
for n×n matrices C and D, we have det(CD) = det(C) det(D). (Aside: You may describe
this property in words: the determinant function is multiplicative.) Now assume that A
and B are similar matrices, say A = PBP−1 for some invertible matrix P . Then
det(A) = det(PBP−1) = det(P ) det(B) det(P−1)
= det(P ) det(P−1) det(B) = det(PP−1B) = det(B).
So A and B have the same determinant.
What is the upshot of all this? Well, it allows us to define the determinant of a
linear operator (instead of matrix) legitimately. Given an operator, say T , on a (finite
dimensional) vector space V . How should we define its determinant? Well, the natural
definition is to take a basis E of V , look at the matrix representation [T ] E relative to this
basis, and call the determinant of the matrix [T ] E to be the determinant of T . But V
has many different bases; which one should we pick? There is no clear choice. So pick
any one. Now the problem is: if I pick a basis E and call det([T ] E ) the determinant of T ,
and you pick another basis, say F , and call det([T ]F ) the determinant of T , will it ever
happen that my determinant of T differs from yours? This is something we really have to
worry about. Fortunately, such discrepancy never occurs. The reason is, even though our
matrices are different, they are similar and hence they have the same determinant. Now
we can legitimately define the determinant det(T ) of a linear operator T by putting
det(T ) = det([T ]),
where [T ] is the matrix representation of T relative to any basis. The determinant function
for matrices is a similarity invariant, or simply an invariant. This means that the
determinant of a matrix is a quantity unchanged when this matrix is replaced by a similar
one. Figuratively speaking, a matrix representation of an operator is like a disguise. A
different disguise changes the look of an operator. But an invariant is like a characteristic
trait of a person, impossible to hide under any disguise.
Besides the determinant function, there is another important similarity invariant for
matrices, called the trace function. By definition, the trace of a square matrix A, denoted
20
by tr(A) or just tr A, is simply the sum of its diagonal entries. A quick example:
tr
[
1 23 4
]
= 1 + 4 = 5.
♠ Aside: This seems to be rather cheap. In mathematics it is very unlikely that you can
obtain something significant at no cost. Now we would like to show that similar matrices
have the same trace. This is by no means a trivial matter. The effort to prove this is
the price we have to pay. One might try to prove this by establishing an identity for the
trace function similar to the one for determinant function, namely tr(AB) = tr(A) tr(B)
and then follow the same argument as above for the determinant function. But, alas! This
identity is not true. In fact, it is terribly wrong, even under the most favorable condition
that both A and B are diagonal matrices! The saving grace is the following theorem. ♠
Theorem 2.6.1. For n× n matrices A and B, tr(AB) = tr(BA).
We prove this theorem as follows. Let A = [aij ] and B = [bij ]. Then the (i, j) entry
of C ≡ AB is given by cij =∑
k aikbkj . Hence the sum of the diagonal entries of C = AB,
giving us the trace of AB, is
tr(AB) =∑
icii =
∑
i
∑
kaikbki.
Similarly, the trace of BA is tr(BA) =∑
i
∑
k bikaki. The question is: how to see that
two “double sums” above are equal? Of course you can “ expand” them and check. This
method is fool-proof but quite tedious. A better way is: in the second double sum, swap
the names of the subscripts (i is renamed as k and k is renamed as i) and then swap
summation signs:
tr(AB) =∑
i
∑
k
akibik =∑
k
∑
i
aikbki =∑
i
∑
k
aikbki = tr(BA).
Now we return to the proof of the fact that similar matrices have the same trace. Indeed,
if B = PAP−1, then
tr(B) = tr(PAP−1) = tr((PA)P−1) = tr(P−1(PA)) = tr(A).
Since the trace function is a similarity invariant for matrices, we can define the trace of
an operator T by putting tr(T ) = tr([T ]), where [T ] is a matrix representation of T . We
should mention that the trace function is linear: tr(aA + bB) = a tr(A) + b tr(B). This
follows directly from the definition of trace.
We have introduced two invariants for a linear operator, namely, the determinant and
the trace. More invariants will be considered in the next chapter, such as the spectrum
21
and the characteristic polynomial. (♠ Aside: Do not belittle the trace function! It is
widely used in diverse areas such as group characters, which have bearings on chemical
properties of crystals. It has been recently generalized immensely, giving rise to the theory
of so-called “noncommutative integration” and “tracial states” in the algebraic approach
to quantum mechanics. ♠.)
Let T be a linear operator defined on a finite dimensional vector space V . Suppose
that T is diagonalizable, that is, there is a basis B in V consisting of eigenvectors
of T , say vj (1 ≤ j ≤ n) with Tvj = λjvj . Then [T ]B is a diagonal matrix with
λ1, λ2, . . . , λn as its diagonal entries. According to the definition of the trace and the
determinant of an operator, we have
tr = λ1 + λ2 + · · ·+ λn, det(T ) = λ1λ2 · · ·λn.
Actually, the above identities still hold without assuming T diagonalizable. Thus, given a
linear operator on a finite dimensional complex vector space, its trace is the sum of all its
eigenvalues and its determinant is the product of all its eigenvalues (counting multiplici-
ties). For example, the trace and the determinant of
A =
[
1 24 3
]
are tr(A) = 4 and det(A) = −5 respectively. A quick guess gives (−1) + 5 = 4 and
(−1)× 5 = −5. So the eigenvalues of A are −1, 5. Another example: for
B =
[
1 −ii 1
]
we have tr(B) = 2 and det(B) = 0 respectively. A quick guess gives 0+2 = 2 and 0×2 = 0.
So the eigenvalues of B are 0, 2.
EXERCISE SET III.2.
Review Questions. What does it mean by saying that two matrices are similar. Why do
we care about the concept of similarity between matrices? Why the determinant and the
trace of a matrix are its similarity invariants? Am I still intimidated by the summation
sign? From now on, can I handle it with ease?
22
Drills
1. Simplify each of the following expressions:
(a)∑n
k = 1ak +
∑n
j = 1(b− aj).
(b)∑n
k = 1(ak − ak + 1).
(c)∑n
k = 1(ak − ak−1)
(d)∑n
k = 1(ak − ak + 1)(ak + ak + 1)
(e)∑n
k = −nk.
2. In each of the following cases, the matrix of a linear operator T on V relative to a
basis B = {b1,b2} is given, and the representing columns [e1]B and [e2]B (relative to
B) of vectors in the basis E = {e1, e2} are also given. Find the representing matrix
[T ] E of T relative to E in each case.
(a) [e1]B = [1, 1]⊤, [e2]B = [1,−1]⊤, and [T ]B =
[
1 11 1
]
.
(b) E is the same as the one in (a), and [T ]B =
[
1 1−1 1
]
.
(c) [e1]B = [i, 1]⊤, [e2]B = [1, i]⊤ and [T ]B =
[
1 ii −1
]
.
3. Same as the previous question, but with [b1] E , [b2] E given instead of [e1]B, [e2]B.
(a) [b1]E = [1, 0]⊤, [b2] E = [1, 1]⊤ and [T ]B =
[
1 11 1
]
.
(b) [b1]E = [1, 1]⊤, [b2] E = [i,−i]⊤ and [T ]B =
[
0 i−i 0
]
.
4. True or false (S and T are operators on some vector space V and A is a square matrix):
(a) det(ST ) = det(S) det(T ). (b) det(S + T ) = det(S) + det(T ).
(c) tr(S + T ) =tr(S)+tr(T ). (d) tr(ST ) =tr(S)tr(T ).
(e) det(A⊤) = det(A). (f) tr(A⊤) =tr(A).
5. True or false (A,B,C and D are n× n matrices):
(a) If A is similar to B and if C is similar to D, then AC is similar to BD.
(b) If A is similar to B, then A2 is similar to B2.
23
(c) If A is similar to B, the A⊤ is similar to B⊤.
(d) It A is invertible and B is similar to A, then B is also invertible.
(e) If A is similar to the identity matrix I, then A = I.
Exercises
1. Show that the matrices A =
[
1 10 2
]
and B =
[
1 20 2
]
are similar by finding an
invertible matrix P such that PAP−1 = B.
2. Consider the 2-dimensional complex vector space V of functions spanned by sinx and
cosx. For a fixed real number α, define a linear operator T ≡ Tα on V by putting
T (f(x)) = f(x + α). Find the matrices [T ]B and [T ]E of T relative to the bases
B = {cosx, sinx} and E = {cosx + i sinx, cosx − i sinx}. Find an invertible matrix
P implementing the similarity between [T ]B and [T ] E : [T ]B = P [T ]E P−1.
3. Show that, if A = [aij ] is an n× n matrix, then
tr(A⊤A) =∑n
j,k = 1a2jk.
4. Show that the n × n identity matrix I cannot be written in the form AB − BA for
some n× n matrices A and B. (Hint: Use a basic property of tr(·).)
5. Let A and B be n× n matrices. (a) Show that, if A is invertible, then AB is similar
to BA. (b) Show that AB and BA may not be similar in general.
6. Criticize the following “proof” of the statement “if a (square) matrix A is similar to12A, then A = O”. “Proof ”: For simpicity, we write B ∼ C for B is similar to C.
Notice that, if B ∼ C, then 12B ∼ 1
2C. So, from A ∼ 1
2A we obtain 1
2A ∼ 1
4A. In the
same way, we get 14A ∼ 1
8A etc. Hence A ∼ 12nA for all positive integer n. Letting
n → ∞, we get A ∼ O, from which it follows A = O.
7. Let A abd B be n× n matrices with B invertible. Simplify the expression
∑n
k = 1Ak(A−B−1)Bk.
8. Use the summation sign to give a careful inductive proof of the binomial formula
(a+ b)n =∑n
k = 0
(
n
k
)
an−kbk.
24
§3. Basic Spectral Theory
3.1. Spectral Theory is considered as “the heart of the matter” in linear algebra. Here
we only present some basic aspect of this theory which is adequate for most applications.
The full theory, not described here, includes the Jordan canonical form (see Appendix C),
which is substantually more difficult.
Let T be a linear operator on a finite dimensional vector space V over the complex
field C with dimV = n. As we know, the dimension of L (V ), the space of all linear
operators on V , is also finite dimensional with dim L (V ) = n2. Thus, letting N be any
integer with N ≥ n2, the N +1 operators I, T, T 2, . . . , TN must be linearly dependent.
Hence there exist complex numbers a0, a1, . . . , aN , not all zeroes, such that
a0I + a1T + · · · + aNTN = O.
Let p(x) = a0 + a1x + · · · + aNxN . Then p(x) is a nonzero polynomial such that
p(T ) = O. We have proved that, given a linear operator on a finite dimensional space,
there is a nonzero polynomial p(x) such that p(T ) = O.
Among all nonzero polynomials p(x) satisfying p(T ) = O, we pick the one with
the smallest degree, say m, with the leading coefficient one, and denote it by pT (x).
Now, suppose that q(x) is any nonzero polynomial such that q(T ) = O. Divide q(x) by
pT (x) to get q(x) = Q(x)pT (x)+r(x), where r(x) is the remainder, which is a polynomial
of degree less than m, the degree of pT (x). Now q(T ) = Q(T )pT (T ) + r(T ). From
q(T ) = O and pT (T ) = O we get r(T ) = O. Since the degree of r(x) is less than that of
pT (x), it is necessarily the zero polynomial; (otherwise it would be a nonzero polynomial
of lower degree satisfying r(T ) = O, contradicting our choice of pT (x)). Thus we have
q(x) = Q(x)pT (x). In the future we will call pT (x) the minimal polynomial of T . We
have proved that any polynomial q(x) satisfying q(T ) = O is a multiple of the minimal
polynomial pT (x) of T .
Let λ1, λ2, . . . , λr be all roots of pT (x) with multiplicities m1, m2, . . . , mr respec-
tively. Thus pT (x) = (x− λ1)m1(x− λ2)
m2 · · · (x− λr)mr and
(T − λ1I)m1(T − λ2I)
m2 · · · (T − λrI)mr = pT (T ) = O.
Let q1(x) = (x− λ1)m1−1(x− λ2)
m2 · · · (x− λr)mr . Since the degree of q1(x) is less than
the degree of pT (x), we must have q1(T ) �= O. Hence there exists a vector u in V such
that v ≡ q(T )u �= 0. Since (x− λ1)q(x) = pT (x), we have
(T − λ1I)v = (T − λ1I)q(T )u = pT (T )u = 0.
25
This shows that λ1 is an eigenvalue of T . In the same way, we can show λk for any k ≤ r
is an eigenvalue of T . We have proved that the roots of the minimal polynomial of a linear
operator on a finite dimensional complex vector space are eigenvalues of T . In particular,
we have proved that eigenvalues do exist for such an operator. (♠ Remark: If we work in
a field other than C, say R, eigenvalues may not exist.♠)
Next, we use an idea in the proof of Proposition 3.4.1 in Chapter I to investigate
the so–called spectral decomposition of T . Let fk(x) be the polynomial obtained from
pT (x) = (x− λ1)m1(x− λ2)
m2 · · · (x− λr)mr by deleting the factor (x− λk)
mk . Thus we
have (x−λk)mkfk(x) = pT (x) for all k. Clearly, the polynomials f1(x), f2(x), . . . , fr(x)
do not have any common root. So they are coprime polynomials. Hence there exist
polynomials g1(x), g2(x), . . . , gr(x) such that
f1(x)g1(x) + f2(x)g2(x) + · · ·+ fr(x)gr(x) = 1. (3.1.1)
Let pk(x) = fk(x)gk(x) so that p1(x) + p2(x) + · · ·+ pr(x) = 1. Let Pk = pk(T ). The
above identity gives p1(T ) + p2(T ) + · · ·+ pr(T ) = I, or
P1 + P2 + · · ·+ Pr = I. (3.1.2)
Notice that, when j �= k, pT (x) is a factor of fj(x)fk(x) and hence
fj(T )fk(T ) = O, (3.1.3)
which gives PjPk = fj(T )gk(T )fk(T )gk(T ) = O. Multiply both sides of (3.1.2) by Pk.
The right hand side becomes Pk. The left hand side is the sum∑r
j = 1PjPk, where the
term PjPk vanishes if j �= k. Hence this sum becomes P 2k , which gives P 2
k = Pk. We
conclude
P 2k = Pk, and PkPj = O if k �= j. (3.1.4)
Notice that Pk can be expressed as a polynomial in T , namely Pk = pk(T ). From Tpk(T ) =
pk(T )T we get TPk = PkT , or, putting in words, Pk commutes with T . The following
observation is crucial: (T − λkI)mkPk = (T − λkI)
mkpk(T ) = (T − λkI)mkfk(T )gk(T ) =
pT (T )gk(T ) = O. We make another short summary:
TPk = PkT, (T − λkI)mkPk = O. (3.1.5)
The relations (3.1.2), (3.1.4) and (3.1.5) all together give us the spectral decomposition
of T where Pk is called the spectral projection corresponding to the eigenvalue λk. From
the computational point of view, the important step of obtaining the spectral decompo-
sition is to find the polynomials gk(x) so that (3.1.1) holds. This can be done by finding
partial fractions of the rational function 1/pT (x), as shown in the following examples.
26
Example 3.1.1. Consider the operator T =MA on R3 (Tx = Ax), where
A =
1 2 30 1 10 0 2
.
The characteristic polynomial of A is p(x) = (x− 1)2(x− 2). The partial fraction decom-
position of 1/p(x) is
1
(x− 1)2(x− 2)=
1
x− 2−
1
x− 1−
1
(x− 1)2=
1
x− 2−
x
(x− 1)2.
(The basic working knowledge of partial fractions is assumed here.) So 1 = (x−1)2−x(x−
2). Thus, if we let f1(x) = (x − 1)2 and f2(x) = x − 2, then g1(x) = 1 and g2(x) = −x.
Correspondingly P1 = f1(A)g1(A) = (A − I)2 and P2 = f2(A)g2(A) = −A(A − 2I).
After some matrix computation, we arrive at
P1 =
0 0 50 0 10 0 1
, P2 =
1 0 −50 1 −10 0 0
.
As you can check, P 21 = P1, P
22 = P2, P1P2 = P2P1 = O, P1 + P2 = I, (A − 2I)P1 = O,
(A− I)2P2 = O.
Example 3.1.2. Consider the operator T =MA on R3, where
A =
0 2 30 1 10 0 2
.
The characteristic polynomial of A is p(x) = x(x− 1)(x− 2). The partial fraction decom-
position of 1/p(x) is
1
x(x− 1)2(x− 2)=
1/2
x−
1
x− 1+
1/2
x− 2.
So 1 = (1/2)(x − 1)(x − 2) − x(x − 2) + (1/2)x(x − 1). Thus P1 = 12(A − I)(A − 2I),
P2 = −A(A− 2I) and P3 =12A(A− I). Actual computation shows
P1 =
1 −2 −1/20 0 00 0 0
, P2 =
0 2 −20 1 −10 0 0
, P3 =
0 0 5/20 0 10 0 1
.
As you can check, PjPk = δjkPj , P1 + P2 + P3 = I, AP1 = O, AP2 = P2, AP3 = 2P3.
27
3.2. We go back to our discussion of a linear operator T on V with minimal polynomial
pT (x) = (x−λ1)m1(x−λ2)
m2 · · · (x−λr)mr . In a lucky situation, all of the exponents mk
are equal to 1, in other words, the minial polynomial is of the form
pT (x) = (x− λ1)(x− λ2) · · · (x− λr)
where λk (1 ≤ k ≤ r) are distinct roots of pT (x) and thus pT (x) has simple roots (that is,
each root does not repeat itself). Then (3.1.5) says that (T − λkI)Pk = O for all k. This
identity tells us the the range Pk(V ) of Pk is contained in the eigenspace of T corresponding
to the eigenvalue λk. From (3.1.2) we see that each vector v can be expressed as a sum
v =∑r
k = 1Pkvk,
where Pkv is either zero or an eigenvector of T . This tells us that T is diagonalizable, that
is, it has a basis consisting of eigenvectors; (a more detail argument is given in Appendix
B). The converse is also true: if T is diagonalizable, then its minimal polynomial pT (x) has
simple roots. Indeed, if b1, b2, . . . , bn is a basis of eigenvectors and let λ1, λ2, . . . , λrbe the distinct eigenvalues of T , then each bj is annihilated by T − λkI for some k and
hence (T − λ1I)(T − λ2I) · · · (T − λrI)bj = O. Since bj (1 ≤ j ≤ n) span V ,
(T − λ1I)(T − λ2I) · · · (T − λrI)v = O
for all v. Thus p(T ) = O, where p(x) = (x− λ1)(x− λ2) · · · (x− λr) is a polynomial with
simple roots. We have proved:
Theorem 3.2.1. A linear operator T defined on a finite dimensional complex
vetcor space is diagonalizable if and only if its minimal polynomial is of the form pT (x) =
(x−λ1)(x−λ2) · · · (x−λr), where λ1, λ2, . . . , λr is the set of distinct eigenvalues of T .
According to this theorem, to see if an operator T is diagonalizable, we can take the
following two steps: first, find all distinct eigenvalues λ1, λ2, . . . , λr of T ; second, form
the polynomial p(x) = (x− λ1)(x − λ2) · · · (x − λr) and check that if p(T ) = O holds. If
p(T ) = O, the answer is yes; if p(T ) �= O, the answer is no.
Example 3.2.1. Find the condition on a, b, c such that the matrix
A =
1 0 ab 2 c0 0 2
is diagonalizable.
28
Solution. We find that the characteristic polynomial of A is (x− 1)(x− 2)2 and hence
the distinct eigenvalues are 1, 2. Form p(x) = (x− 1)(x− 2). Then
p(A) =
0 0 ab 1 c0 0 1
−1 0 ab 0 c0 0 0
=
0 0 00 0 ab+ c0 0 0
which is the zero matrix if and only if ab+ c = 0. Thus A is diagonalizable if and only if
ab+ c = 0.
If we know that T satisfies p(T ) = O for a polynomial p(x) with simple roots, then T is
diagonalizable. This is because the minimal polynomial pT (x) is a factor of p(x) and hence
it also has simple roots.
Example 3.2.2. An operator P is called a projection if P 2 = P . A projection P
is diagonalizable, since P 2 = P tells us that p(P ) = O, where p(x) = x2 − x = x(x − 1),
which is a polynomial with simple roots. Next, suppose that T is an operator satisfying
Tm = I for some positive integer, then T is diagonalizable because p(T ) = O is satisfied
with p(x) = xm − 1 and p(x) has simple roots: indeed,
xm − 1 = (x− 1)(x− ω)(x− ω2) · · · (x− ωm−1)
with m distinct roots 1, ω, ω2, . . . , ωm−1, where ω = e2πi/m.
3.3. We continue with the general discussion of the spectral decomposition of T and
keep the notation used in subsection 3.1. LetMk be the range of the spectral projection Pk:
Mk = Pk(V ). We call Mk the spectral subspace of T corresponding to the eigenvalue
λk. The identity TPk = PkT (see (3.1.5)) tells us that Mk is invariant for T . Indeed a
vector in Mk := Pk(V ) has the form Pkv and T (Pkv) = Pk(Tv), showing that T (Pkv) is
also in Mk. Thus T sends vectors in Mk to vectors in Mk. So we can define an operator
Tk on Mk by putting Tkv = Tv for all v in Mk. (The operator Tk defined in the
way is called the restriction of T to Mk.) Let Qk = Tk − λkIk, where Ik is the identity
operator on Mk. Then
Tk = λkIk +Qk and Qmk
k = O, (3.3.1)
according to (3.1.5). We call a linear operator Q a nilpotent operator if its certain
power vanishes, that is, Qm = O for some positive integer m. Thus Qk here is a nilpoient
operator on Mk for each k. So, by means of the spectral decomposition, the problem
about the structure of a general operator is boiled down to the one about the structure of
a general nilpotent operator.
29
Example 3.3.1 Cosider the operator D on the space Pn (of polynomials of degrees
at most n) defined by D(p(x)) = p′(x), the derivative of p(x) We have Dn+ 1 = O, which
tells us that D is nilpotent. This can be seen from the fact that D reduces the degree
of a nonconstant polynomial by one and sending constant polynomials to zero. Take any
constant a and let
τk(x) =(x− a)k
k!k = 0, 1, 2, . . . , n.
(By convention, 0! = 1 and hence τ0(x) = 1. The Greek letter τ is pronounced as “tau”.
We choose this letter because of its association with “Taylor”.) Notice that the degree of
τk(x) is k. From this fact it is not hard to deduce that τ0(x), τ1(x), τ2(x), . . . , τn(x) form
a basis of Pn, say T . Notice that, for k ≥ 1,
D(τk(x)) =d
dx
(x− a)k
k!=
k(x− a)k−1
k!=
(x− a)k−1
(k − 1)!= τk−1(x).
This shows that the matrix [D]T of D relative to T is given by
N =
0 1 0 · · · 00 0 0 · · · 00 0 1 · · · 0...0 0 0 · · · 10 0 0 · · · 0
(3.3.2)
One can check directly that Nn+ 1 = O.
Example 3.3.2. Next we study the “difference operator” ∆ on Pn defined by
∆(p(x)) = p(x)− p(x− 1).
We introduce the polynomials
δk(x) =x(x+ 1)(x+ 2) · · · (x+ k − 1)
k!k = 0, 1, 2, . . . , n.
Again, by convention, δ0(x) = 1. Now
∆(δk(x)) =x(x+ 1) · · · (x+ k − 2)(x+ k − 1)
k!−
(x− 1)x(x+ 1) · · · (x+ k − 2)
k!
=x(x+ 1) · · · (x+ k − 2)((x+ k − 1)− (x− 1))
k!
=x(x+ 1) · · · (x+ k − 2)
(k − 1)!= δk−1(x).
30
So the matrix [∆]D of ∆ relative to the basis D = (δ0(x), δ1(x), . . . , δn(x)) is also given
by (3.3.2) above.
In the above two examples, notice that, except for k = 0, we have τk(a) = 0 and
δk(0) = 0. This common feature deserves an effort of addtional discussion as follows. Let
T be a linear operator defined on Pn, B = (p0(x), p1(x), . . . , pn(x)} a basis of Pn such
that T (pk(x)) = pk−1(x) for k ≥ 1 and T (p0(x)) = 0. Furthermore, there is a constant a
such that pk(a) = 0 for all k ≥ 1 and p0(a) = 1. Take any polynomial p(x) in Pn. Since B
is a basis of Pn, we can write p(x) as a linear combination of vectors in B, say
p(x) = a0p0(x) + a1p1(x) + · · ·+ anpn(x) =∑n
k = 0akpk(x). (3.3.3)
(This means [p(x)]B = (a0, a1, . . . , an).) Evaluating at x = a, we get p(a) = a0p0(a) = a0.
So a0 = p(a). Next, apply T to both sides of (3.3.3) to get
Tp(x) = a1p0(x) + a2p1(x) + · · ·+ anpn−1(x) (3.3.4).
Evaluating at x = a, we get Tp(a) = a1. (Here we write Tp(a) for T (p(x)) evaluated
at x = a.) Applying T to (3.3.4), we get T 2p(x) = a2p0(x) + a3p1(x) + · · · + anpn−2(x).
Evaluating at x = a, we get T 2p(a) = a2. Continuing in this manner, we obtain T kp(a) =
ak for all k. Thus we have
p(x) =∑n
k = 0(T kp(a)) pk(x). (3.3.5)
In case T = D and pk(x) = τk(x), Tkp = p(k) (kth derivative of p) and hence (3.3.5)
becomes
p(x) =∑n
k = 0p(k)(a)
(x− a)k
k!
Usually we prefer to write this as
p(x) =n∑
k = 0
p(k)(a)
k!(x− a)k, (3.3.6)
which is Taylor’s formula for polynomials. In case T = ∆ and pk(x) = δk(x), we have
p(x) =n∑
k = 0
∆kp(0)
k!x(x+ 1) · · · (x+ k − 1)
This neat formula has many applications but we have no intention of pursuing this matter.
Now we apply Taylor’s formula (3.3.6) to p(x) = xn. We have
p(k)(x) = n(n− 1) · · · (n− k + 1)xn−k,
31
and hence (3.3.6) gives
xn =n∑
k = 0
n(n− 1) · · · (n− k + 1)an−k
k!(x− a)k =
n∑
k = 0
(
n
k
)
an−k(x− a)k.
Evaluating at x = a+ b, the above identity becomes
(a+ b)n =n∑
k = 0
(
n
k
)
an−kbk
which is the binomial theorem.
3.4. In the present subsection we show how to use the spectral decomposition to
compute the powers Tn and the exponential etT of T . Identity (3.1.2) tells us that
Tn = TnP1 + TnP2 + · · ·+ TnPr and etT = etTP1 + etTP2 + · · ·+ etTPr
So it is enough to consider the restriction Tk of T to the spectral subspace Mk for each k.
Identity (3.3.1) tells us Tk = λkIk +Qk and Qmk
k = O. Thus
etTk = etλketQk = etλk
∑mk−1
j = 0
Qjk
j!tj
In view of (1.5.2) in §1 of the present chapter and the fact that Qjk = O for j ≥ mk. We
can see the pattern better by considering small values of mk, say mk = 1, 2, 3:
(mk = 1) etTk = etλk
(mk = 2) etTk = etλk(Ik +Qkt)
(mk = 3) etTk = etλk
(
Ik +Qkt+Qk
2t2)
For the powers Tnk , we can use the binomial expansion to evaluate
Tnk = (λkIk +Qk)
n =∑mk−1
j = 0
(
n
j
)
λn−jk Qj
k
For small values of mk, we have
(mk = 1) Tnk = λnkIk
(mk = 2) Tnk = λnkIk + nλn−1
k Qk
(mk = 3) Tnk = λnkIk + nλn−1
k Qk + n(n− 1)λn−2k
Q2k
2.
32
Recognizing the patterns here is all what we need for computing Tn and etT , as shown in
the following examples. In the course we only focus on powers of matrices.
Example 3.4.1. Find a closed formula for the powers of A =
[
2 10 3
]
.
Solution. We find that the eigenvalues of A are λ1 = 2 and λ2 = 3 and p(A) = O,
where p(x) = (x− 2)(x− 3). So m1 = m2 = 1. Thus the powers of A has the form
An = 2nX + 3nY
for some matrices X and Y ; (here we write X , Y for P1, P2 since they are treated as
unknowns in equations given below). Setting n = 0 and n = 1, we get
I = X + Y, A = 2X + 3Y
Solving this matrix equation, we obtain
X = 3I −A =
[
1 −10 0
]
Y = A− 2I =
[
0 10 1
]
.
Our final answer is
An = 2n[
1 −10 0
]
+ 3n[
0 10 1
]
=
[
2n 3n − 2n
0 3n
]
.
(♠ Remark: Contrary to some textbooks in linear algebra, there is no need to find eigen-
vectors to answer this kind of questions ♠.)
Example 3.4.2. Find a closed formula for the powers of A =
1 2 20 2 1
−1 2 2
.
Solution. The characteristic polynomial of A is found to be p(z) = (z − 1)(z − 2)2.
We can check that p(A) ≡ (A− 1)(A− 2)2 = O but (A− 1)(A− 2) �= O. Hence A is
not diagonalizable. So we write
An = 1nX + 2nY + n2n−1Z,
where X , Y and Z to be determined. Setting n = 0, 1, 2, we obtain I = X + Y , A =
X +2Y +Z and A2 = X +4Y +4Z. Using the first identity I = X +Y , the others can be
33
written as A = I+Y +Z and A2 = I+3Y +4Z, which gives Y = A2−4A−4I = (A−2I)2.
So
Y = (A− 2I)2 =
−1 2 20 0 1
−1 2 0
2
=
−1 2 0−1 2 01 −2 0
,
X = I − Y =
2 −2 01 −1 0
−1 2 1
and Z = A− Y − I =
−2 4 2−1 2 10 0 0
.
Hence
An =
−1 2 0−1 2 01 −2 0
+ 2n
2 −2 01 −1 0
−1 2 1
+ n2n−1
−2 4 2−1 2 10 0 0
.
Example 3.4.1. Find a closed form of the nth power of A =
14 4 −24 14 2
−2 2 17
.
Solution. The distinct eigenvalues of A are 18 and 9. A brute force computation
shows that p(A) = O for p(x) = (x − 18)(x − 9), which a polynomial with simple roots.
(Notice that A is a real symmetric matrix. It follows from a theorem presented in the next
chapter that A is diagonalizable and hence the minimal polynomial of A has simple roots.)
Thus we may write An = 18nX+9nY , whereX and Y are 3×3 matrices to be determined.
Setting n = 0 and n = 1, we obtain I = X + Y and A = 18X + 9Y respectively. From
these two identities we obtain X = 9−1(A−9 I) and Y = −9−1(A−18 I). Thus we have
An = 18n9−1(A− 9 I)− 9n9−1(A− 18 I) = An = 2n9n−1(A− 9 I)− 9n−1(A− 18 I), or
14 4 −24 14 2
−2 2 17
n
= 2n9n−1
5 4 −24 5 2
−2 2 8
− 9n−1
−4 4 −24 −4 2
−2 2 −1
.
3.5. The rest of this section will be a brief description of some advanced material in
spectral theory. Detailed arguments will be presented in the Appendices at the end of the
present chapter. Mathematically inclined students are encouraged to study them on their
own, for enjoying a truly beautiful piece of mathematics.
Let us recall some notation. Let pT (x) = (x − λ1)m1(x − λ2)
m2 · · · (x − λr)mr be
the minimal polynomial of T , let Pk be the spectral projection and Mk = Pk(V ) be the
spectral subspace, corresponding to the eigenvalue λk. Let Tk be the restriction of T to
Mk. Recall from subsection 3.1 that
P1 + P2 + · · ·+ Pr = I, PjPk = δjkPk, (3.5.1)
34
and
TPk = PkT, (T − λkI)mkPk = O. (3.5.2)
Also recall from (3.3.1) that Tk = λkIk + Qk, with Qmk
k = O. The last identity tells us
that λk is the only eigenvalue of Tk.
Let nk = dimMk. The characteristic polynomial of Tk is det(xIk − T ) = (x− λk)nk .
Take a basis of Mk, say Bk = (b(k)1 , b
(k)2 , . . . , b
(k)nk
). Putting all Bk together, say:
B = (b(1)1 , b
(1)2 , . . . , b(1)
n1, b
(2)1 , b
(2)2 , . . . , b(2)
n2, . . . , b
(r)1 , b
(r)2 , . . . , b(r)
nr),
which can be shown to be a basis of V . The matrix [T ]B representing T relative to B is a
block diagonal matrix with blocks [T1]B1, [T2]B2
, . . . , [Tr]Bralong its diagonal:
[T ]B =
[T1]B1O · · · O
O [T2]B2· · · O
...
O O · · · [Tr]Br
.
Thus the characteristic equation of T is
cT (x) = det[xI − T ] = det[xI1 − T1]B1det[xI2 − T2]B2
· · · det[xIr − Tr]Br
= (x− λ1)n1(x− λ2)
n2 · · · (x− λr)nr .
The positive integer nk, which is the dimension of the spectral subspace Mk and appears
as the power of the factor x− λk in the characteristic polynomial, is called the algebraic
multiplicity of λk. From the fact that Qk is a nilpotent operator on Mk and mk is the
least positive integer satisfying Qmk
k = O, it can be proved that mk ≤ nk. This shows that
the minimal polynomial
pT (x) = (x− λ1)m1(x− λ2)
m2 · · · (x− λr)mr
is a factor of cT (x). Thus, from pT (T ) = O we obtain cT (T ) = O. We have arrived at the
following celebrated:
Cayley–Hamilton Theorem If p(x) is the characteristic polynomial of a linear
operator T on a finite dimensional vector space, then p(T ) = O.
The deepest (and the hardest) result in linear algebra is perhaps the Jordan canon-
ical form theorem. It says that, relative to an appropriate basis in V , the matrix [T ]
35
representing T is a block diagonal matrix with blocks of the form
J =
λ 1 0 · · · 0 00 λ 1 · · · 0 00 0 λ...
0 0 0 · · · λ 10 0 0 · · · 0 λ
along the diagonal, where λ is an eigenvalue of T . A matrix of the form J above is called
a Jordan block. It turns out that the dimension of the eigenspace ker(λkI − T ) is the
number of Jordan blocks with λk as its eigenvalue in [T ]. Also, mk is the maximal size
of the Jordan blocks with λk as its eigenvalue in [T ]. Thanks to the Jordan canonical
form, the structure of a linear operator on a finite dimensional complex vector space is
considered to be completely understood.
EXERCISE SET III.3.
Review Questions. What is the minimal polynomial of a linear operator T (on a finite
dimensional space)? Why is the spectrum σ(T ) equal to the set of all roots of the minimal
polynomial of T? What is the spectral decomposition of T ? Am I able to state all
important facts about spectral decomposition described in this section? How do I take
advantage of these facts to compute the powers and the exponential of T in an efficient
way (without finding eigenvectors)? What is a nilpotent operator? What is the Jordan
canonical form?
Drills
1. Find the spectral decomposition for each of the following matrices:
(a)
[
1 20 3
]
, (b)
[
1 11 1
]
, (c)
[
1 31 −1
]
(d)
[
0 1 + i1− i 0
]
.
(e)
0 1 10 1 10 0 0
(f)
0 1 00 0 01 2 1
(g)
−1 0 21 0 −20 1 2
.
2. Given the minimal polynomial pT (x) of an operator T in each of the following cases,
determine if T is diagonalizable:
36
(a) pT (x) = x2 + 2x (b) pT (x) = x2 + 2x+ 1 (c) p(x) = x2
(d) pT (x) = x2 + 1 (e) pT (x) = x2 − 3x+ 2
3. Use Taylor’s formula to express the polynomial 2x2 + x+ 1 in the form
a2(x− 1)2 + a1(x− 1) + a0.
Then use you answer to find the partial fraction decomposition
2x2 + x+ 1
(x− 1)3=
c1x− 1
+c2
(x− 1)2+
c3(x− 1)3
.
(In this question you are asked to find the constants a0, a1, a2, c1, c2, c3.)
4. True or false:
(a) The minimal polynomial of an operator divides its characteristic polynomial.
(b) If an operator T on a vector space V of dimension n has n distinct eigenvalues,
then its minimal polynomial is equal to its characteristic polynomial.
(c) If ST = TS, then the range of S is invariant for T .
(d) A projection is diagonalizable. (Recall that a projection is an operator P satis-
fying P 2 = P .)
(e) A Jordan block is diagonalizable.
5. In each of the following cases, prove that the given matrix is not diagonalizable; (you
may either use your knowledge about minimal polynomial or use the first principle.)
(a)
[
1 10 1
]
, (b)
[
1 1−1 −1
]
, (c)
[
1 ii −1
]
.
6. In each of the following cases, use the method described in the present section to find
an explicit expression for the nth power of the given matrix.
(a)
[
1 20 3
]
, (b)
[
1 11 1
]
, (c)
[
1 31 −1
]
(d)
[
1 34 2
]
, (e)
[
0 1 + i1− i 0
]
.
(f)
0 1 10 1 10 0 0
(g)
0 1 00 0 01 2 1
(h)
−1 0 21 0 −20 1 2
.
Exercises
37
1. Prove that if T is an operator such that T 2 = O but T �= O, then T is not
diagonalizable. Try to do this without using minimal polynomial.
2. Let T be a linear operator on a complex vector space V (not necessarily finite di-
mensional) and let λ be a complex number. (a) Show that the ranges of (T − λI)n
(n = 1, 2, . . .) form a decreasing sequence:
(T − λI)(V ) ⊇ (T − λI)2(V ) ⊇ (T − λI)3(V ) ⊇ · · ·
(b) Show that if
(T − λI)k(V ) = (T − λI)k + 1(V )
for some k, then (T − λI)n(V ) = (T − λI)k(V ) for all n ≥ k.
3. Let T be a linear operator on a complex vector space V (not necessarily finite di-
mensional) and let λ be a complex number. (a) Show that the kernels of (T − λI)n
(n = 1, 2, . . .) form a increasing sequence:
ker(T − λI) ⊆ ker(T − λI)2 ⊆ ker(T − λI)3 ⊆ · · ·
(b) Show that if
ker(T − λI)k = ker(T − λI)k + 1
for some k, then ker(T − λI)n = ker(T − λI)k for all n ≥ k.
4. Let T be a linear operator on a finite dimensional complex vector space V and let λ
be an eigenvalue of T . Prove that the spectral subspace corresponding to λ is equal
to the set
Mλ = {v ∈ V : (T − λI)nv = 0 for some positive integer k}.
38
Appendices for Chapter III
Appendix A*: cyclicity and companion matrices
In the present appendix we establish the Jordan canonical form theorem for a linear
operator T on an n–dimensional space V with a cyclic vector v0: the vectors
v0, Tv0, T2v0, T
3v0, T4v0, . . . (A1)
span the whole space V . Under this cyclicity assumption, the heavy technicality for study-
ing the structure of operators in the general case is avoided, while it still shares many
important general features. Furthermore, this special case is important in many applica-
tions. For example, in linear control theory, the state space model of a controllable system
with a single input/single output is essentially a linear operator with a cyclic vector.
Since V is finite dimensional, the vectors in (A1) above must be linearly dependent.
It turns out that n, the dimension of V , is the smallest number for which the vectors
v0, Tv0, . . . , Tnv0 are linearly dependent. (Exercise A1. Prove this assertion.) Conse-
quently the vectors v0, Tv0, T2v0, . . . , T
n−1v0 form a basis B of V . The vector Tnv0
can be expressed as a linear combination of of these basis vectors, say
Tnv0 = a0v0 + a1Tv0 + a2T2v0 + · · ·+ an−1T
n−1v0.
The representation matrix [T ]B of T relative to this basis now can be determined:
[T ]B =
0 0 0 · · · 0 0 a01 0 0 · · · 0 0 a10 1 0 · · · 0 0 a20 0 1 · · · 0 0 a3
0 0 0 · · · 1 0 an−2
0 0 0 · · · 0 1 an−1
. (A2)
A matrix of the above form (or its transpose) is called a companion matrix. We have
seen that an operator with a cyclic vector can be represented by a companion matrix of the
form (A2) above. It turns out that the converse of this statement is also true. (Exercise
A2: Prove the last assertion.) The polynomial
pT (x) = xn − an−1xn−1 − · · · − a2x
2 − a1x1 − a0
is the minimal polynomial as well as the characteristic polynomial of T . (Exercise A3:
Prove this.) A linear operator with a cyclic vector is essentially determined by its char-
acteristic polynomial: if both S ∈ L (U) and T ∈ L (V ) have cyclic vectors and have the
39
same characteristic polynomial, then S and T are similar, that is, there is an isomorphism
P from U onto V such that T = PSP−1. (Exercise A4: Prove this assertion.)
If we assume that T is nilpotent, then its characteristic polynomial becomes p(x) = xn
and hence its representation matrix (C2) becomes
[T ] =
0 0 0 0 · · · 0 01 0 0 0 · · · 0 00 1 0 0 · · · 0 00 0 1 0 · · · 0 0
0 0 0 0 · · · 0 00 0 0 0 · · · 1 0
.
More generally, if the spectrum an operator of T consists of a single point λ and T has a
cyclic vector v0, then S = T −λI is nilpotent and v0 is also a cyclic vector for S. Relative
to the basis B = {v0, Sv0, S2v0, . . . , S
n−1v0}, the matrix representing T is
[T ] =
λ 0 0 0 · · · 0 01 λ 0 0 · · · 0 00 1 λ 0 · · · 0 00 0 1 λ · · · 0 0
0 0 0 0 · · · λ 00 0 0 0 · · · 1 λ
. (A3)
(Exercise C4: prove this.) A matrix of the above form is called a Jordan block.
Suppose that λ1, λ2, . . . , λr are distinct eigenvalues of T and let Pk and Vk = Pk(V )
be the spectral projection and the spectral subspace respectively corresponding to λk. Let
Tk be the restriction of T to Vk. Again, we assume that v0 is a cyclic vector for T . Then
one can prove that the vector Pkv0 in Vk is a cyclic vector for Tk. (Exercise A5: prove
this.) Notice that the spectrum σ(Tk) consists of a single point λk and hence we can choose
a basis Bk of Vk relative to which the representation matrix [Tk]Bkis a Jordan block, say
Jk. Let B = B1 ∪ B2 ∪ · · · ∪ Br. Then B is a basis of V and
[T ]B =r
⊕
k = 1
[T ]Bk=
r⊕
k = 1
Jk ≡
J1 O O · · · OO J2 O · · · OO O J3 · · · O
O O O · · · Jr
.
We have proved the Jordan canonical theorem for the cyclic case.
40
Appendix B*: operator-valued functions and linear ODEs
Let us consider differentiation of operator–valued or matrix–valued functions of one
variable. Let V be a finite dimensional vector space and let Φ(t), Ψ(t) be linear operators
on V depending on one variable t. Assume that both of them are differentiable as a function
of t, that is, the limits
Φ′(t) = limh → 0
1
h(Φ(t+ h)− Φ(t)) and Ψ′(t) = lim
h → 0
1
h(Ψ(t+ h)−Ψ(t))
exist. Then, just like the usual product rule, we have
(ΦΨ)′ = ΦΨ′ +Φ′Ψ (∗)
except that we must be very careful about the order presenting the right hand side because
in general ΦΨ and ΨΦ are not the same. The validity (∗) can be shown in the usual way:
for h �= 0, we have
Φ(t+ h)Ψ(t+ h)− Φ(t)Ψ(t)
h= Φ(t+ h)
Ψ(t+ h)−Ψ(t)
h+
Φ(t+ h)−Φ(t)
hΨ(t)
and, letting h→ 0, we get the desired identity.
Suppose furthermore that Φ(t) is invertible. We expect ddtΦ(t)
−1 = −Φ(t)−2Φ′(t)
but unfortunately this is incorrect! To get the correct one, we begin by writing down
Φ(t)Ψ(t) = I, where Ψ(t) = Φ(t)−1. Differentiating both sides and using the product
rule, we have ΦΨ′ +Φ′ Ψ = O. Multiplying by Φ−1, we have Ψ′ +Φ−1Φ′Ψ = O, which
gives Ψ′ = −Φ−1Φ′Ψ = −Φ−1Φ′Φ−1. So we have
d
dtΦ(t)−1 = −Φ(t)−1Φ′(t)Φ(t)−1.
As you can tell, differentiation of a operator–valued or matrix–valued function some-
times is a very touchy business. Let me mention another difficult situation. When
AB = BA, we do have (d/dt)eAt+ B = AeAt+ B , as we expect. But when AB �= BA,
this is no longer true; however there is something called Campbell–Hausdorff–Baker for-
mula to handle the derivative of eAt+ B, which is too complicated to describe here.
Now we use operator–valued function to describe a (time–dependent) linear dynamical
system. We use a vector space V to model such a system. We call it the state space
and each vector in V a state. Assume that the dynamics of the system is governed by a
differential equation of the form
dy
dt= A(t)y + f(t), (B1)
41
where f(t) is an ‘input” or an external force. The corresponding homogeneous equation is
dy
dt= A(t)y. (B2)
Let s be a real number that stands for “starting time” and let y0 be a vector that stands
for the “initial state”. Then, under some very some general condition, there is a unique
solution to (B2) satisfying the initial condition
y(s) = y0. (B3)
For studying this initial value problem, it is convenient to introduce the operator equation
d
dtΦ(t, s) = A(t)Φ(s, t) with Φ(s, s) = I. (B4)
A solution to (B4) gives rise to a solution to (B3) by putting y(t) = Φ(t, s)y0. The
operator-valued function Φ(t, s), called a flow, satisfies the following identities which sig-
nify the evolution or the dynamics of the system:
Φ(t, s)Φ(s, r) = Φ(t, r), Φ(s, s) = I (B5)
Here we mention that, if the system is time independent, or, in another word, stationary,
meaning that the “generator” A(t) in (B1) or (B2) is independent of t, then Φ(t, s) depends
only on the difference t− s and we can write Φ(t, s) = Ψ(s− t) for some operator–valued
function Ψ(t). In that case (B5) becomes
Ψ(s)Ψ(t) = Ψ(s+ t), Ψ(0) = I.
Equation (B4) can be converted into an integral equation:
Φ(t, s) = I +
∫ t
s
A(u)Φ(u, s) du.
One way to solve this is to define a sequence {Φn(t, s)}n≥0 recursively by putting Φ0(t, s) =
I and Φn+ 1(t, s) = I +∫ t
s A(u)Φn(u, s) du. It can be proved that this sequence converges
to some function Φ(t, s) which is a solution to (B4). In the stationary case. it can be
easily checked that
Ψn(t) = Φn(t, 0) =∑n
k = 0
Aktk
k!
which converges to eAt as n → ∞. Clearly, the sequence yn(t) = Φn(t, s)y0 (n ≥ 0)
of vector–valued function converges to y(t) = Φ(t, s)y0, which is the solution to (B2)
satisfying the initial condition (B3).
42
There is a trick to solve the initial value problem for nonhomogeneous equation (B1),
called “variation of constant” and it works as follows. In the solution Φ(t, s)y0 to the
homogeneous equation, replace the “constant” y0 by a function of t, say z(t). Thus we
are looking for the solution of the form Φ(t, s)z(t) to (B1) satisfying the initial condition
(B2). Substitute y(t) = Φ(t, s)z(t) into (B1):
dΦ(t, s)
dtz(t) + Φ(t, s)
dz(t)
dt= A(t)Φ(t, s)z(t) + f(t).
In view of (B4), we have
dz(t)
dt= Φ(t, s)−1f(t) = Φ(s, t)f(t). (B6)
Notice that Φ(s, s)z(s) = y(s) = y0 and hence z(s) = y0. Thus (B6) gives
z(t) = y0 +
∫ t
s
Φ(s, u)f(u) du.
So
y(t) = Φ(t, s)z(t) = Φ(t, s)y0 +
∫ t
s
Φ(t, s)Φ(s, u)f(u) du
= Φ(t, s)y0 +
∫ t
s
Φ(t, u)f(u) du.
Thus the solution to (B1) satisfying the initial; condition (B2) is given by
y(t) = Φ(t, s)y0 +
∫ t
s
Φ(t, u)f(u) du.
Now you can see the advantage of using linear operators here for solving linear ODEs: in
complete generality presented in the simplest way!
43