math 2201 lecture notes - uclucahmki/2201notes.pdf · † euclidean and hermitian spaces....

Math 2201 Lecture Notes

Minhyong Kim, based on Richard Hill’s notes, based on John Talbot’s notes

February 4, 2009

Contents

1 Number Theory 21.1 Euclid’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Factorization into primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 RSA cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Polynomial Rings 132.1 Irreducible elements in Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Euclid’s algorithm in k[X] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Jordan Canonical Form 203.1 Revision of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Matrix representation of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4 Generalized Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Jordan Bases in the one eigenvalue case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.6 Jordan Canonical Form in the one eigenvalue case . . . . . . . . . . . . . . . . . . . . . . 323.7 Calculating the Jordan canonical form in the one eigenvalue case . . . . . . . . . . . . . . 353.8 The Jordan canonical form with more than one eigenvalue . . . . . . . . . . . . . . . . . . 363.9 Existence of a Jordan basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Bilinear and Quadratic Forms 384.1 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Symmetric bilinear forms and quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . 414.3 Orthogonality and diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4 Examples of Diagonalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.5 Canonical forms over C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.6 Canonical forms over R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Inner Product Spaces 505.1 Geometry of Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Gram–Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.4 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.5 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

1

Lecture 1

• My office hour is Friday, 3-4 pm, in the 5th floor common room of the mathematics building.If you need to see me and can’t make that time then please email me to make an appointment([email protected]). You can find instructions on my webpage

http://www.ucl.ac.uk/ ucahmki

for reaching my office in the Kathleen Lonsdale building.

• Be sure to make use also of the course blog:

http://minhyongkim.wordpress.com/

• The assessment of the course is 90% exam and 10% coursework.

• Coursework is collected and handed back in the problem class (Monday 9-10)

Sketch of the course:

• Number theory (prime numbers, factorization, congruences);

• Polynomials (factorization);

• Jordan canonical form (generalization of diagonalizing);

• Quadratic and bilinear forms;

• Euclidean and Hermitian spaces.

Prerequisites (you should know all this from first year algebra courses)

• Fields and vector spaces (bases, linear independence, span, subspaces).

• Linear maps (rank, nullity, kernel and image, matrix representation).

• Matrix algebra (row reduction, determinants, eigenvalues and eigenvectors, diagonalization).

Books: there are hundreds of books on linear algebra, and almost any one of them will do. As a generalrule, the bigger the book, the more examples will be in it. If you can understand something after justa small number of examples then read a small book. If you need hundreds of examples then read a bigbook.

1 Number Theory

Number theory is the theory of Z = {0,±1,±2, . . .}. Recall also the notation N = {0, 1, 2, 3, 4, . . .}.

1.1 Euclid’s algorithm

• We say that a ∈ Z divides b ∈ Z iff there exists c ∈ Z such that b = ac. We write a|b.• A common divisor of a and b is d that divides both a and b.

• A highest common factor of a and b is a common factor d of a and b such that any other commonfactor is smaller than d. This is written d = hcf(a, b). It is also common to refer to the highestcommon factor as the greatest common divisor, with the notation gcd(a, b).

2

Note that every a ∈ Z is a factor of 0, since 0 = 0 × a. Therefore every number is a common factorof 0 and 0, so there is no such thing as hcf(0, 0). However if a, b ∈ Z are not both zero, then they have ahighest common factor. In particular if a > 0 then hcf(a, 0) = a. If you examine the definition carefully,you’ll notice that if a, b are not both zero so that an hcf exists, then hcf(a, b) > 0.

Euclid’s algorithm is a method for calculating highest common factors.Recall that if we have two integers a, b with b 6= 0 then we can divide a by b with remainder, i.e.

a = qb + r, q, r ∈ Z, 0 ≤ r < b.

The reason for doing this is that

1.1.1 Euclid’s algorithm If a, b, q, r ∈ Z and a = qb + r then

hcf(a, b) = hcf(b, r).

Proof. easily check that the common factors of a and b are the same as the common factors of b andr. 2

Using the theorem repeatedly to calculate highest common factors of any two integers. The methodis this: Set r1 = b and r2 = a. If ri−1 6= 0 then define for i ≥ 3:

ri−2 = qiri−1 + ri, 0 ≤ ri < ri−1.

We continue this until ri = 0. Then by the theorem we have

hcf(a, b) = hcf(r2, r3) = . . . = hcf(ri−1, 0) = ri−1.

1.1.2 Example Take a = 27 and b = 7. We have

27 = 3× 7 + 67 = 1× 6 + 16 = 6× 1 + 0.

Thereforehcf(27, 7) = hcf(7, 6) = hcf(6, 1) = hcf(1, 0) = 1.

1.1.3 Example Take a = 666 and b = 153.

666 = 4× 153 + 54153 = 2× 54 + 4554 = 1× 45 + 945 = 5× 9 + 0.

Thereforehcf(666, 153) = hcf(153, 54) = hcf(54, 45) = hcf(45, 9) = hcf(9, 0) = 9.

3

1.1.4 Bezout’s Lemma Let d = hcf(a, b). Then there are integers h, k ∈ Z such that

d = ha + kb.

Proof. Consider the sequence given by Euclid’s algorithm:

a = r1, b = r2, r3, r4, . . . , rn = d.

In fact we’ll show that each of the integers ri may be expressed in the form ha + kb. We prove this byinduction on i: it is certainly the case for i = 1, 2 since r1 = 1× a + 0× b and r2 = 0× a + 1× b. For theinductive step, assume it is the case for ri−1 and ri−2, i.e.

ri−1 = ha + bk, ri−2 = h′a + k′b.

We haveri−2 = qri−1 + ri.

Thereforeri = h′a + k′b− q(ha + kb) = (h′ − qh)a + (k′ − qk)b.

2

1.1.5 Example Again we take a = 27 and b = 7.

27 = 3× 7 + 67 = 1× 6 + 16 = 6× 1 + 0.

Therefore

1 = 7− 1× 6= 7− 1× (27− 3× 7)= 4× 7− 1× 27.

So we take h = −1 and k = 4.

1.1.6 Example For a = 666 and b = 153, then

9 = 54− 45= 54− (153− 2× 54) = 3× 54− 153= 3× (666− 4× 153)− 153= 3× 666− 13× 153

So h = 3, k = −13.

4

1.1.7 Application Does the equation

666x + 153y = 43

have solutions x, y ∈ Z? Obviously not, since 9 divides the left hand side for any integers x, y, while 9does not divide the right hand side. What about the equation

666x + 153y = 72?

This does have a solution. First, we know that 9|72, so that the above objection does not apply. But it’sbetter than that. We know there are h, k ∈ Z such that

666h + 153k = 9.

So666(8h) + 153(8k) = 72.

We took above h = 3 and k = −13. So x = 24, y = −104 will be integral solutions for the equation above.

The more general assertion is this:

1.1.8 The equationax + by = c

has a solution x, y ∈ Z if and only if hcf(a, b)|c.Although the proof is straightforward, it might be instructive for you to write it down carefully as an

exercise.Here is a nice corollary: If a and b are coprime, then

ax + by = c

has integral solutions x and y for any c.

5

Lecture 2

1.2 Factorization into primes

All numbers occurring in this lecture are integers.

1.2.9 Definition An integer p ∈ Z is prime iff p 6= ±1 and the only divisors of p are ±1 and ±p.

Including ±1 in the list of primes is possible, but turns out to be less convenient a convention thanexcluding them.

1.2.10 Definition Two integers a, b are coprime or relatively prime, if hcf(a, b) = 1.

For example, 10 and 33 are coprime, while 63 and 99 are not coprime.

1.2.11 Proposition If p is prime and p 6 |a, then p and a are coprime.

Proof. Let d = hcf(a, p). Since d|p, we have d = 1 or d = ±p. In the latter case, we would have p|a,contradicting the assumption. So we must have d = 1. 2

1.2.12 Euclid’s Theorem If p is a prime and p|ab then p|a or p|b.

Proof. Suppose that p 6 |a then hcf(a, p) = 1 (otherwise p has a non-trivial proper divisor contradictingthe fact that it is prime). So by Bezout’s Theorem there exist h, k ∈ Z such that

ha + kp = 1.

Hencehab + kpb = b.

Now p|ab so ab = cp. Hence b = p(hc + kb) so p|b. 2

1.2.13 Corollary If p|a1 · · · an then there exists 1 ≤ i ≤ n such that p|ai.

Proof. True for i = 1, 2 so suppose true for n−1 and suppose that p|a1 · · · an. Let A = a1 · · · an−1 andB = an then p|AB =⇒ p|A or p|B. In the latter case we are done and in the former case the inductivehypothesis implies that p|ai for some 1 ≤ i ≤ n− 1. 2

Question: Compute some powers of 2:

22 = 4, 23 = 8, 24 = 16, . . . , 29 = 512, 210 = 1024, . . .

Notice that 210 is very close to 1000. Similarly, 220 will be quite close to 1000,000. Now, can it happenthat for some n, we have

2n = 1000 . . . 000

exactly? Why or why not?

6

1.2.14 Unique Factorisation Theorem If a ≥ 2 is an integer then there are primes pi > 0 suchthat

a = p1p2 · · · pn.

Moreover ifa = q1q2 · · · qm

for primes qj > 0 then n = m and there is a permutation σ ∈ Σn such that pi = qσ(i) for i = 1, . . . , n.

Proof. Existence: Proof by induction on a ≥ 2. The result is obvious for a = 2. Now assume it for allb ≥ 2 such that b < a. We will prove it for a based on this assumption. First, if a is itself prime, then‘a = a’ is already written as a product of primes. If not, then a has a proper factor, which we can clearlytake to be positive. (b|a iff −b|a.) Thus, we can write a = bc with 2 ≤ b < a and 2 ≤ c < a. But then, bythe inductive hypothesis, both b and c can be written as a product of primes. So a = bc can be writtenas a product of primes. This finishes the induction.

Uniqueness: Again, we prove this by induction on a. For a = 2, if 2 = p1p2 · · · pn, then each pi|2, andhence, each pi = 2. Thus, we must have n = 1 and p1 = 2. Now assume that the uniqueness of primefactorization is true for all b such that and 2 ≤ b < a. Suppose

a = p1p2 · · · pn

anda = q1q2 · · · qm

where all the pi and qj are primes. Since, qm|(p1p2 · · · pn), we must have qm|pi for some i. Since theseare all positive primes, qm = pi for some i. After re-ordering the pi if necessary, we can assume qm = pn.Let b = a/pn. Then

b = p1p2 · · · pn−1

andb = q1q2 · · · qm−1.

But by the inductive hypothesis, we have uniqueness of factorization for b. Therefore, n − 1 = m − 1(and hence n = m) and there is a permutation τ ∈ Σn−1 such that pi = qτ(i) for i = 1, . . . , n − 1. So ifwe let σ ∈ Σn be the permutation that acts as τ on {1, . . . , n − 1} and σ(n) = n, then we have n = mand pi = qσ(i) for i = 1, . . . , n. 2

1.2.15 Euclid’s Theorem There exist infinitely many primes.

Proof. Let p1, . . . pn be any non-empty finite list of positive primes. Consider Q = p1p2 · · · pn + 1.Since Q is an integer ≥ 2, it has a prime factorisation. In particular, there is a positive prime P thatdivides Q. But for any of the pi in the list, Q leaves a remainder of 1 when we divide by that pi. Thus,the divisor P can’t be any of the pi. Therefore, for any finite list of positive primes, there exists a positiveprime not on that list. Now, if we are given any finite list of primes, we can change some signs and get afinite list of positive primes, whereupon we can find a positive prime not on the changed list, and hence,not on the original list. This shows that there are infinitely many primes.

2

7

Lecture 3

1.3 Congruences

We define a ≡ b mod m iff m|(a− b). We say a is congruent to b modulo m. We will often fix a modulusm, and discuss congruences without referring to the ‘ mod m’ all of the time. The congruency class of ais the set of numbers congruent to a modulo m. This is written [a]. The congruence class [a] consists ofall numbers that differ from a by a multiple of m, so you can visualize the congruence class [a] as beinglined up as:

. . . , a− 3m, a− 2m, a−m, a, a + m, a + 2m, a + 3m, . . .

Note that is a ≡ b mod m, then a determine the same congruence classes mod m, i.e., [a] = [b]. Everyinteger is congruent mod m to one of the numbers 0, 1, . . . ,m− 1, so the set of all congruency classes is{[0], . . . , [m− 1]}. This is written Z/m.

1.3.16 Proposition if a ≡ a′ mod m and b ≡ b′ mod m then a + b ≡ a′ + b′ mod m and ab ≡ a′b′

mod m.

Proof. easy 2

The proposition says that the operations

[a] + [b] = [a + b], [a][b] = [ab]

are well defined on Z/m, turning it into ring. This notion will be reviewed more systematically later.Note that [a] + [0] = [a] and [a][1] = [a], so that [0] and [1] are additive and multiplicative identities.

Which congruence classes in Z/m have multiplicative inverses? We are asking for which [a] one can find[b] such that [a][b] = [1]. In terms of usual integers themselves (rather than congruence classes) we areasking about the equation

ab ≡ 1 mod m.

If such a b exists, we also say a has an inverse modulo m and also write

b ≡ a−1 mod m.

1.3.17 Lemma An integer a has an inverse modulo m if and only if a and m are coprime.

Proof. Assume hcf(a,m) = 1. Let ha + km = 1 (using Bezout’s Theorem). Then km = 1 − ha som|(1− ha) hence ha ≡ 1 mod m.

Conversely if ha ≡ 1 mod m then 1 = ha + km for some integer k. Hence any common factor of aand m must be a factor of 1, so hcf(a,m) = 1. 2

We denote by (Z/m)× the set of invertible congruence classes. For example,

(Z/10)× = {[1], [3], [7], [9]}.

1.3.18 Example of finding inverses. m = 100. What is [17]−1? Following the proof, we must solve

17h + 100k = 1

and then put [h] = [17]−1 in Z/100. But

100 = 5× 17 + 158

17 = 15 + 2

15 = 7× 2 + 1

1 = 15− 7× 2 = 15− 7× (17− 15) = 8× 15− 7× 17

= 8× (100− 5× 17)− 7× 17 = 8× 100− 47× 17

So(−47)× 17 ≡ 1 mod 100

But also,−47 ≡ −47 + 100 ≡ 53 mod 100.

Therefore,[17]−1 = [53].

1.3.19 Corollary Z/p = Fp is a field. (Recall that Fp = {0, 1, . . . , p − 1} with addition and multipli-cation defined modulo p.)

Proof. This was proved last year. The only axiom, which is not trivial to check, is the one whichstates that every every non-zero element has an inverse. 2

1.3.20 Corollary F×p = {1, 2, . . . , p− 1} is a group with the operation of multiplication.

Proof. A group is a set with a binary operation (in this case multiplication), such that (i) the operationis associative; (ii) there is an identity element; (iii) every element has an inverse. Clearly [1] is the identityelement, and the Lemma says that every element has an inverse. 2

1.3.21 Fermat’s Little Theorem If p is prime and a ∈ Z then

ap ≡ a mod p.

Hence if p 6 |a then ap−1 ≡ 1 mod p.

Proof. If p|a then a ≡ 0 mod p and ap ≡ 0 mod p so suppose p 6 |a, and so a ∈ F×p . Recall that by acorollary to Lagrange’s Theorem, the order of an element of a group divides the order of the group. Letn be the order of a, so an ≡ 1. But by the corollary to Lagrange’s theorem, n|p− 1. 2

Example What is 3322 mod 23? 23 is prime so 3322 ≡ 1 mod 23.How about 3101 mod 103? Well 103 is prime so 3102 ≡ 1 mod 103 So 3101 ≡ 3−1 mod 103. To find

3−1 mod 103 use Euclid’s algorithm.

103 = 3× 34 + 1. ⇒ (−34)3 + 103 = 1

So 3−1 ≡ −34 ≡ 69 mod 103. Hence 3101 ≡ 69 mod 103.

9

1.3.22 Chinese Remainder Theorem Suppose m and n are coprime; let [x] ∈ Z/m and [y] ∈ Z/n.Then there is a unique [z] ∈ Z/nm such that z ≡ x mod m and z ≡ y mod n.

Proof. (existence) By Bezout’s Lemma, we can find h, k ∈ Z such that

hn + km = 1.

Given x, y we choose z byz = hnx + kmy.

Clearly z ≡ hnx ≡ x mod m and z ≡ y mod n.(uniqueness) For uniqueness, suppose z′ is another solution. Then z ≡ z′ mod n and z ≡ z′ mod m.

Hence there exist integers r, s such that

z − z′ = nr = ms.

Since hn + km = 1 we have

z − z′ = (z − z′)hn + (z − z′)km = mshn + nrkm = nm(sh + rk).

Hence z ≡ z′(nm). 2

ExampleMore powers: What is 2200 mod 143? 143 is not prime, but we have 143 = 13×11. 210 ≡ 1 mod 11.

So 2200 ≡ 1 mod 11. Also, 212 ≡ 1 mod 13, and hence,

2200 = 219228 ≡ 28 mod 13.

Furthermore,28 = (24)2 ≡ 32 = 9 mod 13.

So we need to find z ∈ Z/143 such that z ≡ 1 mod 11 and z ≡ 9 mod 13. Going through the Euclideanalgorithm, we find 1 = 6×11−5×13. So, as in the proof of the Chinese remainder theorem, we can take

z = 9× 6× 11− 1× 5× 13 = 594− 65 = 529 ≡ 100 mod 143.

That is,2200 ≡ 100 mod 143.

10

Lecture 4

1.3.23 Example Find the unique solution of x ≡ 3 mod 7 and x ≡ 9 mod 11 satisfying 0 ≤ x ≤ 76.Solution find h, k such that 7h + 11k = 1 using Euclid:

11=7+47=4+34=3+1So 1=4-3=4-(7-4)=2.4-7=2.(11-7)-7=2.11-3.7.Hence let h = −3 and k = 2 so take x = −3.7.9 + 2.11.3 = −189 + 66 = −123 ≡ 31 mod 77.

1.4 RSA cryptography

Problem: Alice wants to send Bob a secret message but they have never met up and so they do not havea secret key.Solution: use public key cryptosystem.

Bob chooses two large primes p 6= q and chooses e coprime to (p − 1)(q − 1). He then forms n = pqand publishes (n, e) for all to see.

Alice encrypts a message 0 ≤ M ≤ n− 1 by forming the cryptogram

C ≡ Me mod n.

To decrypt Bob finds d such that de = 1 mod (p− 1)(q − 1) (this exists by Lemma 1.7 since hcf(e, (p−1)(q − 1)) = 1). He then forms

M ≡ Cd mod n.

1.4.24 Example Suppose Bob chooses the primes p = 7 and q = 11. So n = 77, (p−1)(q−1) = 60 andhe takes e = 7, since 7 and 60 are coprime. Bob then calculates his private key using Euclid’s Algorithmto give:

60 = 8× 7 + 4

7 = 4 + 3

4 = 3 + 1

So1 = 4− 3 = 4− (7− 4) = 2× 4− 7 = 2× (60− 8× 7)− 7 = 2× 60− 17× 7.

So the inverse of 7 mod 60 is −17 ≡ 43 mod 60. Hence Bob’s public key is the pair (77, 7) and hisprivate key is 43.

If Alice wants to send the message M = 4 she encrypts it as

C = Me = 47 = 16384 = 212× 77 + 60 ≡ 60 mod 77.

Bob then decrypts using his private key and recovers the message

M = Cd = 6043 = 4 mod 77.

11

1.4.25 Theorem RSA works. i.e. Cd ≡ M mod n.

Proof. We will show that Cd ≡ M mod p and Cd ≡ M mod q and then deduce (by the ChineseRemainder Theorem) that Cd ≡ M mod n and so Bob really does decrypt correctly.

Now ed = 1 mod (p− 1)(q − 1) so there exists t ∈ Z such that ed = 1 + t(p− 1)(q − 1). Now eitherM ≡ 0 mod p and so Cd = Med = 0 mod p or, using Fermat’s Little Theorem, we have

Cd = Med = M1+t(p−1)(q−1) = M · (Mp−1)t(q−1) = M mod p.

So Cd ≡ M mod p. Similarly Cd ≡ M mod q. Hence by the CRT we have Cd ≡ M mod n. 2

Summary: This material is sometimes misleading because we emphasized a good deal the utility ofthe Chinese remainder theorem for taking powers mod n, when you are computing by hand. But in fact,for a computer, taking powers mod any number is very efficient even if you don’t know that n is a product.There are many good algorithms for doing this, all having to do with the fact that the exponentiationcan be done successively. We can already see this in direct hand computation. For example, to compute

2100 mod 200

we note that2100 = (210)10 ≡ 2410 mod 200

= (242)5 ≡ (176)5 ≡ (−24)5 = (24)2(24)2(−24) ≡ (−24)(−24)(−24) ≡ (−24)(−24) ≡ −24

≡ 176 mod 200.

That is, the process of repeatedly computing remainders after taking powers a little bit at a time allowsefficient computation. However, inverting the process is hard. For example, if I told you

x100 ≡ 155 mod 200

would you be able to find x? There is no quick trick for doing this directly! The fact is, when the modulusm is, say, several million digits long, it is impossible even for a computer to solve the equation

x100 = 1209340980988 mod m.

However, if you know that m = pq for two primes p, q, then there is an easy way. You don’t need toinvert the procedure at all. You just need to take further powers, which, you see, is efficient.

12

Lecture 5

2 Polynomial Rings

2.1 Irreducible elements in Rings

2.1.26 Definition A ring is (R, +, ·), R is a set and +, · are binary operations. (R, +) is an Abeliangroup and (R, ·) is a monoid and multiplication is distributive over addition. In detail:

• ∀a, b, c ∈ R (a + b) + c = a + (b + c),

• ∃0 ∈ R ∀a ∈ R a + 0 = a = 0 + a,

• ∀a ∈ R∃ − a ∈ R a + (−a) = 0 = (−a) + a,

• ∀a, b ∈ R a + b = b + a,

• ∀a, b, c ∈ R (ab)c = a(bc),

• ∃1 ∈ R ∀a ∈ R 1 · a = a = a · 1,

• ∀a, b, c ∈ R a(b + c) = ab + ac,

• ∀a, b, c ∈ R (b + c)a = ba + ca.

2.1.27 Example There are lots of examples of rings:

• Z is a ring;

• Z/n is a ring;

• Q and R and C are rings;

• More generally every field is a ring. Conversely if R is a ring in which 0 6= 1; xy is always the sameas yx, and every non-zero element has a multiplicative inverse then R is a field.

• The set Mn(R) of real n× n matrices is a ring;

• More generally, given any ring R, the set Mn(R) is a ring.

• The set R[x] is all polynomials in x with coefficients in R is a ring. Note that a polynomial is anexpression of the form

a0 + a1x + . . . + anxn, a0, . . . an ∈ R.

• More generally, for any ring R, the set R[x] of polynomials with coefficients in R is a ring. Additionand multiplication are defined as one expects: if f(X) =

∑anXn and g(X) =

∑bnXn then we

define(f + g)(X) =

∑(an + bn)Xn,

(fg)(X) =∑

cnXn,

where

cn =n∑

i=0

aibn−i.

13

We’ll actually study polynomial rings k[X] over a field k. If f =∑

anXn is a non-zero polynomial ink[X], then the degree of f is the largest n such that an 6= 0. We also define deg(0) = −∞. The point ofthis definition is so that we always have:

deg(f × g) = deg(f) + deg(g).

(we are using the convention that −∞+n = infty). If f =∑

anXn 6= 0 has degree d, the the coefficientad is called the leading coefficient of f . If f has leading coefficient 1 then f is called monic.

2.1.28 Example f(X) = X3 + X + 2 has degree 3, and is monic.

2.1.29 Proposition If f and g are non-zero, f |g, and deg(f) = deg(g), then g = cf for a non-zeroconstant c.

Proof. Since f |g, we have g = hf for some h ∈ k[x]. Therefore,

deg(g) = deg(h) + deg(f).

But then, since deg(g) = deg(f), we have deg(h) = 0. This implies that h is a constant, non-zero sinceg 6= 0. 2

One can prove in an entirely similar manner:

2.1.30 Proposition If f and g are non-zero, f |g and g|f , then g = cf for a non-zero constant c.

2.1.31 Proposition If f and g are non-zero monic, f |g and deg(f) = deg(g), then g = f .

2.1.32 Proposition If f and g are non-zero monic, f |g and g|f , then f = g.

2.1.33 Definition Let R be any ring. There are three kinds of element of R:

• An element a ∈ R is a unit if there exists a−1 ∈ R such that aa−1 = a−1a = 1. The set of units ofR is denoted by R×.

• An element a ∈ R is reducible if it factorizes as a = bc with neither b nor c a unit.

• If a is neither a unit nor reducible then a is called irreducible.

2.1.34 Example If R = Z then Z× = {−1, 1}. The irreducible elements are ±p with p prime.

2.1.35 Example If k is a field then k× = k \ {0}. The element 0 is reducible since 0 = 0× 0..

2.1.36 Proposition The units in k[X] are precisely the polynomials of degree 0, i.e. the non-zeroconstant polynomials.

Proof. Clearly if a is a non-zero constant polynomial then it is a unit in k[X]. Conversely, supposeab = 1. Then we have deg(a) + deg(b) = 0. Hence deg(a) = deg(b) = 0. 2

The question of which polynomials are irreducible is much harder, and depends on the field. Forexample X2−2 factorizes in R[X] as (X+

√2)(X−√2), but is irreducible in Q[X] (since

√2 is irrational).

The only general statement about irreducible polynomials is the following:14

2.1.37 Proposition If deg(f) = 1 then f is irreducible.

Proof. Suppose f = gh. Then deg(g) + deg(h) = 1. Therefore the degrees of g and h are 0 and 1, soone of them is a unit. 2

Note that the converse to the above is false as we have already seen with X2 − 2 in Q[X]. Note alsothat even in R[X], the polynomial X2 + 1 is irreducible, although it factorizes in C[X] as (X + i)(X − i).One might ask whether there are similar phenomena for C and bigger fields, but in fact we have:

2.1.38 Fundamental Theorem of Algebra Let f ∈ C[X] be a non-zero polynomial. Then ffactorizes as a product of linear factors:

f(X) = c(X − λ1) · · · (X − λd),

where c is the leading coefficient of f .

Proof. This is proved in M2101. It is an interesting fact that all known proofs of the fundamentaltheorem have a geometric flavor. None of them use ‘pure algebra’ even though the result could be seenas belonging to algebra proper. Of course, somewhere you need to use the special properties of the fieldC, and perhaps the point is that the properties in question inevitably touch on geometry. 2

The result is surprising in that the construction of C from R, in some sense, is motivated by havinga solution to just one equation, namely,

x2 = −1.

But then, suddenly, you can solve any polynomial equation at allIn the notation of this course, the theorem means the in C[X] the irreducible polynomials are exactly

the polynomials of degree 1, with no exceptions. In R[X] the description of the irreducible polynomials isa little more complicated. (How would you describe them?) In Q[X] things are much more complicatedand it can take some time to determine whether a polynomial is irreducible or not.

Constructing irreducible polynomials in F2[x] is a major industry, as it applies to the problem ofefficiently storing information in such a way that it can also be efficiently processed.

15

Lecture 6

2.2 Euclid’s algorithm in k[X]

The rings Z and k[X] are very similar. This is because in both rings we a able to divide with remainderin such a way that the remainder is smaller than the element we divided by. In Z if we divide a by b wefind:

a = qb + r, 0 ≤ r < b.

In k[X] if we divide a polynomial f by a non-zero polynomial g then we find

f = qg + r, deg(r) < deg(g).

This allows us to prove the same theorems for k[X] as we proved for Z.

2.2.39 Example In Q[X] divide X4 + 2X3 + X2 + 2X + 1 by X2 + X + 1.

2.2.40 Example In F5[X] divide x3 + x2 + x + 1 by 2x + 1. Note that in F5 we have 2−1 = 3, 3−1 = 2and 4−1 = 4.

2.2.41 Example In F7[x], divide x3 + x + 1 by 3x2 + 1.

2.2.42 Division Algorithm Given a, b ∈ k[X] with b 6= 0 there exist unique q, r ∈ k[x] such that

a = qb + r and deg(r) < deg(b).

Not actually gone through in class:

Proof. Existence:Choose q so that deg(a− qb) is minimal. If this is not less than deg(b) then suppose

(a− qb)(x) = ckxk + · · ·+ c0,

ck 6= 0. If b has degree m ≤ k sayb(x) = bmxm + · · ·+ b0,

where bm 6= 0. Letckb−1m xk−mb from (a− qb) to give

q′ = q + ckb−1m xk−m.

Thena− q′b = a− qb− ckb−1

m xk−mb = ckxk − ckxk + terms of order at most k − 1.

This contradicts the minimality of deg(q − qb). Hence we can choose q such that deg(a − qb) < deg(b)and then set r = a− qb.

Uniqueness:Suppose we have a = q1b + r1 = q2b + r2. Then

b(q1 − q2) = r2 − r1.

So if q1 6= q2 then deg(q1− q2) ≥ 0 so deg(b(q1− q2)) ≥ deg(b). But then (using 2.2 (ii) that deg(f + g) ≤max{deg(f), deg(g)})

deg(r2 − r1) ≤ max{deg(r2),deg(r1)} < deg(b) ≤ deg(b(q1 − q2)) = deg(r2 − r1),

a contradiction. So q1 = q2 and r1 = r2. 2

16

2.2.43 The Remainder Theorem If f ∈ k[X] and a ∈ k then

f(a) = 0 ⇐⇒ (X − a)|f.

Proof. If (x−a)|f then there exists g ∈ k[x] such that f(x) = (x−a)g(x). Then f(a) = (a−a)g(a) =0g(a) = 0.

Conversely if by the Division Algorithm we have q, r ∈ F[x] with deg(r) < deg(X − a) = 1 such thatf(X) = q(X)(X − a) + r(X). So r(x) ∈ k. Then

r(a) = f(a)− q(x)(a− a) = 0 + 0 = 0.

Hence (X − a)|f . 2

We can also use the division algorithm to calculate highest common factors as before:

2.2.44 Definition Let f, g ∈ k[X], not both zero. A highest common factor of f and g is a monicpolynomial h such that:

• h|f and h|g.

• if a|f and a|g then deg(a) ≤ deg(h).

2.2.45 Example hcf(2x2 − 1, 0) = x2 − 1/2.

2.2.46 Proposition Let f = qg + r. Then h is a hcf if f and g iff h is a hcf of g and r.

Proof. as before. The common factors are the same. 2

Note that hcf(f, 0) = 1cf where c is the leading coefficient of f .

17

Lecture 7

2.2.47 Bezout’s Lemma Let f, g ∈ k[X] not both zero. Then there exist a, b ∈ k[X] such that

hcf(f, g) = af + bg.

2.2.48 Example hcf(2x5 − 2, 2x3 − 2).

2.2.49 Corollary A highest common factor is unique

Proof. First, by Bezout’s lemma, if a is any common factor of f and g, then a|hcf(f, g). Let h1 and h2

be two polynomials satisfying the properties of the hcf. Then we must have h1|h2 and h2|h1. So h2 = ch1

for a non-zero constant c. But then, since both are monic, we must have c = 1. 2

2.2.50 Lemma Let p ∈ k[X] be irreducible. If p|ab then p|a or p|b.

Proof. as before. 2

2.2.51 Unique Factorisation Theorem Let f ∈ k[x]. Then there exist p1, p2, . . . , pr ∈ k[x] monicirreducibles such that

f = p1p2 · · · pr.

If q1, . . . , qs are monic and irreducible and f = q1 . . . qs then r = s and (after reordering) p1 = q2, ... ,pr = qr.

Proof. (Existence): We prove the existence by induction on deg(f). If f is linear then it is irreducibleand the result holds. So suppose the result holds for polynomials of smaller degree. Either f is irreducibleand so the result holds or f = gh for g, h non-constant polynomials of smaller degree. By our inductivehypothesis g and h can be factorized into irreducibles and hence so can f .

(Uniqueness): Factorization is obviously unique for linear polynomials (or even irreducible polynomi-als). For the inductive step, assume all polynomials of smaller degree than f have unique factorization.Let

f = g1 · · · gs = h1 · · ·ht,

with gi, hj monic irreducible.Now g1 is irreducible and g1|h1 · · ·ht. By the Lemma, there is 1 ≤ j ≤ t such g1|hj . This implies

g1 = hj since they are both monic irreducibles. After reordering, we can assume j = 1, so

g2 · · · gs = h2 · · ·ht,

is a polynomial of smaller degree than f . By the inductive hypothesis, this has unique factorization. I.e.we can reorder things so that s = t and

g2 = h2, . . . , gs = ht.

2

A simple corollary that’s good to keep in mind leaves out the monic condition on f :

18

2.2.52 Corollary Let f ∈ k[x] be non-zero. Then there exist a non-zero constant c and p1, p2, . . . , pr ∈k[x] monic irreducibles such that

f = cp1p2 · · · pr.

If q1, . . . , qs are monic and irreducible, b is a constant and f = bq1 . . . qs then r = s, c = b, and (afterreordering) p1 = q2, ... , pr = qr.

19

Lecture 8

3 Jordan Canonical Form

3.1 Revision of linear algebra

• Fields. A field is a commutative ring with 1 such that every non-zero element has an inverse.Examples: Q, R, C, Fp. If k is any field then k(X) (the field of rational functions) is a field.

• Vector spaces, subspaces, direct sums. A vector space over a field k is a set V with two operations:addition and scalar multiplication by elements of k. Elements of V are called vectors, and elementsof k are called scalars. The axioms are:

– (V,+) is an abelian group.

– (xy)v = x(yv).

– (x + y)(v + w) = xv + xw + yv + yw.

– 1v = v.

• A linear combination of {v1, . . . , vn} is a vector of the form x1v1 + . . . + xnvn.

• The span of a set of vectors is the set of linear combinations of those vectors.

• Vectors v1, . . . , vn ∈ V are called linearly dependent if some non-trivial linear combination is zero.

• Vectors v1, . . . , vn ∈ V are called linearly independent if they are not linearly dependent. It’s usefulto know that this condition is the same as the implication:

c1v1 + · · · cnvn = 0 ⇒ c1 = c2 = · · · = cn = 0

• A set {v1, . . . , vn} of vectors is a basis for V if it is linearly independent and it’s span is V . If thisis the case then every vector has a unique expression as a linear combination of {v1, . . . , vn}.

• The dimension of a vector space is the number of vectors in a basis. This does not depend on thebasis: any two bases have the same number of elements.

• A subspace of V is a subset, which is a vector space with the same addition and scalar multiplicationas V .

Suppose U is a subspace of V . Then we can define the quotient space V/U as follows: We call twovectors v, w ∈ V congruent modulo U if v − w ∈ U . We’ll write v + U for the congruency class of vmodulo U .

[picture]Then V/U is the set of congruency classes modulo U .

3.1.53 Proposition The set V/U is a vector space with the operations:

(v + U) + (w + U) = v + w + U,

x(v + U) = xv + U.

Proof. Well defined. Axioms. 2

20

3.1.54 Proposition dim(V/U) = dim V − dim U .

Proof. Choose a basis for U . Then extend it to a basis of V . Claim: the added vectors, when reducedmodulo U form a basis for the quotient space. 2

Let V,W be vector spaces. A function T : V → W is a linear map if

• T (v + w) = T (v) + T (w),

• T (xv) = xT (v).

or equivalently, T (v + xw) = T (v) + xT (w). A bijective linear map is called an isomorphism of vectorspaces.

If T is linear, we define its kernel and image:

ker(T ) = {v ∈ V : T (v) = 0},

Im(T ) = {T (v) : v ∈ V }.The rank of T is the dimension of the image of T , and the nullity of T is the dimension of the kernel ofT .

3.1.55 Isomorphism Theorem for Vector spaces Let T : V → W be a linear map. Then there isan isomorphism of vector spaces

V/ ker(T ) ∼= Im(T ), v + ker(T ) 7→ T (v)

This implies the following:

3.1.56 Rank-Nullity Theorem Let T : V → W be a linear map. Then

rank(T ) + null(T ) = dim V.

21

Lecture 9

3.2 Matrix representation of linear maps

Let V be a finite dimensional vector space over a field k. We shall write End(V ) for the set of linearmaps T : V → V . Note that:

• The identity map id(v) = v is a linear map, so id ∈ End(V ).

• The zero map 0(v) = 0 is a linear map,

• If T,U ∈ End(V ) then T + U ∈ End(V ), where T + U is the linear map defined by

(T + U)(v) = T (v) + U(v).

• If T,U ∈ End(V ) then T + U ∈ End(V ), where TU is the linear map defined by

(TU)(v) = T (U(v)).

In fact the operations T +U and TU make End(V ) into a ring. The identity map id is the multiplicativeidentity and the zero map is the additive identity.

Now let B = {b1, . . . , bn} be a basis for V . For any vector v ∈ V we shall write [v]B for the columnvector of coefficients of v with respect to the basis B, i.e.

[v]B =

x1

...xn

, v = x1b1 + . . . + xnbn.

Given a linear map T : V → V we represent T by the matrix

[T ]B = (ai,j), T (bj) =∑

i

ai,jbi.

In other words the columns of [T ]B are [T (bi)]B. This gives a bijective correspondence between linearmaps from T : V → V and the ring Mn(k) of n× n matrices. We recall the following:

3.2.57 Proposition [T (v)]B = [T ]B[v]B.

Proof. Let v =∑

j xjbj . Since T is linear we have

T (v) =∑

j

xjT (bj).

Therefore,T (v) =

∑

i,j

xjai,jbi.

So the coefficient of bi in T (v) is ∑

j

ai,jxj .

This is also the i-th entry of the column vector [T ]B[v]B. 2

In fact this correspondence is a ring homomorphism in the following sense:

3.2.58 Lemma If T, U : V → V are two linear maps then

[T + U ]B = [T ]B + [U ]B, [TU ]B = [T ]B[U ]B, [idV ] = In, [0]B = 0.

22

Lecture 10

Proof. Most of these are trivial; the only one which is not is the multiplication relation. We have bythe previous proposition

[(TU)(v)] = [TU ][v].

On the other hand[(TU)(v)] = [T (U(v))] = [T ][U(v)] = [T ][U ][v].

2

3.2.59 Definition Let f(X) =∑

anXn ∈ k[X]. We define

f(T ) =∑

anTn.

where we define T 0 = id. If A ∈ Mn(k) is a matrix then we define

f(A) =∑

anAn,

where A0 = In.

3.2.60 Proposition With this notation we have [f(T )]B = f([T ]B).

Proof. This follows from the lemma. 2

Recall that if C = {c1, . . . , cn} is another basis, then we may write each ci as a linear combination ofB:

ci =∑

j

λj,ibj .

The matrix

M =

λ1,1 . . . λ1,n

......

λn,1 . . . λn,n

.

is called the transition matrix from B to C.

3.2.61 Proposition For any vector v ∈ V we have

[v]C = M−1[v]B.

For any T ∈ End(V ) we have[T ]C = M−1[T ]BM.

More generally for any polynomial f ∈ k[X] we have:

[f(T )]C = M−1[f(T )]BM.

23

3.2.62 Definition Recall that the characteristic polynomial of an n× n matrix A is defined by

chA(X) = det(X · In −A).

This is a monic polynomial of degree n over k. Now suppose T : V → V is a linear map. We can definechT to be ch[T ]B but we need to check that this does not depend on the basis B. If C is another basiswith transition matrix M then we have:

ch[T ]C (X) = det(X · In −M−1[T ]BM)

= det(M−1(X · In − [T ]B)M)= det(M)−1 det(X · In − [T ]B) det(M)= det(X · In − [T ]B)= ch[T ]B(X)

The following theorem was proved in M12B.

3.2.63 Cayley–Hamilton Theorem For any A ∈ Mn(k) we have chA(A) = 0.

We therefore have:

3.2.64 Cayley–Hamilton Theorem For any T ∈ End(V ) we have chT (T ) = 0.

3.2.65 Example A =(

1 23 4

).

24

Lecture 11

3.3 Minimal polynomials

3.3.66 Definition Let V be a finite dimensional vector space over a field k and T : V → V a linearmap. A minimal polynomial of T is a monic polynomial mT ∈ k[X] such that

• mT (T ) = 0;

• if f(T ) = 0 and f 6= 0 then deg f ≥ deg mT .

The monic condition in particular implies that mT has to be non-zero. Note that the Cayley-Hamiltontheorem implies that chT (T ) = 0. Thus, we see that the set of f(x) such that f(T ) = 0 is non-zero.Thus, taking a monic element of minimal degree from this set, we see that an mT as above exists.

3.3.67 Theorem Given a linear map T : V → V we have f(T ) = 0 iff mT |f .

Proof. If mT |f , then f(x) = mT (x)g(x) for some g ∈ k[x]. Therefore, f(T ) = mT (T )g(T ) = 0.Conversely, suppose f(T ) = 0. By the division algorithm, we can write

f = qmT + r

with deg r < deg m. But then

0 = f(T ) = q(T )mT (T ) + r(T ) = r(T ).

So we must have deg m ≤ deg r if r is non-zero. Therefore, we must have r = 0. So mT |f . 2

3.3.68 Corollary mT as above is unique.

Proof. If we had m1T and m2

T both satisfying the conditions, they would have to divide each other.Thus, one would have to be a non-zero constant multiple of the other. But then, since both are monic,they would have to be equal. 2

3.3.69 Corollary If T : V → V is a linear map then mT |chT .

Proof. By the Cayley-Hamilton Theorem chT (T ) = 0. 2

Using the corollary we can calculate the minimal polynomial as follows:

• Calculate chT and factorize it into irreducibles.

• Make a list of all the factors.

• Find the monic factor m of smallest degree such that m(T ) = 0.

3.3.70 Example Suppose T is represented by the matrix

2 12

2

. The characteristic polynomial

ischT (X) = (X − 2)3.

The factors of this are:1, (X − 2), (X − 2)2, (X − 2)3.

The minimal polynomial is (X − 2)2.

In fact this method can be speeded up: there are certain factors of the characteristic polynomial whichcannot arise. To explain this we recall the definition of an eigenvalue

25

3.3.71 Definition Recall that a number λ ∈ k is called an eigenvalue of T if there is a non-zero vectorv satisfying

T (v) = λ · v.

The non-zero vector v is called an eigenvector

3.3.72 Proposition Let v be an eigenvector of T with eigenvalue λ ∈ k. Then for any polynomialf ∈ k[X],

(f(T ))(v) = f(λ) · v.

Proof. easy. 2

3.3.73 Theorem If T : V → V is linear and λ ∈ k then the following are equivalent:

(i) λ is an eigenvalue of T .

(ii) mT (λ) = 0.

(iii) chT (λ) = 0.

Proof. (i) ⇒ (ii): Assume T (v) = λv with v 6= 0. Then by the proposition,

(mT (T ))(v) = mT (λ) · v.

But mT (T ) = 0 so we have mT (λ) · v = 0. Since v 6= 0 this implies mT (λ) = 0.(ii) ⇒ (iii): This is trivial since we have already shown that mT is a factor of chT .(iii) ⇒ (i): Suppose chT (λ) = 0. Therefore det(λ · id − T ) = 0. It follows that (λ · id − T ) is not

invertible so there is a non-zero solution to (λ · id− T )(v) = 0. But then T (v) = λ · v. 2

Now suppose the characteristic polynomial of T factorizes into irreducibles as

chT (X) =r∏

i=1

(X − λi)a1 .

We have shown that the minimal polynomial has the form

mT (X) =r∏

i=1

(X − λi)b1 , 1 ≤ bi ≤ ai.

This makes it much quicker to calculate the minimal polynomial.

3.3.74 Example Suppose T is represented by the matrix diag(2, 2, 3). The characteristic polynomialis

chT (X) = (X − 2)2(X − 3).

The possibilities for the minimal polynomial are:

(X − 2)(X − 3), (X − 3)2(X − 3).

The minimal polynomial is (X − 2)(X − 3).

26

Lecture 12

3.4 Generalized Eigenspaces

3.4.75 Definition Let V be a finite dimensional vector space over a field k, and let λ ∈ k be aneigenvalue of a linear map T : V → V . We define for t ∈ N the t-th generalized eigenspace by:

Vt(λ) = ker((λ · id− T )t).

Note that V1(λ) is the usual eigenspace (i.e. the set of eigenvectors together with zero).

3.4.76 Remark We obviously have

V1(λ) ⊆ V2(λ) ⊆ . . .

and by definition,dim Vt(λ) = null

((λ · id− T )t

).

3.4.77 Example Let

A =

2 2 20 2 20 0 2

.

We have chA(X) = (X − 2)3 so 2 is the only eigenvalue. We’ll now calculate the generalized eigenspacesVt(2):

V1(2) = ker

0 2 20 0 20 0 0

.

We calculate the kernel by row-reducing the matrix:

V1(2) = ker

0 1 00 0 10 0 0

= span

100

.

Similarly

V2(2) = ker

0 0 10 0 00 0 0

= span

100

,

010

.

V3(2) = ker

0 0 00 0 00 0 0

= span

100

,

010

,

001

.

3.4.78 Example Let

A =

1 1 −21 1 −21 1 −2

.

27

3.4.79 Primary Decomposition Theorem If V is a finite dimensional F-vector space and T : V → Vis linear, with distinct eigenvalues λ1, . . . , λr ∈ F, minimal polynomial

mT (X) =r∏

i=1

(X − λi)ei ,

thenV = Ve1(λ1)⊕ · · · ⊕ Ver

(λr).

3.4.80 Lemma If f, g ∈ F[x] satisfy hcf(f, g) = 1 and T is as above then

ker(fg(T )) = ker(f(T ))⊕ ker(g(T )).

Proof of theorem using lemma.

By definition of mT we have mT (T ) = 0 so ker(mT (T )) = V . We have a factorization of mT intopairwise coprime factors of the form (x− λi)ei so the lemma implies that

V = ker(mT (α)) = ker

(r∏

i=1

(T − λi1)ei

)

= ker(T − l1)e1 ⊕ [kerr∏

i=2

(T − λi)ei ]

= ker(T − l1)e1 ⊕ (T − l2)e2 ⊕ [kerr∏

i=3

(T − λi)ei ]

= · · · = Ve1(λ1)⊕ Ve2(λ2)⊕ · · · ⊕ Ver (λr).

2

Proof of lemma.Let f, g ∈ F[x] satisfy hcf(f, g) = 1.Firstly if v ∈ ker f(T ) + ker g(T ), say v = w1 + w2, with w1 ∈ ker f(T ) and w2 ∈ ker g(T ) then

fg(T )v = fg(T )(w1 + w2) = gf(T )w1 + fg(T )w2 = g(T )0 + f(T )0 = 0 + 0 = 0.

So ker(f(T )) + ker(g(T )) ⊆ ker(fg(T )).Now since hcf(f, g) = 1 there exist a, b ∈ F[x] such that

af + bg = 1.

Soa(T )f(T ) + b(α)g(T ) = 1 (the identity map).

Let v ∈ ker(fg(T )). Ifv1 = a(T )f(T )v, v2 = b(T )g(T )v

then v = v1 + v2 andg(T )v1 = (gaf)(T )v = a(fg(T )v) = a(T )0 = 0.

So v1 ∈ ker(g(T )). Similarly v2 ∈ ker(f(T )) since

f(T )v2 = (fbg)(T )v = b(fg(T )v) = b(T )0 = 0.

Hence ker fg(T ) = ker f(T ) + ker g(T ). Moreover, if v ∈ ker f(T ) ∩ ker g(T ) then v1 = 0 = v2 so v = 0.Hence

ker fg(T ) = ker f(T )⊕ ker g(T ).

2

28

Lecture 13

3.4.81 Definition Recall that a linear map T : V → V is diagonalizable if there is a basis B of V suchthat [T ]B is a diagonal matrix. This is equivalent to saying that the basis vectors in B are all eigenvectors.

3.4.82 Theorem Let V is a finite dimensional vector space over a field k and let T : V → V be alinear map with eigenvalues λ1, . . . , λr ∈ k. Then T is diagonalizable iff we have (in k[X]):

mT (X) = (X − λ1) . . . (X − λr).

Proof. First suppose that T is diagonalizable and let B be a basis of eigenvectors. For simplicity letf(X) = (X − λ1) . . . (X − λr). We already know that f |mT , so to prove that f = mT we just have tocheck that f(T ) = 0. To show this, it is sufficient to check that f(T )(v) = 0 for each basis vector v ∈ B.Suppose v ∈ B, so v is an eigenvector with some eigenvalue λi. Then we have

f(T )(v) = f(λ) · v = 0 · v = 0.

Therefore mT = f .Conversely if mT = f then by the primary decomposition theorem we have

V = V1(λ1)⊕ . . .⊕ V1(λr).

Let Bi be a basis for V1(λi). Then obviously the elements of Bi are eigenvectors and B = B1 ∪ . . .∪Br isa basis of V . Therefore T is diagonalizable. 2

3.4.83 Example Let k = C and let

A =(

1 23 4

).

The characteristic polynomials is X2 − 5X − 2. This is not a perfect square, so factorizes over C as(X − λ1)(X − λ2) with λ1 6= λ2. Hence mT = chT and since this is a product of distinct linear factors,A is diagonalizable over C.

3.4.84 Example Let k = Q and let

A =(

1 23 4

).

The characteristic polynomials is X2 − 5X − 2. This is irreducible over Q. Hence mT = chT and sincethis is not a product of distinct linear factors, A is not diagonalizable over Q.

3.4.85 Example Let k = C and let

A =(

1 −11 −1

).

The characteristic polynomials is X2. Since A 6= 0 the minimal polynomial is also X2. Since this is nota product of distinct linear factors, A is not diagonalizable over C.

29

3.5 Jordan Bases in the one eigenvalue case

Let T : V → V be a linear map. It is not always the case that T can be diagonalized; i.e. there is notalways a basis of V consisting of eigenvalues of T . This the case that there is no basis of eigenvalues, thebest kind of basis is a Jordan basis. We shall define a Jordan basis in several steps.

Suppose λ ∈ k is an eigenvalue of a linear map T : V → V . We have defined generalized eigenspaces:

V1(λ) ⊆ V2(λ) ⊆ . . . ⊆ Vb(λ),

where b is the power of X − λ in the minimal polynomial mT .We can choose a basis B1 for V1(λ). Then we can choose B2 so that B1 ∪ B2 is a basis for V2(λ) etc.

Eventually we end up with a basis B1 ∪ . . . ∪ Bb for Vb(λ). We’ll call such a basis a pre-Jordan basis.

3.5.86 Example

A =

2 1 −21 2 −21 1 −1

We have chA(X) = (X − 1)3 and mA(X) = (X − 1)2. There is only one eigenvalue λ = 1, and we havegeneralized eigenspaces

V1(1) = ker(1 1 −2

), V2(1) = ker(0) = R3.

So we can choose a pre-Jordan basis as follows:

B1 =

1−10

,

021

, B2 =

100

.

Now note the following:

3.5.87 Lemma If v ∈ Vt(λ) with t > 1 then

(T − λ · id)(v) ∈ Vt−1(λ).

Proof. easy. 2

Now suppose we have a pre-Jordan basis B1 ∪ . . . ∪ Bb. We call this a Jordan basis if in addition wehave the condition:

(T − λ · id)Bt ⊂ Bt−1, t = 2, 3, . . . , b.

If we have a pre-Jordan basis B1 ∪ . . . ∪ Bb, then to find a Jordan basis, we do the following:

• For each basis vector v ∈ Bb, replace one of the vectors in Bb−1 by (T − λ · id)(v). When choosingwhich vector to replace, we just need to take care that we still have a basis at the end.

• For each basis vector v ∈ Bb−1, replace one of the vectors in Bb−2 by (T −λ · id)(v). When choosingwhich vector to replace, we just need to take care that we still have a basis at the end.

• etc.

• For each basis vector v ∈ B2, replace one of the vectors in B1 by (T − λ · id)(v). When choosingwhich vector to replace, we just need to take care that we still have a basis at the end.

We’ll prove later that this method always works.

30

3.5.88 Example In the example above, we replace one of the vectors in B1 by

(A− I3)

100

=

111

.

So we can choose a Jordan basis as follows:

B1 =

1−10

,

111

, B2 =

100

.

31

Lecture 14

3.5.89 Example

A =(

1 −11 −1

).

3.5.90 Example

A =

2 2 20 2 20 0 2

.

3.6 Jordan Canonical Form in the one eigenvalue case

The Jordan canonical form of a linear map T : V → V is essentially the matrix of T with respect to aJordan basis. However, the matrix [T ]B depends on the order of the basis vectors in B, so we’ll first needto say exactly which order to take the basis vectors in a Jordan basis.

Suppose for the moment that T has only one eigenvalue λ and choose a Jordan basis:

B = B1 ∪ . . . ∪ Bb,

where b is the power of X − λ in the minimal polynomial of T . We can arrange the basis vectors in atable:

Bb ∗ ∗Bb−1 ∗ ∗ ∗...B1 ∗ ∗ ∗ . . . ∗

with (T − λ · id)(v) always directly below v in the table. We then number these basis vectors as follows:

Bb bb b2b

Bb−1 bb−1 b2b−1 b3b−1

...B1 b1 bb+1 b2b+1 . . . ba

where a = dimVb(λ). In other words we order the basis vectors in columns, starting at the bottom ofeach column. Note that

• If bi is at the bottom of a column then T (bi) = λbi. This means that the i-th column on [T ]B is

[T (bi)]B =

0...0λ0...0

,

with a λ in the i-th row.

32

• If bi is not at the bottom of a column then T (bi) = λbi + bi−1. This means that the i-th column on[T ]B is

[T (bi)]B =

0...01λ0...0

,

with a 1 in the i− 1-th row and a λ in the i-th row.

We can therefore write down the Jordan canonical form [T ]B straight from the table of the basis B.For example suppose B has the form:

B4 b4 b8

B3 b3 b7

B2 b2 b6 b10

B1 b1 b5 b9 b11 b12

Then the matrix [T ]B is

λ 1λ 1

λ 1λ

λ 1λ 1

λ 1λ

λ 1λ

λλ

.

3.6.91 Definition The Jordan block matrix Js(λ) is defined to be the s× s matrix:

Js(λ) =

λ 1λ 1

. . . . . .. . . 1

λ

.

Using this notation we can write the above matrix as

[T ]B = diag(J4(λ), J4(λ), J2(λ), λ, λ).

We see that the size of the Jordan block matrices are exactly the heights of the columns in the diagramof B. In other words we have:

33

3.6.92 Proposition Let T be a linear map with only one eigenvalue λ. Suppose that in the table forthe Jordan basis there are columns of height h1, . . . , hw. Then the matrix of T with respect to the Jordanbasis is

[T ]B = diag(Jh1(λ), . . . , Jhw(λ)).

3.6.93 Definition The matrix diag(Jh1(λ), . . . , Jhw(λ)) is called the Jordan canonical form of T .

34

Lecture 15

3.7 Calculating the Jordan canonical form in the one eigenvalue case

See ‘Tom Leinster’s notes on the Jordan normal form’ and ‘Some examples by Alastair Fletcher’ fromthe course webpage.

See also section 4 of the note by Saxl and Hyland. Some of the terminology there may be unfamiliar.I will explain some of it in class. Otherwise, email me with questions.

35

Lecture 16

3.8 The Jordan canonical form with more than one eigenvalue

See ‘Tom Leinster’s notes on the Jordan normal form’ and ‘Some examples by Alastair Fletcher’ fromthe course webpage.

36

Lecture 17

3.9 Existence of a Jordan basis

Recall that our algorithm for converting a pre-Jordan basis into a Jordan basis is as follows:

• For each basis vector v ∈ Bb, replace some vector in Bb−1 by (T − λ)(v) in such a way that we stillhave a basis.

• For each basis vector v ∈ Bb−1, replace some vector in Bb−2 by (T − λ)(v) in such a way that westill have a basis.

• etc.

• For each basis vector v ∈ B2, replace some vector in B1 by (T − λ)(v) in such a way that we stillhave a basis.

To prove that this is always possible we need the following result.

3.9.94 Lemma L et B1 ∪ . . . ∪ Bb be a pre-Jordan basis in Vb(λ). For n = 1, . . . , b− 1 the set

B1 ∪ . . . ∪ Bn−1 ∪ (T − λ)Bn+1

is a linearly independent subset of Vn(λ).

Proof. LetB1 ∪ . . . ∪ Bn−1 = {b1, . . . , bp},

Bn = {c1, . . . , cq},Bn+1 = {d1, . . . , dr}.

Suppose some linear combination is zero:∑

xibi +∑

yi(T − λ)(di) = 0.

Since T − λ is linear, we have ∑xibi + (T − λ)

∑yidi = 0.

Applying (T − λ)n−1 to both sides of this equation we have:

(T − λ)n∑

yidi = 0.

This implies that∑

yidi ∈ Vn(λ), so we may expand it as a linear combination of B1 ∪ . . . ∪ Bn:∑

yidi =∑

wibi +∑

zici.

Taking things to the same side of the equation we have:∑

yidi −∑

wibi −∑

zici = 0.

However, since B1 ∪ . . .Bn+1 is linearly independent, the coefficients yi, wi and zi are all zero. Hence ouroriginal equation becomes ∑

xibi = 0.

Then since B1 ∪ . . .Bn−1 is linearly independent is follows that the coefficients xi are all zero. 2

3.9.95 Corollary The algorithm for finding a Jordan basis works. Hence a Jordan basis exists.

37

Lecture 18

4 Bilinear and Quadratic Forms

4.1 Matrix Representation

For some motivation, read the ‘Remark on quadratic forms’ from the course webpage.

4.1.96 Definition Let V be a vector space over k. A bilinear form on V is a function f : V × V → ksuch that

• f(u + λv, w) = f(u,w) + λf(v, w);

• f(u, v + λw) = f(u, v) + λf(u,w).

I.e. f(v, w) is linear in both v and w.

4.1.97 Example For example, given a matrix A ∈ Mn(k), the following is a bilinear form on kn:

f(v, w) = vtAw. =∑

i,j

viai,jwj , v =

v1

...vn

, w =

w1

...wn

.

We’ll see that in fact these are the only examples.

4.1.98 Example If A =(

1 23 4

)then the corresponding bilinear form is

f

((x1

y1

),

(x2

y2

))=

(x1 y1

)(1 23 4

) (x2

y2

)= x1x2 + 2x1y2 + 3y1x2 + 4y1y2.

Recall that if B = {b1, . . . , bn} is a basis for V and v =∑

xibi then we write [v]B for the columnvector

[v]B =

x1

...xn

.

4.1.99 Definition If f is a bilinear form on V and B = {b1, . . . , bn} is a basis for V then we define thematrix of f with respect to B by

[f ]B =

f(b1, b1) . . . f(b1, bn)...

...f(bn, b1) . . . f(bn, bn)

38

4.1.100 Proposition Let B be a basis for a finite dimensional vector space V over k, dim(V ) = n.Any bilinear form f on V is determined by the matrix [f ]B. Moreover for v, w ∈ V ,

f(v, w) = [v]tB[f ]B[w]B.

Proof. Letv = x1b1 + x2b2 + · · ·+ xnbn,

so

[u]B =

x1

...xn

.

Similarly suppose

[w]B =

y1

...yn

.

Then

f(v, w) = f

(n∑

i=1

xibi, w

)

=n∑

i=1

xif (bi, w)

=n∑

i=1

xif

bi,

n∑

j=1

yjbj

=n∑

i=1

xi

n∑

j=1

yjf (bi, bj)

=n∑

i=1

n∑

j=1

xiyjf (bi, bj)

=n∑

i=1

n∑

j=1

ai,jxiyj

= [v]tB[f ]B[w]B.

2

Now suppose B = {b1, . . . , bn} and C = {c1, . . . , cn} are two bases for V . We may write one basis interms of the other:

ci =n∑

j=1

λj,ibj .

The matrix

M =

λ1,1 . . . λ1,n

......

λn,1 . . . λn,n

39

is called the transition matrix from B to C. It is always an invertible matrix: its inverse in the transitionmatrix from C to B. Recall that for any vector v ∈ V we have

[v]B = M [v]C ,

and for any linear map T : V → V we have

[T ]C = M−1[T ]BM.

We’ll now describe how bilinear forms behave under change of basis.

4.1.101 Change of Basis Formula Let f be a bilinear form on a finite dimensional vector space Vover k. Let B and C be two bases for V and let M be the transition matrix from B to C.

[f ]C = M t[f ]BM.

Proof. Let A = {a1, . . . , an} and B = {b1, . . . ,bn}.Let u, v ∈ V with [u]A = x, [v]A = y, [u]B = s and [v]B = t.Let P be the transition matrix from A to B and A = (ai,j) be the matrix representing f with respect toA.

By Theorem 5.1 we havef(u, v) = xtAy.

Now x = P s and y = P t so

f(u, v) = (P s)tA(P t)= (stP t)A(P t)= st(P tAP )t.

So using Theorem 4.1 we have f(bi,bj) = (P tAP )i,j . Hence

[f ]B = P tAP = P t[f ]AP.

2

40

Lecture 19

4.2 Symmetric bilinear forms and quadratic forms

As before let V be a finite dimensional vector space over a field k.

4.2.102 Definition A bilinear form f on V is called symmetric if it satisfies f(v, w) = f(w, v) for allv, w ∈ V .

4.2.103 Definition Given a symmetric bilinear form f on V , the associated quadratic form is thefunction q(v) = f(v, v).

4.2.104 Proposition Let f be a bilinear form on V and let B be a basis for V . Then f is a symmetricbilinear form if and only if [f ]B is a symmetric matrix.

Proof. This is an easy exercise. 2

4.2.105 Polarization Theorem If 1 + 1 6= 0 in k then for any quadratic form q the underlyingsymmetric bilinear form is unique.

Proof. If u, v ∈ V then

q(u + v) = f(u + v, u + v)= f(u, u) + 2f(u, v) + f(v, v)= q(u) + q(v) + 2f(u, v).

So f(u, v) = 12 (q(u + v)− q(u)− q(v)). 2

4.2.106 Example Correspondence between f and q, and symmetric matrices.

4.3 Orthogonality and diagonalization

4.3.107 Definition Let V be a vector space over k with a symmetric bilinear form f . We call twovectors v, w ∈ V orthogonal if f(v, w) = 0. It is a good idea to imagine this means that v and w areat right angles to each other. This is written v ⊥ w. If S ⊂ V then the orthogonal complement of S isdefined to be

S⊥ = {v ∈ V : ∀w ∈ S,w ⊥ v}.

4.3.108 Proposition S⊥ is a subspace of V .

Proof. Let v, w ∈ V and λ ∈ k. Then for any u ∈ S we have

f(v + λw, u) = f(v, u) + λf(w, u) = 0.

Therefore v + λw ∈ S⊥. 2

41

4.3.109 Definition A basis B is called an orthogonal basis if any two distinct basis vectors are orthog-onal. Thus B is an orthogonal basis if and only if [f ]B is diagonal.

4.3.110 Diagonalization Theorem Let f be a symmetric bilinear form on a finite dimensional vectorspace V over a field k in which 1 + 1 6= 0. Then there is an orthogonal basis B for V ; i.e. a basis suchthat [f ]B is a diagonal matrix.

42

Lecture 20

4.3.111 Recall Let U,W be two subspaces of V . The sum of U and W is the subspace

U + W = {u + w : u ∈ U,w ∈ W}.

We call this a direct sum U ⊕W if U ∩W = {0}. This is the same as saying that ever element of U + Wcan be written uniquely as u + w with u ∈ U and w ∈ W .

4.3.112 Key Lemma Let v ∈ V and assume that q(v) 6= 0. Then

V = span{v} ⊕ {v}⊥.

Proof. For w ∈ V , let

w1 =f(v, w)f(v, v)

v, w2 = w − f(v, w)f(v, v)

v.

Clearly w = w1 + w2 and w1 ∈ span{v}. Note also that

f(w2, v) = f

(w − f(v, w)

f(v, v)v, v

)= f(w, v)− f(v, w)

f(v, v)f(v, v) = 0.

Therefore w2 ∈ {v}⊥. It follows that span{v}+ {v}⊥ = V . To prove that the sum is direct, suppose thatw ∈ span{v} ∩ {v}⊥. Then w = λv for some λ ∈ k and we have f(w, v) = 0. Hence λf(v, v) = 0. Sincef(v, v) 6= 0 it follows that λ = 0 so w = 0. 2

Proof of the theorem. We use induction on dim(V ) = n. If n = 1 then the theorem is true, since any1×1 matrix is diagonal. So suppose the result holds for vector spaces of dimension less than n = dim(V ).

If f(v, v) = 0 for every v ∈ V then using Theorem 5.3 for any basis B we have [f ]B = [0], which isdiagonal. [This is true since

f(ei, ej) =12(f(ei + ej , ei + ej)− f(ei, ei)− f(ej , ej)) = 0.]

So we can suppose there exists v ∈ V such that f(v, v) 6= 0. By the Key Lemma we have

V = span{v} ⊕ {v}⊥.

Since span{v} is 1-dimensional, it follows that {v}⊥ is n − 1-dimensional. Hence by the inductive hy-pothesis there is an orthonormal basis {b1, . . . , bn−1} of {v}⊥.

Now let B = {b1, . . . , bn−1, v}. This is a basis for V . Any two of the vectors bi are orthogonal bydefinition. Furthermore bi ∈ {v}⊥, so bi ⊥ v. Hence B is an orthogonal basis. 2

4.4 Examples of Diagonalizing

4.4.113 Definition Two matrices A,B ∈ Mn(k) are congruent if there is an invertible matrix P ∈GLn(k) such that

B = P tAP.

We have shown that if B and C are two bases then for a bilinear form f , the matrices [f ]B and [f ]C arecongruent.

43

4.4.114 Theorem Let A ∈ Mn(k) be symmetric, where k is a field in which 1 + 1 6= 0, then A iscongruent to a diagonal matrix.

Proof. This is just the matrix version of the previous theorem. 2

We shall next find out how to calculate the diagonal matrix congruent to a given symmetric matrix.

4.4.115 Recall There are three kinds of row operation:

• swap rows i and j;

• multiply row(i) by λ 6= 0;

• add λ× row(i) to row(j).

To each row operation there is a corresponding elementary matrix E; the matrix E is the result of doingthe row operation to In. The row operation transforms a matrix A into EA.

We may also define three corresponding column operations:

• swap columns i and j;

• multiply column(i) by λ 6= 0;

• add λ× column(i) to column(j).

Doing a column operation to A is the same a doing the corresponding row operation to At. We thereforeobtain (EAt)t = AEt.

4.4.116 Definition By a double operation we shall mean a row operation followed by the correspondingcolumn operation.

If E is the corresponging elementary matrix then the double operation transforms a matrix A intoEAEt.

4.4.117 Lemma I f we do a double operation to A then we obtain a matrix congruent to A.

Proof. EAEt is congruent to A. 2

44

Lecture 21

Recall that a symmetric bilinear forms are represented by symmetric matrices. If we change the basisthen we will obtain a congruent matrix. We’ve seen that if we do a double operation to matrix A then weobtain a congruent matrix. This corresponds to the same quadratic form with respect to a different basis.We can always do a sequence of double operations to transform any symmetric matrix into a diagonalmatrix.

4.4.118 Example Consider the quadratic form q(x, y)t = x2 + 4xy + 3y2

(1 22 3

)→

(1 20 −1

)→

(1 00 −1

).

This shows that there is a basis B = {b1, b2} such that

q(xb1 + yb2) = x2 − y2.

Lemma 4.4.117 can be used not just to diagonalize, but also to find the matrix P such that P tAP isdiagonal. That is, if we used the double operations corresponding to elementary matrices E1, E2, . . . , EN ,then we can take P = (E1E2 · · ·En)t. To compute this efficiently, we can simultanouesly perform columnoperations starting from the identity matrix, as we perform the double operations for diagonalizing.

For example, let

A =

0 1 01 1 20 2 1

.

We can use the following sequence of double operations:Switch 1 and 2:

0 1 01 1 20 2 1

→

1 0 01 1 22 0 1

→

1 1 21 0 02 0 1

Subtract 1 from 2:

1 1 21 0 02 0 1

→

1 0 21 −1 02 −2 1

→

1 0 20 −1 −22 −2 1

Subtract twice 1 from 3:

1 0 20 −1 −22 −2 1

→

1 0 00 −1 −22 −2 −3

→

1 0 00 −1 −20 −2 −3

.

Subtract twice 2 from 3:

1 0 00 −1 −20 −2 −3

→

1 0 00 −1 00 −2 1

→

1 0 00 −1 00 0 1

=: D

To find the transition matrix, we just perform the column operations to the identity matrix.

1 0 00 1 00 0 1

→

0 1 01 0 00 0 1

→

0 1 01 −1 00 0 1

→

0 1 01 −1 −20 0 1

45

→

0 1 −21 −1 00 0 1

=: P

Then we will haveP tAP = D,

as can be readily verified.

4.4.119 Example Consider the quadratic form q(x, y)t = 4xy + y2

(0 22 1

)→

(2 10 2

)→

(1 22 0

)

→(

1 20 −4

)→

(1 00 −4

)

→(

1 00 −2

)→

(1 00 −1

)

This shows that there is a basis {b1, b2} such that

q(xb1 + yb2) = x2 − y2.

The last step in the previous example transformed the -4 into a -1. In general, once we have a diagonalmatrix we are free to multiply or divide the diagonal entries by squares:

4.4.120 Lemma For µ1, . . . , µn ∈ k× = k \ {0} and λ1, . . . , λn ∈ k

D(λ1, . . . , λn) is congruent to D(µ21λ1, . . . µ

2nλn).

Proof. Since µ1, . . . , µn ∈ k \ {0} then µ1 · · ·µn 6= 0. So

P = D(µ1, . . . , µn)

is invertible. Then

P tD(λ1, . . . , λn)P = D(µ1, . . . , µn)D(λ1, . . . , λn)D(µ1, . . . , µn)= D(µ2

1λ1, . . . , µ2nλn).

2

4.4.121 Definition Two bilinear forms f, f ′ are equivalent if they are the same up to a change of basis.

4.4.122 Definition The rank of a bilinear form f is the rank [f ]B for any basis B.

Clearly if f and f ′ have different rank then they are not equivalent.

46

Lecture 22

4.5 Canonical forms over C4.5.123 Definition Let q be a quadratic form on vector space V over C, and suppose there is a basisB of V such that

[q]B =(

Ir

0

).

We call the matrix(

Ir

0

)a canonical form of q (over C).

4.5.124 Canonical forms over C Let V be a finite dimensional vector space over C and let q be aquadratic form on V . Then q has exactly one canonical form.

Proof. (Existence) We first choose an orthogonal basis B = {b1, . . . , bn}. After reordering the basiswe may assume that q(b1), . . . , q(br) 6= 0 and q(br+1), . . . , q(bn) = 0. Since every complex number has asquare root in C, we may divide bi by

√q(bi) if i ≤ r.

(Uniqueness) row and column operations do not change the rank of a matrix. Hence congruentmatrices have the same rank. 2

4.5.125 Corollary Two quadratic forms over C are equivalent iff they have the same canonical form.

4.6 Canonical forms over R4.6.126 Definition Let q be a quadratic form on vector space V over R, and suppose there is a basisB of V such that

[q]B =

Ir

−Is

0

.

We call the matrix

Ir

−Is

0

a canonical form of q (over R).

4.6.127 Sylvester’s Law of Inertia Let V be a finite dimensional vector space over R and let q bea quadratic form on V . Then q has exactly one (real) canonical form.

Proof. (existence) Let B = {b1, . . . , bn} be an orthogonal basis. We can reorder the basis so that

q(b1), . . . , q(br) > 0, q(br+1), . . . , q(br+s) < 0, q(br+s+1), . . . , q(bn) = 0.

Then define a new basis by

ci =

{1√

|q(bi,bi)|bi i ≤ r + s,

bi i > r + s.

The matrix of q with respect to C is a canonical form.(uniqueness) Suppose we have two bases B and C with

[q]B =

Ir

−Is

0

, [q]C =

Ir′

−Is′

0

.

47

By comparing the ranks we know that r + s = r′+ s′. It’s therefore sufficient to prove that r = r′. Definetwo subspaces of V by

U = span{b1, . . . , br}, W = span{cr′+1, . . . , cn}.If u is a non-zero vector of U then we have u = x1b1 + . . . + xrbr. Hence

q(u) = x21 + . . . + x2

r > 0.

Similarly if w ∈ W then w = yr′+1cr′+1 + . . . + yncn, and

q(w) = −y2r′+1 − . . .− y2

r′+s′ ≤ 0.

It follows that U ∩W = {0}. Therefore

U + W = U ⊕W ⊂ V.

From this we havedim U + dim W ≤ dim V.

Hencer + (n− r′) ≤ n.

This implies r ≤ r′. A similar argument shows that r′ ≤ r, so we have r = r′. 2

48

Lecture 23

Examples of canonical forms over R and C.

49

Lecture 24

5 Inner Product Spaces

5.1 Geometry of Inner Product Spaces

5.1.128 Definition Let V be a vector space over R and let 〈−,−〉 be a symmetric bilinear form on V .We shall call the form positive definite if for all non-zero vectors v ∈ V we have

〈v, v〉 > 0.

5.1.129 Remark A symmetric bilinear form is positive definite if and only if its canonical form (overR) is In.

Proof. Clearly x21 + . . . + x2

n is positive definite on Rn. Conversely, suppose B is a basis such thatthe matrix with respect to B is the canonical form. For any basis vector bi, the diagonal entry satisfies〈bi, bi〉 > 0 and hence 〈bi, bi〉 = 1. 2

5.1.130 Definition Let V be a vector space over C. A Hermitian form on V is a function 〈−,−〉 :V × V → C such that:

• For all u, v, w ∈ V and all λ ∈ C,

〈u + λv, w〉 = 〈u,w〉+ λ〈v, w〉;

• For all u, v ∈ V ,〈u, v〉 = 〈v, u〉.

5.1.131 Example A matrix A ∈ Mn(C) is called a Hermitian matrix if At = M . If A is a Hermitianmatrix then the following is a Hermitian form on Cn:

〈v, w〉 = vtAw.

In fact every Hermitian form on Cn is one of these.

Note that a Hermitian form is conjugate-linear in the second variable, i.e.

〈u, v + λw〉 = 〈u, v〉+ λ〈u, w〉.

Note also that by the second axiom〈u, u〉 ∈ R.

5.1.132 Definition A Hermitian form is positive definite if for all non-zero vectors v we have

〈v, v〉 > 0.

50

5.1.133 Definition By an inner product space we shall mean one of the following:

either A finite dimensional vector space V over R with a positive definite symmetric bilinear form;

or A finite dimensional vector space V over C with a positive definite Hermitian form.

We shall often write K to mean the field the field R or C, depending on which is relevant. On Rn,

< v,w >=∑

i

viwi

defines a positive definite symmetric bilinear form, and on Cn,

< v,w >=∑

i

viwi

defines a positive definite Hermitian form. When referring to either of these, we often speak of thestandard inner product.

5.1.134 Example Suppose V is a vector space of functions X → R, where X is some set on which wecan define integration. Suppose that products of function in V are integrable. Then we can define

〈f, g〉 =∫

X

f(x)g(x)dx.

This will often be positive definite. For example if X = [0, 1] and V is consists of continuous functionsthen the form is positive definite. Similarly if our functions take values in C then we can define

〈f, g〉 =∫

X

f(x)g(x)dx.

5.1.135 Definition Let V be an inner product space. We define the norm of a vector v ∈ V by

||v|| =√〈v, v〉.

5.1.136 Lemma For λ ∈ K we have λλ = |λ|2 for for v ∈ V we have ||λv|| = |λ| ||v||.

Proof. Easy. 2

5.1.137 Cauchy-Schwarz inequality If V is an inner product space then

∀u, v ∈ V |〈u, v〉| ≤ ||u|| · ||v||.

Proof. If v = 0 then the result holds so suppose v 6= 0. We have for all λ ∈ K,

〈u− λv, u− λv〉 ≥ 0.

Expanding this out we have:

||u||2 − λ〈v, u〉 − λ〈u, v〉+ |λ|2||v||2 ≥ 0.

51

Setting λ = 〈u,v〉||v||2 we have:

||u||2 − 〈u, v〉||v||2 〈v, u〉 − 〈v, u〉

||v||2 〈u, v〉+∣∣∣∣〈u, v〉||v||2

∣∣∣∣2

||v||2 ≥ 0.

Multiplying by ||v||2 we get||u||2||v||2 − 2|〈u, v〉|2 + |〈u, v〉|2 ≥ 0.

Hence||u||2||v||2 ≥ |〈u, v〉|2 .

Taking the square root of both sides we get the result. 2

5.1.138 Triangle inequality If V is an inner product space with norm || · || then

∀u, v ∈ V ||u + v|| ≤ ||u||+ ||v||.

Proof. We have

||u + v||2 = 〈u + v, u + v〉= ||u||2 + 2<〈u, v〉+ ||v||2.

Note that for any complex number z, <z ≤ |z|, where <z refers to the real part of z. So the last quantityis less than or equal to

||u||2 + 2| 〈u, v〉 |+ ||v||2.So the Cauchy–Schwarz inequality implies that

||u + v||2 ≤ ||u||2 + 2||u|| ||v||+ ||v||2 = (||u||+ ||v||)5.

Hence||u + v|| ≤ ||u||+ ||v||.

2

5.1.139 Definition Two vectors v, w in an inner product space are called orthogonal if 〈v, w〉 = 0.

5.1.140 Pythagoras’ Theorem If v, w ∈ V , an inner product space, and v and w are orthogonal,then

||v||2 + ||w||2 = ||v + w||2.

Proof. Since||v + w||2 = 〈v + w, v + w〉 = ||v||2 + 2<〈v, w〉+ ||w||2,

we have||v||2 + ||w||2 = ||v + w||2

if 〈v, w〉 = 0. 2

52

Lecture 25

5.2 Gram–Schmidt Orthogonalization

5.2.141 Definition Let V be an inner product space. We shall call a basis B of V an orthonormalbasis if 〈bi, bj〉 = δi,j .

5.2.142 Proposition If B is an orthonormal basis then for v, w ∈ V we have:

〈v, w〉 = [v]tB[w]B.

Proof. Easy. 2

5.2.143 Gram–Schmidt Orthogonalization Let B be any basis. Then the basis C defined by

c1 = b1

c2 = b2 − 〈b2, c1〉〈c1, c1〉c1

c3 = b3 − 〈b3, c1〉〈c1, c1〉c1 − 〈b3, c2〉

〈c2, c2〉c2

...

cn = bn −n−1∑r=1

〈bn, cr〉〈cr, cr〉 cr,

is orthogonal. Furthermore the basis D defined by

dr =1

||cr||cr,

is orthonormal.

Proof. Clearly each bi is a linear combination of C, so C spans V . Hence C is a basis. It follows alsothat D is a basis. We’ll prove by induction that {c1, . . . , cr} is orthogonal. Clearly any one vector isorthogonal. Suppose {c1, . . . , cr−1} are orthogonal. The for s < r we have

〈cr, cs〉 = 〈br, cs〉 −r−1∑t=1

〈br, ct〉〈ct, ct〉〈ct, cs〉.

By the inductive hypothesis we have

〈cr, cs〉 = 〈br, cs〉 − 〈br, cs〉〈cs, cs〉〈cs, cs〉. = 〈br, cs〉 − 〈br, cs〉 = 0.

This shows that {c1, . . . , cr} are orthogonal. Hence C is an orthogonal basis. It follows easily that D isorthonormal. 2

5.2.144 Example A few examples.

53

5.2.145 Fourier Expansion If V is an inner product space with an orthonormal basis B = {b1, . . . , bn},then any v ∈ V can be written as v =

∑ni=1〈v, bi〉bi.

Proof. We have v =∑n

i=1 λibi and 〈v, bj〉 =∑n

i=1 λi〈bibj〉 = λj . 2

The connection with Fourier expansions is as follows. Let V be the vector space of continuous periodicfunctions f : R → C with period 1, i.e. f(x + 1) = f(x). This space has the following positive definiteHermitian form:

〈f, g〉 =∫ 1

0

f(x)g(x)dx.

This is not an inner product space as defined above, since it is infinite dimensional. However a lot ofthe theory still works for infinite dimensional spaces (this is studied more in the course “FunctionalAnalysis”). It turns out that in some sense the functions

en(x) = e2πix, n ∈ Z

are an orthonormal basis. We can at least check that they are orthonormal:

〈en, em〉 =∫ 1

0

exp(2πi(n−m)x)dx =

{1 if n = m,0 otherwise.

Hence any f ∈ V can be expanded in terms of the basis:

f(x) =∑

n∈Zan exp(2πinx),

and the coefficients are given by

an = 〈f, en〉 =∫ 1

0

f(x) exp(−2πinx)dx.

This is the usual Fourier expansion of a periodic function f .

54

Lecture 26

5.2.146 Definition Let S be a subspace of an inner product space V . The orthogonal complement ofS is defined to be

S⊥ = {v ∈ V : ∀w ∈ S 〈v, w〉 = 0}.

5.2.147 Theorem If V is a Euclidean space and W is a subspace of V then

V = W ⊕W⊥,

and hence any v ∈ V can be written asv = w + w⊥,

for unique w ∈ W and w⊥ ∈ W⊥.

Proof. Let B be a basis for W . One easily shows that W⊥ = B⊥. On the other hand, B⊥ is theintersection of the subspaces b⊥i for bi ∈ B. Since b⊥i has dimension n − 1 it follows that the dimensionof W⊥ is at least n− dim W . On the other hand the intersection is zero. The result follows. 2

Proof. We show first that V = W + W⊥.Let E = {e1, . . . , en} be an orthonormal basis for V , such that {e1, . . . , er} is a basis for W . This can

be constructed by Gram-Schmidt orthogonalization. If v ∈ V then

v =r∑

i=1

λiei +n∑

i=r+1

λiei.

Nowr∑

i=1

λiei ∈ W.

If w ∈ W then there exist µi ∈ R such that

w =r∑

i=1

µiei.

So ⟨w,

n∑

j=r+1

λiej

⟩=

r∑

i=1

n∑

j=r+1

µiλj 〈ei, ej〉 = 0.

Hencen∑

i=r+1

λiei ∈ W⊥.

ThereforeV = W + W⊥.

Next suppose v ∈ W ∩W⊥. So 〈v, v〉 = 0 and so v = 0.Hence V = W ⊕W⊥ and so any vector v ∈ V can be expressed uniquely as

v = w + w⊥,

where w ∈ W and w⊥ ∈ W⊥. 2

55

5.3 Adjoints

5.3.148 Definition An adjoint of a linear map T : V → V is a linear map T ∗ such that 〈T (u), v〉 =〈u, T ∗(v)〉 for all u, v ∈ V .

5.3.149 existence and uniqueness Every T : V → V has a unique adjoint. If T is represented byA (w.r.t. an orthonormal basis) then T ∗ is represented by At.

Proof. (Existence) Let T ∗ be the linear map represented by At. We’ll prove that it is an adjoint of A.

〈Tv,w〉 = [v]tAt[w] = [v]tAt[w]. = 〈v, T ∗w〉.(Uniqueness) Let T ∗, T ′ be two adjoints. Then we have

〈(T ∗ − T ′)u, v〉 = 0.

Therefore ||(T ∗ − T ′)u|| = 0, so T ∗ = T ′. 2

5.4 Isometries

5.4.150 Theorem If α : V → V be a linear map of a Euclidean space V then the following areequivalent.

(i) αα∗ = id.

(ii) ∀u, v ∈ V 〈αu, αv〉 = 〈u, v〉. (i.e. α preserves the inner product.)

(iii) ∀v ∈ V ||αv|| = ||v||. (i.e. α preserves the norm.)

5.4.151 Definition If α satisfies any of the above (and so all of them) then α is called an isometry.

Proof. (i) =⇒ (ii)Let u, v ∈ V then

〈αu, αv〉 = 〈u, α∗αv〉 = 〈u, v〉 ,since α∗ = α−1.

(ii) =⇒ (iii)If v ∈ V then

||αv||2 = 〈αv, αv〉so by (ii)

||αv||2 = 〈v, v〉 = ||v||5.Hence ||αv|| = ||v||, so (iii) holds.

(iii) =⇒ (ii) We just show that the form can be recovered from the norm. We have

2<〈u, v〉 = ||u + v||2 − ||u||2 − ||v||2, =〈v, w〉 = <〈v, iw〉.(ii) implies (i):

〈T ∗Tu, v〉 = 〈Tu, Tv〉= 〈u, v〉 .

Therefore 〈(TT ∗ − I)u, (TT ∗ − 1)u〉 = 0. Therefore TT ∗ = I. 2

For some motivation for these notions, see the note ‘Isometries in 2 and 3 dimensions’ from the coursewebpage.

56

Lecture 27

If a real n × n matrix A is viewed as a linear map from Rn to Rn and Rn has the standard innerproduct, then A∗ = At, so one of the equivalent conditions for being an isometry says

AAt = id

orAt = A−1.

A matrix satisfying this condition is also called an orthogonal matrix, and the set of them is denoted

On(R).

5.4.152 Proposition (a) If A ∈ On(R) then |A| = ±1.(b) On(R) is a subgroup of GLn(R).

Proof. If A ∈ On(R) then At = A−1 so

|A| = |At| = |A−1| = |A|−1.

So, since |A| ∈ R, we have |A| = ±1.Clearly On(R) is a subset of GLn(R) so to show that it is a subgroup it is enough to show that if

A,B ∈ On(R) then AB−1 ∈ On(R).Let A, B ∈ On(R). Then

(AB−1)−1 = BA−1 = BAt = (ABt)t = (AB−1)t.

Hence AB−1 ∈ On(R) and so On(R) is a subgroup of GLn(R). 2

5.4.153 Theorem If A ∈ GLn(R) then the following are equivalent.

(i) A ∈ On(R).

(ii) The columns of A form an orthonormal basis for Rn.

(iii) The rows of A form an orthonormal basis for Rn.

Proof. We prove (i) ⇐⇒ (ii) (the proof of (i) ⇐⇒ (iii) is identical).Consider AtA. If A = [a1, . . . , an], so the jth column of A is aj , then the (i, j)th entry of AtA is at

iaj .So AtA = In ⇐⇒ at

iaj = δi,j ⇐⇒ 〈ai, aj〉 = δi,j ⇐⇒ {a1, . . . , an} is an orthonormal basis for Rn.2

The preceding theorem explains the terminology ‘orthogonal matrix,’ although ‘orthonormal matrix’may have been more accurate. In any case, even a general isometry on an abstract real inner productspace is sometimes referred to as an orthogonal transformation by extension from the matrix case.

5.4.154 Theorem Let V be a Euclidean space with orthonormal basis E = {e1, . . . , en}. If F ={f1, . . . , fn} is a basis for V and P is the transition matrix from E to F , then

P ∈ On(R) ⇐⇒ F is an orthonormal basis for V .

57

Proof. The jth column of P is [fj ]E so

fj =n∑

k=1

pk,jek.

Hence

〈fi, fj〉 =

⟨n∑

k=1

pk,iek,

n∑

l=1

pl,jel

⟩=

n∑

k=1

n∑

l=1

pk,ipl,j 〈ek, el〉 =n∑

k=1

pk,ipk,j = (P tP )i,j .

So F is an orthonormal basis for Rn ⇐⇒ 〈fi, fj〉 = δi,j iff P tP = In ⇐⇒ P ∈ On(R). 2

5.4.155 Lemma Suppose α : V → V is a linear map on a vector space V over F and E ,F are basesfor V . If

A = [α]E and B = [α]FF ,

then cA(x) = cB(x).

Proof. If P is the transition matrix from E to F then B = P−1AP . So

cB(x) = |xI −B| = |xP−1P − P−1AP | = |P−1||xP −AP | = |P |−1|xI −A||P | = cA(x).

2

58

Lecture 28

5.5 Orthogonal Diagonalization

5.5.156 Definition self adjoint A map

α : V → V

from a finite-dimensional inner product space (V,< ·, · >) to itself is self-adjoint, if α∗ = α. Notice thatthe definition of α∗ explicitly requires the inner product. Hence, so does the notion of a self-adjoint map.

5.5.157 Theorem If A ∈ Mn(C) is Hermitian then all the eigenvalues of A are real.

5.5.158 Fundamental Theorem of Algebra If f ∈ C[x] has degree n then f is a product of n linearfactors.

Proof of Theorem 5.5.131. We use a, b ∈ C then ab = ab, a + b = a + b and aa = |a|2 ∈ R. IfA = (ai,j) ∈ Mn(C) is a matrix then A = (ai,j).

Consider cA(x) ∈ R[x] ⊂ C[x]. By the Fundamental Theorem of Algebra (which you prove usinganalysis) cA(x) is a product of n linear factors.

cA(x) = (x− λ1) · · · (x− λn),

with λi ∈ C for 1 ≤ i ≤ n.Suppose that λ is a root of cA(x) = 0 i.e. λ is an eigenvalue of A. So there exists x ∈ nC, x 6= 0 such

that Ax = λx. HenceAx = λx.

Now A ∈ Mn(R)) so A = A. HenceAx = Ax = λx.

So

xtAx = xtλx = λxtx = λ

n∑

i=1

xixi = λ

n∑

i=1

|xi|5.

Therefore since this is simply a number (and so equal to its transpose)

λ

n∑

i=1

|xi|2 = (xtAx)t = xtAtx = xtAx = xtAx = xtλx = λ

n∑

i=1

|xi|5.

So

(λ− λ)n∑

i=1

|xi|2 = 0.

But x 6= 0 son∑

i=1

|xi|2 6= 0.

Hence λ = λ so λ ∈ R. 2

Remark: For another, more abstract, proof that only requires the notion of a self-adjoint map, lookat the ‘Supplementary note on self-adjoint maps.’

59

5.5.159 Spectral theorem Let α : V → V be a self-adjoint linear map of a Euclidean space V . ThenV has an orthonormal basis of eigenvectors.

Proof. See the article ‘The Spectral Theorem’ on the course webpage. 2

5.5.160 Theorem Let α : V → V be a self-adjoint linear map of a Euclidean space V . If λ, µ aredistinct eigenvalues of α then

∀u ∈ Vλ ∀v ∈ Vµ 〈u, v〉 = 0.

Proof. If u ∈ Vλ and v ∈ Vµ then

λ 〈u, v〉 = 〈λu, v〉 = 〈αu, v〉 = 〈u, α∗v〉 = 〈u, αv〉 = 〈u, µv〉 = µ 〈u, v〉 .

So (λ− µ) 〈u, v〉 = 0, with λ 6= µ. Hence 〈u, v〉 = 0. 2

60

Lecture 29

5.5.161 Example examples of orthogonal diagonalization

See the ‘Supplementary note on self-adjoint maps.’

61

math 2201 lecture notes - uclucahmki/2201notes.pdf · † euclidean and hermitian spaces....

Documents