ma251 algebra i – advanced linear algebramasbal/algebra 1... · ma251 algebra i – advanced...

MA251 Algebra I – Advanced Linear AlgebraDaan Krammer

November 27, 2014

Contents

1 Review of Some Linear Algebra 21.1 The matrix of a linear map with respect to two bases . . . . . . . . . . 21.2 The column vector of a vector with respect to a basis . . . . . . . . . . 21.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 The Jordan Canonical Form 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 The Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . 52.3 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Jordan chains and Jordan blocks . . . . . . . . . . . . . . . . . . . . . 72.5 Jordan bases and the Jordan canonical form . . . . . . . . . . . . . . . 82.6 The JCF when n = 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . 102.7 The JCF for general n . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.8 Proof of theorem 26 (non-examinable) . . . . . . . . . . . . . . . . . . 142.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.10 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.11 Applications to difference equations . . . . . . . . . . . . . . . . . . . 182.12 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.13 Applications to differential equations . . . . . . . . . . . . . . . . . . . 20

3 Bilinear Maps and Quadratic Forms 213.1 Bilinear maps: definitions . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Bilinear maps: change of basis . . . . . . . . . . . . . . . . . . . . . . . 223.3 Quadratic forms: introduction . . . . . . . . . . . . . . . . . . . . . . . 233.4 Quadratic forms: definitions . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Change of variable under the general linear group . . . . . . . . . . . . 263.6 Change of variable under the orthogonal group . . . . . . . . . . . . . 283.7 Applications of quadratic forms to geometry . . . . . . . . . . . . . . . 33

3.7.1 Reduction of the general second degree equation . . . . . . . . 333.7.2 The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.7.3 The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.8 Unitary, hermitian and normal matrices . . . . . . . . . . . . . . . . . 383.9 Applications to quantum mechanics (non-examinable) . . . . . . . . . 40

4 Finitely Generated Abelian Groups 424.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Cosets and quotient groups . . . . . . . . . . . . . . . . . . . . . . . . 464.4 Homomorphisms and the first isomorphism theorem . . . . . . . . . . 484.5 Free abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6 Unimodular elementary row and column operations and the Smith

normal form for integral matrices . . . . . . . . . . . . . . . . . . . . . 514.7 Subgroups of free abelian groups . . . . . . . . . . . . . . . . . . . . . 534.8 General finitely generated abelian groups . . . . . . . . . . . . . . . . 544.9 Finite abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.10 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.11 Hilbert’s Third problem . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2 MA251 Algebra I November 27, 2014

4.12 Possible topics for the second year essays . . . . . . . . . . . . . . . . . 60

1 Review of Some Linear Algebra

Students will need to be familiar with the whole of the contents of the first year LinearAlgebra module (MA106). In this section, we shall review the material on matricesof linear maps and change of basis. Other material will be reviewed as it arises.

Throughout these lecture notes, K is a field and all vector spaces and linear mapare over K. If K is not allowed to be an arbitrary field we will say so.

1.1 The matrix of a linear map with respect to two bases

Let T : V ! W be a linear map. Let E = (e1

, . . . , en) be a basis of V and F =( f

1

, . . . , fm) of W. It is known that there exist unique scalars ai j for 1 j n and1 i m such that

T(e j) =m

Âi=1

ai j fi

for all j. The matrix

(ai j)i j =

0

B@

a11

· · · a1n

......

am1

· · · amn

1

CA

is written [F, T, E] or [FTE] and called the matrix of T with respect to E and F.The set (or vector space) of linear maps V ! W is written Hom(V, W). For fixed

V, W, E, F as above, the association T 7! [FTE] defines a bijective map Hom(V, W) !Km,n.

Theorem 1. Let S : U ! V and T : V ! W be linear maps. Let (respectively) E, F, Gbe bases of (respectively) U, V, W. Then

[G, TS, E] = [G, T, F][F, S, E]. ⇤

1.2 The column vector of a vector with respect to a basis

Let E = (e1

, . . . , en) be a basis of a vector space V and let x 2 V. It is known thatthere are unique scalars ai for 1 i n such that

v =n

Âi=1

ai ei.

The column vector0

B@

a1

...an

1

CA (2)

is written [E, v] or [Ev] and is called the column vector of v with respect to E. The aiare known as the coordinates of v with respect to E.

For typographical reasons we often denote the column vector (2) by (a1

, . . . , an)T

(T denotes transpose).For fixed V, E as above, the association v 7! [E, v] defines a bijective map V !

Kn,1.

Theorem 3. Let T : V ! W be a linear map and let v 2 V. Let (respectively) E, F bebases of (respectively) V, W. Then

[F, T(v)] = [F, T, E][E, v]. ⇤

November 27, 2014 MA251 Algebra I 3

If E = (e1

, . . . , en) is the standard basis of Kn and v 2 Kn then [E, v] = v. IfF = ( f

1

, . . . , fn) is another basis of Kn then [E, 1, F] ei = fi for all i because

[E, 1, F] ei = [E, 1, F][F, fi] = [E, fi] = fi.

1.3 Change of basis

Let U be a vector space. The identity map U ! U is defined by x 7! x and is variouslywritten I, id, 1, IU, idU, 1U.

An easy consequence of theorem 3 is:

Corollary 4. Let T : V ! W be a linear map. Let (respectively) E1

, E2

, F1

, F2

be basesof (respectively) V, V, W, W. Then

[F2

, T, E2

] = [F2

, 1, F1

][F1

, T, E1

][E1

, 1, E2

]. ⇤

The foregoing corollary explains why matrices of the form [F, 1, E] are calledchange of base matrices.

Definition 5. Two matrices A, B 2 Km,n are said to be equivalent if there exist invert-ible matrices P 2 Km,m, Q 2 Kn,n such that B = PAQ.

Theorem 6. A, B 2 Km,n. Then the following are equivalent:

(a) The matrices A, B are equivalent.

(b) The matrices A, B represent the same linear map with respect to possibly differentbases.

(c) The matrices A, B have the same rank. ⇤

2 The Jordan Canonical Form

2.1 Introduction

Throughout this section V will be a vector space of dimension n over a field K, T :

V ! V will be a linear operator1, and A will be the matrix of T with respect to afixed basis e

1

, . . . , en of V. Our aim is to find a new basis e

01

, . . . , e

0n for V, such that

the matrix of T with respect to the new basis is as simple as possible. Equivalently(by corollary 4), we want to find an invertible matrix P (the associated basis changematrix) such that P�1 AP is a simple as possible.

Our preferred form of matrix is a diagonal matrix, but we saw in MA106 thatthe matrix ( 1 1

0 1

), for example, is not similar to a diagonal matrix. We shall generallyassume that K = C. This is to ensure that the characteristic polynomial of A factorisesinto linear factors. Under this assumption, it can be proved that A is always similarto a matrix B = (�i j) of a certain type (called the Jordan canonical form or sometimesJordan normal form of the matrix), which is not far off being diagonal. In fact �i j iszero except when j = i or j = i + 1, and �i,i+1

is either 0 or 1.We start by summarising some definitions and results from MA106. We shall use

0 both for the zero vector in V and the zero n ⇥ n matrix. The zero linear operator0V : V ! V corresponds to the zero matrix 0, and the identity linear operator IV :

V ! V corresponds to the identity n ⇥ n matrix In.Because of the correspondence between linear maps and matrices, which respects

addition and multiplication, all statements about A can be rephrased as equivalentstatements about T. For example, if p(x) is a polynomial equation in a variable x,then p(A) = 0 , p(T) = 0V .

1i.e. a linear map from a space to itself


If Tv = �v for � 2 K and 0 6= v 2 V, or equivalently, if Av = �v, then � is aneigenvalue, and v a corresponding eigenvector of T and A. The eigenvalues can becomputed as the roots of the characteristic polynomial cA(x) = det(A � xIn) of A.

The eigenvectors corresponding to � are the non-zero elements in the nullspace(= kernel) of the linear operator T � �IV This nullspace is called the eigenspace of Twith respect to the eigenvalue �. In other words, the eigenspace is equal to {v 2 V |T(v) = �v}, which is equal to the set of eigenvectors together with 0.

The dimension of the eigenspace, which is called the nullity of T � �IV is thereforeequal to the number of linearly independent eigenvectors corresponding to �. Thisnumber plays an important role in the theory of the Jordan canonical form. From theDimension Theorem, proved in MA106, we know that

rank(T � �IV) + nullity(T � �IV) = n,

where rank(T � �IV) is equal to the dimension of the image of T � �IV .For the sake of completeness, we shall now repeat the results proved in MA106

about the diagonalisability of matrices. We shall use the theorem that a set of nlinearly independent vectors of V form a basis of V without further explicit reference.

Theorem 7. Let T : V ! V be a linear operator. Then the matrix of T is diagonal withrespect to some basis of V if and only if V has a basis consisting of eigenvectors of T.

Proof. Suppose that the matrix A = (↵i j) of T is diagonal with respect to the basise

1

, . . . , en of V. Recall that the image of the i-th basis vector of V is represented bythe i-th column of A. But since A is diagonal, this column has the single non-zeroentry ↵ii. Hence T(ei) = ↵iiei, and so each basis vector ei is an eigenvector of A.

Conversely, suppose that e

1

, . . . , en is a basis of V consisting entirely of eigenvec-tors of T. Then, for each i, we have T(ei) = �iei for some �i 2 K. But then the matrixof A with respect to this basis is the diagonal matrix A = (↵i j) with ↵ii = �i for eachi. ⇤Theorem 8. Let �

1

, . . . , �r be distinct eigenvalues of T : V ! V, and let v

1

, . . . , vrbe corresponding eigenvectors. (So T(vi) = �ivi for 1 i r.) Then v

1

, . . . , vr arelinearly independent.

Proof. We prove this by induction on r. It is true for r = 1, because eigenvectors arenon-zero by definition. For r > 1, suppose that for some ↵

1

, . . . ,↵r 2 K we have

↵1

v

1

+↵2

v

2

+ · · ·+↵rvr = 0.

Then, applying T to this equation gives

↵1

�1

v

1

+↵2

�2

v

2

+ · · ·+↵r�rvr = 0.

Now, subtracting �1

times the first equation from the second gives

↵2

(�2

� �1

)v2

+ · · ·+↵r(�r � �1

)vr = 0.

By inductive hypothesis, v

2

, . . . , vr are linearly independent, so ↵i(�i � �1

) = 0 for2 i r. But, by assumption, �i � �1

6= 0 for i > 1, so we must have ↵i = 0 fori > 1. But then ↵

1

v

1

= 0, so ↵1

is also zero. Thus ↵i = 0 for all i, which proves thatv

1

, . . . , vr are linearly independent. ⇤Corollary 9. If the linear operator T : V ! V (or equivalently the n ⇥ n matrix A) hasn distinct eigenvalues, where n = dim(V), then T (or A) is diagonalisable.

Proof. Under the hypothesis, there are n linearly independent eigenvectors, whichtherefore form a basis of V. The result follows from theorem 7. ⇤


2.2 The Cayley-Hamilton theorem

This theorem says that a matrix satisfies its own characteristic equation. It is easy tovisualise with the following “non-proof”:

cA(A) = det(A � AI) = det(0) = 0.

This argument is faulty because you cannot really plug the matrix A into det(A� xI):you must compute this polynomial first.

Theorem 10 (Cayley-Hamilton). Let cA(x) be the characteristic polynomial of the n ⇥n matrix A over an arbitrary field K. Then cA(A) = 0.

Proof. Recall from MA106 that, for any n ⇥ n matrix B, the adjoint adj(B) is then ⇥ n matrix whose ( j, i)th entry is the cofactor ci j = (�1)i+ j

det(Bi j), where Bi j isthe matrix obtained from B by deleting the i-th row and the j-th column of B. Weproved that B adj(B) = det(B)In.

By definition, cA(x) = det(A � xIn), and (A � xIn)adj(A � xIn) = det(A �xIn)In. Now det(A � xIn) is a polynomial of degree n in x; that is det(A � xIn) =a

0

x0 + a1

x1 + · · · + anxn, with ai 2 K. Similarly, putting B = A � xIn in the lastparagraph, we see that the ( j, i)-th entry (�1)i+ j

det(Bi j) of adj(B) is a polynomialof degree at most n � 1 in x. Hence adj(A � xIn) is itself a polynomial of degreeat most n � 1 in x in which the coefficients are n ⇥ n matrices over K. That is,adj(A � xIn) = B

0

x0 + B1

x + · · ·+ Bn�1

xn�1, where each Bi is an n ⇥ n matrix overK. So we have

(A � xIn)(B0

x0 + B1

x + · · ·+ Bn�1

xn�1) = (a0

x0 + a1

x1 + · · ·+ anxn)In.

Since this is a polynomial identity, we can equate coefficients of the powers of x onthe left and right hand sides. In the list of equations below, the equations on the leftare the result of equating coefficients of xi for 0 i n, and those on right areobtained by multiplying Ai by the corresponding left hand equation:

AB0

= a0

In AB0

= a0

In

AB1

� B0

= a1

In A2B1

� AB0

= a1

AAB

2

� B1

= a2

In A3B2

� A2B1

= a2

A2

· · · · · ·ABn�1

� Bn�2

= an�1

In AnBn�1

� An�1Bn�2

= an�1

An�1

� Bn�1

= an In � AnBn�1

= an An.

Now summing all of the equations in the right hand column gives

0 = a0

A0 + a1

A + . . . + an�1

An�1 + an An

(remember A0 = In), which says exactly that cA(A) = 0. ⇤By the correspondence between linear maps and matrices, we also have cA(T) =

0.

2.3 The minimal polynomial

We start this section with a brief general discussion of polynomials in a single variablex with coefficients in a field K, such as p = p(x) = 2x2 � 3x + 11. The set of all suchpolynomials is denoted by K[x]. There are two binary operations on this set: additionand multiplication of polynomials. These operations turn K[x] into a ring, which willbe studied in great detail in Algebra-II.


As a ring K[x] has a number of properties in common2 with the integers Z. Thenotation a|b mean a divides b. It can be applied to integers (for instance, 3|12), andalso to polynomials (for instance, (x � 3)|(x2 � 4x + 3)).

We can divide one polynomial p (with p 6= 0) into another polynomial q and geta remainder with degree less than p. For example, if q = x5 � 3, p = x2 + x + 1, thenwe find q = sp + r with s = x3 � x2 + 1 and r = �x � 4. For both Z and K[x], this isknown as the Euclidean Algorithm.

A polynomial r is said to be a greatest common divisor of p, q 2 K[x] if r|p, r|q,and, for any polynomial r0 with r0|p, r0|q, we have r0|r. Any two polynomials p, q 2K[x] have a greatest common divisor and a least common multiple (which is definedsimilarly), but these are only determined up to multiplication by a constant. Forexample, x � 1 is a greatest common divisor of x2 � 2x + 1 and x2 � 3x + 2, but so is1 � x and 2x � 2. To resolve this ambiguity, we make the following definition.

Definition 11. A polynomial with coefficients in a field K is called monic if the coef-ficient of the highest power of x is 1.

For example, x3 � 2x2 + x + 11 is monic, but 2x2 � x � 1 is not.Now we can define gcd(p, q) to be the unique monic greatest common divisor of

p and q, and similarly for lcm(p, q).As with the integers, we can use the Euclidean Algorithm to compute gcd(p, q).

For example, if p = x4 � 3x3 + 2x2, q = x3 � 2x2 � x + 2, then p = q(x � 1) + r withr = x2 � 3x + 2, and q = r(x + 1), so gcd(p, q) = r.

Theorem 12. Let A be an n ⇥ n matrix over K representing the linear operator T :

V ! V. The following hold:

(a) There is a unique monic non-zero polynomial p(x) with minimal degree and coef-ficients in K such that p(A) = 0,

(b) If q(x) is any polynomial with q(A) = 0, then p|q.

Proof. (a). If we have any polynomial p(x) with p(A) = 0, then we can makep monic by multiplying it by a constant. By theorem 10, there exists such a p(x),namely cA(x). If we had two distinct monic polynomials p

1

(x), p2

(x) of the sameminimal degree with p

1

(A) = p2

(A) = 0, then p = p1

� p2

would be a non-zeropolynomial of smaller degree with p(A) = 0, contradicting the minimality of thedegree, so p is unique.

(b). Let p(x) be the minimal monic polynomial in (a) and suppose that q(A) = 0.As we saw above, we can write q = sp + r where r has smaller degree than p. If r isnon-zero, then r(A) = q(A)� s(A)p(A) = 0 contradicting the minimality of p, sor = 0 and p|q. ⇤Definition 13. The unique monic polynomial µA(x) of minimal degree with µA(A) =0 is called the minimal polynomial of A or of the corresponding linear operator T.(Note that p(A) = 0 () p(T) = 0 for p 2 K[x].)

By theorem 10 and theorem 12 (ii), we have:

Corollary 14. The minimal polynomial of a square matrix A divides its characteristicpolynomial. ⇤

Similar matrices A and B represent the same linear operator T, and so their min-imal polynomial is the same as that of T. Hence we have

Proposition 15. Similar matrices have the same minimal polynomial. ⇤For a vector v 2 V, we can also define a relative minimal polynomial µA,v

as

2Technically speaking, they are both Euclidean Domains that is an important topic in Algebra-II.


the unique monic polynomial p of minimal degree for which p(T)(v) = 0V . Sincep(T) = 0 if and only if p(T)(v) = 0V for all v 2 V, µA is the least common multipleof the polynomials µA,v

for all v 2 V.But p(T)(v) = 0V for all v 2 V if and only if p(T)(bi) = 0V for all bi in a basis

b

1

, . . . , bn of V (exercise), so µA is the least common multiple of the polynomialsµA,bi .

This gives a method of calculating µA. For any v 2 V, we can compute µA,v

by calculating the sequence of vectors v, T(v), T2(v), T3(v) and stopping whenit becomes linearly dependent. In practice, we compute T(v) etc. as Av for thecorresponding column vector v 2 Kn,1.

For example, let K = R and

A =

0

BB@

3 �1 0 1

1 1 0 1

0 0 1 0

0 0 0 1

1

CCA.

Using the standard basis b

1

= (1 0 0 0)T, b

2

= (0 1 0 0)T, b

1

= (0 0 1 0)T,b

4

= (0 0 0 1)T of R4,1, we have:Ab

1

= (3 1 0 0)T, A2

b

1

= A(Ab

1

) = (8 4 0 0)T = 4Ab

1

� 4b

1

, so (A2 � 4A +4)b

1

= 0, and hence µA,b

1

= x2 � 4x + 4 = (x � 2)2.Ab

2

= (�1 1 0 0)T, A2

b

2

= (�4 0 0 0)T = 4Ab

2

� 4b

2

, so µA,b

2

= x2 � 4x + 4.Ab

3

= b

3

, so µA,b

3

= x � 1.Ab

4

= (1 1 0 1)T, A2

b

4

= (3 3 0 1)T = 3Ab

4

� 2b

4

, so µA,b

4

= x2 � 3x + 2 =(x � 2)(x � 1).

So we have µA = lcm(µA,b

1

,µA,b

2

,µA,b

3

,µA,b

4

) = (x � 2)2(x � 1).

2.4 Jordan chains and Jordan blocks

The Cayley-Hamilton theorem and the theory of minimal polynomials are valid forany matrix over an arbitrary field K, but the theory of Jordan forms will require anadditional assumption that the characteristic polynomial cA(x) is split in K[x], i.e. itfactorises into linear factors. If the field K = C then all polynomials in K[x] factoriseinto linear factors by the Fundamental Theorem of Algebra and JCF works for anymatrix.

Definition 16. A Jordan chain of length k is a sequence of non-zero vectors v

1

, . . . ,vk 2 Kn,1 that satisfies

Av

1

= �v

1

, Avi = �vi + vi�1

, 2 i k,

for some eigenvalue � of A.

Exercise 17. Prove that every Jordan chain is independent.

Example 18. Let V be the vector space of functions in the form f (z)e�z where f (z)is the polynomial of degree less than k. Consider the derivative, that is, the linearoperator T : V ! V given by T(�(z)) = �0(z). The vectors vi = zi�1e�z/(i � 1)!form a Jordan chain for T and a basis of V. In particular, the matrix of T in this basisis the Jordan block defined below.

Definition 19. Let T : V ! V be linear, � 2 K and n > 0. The kernel of the linearmap (T � � IV)i is called a generalised eigenspace. Likewise for matrices.

Note that ker(T � � IV) is an eigenspace; this is the case i = 1. The nonzeroelements of generalised eigenspaces are called generalised eigenvectors.

Notice that v 2 V is an eigenvector with eigenvalue � if and only if µA,v

= x � �.Similarly, generalised eigenvectors are characterised by the property µA,v

= (x � �)i

for some � and i.


Example 20. Let’s find the generalised eigenspaces and a Jordan chain of

A =

0

@3 1 0

0 3 1

0 0 3

1

A.

Firstly, ker(A � � I3

) = 0 if � 6= 3. Furthermore

A � 3 I3

=

0

@0 1 0

0 0 1

0 0 0

1

A, (A � 3 I

3

)2 =

0

@0 0 1

0 0 0

0 0 0

1

A, (A � 3 I

3

)3 = 0.

So the remaining generalised eigenspaces are

ker(A � 3 I3

) = hb1

i, ker(A � 3 I3

)2 = hb1

, b

2

i, ker(A � 3 I3

)3 = hb1

, b

2

, b

3

i

where bi denotes the standard base vectors.Also Ab

1

= 3b

1

, Ab

2

= 3b

2

+ b

1

, Ab

3

= 3b

3

+ b

2

, so b

1

, b

2

, b

3

is a Jordan chainof length 3 for the eigenvalue 3 of A. The generalised eigenspaces of index 1, 2, and3 are respectively hb

1

i, hb1

, b

2

i, and hb1

, b

2

, b

3

i.

Notice that the dimension of a generalised eigenspace of A is the nullity of (T ��IV)i, which is a a function of the linear operator T associated with A. Since similarmatrices represent the same linear operator, we have

Proposition 21. Let A, B 2 Kn,n be similar. Then

dim ker(A � �)i = dim ker(B � �)i

for all � 2 K and i > 0. In words, the dimensions of the generalised eigenspaces of Aand B are equal. ⇤Definition 22. We define a Jordan block with eigenvalue � of degree k to be a k ⇥ kmatrix J�,k = (�i j), such that �ii = � for 1 i k, �i,i+1

= 1 for 1 i < k, and�i j = 0 if j is not equal to i or i + 1.

So, for example,

J1,2

=

✓1 1

0 1

◆, J�,3

=

0

@3�i

2

1 0

0

3�i2

1

0 0

3�i2

1

A, and J

0,4

=

0

BB@

0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

1

CCA

are Jordan blocks, where � = 3�i2

in the second example.It should be clear that the matrix of T with respect to the basis v

1

, . . . , vn of Kn,1

is a Jordan block of degree n if and only if v

1

, . . . , vn is a Jordan chain for A.Note also that for A = J�,k, µA,vi = (x � �)i, so µA = (x � �)k. Since J�,k is an

upper triangular matrix with entries � on the diagonal, we see that the characteristicpolynomial cA of A is also equal to (�� x)k.

Warning: Some authors put the 1’s below rather than above the main diagonal ina Jordan block. This corresponds to either writing the Jordan chain in the reversedorder or using rows instead of columns for the standard vector space. However, if anauthor does both (uses rows and reverses the order) then the 1’s will go back abovethe diagonal.

2.5 Jordan bases and the Jordan canonical form

Definition 23. We denote the m ⇥ n matrix in which all entries are 0 by 0m,n. If Ais an m ⇥ m matrix and B an n ⇥ n matrix, then we denote the (m + n) ⇥ (m + n)matrix with block form ✓

A 0m,n0n,m B

◆

by A � B. We call A � B the block sum of A and B.


Definition 24. Let T : V ! V be linear. A Jordan basis for T and V is a finite basis Eof V such that there exist Jordan blocks J

1

, . . . , Jk such that

[ETE] = J1

� · · ·� Jk.

Likewise for matrices instead of linear maps.

Note that a linear map admits Jordan basis if and only if it is similar to a blocksum of Jordan blocks.

Combining definitions 16, 22 and 24 we find the following explicit characterisa-tion of Jordan bases in terms of Jordan chains.

Proposition 25. Let T : V ! V be linear and E = (e1

, . . . , en) a basis of V. Then E isa Jordan basis for T and V if and only if there exists k � 1 and integers i(0), . . . , i(k)such that 0 = i(0) < · · · < i(k) = n and

�ei(t)+1

, ei(t)+2

, . . . , ei(t+1)

�

is a Jordan chain for all t with 0 t < k. ⇤We can now state without proof the main theorem of this section, which says that

Jordan bases exist.

Theorem 26. Let A be an n ⇥ n matrix over K such that cA(x) splits into linear factorsin K[x].

(a) Then there exists a Jordan basis for A, and hence A is similar to a matrix J whichis a block sum of Jordan blocks.

(b) The Jordan blocks occurring in J are uniquely determined by A. More precisely,if A is similar to J

1

� · · · � Jj and L1

� · · · � L` and Ji and Li are (nonempty)Jordan blocks then j = ` and there exists a permutation s of {1, . . . , j} such thatJi = Ls(i) for all i.

The matrix J in the theorem is said to be the Jordan canonical form (JCF) orsometimes Jordan normal form of A.

We will prove the theorem later. First we derive some consequences and studymethods for calculating the JCF of a matrix. As we have discussed before, polynomialsover C always split. The gives the following corollary.

Corollary 27. Let A be an n ⇥ n matrix over C. Then there exists a Jordan basis for A.⇤

The proof of the following corollary requires algebraic techniques beyond thescope of this course. You can try to prove yourself after you have done Algebra-II3.The trick is to find a field extension F � K such that cA(x) splits in F[x]. For example,consider the rotation by 90 degrees matrix A = ( 0 1

�1 0

). Since cA(x) = x2 + 1, itseigenvalues are imaginary numbers i and �i. Hence, it admits no JCF over R butover complex numbers it has JCF ( i 0

0 �i ).

Corollary 28. Let A be an n ⇥ n matrix over K. Then there exists a field extensionF � K and a Jordan basis for A in Fn,1. ⇤

The next two corollaries are immediate4 consequences of theorem 26 but they areworth stating because of their computational significance.

Corollary 29. Let A be an n ⇥ n matrix over K that admits a Jordan basis. If P is thematrix having a Jordan basis as columns, then P�1 AP is the JCF of A.

3Or you can take Galois Theory next year and this should become obvious.4This means I am not proving them here but I expect you to be able to prove them


Notice that a Jordan basis is not, in general, unique. Thus, there exists multiplematrices P such that J = P�1 AP is the JCF of A.

The final corollary follows from an explicit calculation5 for J because both mini-mal and characteristic polynomials of J and A are the same.

Corollary 30. Suppose that the eigenvalues of A are �1

, . . . , �t, and that the Jordanblocks in J for the eigenvalue �i are J�i ,ki,1 , . . . , J�i ,ki, ji

, where ki,1 � ki,2 � · · · � ki, ji .

Then the characteristic polynomial is cA(x) = ’ti=1

(�i � x)ki , where ki = ki,1 +· · ·+ ki, ji for 1 i t. The minimal polynomial is µA(x) = ’t

i=1

(x � �i)ki,1 .

2.6 The JCF when n = 2 and 3

Figure 1: All Jordan matrices of size 2 or 3 up to reordering the blocks. A dot meanszero; �,µ, ⇡ 2 K are distinct.

Jordan matrix cA µA✓� ·· µ

◆(�� x)(µ � x) (x � �)(x �µ)

✓� 1

· �

◆(�� x)2 (x � �)2

✓� ·· �

◆(�� x)2 (x � �)

0

@� · ·· µ ·· · ⇡

1

A (�� x)(µ � x)(⇡ � x) (x � �)(x �µ)(x � ⇡)0

@� 1 ·· � ·· · µ

1

A (�� x)2(µ � x) (x � �)2(x �µ)

0

@� · ·· � ·· · µ

1

A (�� x)2(µ � x) (x � �)(x �µ)

0

@� 1 ·· � 1

· · �

1

A (�� x)3 (x � �)3

0

@� 1 ·· � ·· · �

1

A (�� x)3 (x � �)2

0

@� · ·· � ·· · �

1

A (�� x)3 (x � �)

In figure 1 you can find a full list of the Jordan matrices of size 2 or 3 up toreordering the blocks. For each we give the minimal and characteristic polynomials.

Corollary 31. Let A 2 Kn,n with n 2 {2, 3} admit a JCF. Then a JCF of A is determinedby the minimal and characteristic polynomials of A.

Proof. Immediate from figure 1. ⇤In the rest of this section we shall show examples where A 2 Kn,n is given with

n 2 {2, 3} and one is asked to find a Jordan basis.

Example 32. A = ( 1 4

1 1

). We calculate cA(x) = x2 � 2x � 3 = (x � 3)(x + 1), so

5The characteristic polynomial of J is the product of the characteristic polynomials of the Jordanblocks and the minimal polynomial of J is the least common multiple of characteristic polynomials ofthe Jordan blocks


there are two distinct eigenvalues, 3 and �1. Associated eigenvectors are (2 1)T and(�2 1)T, so we put P = ( 2 �2

1 1

) and then P�1 AP = ( 3 0

0 �1

).If the eigenvalues are equal, then there are two possible JCF’s, J�

1

,1

� J�1

,1

, whichis a scalar matrix, and J�

1

,2

. The minimal polynomial is respectively (x � �1

) and(x � �

1

)2 in these two cases. In fact, these cases can be distinguished without anycalculation whatsoever, because in the first case A = PJP�1 = J so A is its own JCF.

In the second case, a Jordan basis consists of a single Jordan chain of length2. To find such a chain, let v

2

be any vector for which (A � �1

I2

)v2

6= 0 and letv

1

= (A � �1

I2

)v2

. (In practice, it is often easier to find the vectors in a Jordan chainin reverse order.)

Example 33. A = ( 1 4

�1 �3

). We have cA(x) = x2 + 2x + 1 = (x + 1)2, so there is asingle eigenvalue �1 with multiplicity 2. Since the first column of A + I

2

is non-zero,we can choose v

2

= (1 0)T and v

1

= (A + I2

)v2

= (2 �1)T, so P = ( 2 1

�1 0

) andP�1 AP = ( �1 1

0 �1

).Now let n = 3. If there are three distinct eigenvalues, then A is diagonalisable.Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and

the other has multiplicity 1. Let the eigenvalues be �1

, �1

, �2

, with �1

6= �2

. Thenthere are two possible JCF’s for A, J�

1

,1

� J�1

,1

� J�2

,1

and J�1

,2

� J�2

,1

, and the minimalpolynomial is (x � �

1

)(x � �2

) in the first case and (x � �1

)2(x � �2

) in the second.In the first case, a Jordan basis is a union of three Jordan chains of length 1, each

of which consists of an eigenvector of A.

Example 34. A =

0

@2 0 0

1 5 2

�2 �6 �2

1

A. Then

cA(x) = (2 � x)[(5 � x)(�2 � x) + 12] = (2 � x)(x2 � 3x + 2) = (2 � x)2(1 � x).

We know from the theory above that the minimal polynomial must be (x � 2)(x � 1)or (x � 2)2(x � 1). We can decide which simply by calculating (A � 2I

3

)(A � I3

) totest whether or not it is 0. We have

A � 2I3

=

0

@0 0 0

1 3 2

�2 �6 �4

1

A, A � I

3

=

0

@1 0 0

1 4 2

�2 �6 �3

1

A,

and the product of these two matrices is 0, so µA = (x � 2)(x � 1).The eigenvectors v for �

1

= 2 satisfy (A � 2I3

)v = 0, and we must find twolinearly independent solutions; for example we can take v

1

= (0 2 �3)T, v

2

=(1 �1 1)T. An eigenvector for the eigenvalue 1 is v

3

= (0 1 �2)T, so we can choose

P =

0

@0 1 0

2 �1 1

�3 1 �2

1

A

and then P�1 AP is diagonal with entries 2, 2, 1.In the second case, there are two Jordan chains, one for �

1

of length 2, and one for�

2

of length 1. For the first chain, we need to find a vector v

2

with (A � �1

I3

)2

v

2

= 0but (A � �

1

I3

)v2

6= 0, and then the chain is v

1

= (A � �1

I3

)v2

, v

2

. For the secondchain, we simply need an eigenvector for �

2

.

Example 35. A =

0

@3 2 1

0 3 1

�1 �4 �1

1

A. Then

cA(x) = (3 � x)[(3 � x)(�1 � x) + 4]� 2 + (3 � x)= �x3 + 5x2 � 8x + 4 = (2 � x)2(1 � x),

A � 2I3

=

0

@1 2 1

0 1 1

�1 �4 �3

1

A, (A � 2I

3

)2 =

0

@0 0 0

�1 �3 �2

2 6 4

1

A,


(A � I3

) =

0

@2 2 1

0 2 1

�1 �4 �2

1

A.

and we can check that (A � 2I3

)(A � I3

) is non-zero, so we must have µA = (x �2)2(x � 1).

For the Jordan chain of length 2, we need a vector with (A � 2I3

)2

v

2

= 0 but(A � 2I

3

)v2

6= 0, and we can choose v

2

= (2 0 �1)T. Then v

1

= (A � 2I3

)v2

=(1 �1 1)T. An eigenvector for the eigenvalue 1 is v

3

= (0 1 �2)T, so we can choose

P =

0

@1 2 0

�1 0 1

1 �1 �2

1

A

and then

P�1 AP =

0

@2 1 0

0 2 0

0 0 1

1

A.

Finally, suppose that there is a single eigenvalue, �1

, so cA = (�1

� x)3. There arethree possible JCF’s for A, J�

1

,1

� J�1

,1

� J�1

,1

, J�1

,2

� J�1

,1

, and J�1

,3

, and the minimalpolynomials in the three cases are (x � �

1

), (x � �1

)2, and (x � �1

)3, respectively.In the first case, J is a scalar matrix, and A = PJP�1 = J, so this is recognisable

immediately.In the second case, there are two Jordan chains, one of length 2 and one of length

1. For the first, we choose v

2

with (A� �1

I3

)v2

6= 0, and let v

1

= (A� �1

I3

)v2

. (Thiscase is easier than the case illustrated in Example 4, because we have (A � �

1

I3

)2

v =0 for all v 2 C3,1.) For the second Jordan chain, we choose v

3

to be an eigenvectorfor �

1

such that v

2

and v

3

are linearly independent.

Example 36. A =

0

@0 2 1

�1 �3 �1

1 2 0

1

A. Then

cA(x) = �x[(3 + x)x + 2]� 2(x + 1)� 2 + (3 + x)

= �x3 � 3x2 � 3x � 1 = �(1 + x)3

.

We have

A + I3

=

0

@1 2 1

�1 �2 �1

1 2 1

1

A,

and we can check that (A + I3

)2 = 0. The first column of A + I3

is non-zero, so(A + I

3

)(1 0 0)T 6= 0, and we can choose v

2

= (1 0 0)T and v

1

= (A + I3

)v2

=(1 �1 1)T. For v

3

we need to choose a vector which is not a multiple of v

1

such that(A + I

3

)v3

= 0, and we can choose v

3

= (0 1 �2)T. So we have

P =

0

@1 1 0

�1 0 1

1 0 �2

1

A

and then

P�1 AP =

0

@�1 1 0

0 �1 0

0 0 �1

1

A.

In the third case, there is a single Jordan chain, and we choose v

3

such that(A � �

1

I3

)2

v

3

6= 0, v

2

= (A � �1

I3

)v3

, v

1

= (A � �1

I3

)2

v

3

.

Example 37. A =

0

@0 1 0

�1 �1 1

1 0 �2

1

A. Then

cA(x) = �x[(2 + x)(1 + x)]� (2 + x) + 1 = �(1 + x)3

.


We have

A + I3

=

0

@1 1 0

�1 0 1

1 0 �1

1

A, (A + I

3

)2 =

0

@0 1 1

0 �1 �1

0 1 1

1

A,

so (A + I3

)2 6= 0 and µA = (x + 1)3. For v

3

, we need a vector that is not in thenullspace of (A + I

3

)2. Since the second column, which is the image of (0 1 0)T isnon-zero, we can choose v

3

= (0 1 0)T, and then v

2

= (A + I3

)v3

= (1 0 0)T andv

1

= (A + I3

)v2

= (1 �1 1)T. So we have

P =

0

@1 1 0

�1 0 1

1 0 0

1

A, P�1 AP =

0

@�1 1 0

0 �1 1

0 0 �1

1

A.

2.7 The JCF for general n

Reminder on direct sums. Let W1

, . . . , Wk be subspaces of a vector space V. Wesay that V is a direct sum of W

1

, . . . , Wk and write

V = W1

� · · ·� Wk

if there exists a bijective map � from the Cartesian product W1

⇥ · · · ⇥ Wk to V,defined by �(x

1

, . . . , xk) = x1

+ · · ·+ xk.

Lemma 38. Let V = W1

� · · ·�Wk. Let T : V ! V be linear. Assume T(Wi) ⇢ Wi forall i and let Ti : Wi ! Wi be the restriction of T to Wi. Then

ker(T) = ker(T1

)� · · ·� ker(Tk).

Proof. Let ⇢ : W1

. . . Wk ! V be defined by addition: ⇢(x1

, . . . , xk) = x1

+ · · · xk. Let� be the restriction of ⇢ to ker(T

1

)⇥ · · ·⇥ ker(Tk).Claim 1: im(�) ⇢ ker(T). Proof of this. Let xi 2 ker(Ti) for all i and put

x = (x1

, . . . , xk). Then

T�x = T�(x1

, . . . , xk) = T(x1

+ · · ·+ xk)

= Tx1

+ · · ·+ Txk because T is linear= T

1

x1

+ · · ·+ Tkxk because xi 2 Wi for all i= 0 + · · ·+ 0 because Ti xi = 0 for all i= 0

which proves �(x) 2 ker(T) as required.Claim 2: im(�) � ker(T). Proof of this. Let y 2 ker(T), say, y = ⇢(x

1

, . . . , xk).Then

0 = Ty = T⇢(x1

, . . . , xk) = T(x1

+ · · ·+ xk)

= Tx1

+ · · ·+ Txk because T is linear= T

1

x1

+ · · ·+ Tkxk because xi 2 Wi for all i= ⇢(T

1

x1

, . . . , Tkxk) because Ti xi 2 Wi for all i.

But ⇢ is bijective so Tixi = 0 for all i, that is, xi 2 ker(Ti). So y = �(x1

, . . . , xk) asrequired.

The above claims show that � : ker(T1

) ⇥ · · · ⇥ ker(Tk) ! ker(T) defined by�(x

1

, . . . , xk) = x1

+ · · · + xk is surjective. It remains to prove that it is injective.Well, it is because ⇢ is. The proof is finished. ⇤

A JCF is determined by the dimensions of the generalised eigenspaces. Recall thatthe degree of a Jordan block J�,k is k.


Theorem 39. Let T : V ! V be linear admitting a JCF J. Let i > 0 and � 2 K. Thenthe number of Jordan blocks of J with eigenvalue � and degree at least i is equal tonullity

�(T � �In)i�� nullity

�(T � �In)i�1

�.

Proof. Let us first prove this if J is a Jordan block, J = J�,n.Let (b

1

, . . . , bn) be a Jordan basis for J. For all ai 2 K

(J � � In)k⇣

Âi

ai bi

⌘=

n

Âi=k+1

ai bi�k

which shows

ker(J � � In)k =

(Span(b

1

, . . . , bk) if 0 k n;Kn if n k.

so

nullity(A � � In)k = nullity(J � � In)

k =

(k if 0 k n;n if n k.

The result for J = J�,k follows.Knowing the result for Jordan blocks, we deduce the full result as follows. There

are subspaces W1

, . . . , Wk such that V = W1

� · · · � Wk and T(Wi) ⇢ Wi and therestriction Ti of T to Wi is a Jordan block (that is, the matrix of T with respect to asuitable basis is a Jordan block). Then

# Jordan blocks of T of eigenvalue � and degree i= Â

j# Jordan blocks of Tj of eigenvalue � and degree i

= Âj

⇣nullity(Tj � � IWj)

i � nullity(Tj � � IWj)i�1

⌘by the special case

=⇣

Âj

nullity(Tj � � IWj)i⌘�⇣

Âj

nullity(Tj � � IWj)i�1

⌘

= nullity(T � � IW)i � nullity(T � � IW)i�1 by lemma 38. ⇤

2.8 Proof of theorem 26 (non-examinable)

We proceed by induction on n = dim(V). The case n = 0 is clear.Let � be an eigenvalue of T and let U = im(T � �IV) and m = dim(U). Then

m = rank(T � �IV) = n � nullity(T � �IV) < n, because the eigenvectors for � lie inthe nullspace of T � �IV .

Note T(U) ⇢ U because for u 2 U, we have u = (T � �IV)(v) for some v 2 V,and hence T(u) = T(T � �IV)(v) = (T � �IV)T(v) 2 U.

Let TU : U ! U denote its restriction. We apply the inductive hypothesis to TU todeduce that U has a Jordan basis e

1

, . . . , em.We now show how to extend the Jordan basis of U to one of V. We do this in

two stages. For the first stage, suppose that ` of the (nonempty) Jordan chains ofTU are for the eigenvalue � (possibly ` = 0). For each such chain v

1

, . . . , vk withT(v

1

) = �v

1

, T(vi) = �vi + vi�1

, 2 i k, since vk 2 U = im(T � �IV), we canfind vk+1

2 V with T(vk+1

) = �vk+1

+ vk, thereby extending the chain by an extravector. So far we have adjoined ` new vectors to the basis, by extending the Jordanchains of eigenvalue � by one vector. Let us call these new vectors w

1

, . . . , w`.Now for the second stage. Recall that W := ker(T � �IV) has dimension n � m.

We already have ` vectors in W, namely the first vector in each Jordan chain ofeigenvalue �.

We can adjoin (n � m)� ` further eigenvectors of T to the ` that we have alreadyto complete a basis of W. Let us call these (n � m)� ` new vectors wl+1

, . . . , wn�m.


They are adjoined to our basis of V in the second stage. They each form a Jordanchain of length 1, so we now have a collection of n vectors which form a disjointunion of Jordan chains.

To complete the proof, we need to show that these n vectors form a basis of V, forwhich is it is enough to show that they are linearly independent.

By lemma 40 we may assume that T has only one eigenvalue �.Suppose that ↵

1

w

1

+ · · ·+↵n�mwn�m + x = 0, where x 2 U. Applying T � �Ingives

↵1

(T � �In)(w1

) + · · ·+↵l(T � �In)(w`) + (T � �In)(x) = 0.

For all i with 1 i ` then, (T � �In)(wi) is the last member of one of the `Jordan chains for TU. Moreover, (T � �In)(x) is a linear combination of the basisvectors of U other than (T � �In)(wi) for 1 i `. Hence, by linear independenceof the basis of U, we deduce that ↵i = 0 for 1 i ` and (T � �IV)(x) = 0.

So x 2 ker(TU � � IU). But, by construction, w`+1

, . . . , wn�m extend a basis ofker(TU � � IU) to W = ker(T � � IV), so we also get ↵i = 0 for `+ 1 i n � m,which completes the proof. ⇤Lemma 40. Let T : V ! V be linear and dim(V) = n < 1. Let ⇤ be a set ofeigenvalues of T. For all � 2 ⇤ let x� 2 V be such that (T � �IV)nx� = 0. AssumeÂ⇤ x� = 0. Then x� = 0 for all �.

Proof. Induction on k = #⇤. For k = 0 there is nothing to prove. Assume it’s true fork � 1. Pick ⇡ 2 ⇤ and apply (T � ⇡ IV)n to Â⇤ x� = 0. We get

Â⇤\{⇡}

(T � ⇡ IV)nx� = 0.

By the induction hypothesis (T � ⇡ IV)nx� = 0 for all �.Fix � 2 ⇤ \ {⇡}. Let µ be the unique monic polynomial such that µ(T) x� = 0

of least degree (relative minimal polynomial). Then µ(x) divides both (x � ⇡)n and(x � �)n and ⇡ 6= � so µ = 1 and x� = 0. ⇤

2.9 Examples

Example 41. A =

0

BB@

�2 0 0 0

0 �2 1 0

0 0 �2 0

1 0 �2 �2

1

CCA.

Then cA(x) = (�2 � x)4, so there is a single eigenvalue �2 with multiplicity 4. Wefind

(A + 2I4

) =

0

BB@

0 0 0 0

0 0 1 0

0 0 0 0

1 0 �2 0

1

CCA,

and (A + 2I4

)2 = 0, so µA = (x + 2)2, and the JCF of A could be J�2,2

� J�2,2

orJ�2,2

� J�2,1

� J�2,1

.To decide which case holds, we calculate the nullity of A + 2I

4

which, by theo-rem 39, is equal to the number of Jordan blocks with eigenvalue �2. Since A + 2I

4

has just two non-zero rows, which are distinct, its rank is clearly 2, so its nullity is4 � 2 = 2, and hence the JCF of A is J�2,2

� J�2,2

.A Jordan basis consists of a union of two Jordan chains, which we will call v

1

, v

2

,and v

3

, v

4

, where v

1

and v

3

are eigenvectors and v

2

and v

4

are generalised eigenvec-tors of index 2. To find such chains, it is probably easiest to find v

2

and v

4

first andthen to calculate v

1

= (A + 2I4

)v2

and v

3

= (A + 2I4

)v4

.Although it is not hard to find v

2

and v

4

in practice, we have to be careful, becausethey need to be chosen so that no nonzero linear combination of them lies in the


nullspace of A + 2I4

. In fact, since this nullspace is spanned by the second and fourthstandard basis vectors, the obvious choice is v

2

= (1 0 0 0)T, v

4

= (0 0 1 0)T, andthen v

1

= (A+ 2I4

)v2

= (0 0 0 1)T, v

3

= (A+ 2I4

)v4

= (0 1 0 �2)T, so to transformA to JCF, we put

P =

0

BB@

0 1 0 0

0 0 1 0

0 0 0 1

1 0 �2 0

1

CCA, P�1 =

0

BB@

0 2 0 1

1 0 0 0

0 1 0 0

0 0 1 0

1

CCA, P�1 AP =

0

BB@

�2 1 0 0

0 �2 0 0

0 0 �2 1

0 0 0 �2

1

CCA.

Example 42. A =

0

BB@

�1 �3 �1 0

0 2 1 0

0 0 2 0

0 3 1 �1

1

CCA.

Then cA(x) = (�1 � x)2(2 � x)2, so there are two eigenvalue �1, 2, both withmultiplicity 2. There are four possibilities for the JCF (one or two blocks for eachof the two eigenvalues). We could determine the JCF by computing the minimalpolynomial µA but it is probably easier to compute the nullities of the eigenspacesand use theorem 39. We have

A + I4

=

0

BB@

0 �3 �1 0

0 3 1 0

0 0 3 0

0 3 1 0

1

CCA,

(A � 2I4

) =

0

BB@

�3 �3 �1 0

0 0 1 0

0 0 0 0

0 3 1 �3

1

CCA, (A � 2I4

)2 =

0

BB@

9 9 0 0

0 0 0 0

0 0 0 0

0 �9 0 9

1

CCA.

The rank of A + I4

is clearly 2, so its nullity is also 2, and hence there are twoJordan blocks with eigenvalue �1. The three non-zero rows of (A � 2I

4

) are linearlyindependent, so its rank is 3, hence its nullity 1, so there is just one Jordan block witheigenvalue 2, and the JCF of A is J�1,1

� J�1,1

� J2,2

.For the two Jordan chains of length 1 for eigenvalue �1, we just need two linearly

independent eigenvectors, and the obvious choice is v

1

= (1 0 0 0)T, v

2

= (0 0 0 1)T.For the Jordan chain v

3

, v

4

for eigenvalue 2, we need to choose v

4

in the nullspace of(A� 2I

4

)2 but not in the nullspace of A� 2I4

. (This is why we calculated (A� 2I4

)2.)An obvious choice here is v

4

= (0 0 1 0)T, and then v

3

= (�1 1 0 1)T, and totransform A to JCF, we put

P =

0

BB@

1 0 �1 0

0 0 1 0

0 0 0 1

0 1 1 0

1

CCA, P�1 =

0

BB@

1 1 0 0

0 �1 0 1

0 1 0 0

0 0 1 0

1

CCA, P�1 AP =

0

BB@

�1 0 0 0

0 �1 0 0

0 0 2 1

0 0 0 2

1

CCA.

2.10 Powers of matrices

The theory we developed can be used to compute powers of matrices efficiently. Sup-pose we need to compute A2012 where

A =

0

BB@

�2 0 0 0

0 �2 1 0

0 0 �2 0

1 0 �2 �2

1

CCA

from example 41.There are two practical ways of computing An for a general matrix. The first one

involves Jordan forms. If J = P�1 AP is the JCF of A then it is sufficient to computeJn because of the telescoping product:

An = (PJP�1)n = P(JP�1P)n�1 JP�1 = PJn�1 JP�1 = PJnP�1

.


If J =

0

BBB@

Jk1

,�1

0 · · · 0

0 Jk2

,�2

· · · 0

. . .0 0 · · · Jkt ,�t

1

CCCAthen Jn =

0

BBB@

Jnk

1

,�1

0 · · · 0

0 Jnk

2

,�2

· · · 0

. . .0 0 · · · Jn

kt ,�t

1

CCCA.

Finally, the power of an individual Jordan block can be computed as

Jnk,� =

0

BBBBB@

�n n�n�1 · · · Cnk�2

�n�k+2 Cnk�1

�n�k+1

0 �n · · · Cnk�3

�n�k+3 Cnk�2

�n�k+2

. . . . . . . . . . . .0 0 · · · �n n�n�1

0 0 · · · 0 �n

1

CCCCCA

where Cnt = n!/(n � t)!t! is the Choose-function, interpreted as Cn

t = 0 whenevert > n.

Let us apply it to the matrix from example 41:

An = PJnP�1 =

0

BB@

0 1 0 0

0 0 1 0

0 0 0 1

1 0 �2 0

1

CCA

0

BB@

�2 1 0 0

0 �2 0 0

0 0 �2 1

0 0 0 �2

1

CCA

n0

BB@

0 2 0 1

1 0 0 0

0 1 0 0

0 0 1 0

1

CCA =

=

0

BB@

0 1 0 0

0 0 1 0

0 0 0 1

1 0 �2 0

1

CCA

0

BB@

(�2)n n(�2)n�1

0 0

0 (�2)n0 0

0 0 (�2)n n(�2)n�1

0 0 0 (�2)n

1

CCA

0

BB@

0 2 0 1

1 0 0 0

0 1 0 0

0 0 1 0

1

CCA =

=

0

BB@

(�2)n0 0 0

0 (�2)n n(�2)n�1

0

0 0 (�2)n0

n(�2)n�1

0 n(�2)n (�2)n

1

CCA.

The second method of computing An uses Lagrange’s interpolation polynomial.It is less labour intensive and more suitable for pen-and-paper calculations. Suppose (A) = 0 for a polynomial (z). In practice, (z) is either the minimal or charac-teristic polynomial. Dividing with a remainder zn = q(z) (z) + h(z), we concludethat

An = q(A) (A) + h(A) = h(A).

Division with a remainder may appear problematic6 for large n but there is a shortcut.If we know the roots of (z), say ↵

1

, . . . ,↵k with their multiplicities m1

, . . . , mk, thenh(z) can be found by solving the system of simultaneous equations in coefficients ofh(z):

f (t)(↵ j) = h(t)(↵ j) whenever 1 j k, 0 t < mj

where f (z) = zn and f (t) = ( f (t�1))0 is the t th derivative. In other words, h(z) isLagrange’s interpolation polynomial for the function zn at the roots of (z).

We know that µA(z) = (z + 2)2 for the matrix A above. Suppose the Lagrangeinterpolation of zn at the roots of (z + 2)2 is h(z) = ↵z + �. The condition on thecoefficients is given by

⇢(�2)n = h(�2) = �2↵ +�n(�2)n�1 = h0(�2) = ↵

Solving them gives ↵ = n(�2)n�1 and � = (1 � n)(�2)n. It follows that

An = n(�2)n�1 A + (1 � n)(�2)n I =

0

BB@

(�2)n0 0 0

0 (�2)n n(�2)n�1

0

0 0 (�2)n0

n(�2)n�1

0 n(�2)n (�2)n

1

CCA.

6Try to divide z2012 by z2 + z + 1 without reading any further.


2.11 Applications to difference equations

Let us consider an initial value problem for an autonomous system with discrete time:

x(n + 1) = A x(n), n 2 N, x(0) = w.

Here x(n) 2 Km is a sequence of vectors in a vector space over a field K. One thinksof x(n) as a state of the system at time n. The initial state is x(0) = w. The n ⇥ nmatrix A with coefficients in K describes the evolution of the system. The adjectiveautonomous means that the evolution equation does not change with the time7.

It takes longer to formulate this problem than to solve it. The solution is a no-brainer:

x(n) = Ax(n � 1) = A2x(n � 2) = · · · = Anx(0) = Anw.

As a working example, let us consider a 2-step linearly recursive sequence. It isdetermined by a quadruple (a, b, c, d) 2 K4 and the rules

s0

= a, s1

= b, sn = csn�1

+ dsn�2

for n � 2.

Such sequences are ubiquitous. Arithmetic sequences form a subclass with c = 2,d = �1. In general, (a, b, 2,�1) determines the arithmetic sequence starting at a withthe difference b � a. For instance, (0, 1, 2,�1) determines the sequence of naturalnumbers sn = n.

A geometric sequence starting at a with ratio q admits a non-unique description.One obvious quadruples giving it is (a, aq, q, 0). However, it is conceptually betterto use quadruple (a, aq, 2q,�q2) because the sequences coming from (a, b, 2q,�q2)include both arithmetic and geometric sequences and can be called arithmo-geometricsequences.

If c = d = 1 then this is a Fibonacci type sequence. For instance, (0, 1, 1, 1)determines Fibonacci numbers Fn while (2, 1, 1, 1) determines Lucas numbers Ln.

All of these examples admit closed 8 formulae for a generic term sn. Can we find aclosed formula for sn, in general? Yes, we can because this problem is reduced to aninitial value problem with discrete time if we set

x(n) =✓

snsn+1

◆, w =

✓ab

◆, A =

✓0 1

d c

◆.

Computing the characteristic polynomial, cA(z) = z2 � cz � d. If c2 + 4d = 0, the

JCF of A is J =✓

c/2 1

0 c/2

◆. Let q = c/2. Then d = �q2 and we are dealing with

the arithmo-geometric sequence (a, b, 2q,�q2). Let us find the closed formula for sn

in this case using Jordan forms. As A =

✓0 1

�q2

2q

◆one can choose the Jordan

basis e2

=

✓0

1

◆, e

1

=

✓1

q

◆. If P =

✓1 0

q 1

◆then P�1 =

✓1 0

�q 1

◆and

An = (PJP�1)n = PJnP�1 = P✓

qn nqn�1

0 qn

◆P�1 =

✓(1 � n)qn nqn�1

�nqn+1 (1 + n)qn

◆.

This gives the closed formula for arithmo-geometric sequence we were seeking:

sn = (1 � n)qna + nqn�1b.

If c2 + 4d 6= 0, the JCF of A is✓

(c +p

c2 + 4d)/2 0

0 (c �p

c2 + 4d)/2

◆and

the closed formula for sn will involve the sum of two geometric sequences. Let us7A nonautonomous system would be described by x(n + 1) = A(n)x(n) here.8Closed means non-recursive, for instance, sn = a + n(b � a) for the arithmetic sequence


see it through for Fibonacci and Lucas numbers using Lagrange’s polynomial. Sincec = d = 1, c2 + 4d = 5 and the roots of cA(z) are the golden ratio � = (1 +

p5)/2

and 1� � = (1�p

5)/2. It is useful to observe that 2�� 1 =p

5 and �(1� �) = �1.Let us introduce the number µn = �n � (1 � �)n. Suppose the Lagrange interpolationof zn at the roots of z2 � z � 1 is h(z) = ↵z +�. The condition on the coefficients isgiven by ⇢

�n = h(�) = ↵�+�(1 � �)n = h(1 � �) = ↵(1 � �) +�

Solving them gives ↵ = µn/p

5 and � = µn�1

/p

5. It follows that

An = ↵A +� = µn/p

5A +µn�1

/p

5I2

=

✓µn�1

/p

5 µn/p

5

µn/p

5 (µn +µn�1

)/p

5

◆.

Since✓

FnFn+1

◆= An

✓0

1

◆, it immediately implies that

An =

✓Fn�1

FnFn Fn+1

◆and Fn = µn/

p5 .

Similarly for the Lucas numbers, we get✓

LnLn+1

◆= An

✓2

1

◆and

Ln = 2Fn�1

+ Fn = Fn�1

+ Fn+1

= (µn�1

+µn+1

)/p

5.

2.12 Functions of matrices

We restrict to K = R in this section. Let us consider a power series Ân anzn, an 2 Rwith a positive radius of convergence ". It defines a function f : (�",") ! R by

f (x) =1

Ân=0

anxn

and the power series is Taylor’s series of f (x) at zero. In particular,

an = f [n](0) = f (n)(0)/n!,

where f [n](z) = f (n)(z)/n! is a divided derivative. We extend the function f (z) tomatrices by the formula

f (A) =1

Ân=0

f [n](0)An.

The right hand side of this formula is a matrix whose entries are series. All theseseries need to converge for f (A) to be well defined. If the norm9 of A is less than "then f (A) is well defined. Alternatively, if all eigenvalues of A belong to (�",") thenf (A) is well defined as can be seen from the JCF method of computing f (A). If

J =

0

BBB@

Jk1

,�1

0 · · · 0

0 Jk2

,�2

· · · 0

. . .0 0 · · · Jkt ,�t

1

CCCA= P�1 AP

is the JCF of A then

f (A) = P f (J)P�1 = P

0

BBB@

f (Jk1

,�1

) 0 · · · 0

0 f (Jk2

,�2

) · · · 0

. . .0 0 · · · f (Jkt ,�t)

1

CCCA

9this notion is beyond the scope of this module and will be discussed in Differentiation


while

f (Jk,�) =

0

BBBB@

f (�) f [1](�) · · · f [k�1](�)

0 f (�) · · · f [k�2](�). . .

0 0 · · · f (�)

1

CCCCA.

Lagrange’s method works as well:

Proposition 43. Let an 2 R for n � 0. Suppose that the radius of convergence off (x) = Ân�0

an xn is " > 0.Let A 2 Rk,k and 2 R[x] be such that (A) = 0 and splits in R[x] and all roots

of are in (�",").Let h 2 R[x] (Lagrance interpolation) be the unique polynomial of degree < deg( )

such that h(`)(↵) = f (`)(↵) whenever (x �↵)`+1 | . Then f (A) = h(A).

Proof. In analysis people prove that there exists a unique function q on (�","), givenby a converging Taylor series, such that f (x)� h(x) = (x) q(x). Now set x = A. ⇤

Recall that Taylor’s series for exponent ex = Â1n=0

xn/n! converges for all x.Consequently the matrix exponent eA = Â1

n=0

An/n! is defined for all real m-by-mmatrices A. Let us compute eA for the matrix A from example 41.

Suppose the Lagrange interpolation of ez at the roots of µA(z) = (z + 2)2 ish(z) = ↵z +�. The condition on the coefficients is given by

⇢e�2 = h(�2) = �2↵ +�e�2 = h0(�2) = ↵

Solving them gives ↵ = e�2 and � = 3e�2. It follows that

eA = e�2 A + 3e�2 I =

0

BB@

e�2

0 0 0

0 e�2 e�2

0

0 0 e�2

0

e�2

0 �2e�2 e�2

1

CCA.

2.13 Applications to differential equations

Let us now consider an initial value problem for an autonomous system with contin-uous time:

dx(t)dt

= Ax(t), t 2 [0, 1), x(0) = w.

Here A 2 Rn⇥n, w 2 Rn are given, x : R�0

! Rn is a smooth function to be found.One thinks of x(t) as a state of the system at time t. The solution to this problem is

x(t) = etAw

because, as one can easily check,

ddt(x(t)) = Â

n

ddt

✓tn

n!

Anw◆= Â

n

tn�1

(n � 1)!Anw = A Â

k

tk

k!

Akw = Ax(t).

Example 44. Let us consider a harmonic oscillator described by equation y00(t) +y(t) = 0. The general solution y(t) = ↵ sin(t) + � cos(t) is well known. Let usobtain it using matrix exponents. Setting

x(t) =✓

y(t)y0(t)

◆, A =

✓0 1

�1 0

◆


the harmonic oscillator becomes the initial value problem with a solution x(t) =etAx(0). The eigenvalues of A are i and �i. Interpolating ezt at these values of z givesthe following condition on h(z) = ↵z +�

⇢eit = h(i) = ↵i +�e�it = h(�i) = �↵i +�

Solving them gives ↵ = (eit � e�it)/2i = sin(t) and � = (eit + e�it)/2 = cos(t). Itfollows that

etA = sin(t)A + cos(t)I2

=

✓cos(t) sin(t)� sin(t) cos(t)

◆

and y(t) = cos(t)y(0) + sin(t)y0(0).

Example 45. Let us consider a system of differential equations

8<

:

y01

= y1

� 3y3

y02

= y1

� y2

� 6y3

y03

= �y1

+ 2y2

+ 5y3

with initial condition

8<

:

y1

(0) = 1

y2

(0) = 1

y3

(0) = 0

Using matrices

x(t) =

0

@y

1

(t)y

2

(t)y

3

(t)

1

A, w =

0

@1

1

0

1

A, A =

0

@1 0 �3

1 �1 �6

�1 2 5

1

A,

it becomes an initial value problem. The characteristic polynomial is cA(z) = �z3 +5z2 � 8z + 4 = (1 � z)(2 � z)2. We need to interpolate etz at 1 and 2 by h(z) =↵z2 +�z+�. At the multiple root 2 we need to interpolate up to order 2 that involvestracking the derivative (etz)0 = tetz:

8<

:

et = h(1) = ↵ +�+ �e2t = h(2) = 4↵ + 2�+ �te2t = h0(2) = 4↵ +�

Solving, ↵ = (t � 1)e2t + et, � = (4 � 3t)e2t � 4et, � = (2t � 3)e2t + 4et. It followsthat

etA = e2t

0

@3t � 3 �6t + 6 �9t + 6

3t � 2 �6t + 4 �9t + 3

�t 2t 3t + 1

1

A+ et

0

@4 �6 �6

2 �3 �3

0 0 0

1

A

and

x(t) =

0

@y

1

(t)y

2

(t)y

3

(t)

1

A = etA

0

@1

1

0

1

A =

0

@(3 � 3t)e2t � 2et

(2 � 3t)e2t � et

te2t

1

A.

3 Bilinear Maps and Quadratic Forms

3.1 Bilinear maps: definitions

Definition 46. Let V and W be vector spaces over a field K. A bilinear map on Wand V is a map ⌧ : W ⇥ V ! K such that

(a) ⌧(↵1

w

1

+↵2

w

2

, v) = ↵1

⌧(w1

, v) +↵2

⌧(w2

, v) and

(b) ⌧(w,↵1

v

1

+↵2

v

2

) = ↵1

⌧(w, v

1

) +↵2

⌧(w, v

2

)

for all w, w

1

, w

2

2 W, v, v

1

, v

2

2 V, and ↵1

,↵2

2 K.


Notice the difference between linear and bilinear maps. For instance, let V =W = K. Addition is a linear map but not bilinear. On the other hand, multiplicationis bilinear but not linear.

Let us choose a basis E = (e1

, . . . , en) of V and a basis F = ( f

1

, . . . , fm) of W.Let ⌧ : W ⇥ V ! K be a bilinear map, and let ↵i j = ⌧( fi, e j), for 1 i m,

1 j n. The m ⇥ n matrix A = (↵i j) is called the matrix of ⌧ with respect to thebases E and F.

Let v 2 V, w 2 W and write v = x1

e

1

+ · · ·+ xnen and w = y1

f

1

+ · · ·+ ym fm.Recall our notation from section 1.2

[E, v] =

0

B@

x1

...xn

1

CA 2 Kn,1

, and [F, w] =

0

B@

y1

...ym

1

CA 2 Km,1

Then, by using the equations (a) and (b) above, we get

⌧(w, v) =m

Âi=1

n

Âj=1

yi ⌧( fi, e j) xj =m

Âi=1

n

Âj=1

yi↵i j x j = [F, w]TA[E, v] (47)

For example, let V = W = R2 and use the natural basis of V. Suppose thatA =

✓1 �1

2 0

◆. Then

⌧((y1

, y2

), (x1

, x2

)) = (y1

, y2

)✓

1 �1

2 0

◆✓x

1

x2

◆= y

1

x1

� y1

x2

+ 2y2

x1

.

Proposition 48. Let E be a basis of V and F of W. Write n = dim V, m = dim W.Then there is a bijection from the set of bilinear forms on W ⇥ V to Km,n, taking ⌧ to itsmatrix with respect to (E, F).

Proof. Exercise. ⇤

3.2 Bilinear maps: change of basis

Theorem 49. Let A be the matrix of a bilinear map ⌧ : W ⇥ V ! K with respect to thebases E

1

and F1

of V and W, and let B be its matrix with respect to the bases E2

andF

2

of V and W. Consider the basis change matrices P = [E1

, 1, E2

] and Q = [F1

, 1, F2

].Then B = QTAP.

Proof. Recall (CD)T = DT CT whenever C 2 Kp,q and D 2 Kq,r. By (47)

[F2

, y]TB[E2

, x] = ⌧(x, y) = [F1

, x]TA[E1

, y]

=�[F

1

, 1, F2

][F2

, y]�T A [E

1

, 1, E2

][E2

, x]

=�Q [F

2

, y]�T A P [E

2

, x] = [F2

, y]T�QT A P

�[E

2

, x]

for all (x, y) 2 V ⇥ W. By proposition 48 it follows that B = QTAP. ⇤Compare this result with the formula B = QAP of corollary 4.

Definition 50. A bilinear map ⌧ : V ⇥ V ! K (so ‘V = W’) is called a bilinear formon V.

From now on we shall mainly be concerned with bilinear forms rather than bilin-ear maps.

Instead of saying that the matrix of a bilinear form is with respect to (E, E) wejust say that it is with respect to E. Theorem 49 immediately yields:

Theorem 51. Let A be the matrix of a bilinear form ⌧ on V with respect to a basis E1

of V, and let B be its matrix with respect to a basis E2

of V. Consider the change of basismatrix P = [E

1

, 1, E2

]. Then B = PTAP. ⇤


So, in the example at the end of Subsection 3.1, if we choose the new basis e

01

=

(1 �1), e

02

= (1 0) then P =✓

1 1

�1 0

◆, PTAP =

✓0 �1

2 1

◆, and

⌧�

y01

e

01

+ y02

e

02

, x01

e

01

+ x02

e

02

�= �y0

1

x02

+ 2y02

x01

+ y02

x02

.

Definition 52. Two square matrices A and B are called congruent if there exists aninvertible matrix P with B = PTAP.

Compare this with similarity of matrices which is defined by the formula B =P�1 AP.

Definition 53. A bilinear form ⌧ on V is called symmetric if ⌧(w, v) = ⌧(v, w) for allv, w 2 V. An n ⇥ n matrix A is called symmetric if AT = A.

Proposition 54. Let E be a finite basis of a vector space V. Then, a bilinear form ⌧ onV is symmetric if and only if its matrix with respect to E is symmetric.

Proof. Easy exercise. ⇤Example 55. The best known example of a bilinear form is when V = Rn, and ⌧ isdefined by

⌧((x1

, x2

, . . . , xn), (y1

, y2

, . . . , yn)) = x1

y1

+ x2

y2

+ · · ·+ xn yn.

The matrix of this bilinear form with respect to the standard basis of Rn is the iden-tity matrix In. Geometrically, it is equal to the normal scalar product ⌧(v, w) =|vkw| cos✓, where ✓ is the angle between the vectors v and w.

3.3 Quadratic forms: introduction

A quadratic form on Kn is a polynomial function of several variables x1

, . . . , xn inwhich each term has total degree two, such as 3x2 + 2xz + z2 � 4yz + xy. One mo-tivation to study them comes from the geometry of curves or surfaces defined byquadratic equations.

As an example, consider the equation 5x2 + 5y2 � 6xy = 2. See figure 2(a).This represents an ellipse, in which the two principal axes are at an angle of ⇡/4

with the x- and y-axes. To study such curves in general, it is desirable to changevariables (which will turn out to be equivalent to a change of basis) so as to makethe principal axes of the ellipse coincide with the x- and y-axes. This is equivalentto eliminating the xy-term in the equation. We can do this easily by completing thesquare.

In the example

5x2 + 5y2 � 6xy = 2 , 5(x � 3y/5)2 � 9y2/5 + 5y2 = 2

, 5(x � 3y/5)2 + 16y2/5 = 2

so if we change variables, and put x0 = x � 3y/5 and y0 = y, then the equationbecomes 5(x0)2 + 16(y0)2/5 = 2. See figure 2(b).

Here we have allowed an arbitrary basis change. We shall study this situation insection 3.5.

One disadvantage of doing this is that the shape of the curve has become distorted.If we wish to preserve the shape, then we should restrict our basis changes to thosethat preserve distance and angle. These are called orthogonal basis changes, andwe shall study that situation in section 3.6. In the example, we can use the changeof variables x0 = (x + y)/

p2, y0 = (x � y)/

p2 (which represents a non-distorting

rotation through an angle of ⇡/4), and the equation becomes (x0)2 + 4(y0)2 = 1. Seefigure 3.


Figure 2:

–0.8

–0.6

–0.4

–0.2

0.2

0.4

0.6

0.8

y

–0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6 0.8x

–0.8

–0.6

–0.4

–0.2

0.2

0.4

0.6

0.8

y’

–0.6 –0.4 –0.2 0.2 0.4 0.6x’

(a). 5x2 + 5y2 � 6xy = 2 (b). 5(x0)2 + 16(y0)2/5 = 2

–0.4

–0.2

0

0.2

0.4

y’

–1 –0.5 0.5 1x’

Figure 3: (x0)2 + 4(y0)2 = 1


3.4 Quadratic forms: definitions

Definition 56. Let V be a finite-dimensional vector space. A function q : V ! K issaid to be a quadratic form on V if there exists a bilinear form ⌧ : V ⇥ V ! K suchthat q(v) = ⌧(v, v) for all v 2 V.

As this is the official definition of a quadratic form we will use, we do not reallyneed to observe that it yields the same notion for the standard vector space Kn as thedefinition in the previous section. However, it is a good exercise that an inquisitivereader should definitely do. The key is to observe that the function xix j comes fromthe bilinear form ⌧i, j such that ⌧i, j(ei, e j) = 1 and zero elsewhere.

In proposition 57 we need to be able to divide by 2 in the field K. This means thatwe must assume10 that 1 + 1 6= 0 in K. For example, we would like to avoid the fieldof two elements. If you prefer to avoid worrying about such technicalities, then youcan safely assume that K is either Q, R or C.

Let us consider the following three sets. The first set Q(V, K) consists of allquadratic forms on V. It is a subset of the set of all functions from V to K. Thesecond set Bil(V ⇥ V, K) consists of all bilinear forms on V. It is a subset of the setof all functions from V ⇥ V to K. Finally, we need Sym(V ⇥ V, K), the subset ofBil(V ⇥ V, K) consisting of the symmetric bilinear forms.

There are two interesting functions connecting these sets:

Bil(V ⇥ V, K) ��! Q(V, K) ��! Sym(V ⇥ V, K) :��(⌧)

�(v) = ⌧(v, v),

� (q)

�(u, v) = q(u + v)� q(u)� q(v).

Proposition 57. The following hold for all q 2 Q(V, K) and ⌧ 2 Sym(V ⇥ V, K):

(a) (q) 2 Sym(V ⇥ V, K),

(b) �( (q)) = 2q,

(c) (�(⌧)) = 2⌧ ,

(d) If 1 + 1 6= 0 in K then : Q(V, K) ! Sym(V ⇥ V, K) is bijective.

Proof. Proof of (a). Let � be a bilinear form on V such that q = �(�). So q(v) =�(v, v) for all v. Then

� (q)

�(v, w) = q(v + w)� q(v)� q(w)

= �(v + w, v + w)��(v, v)��(w, w) = �(v, w) +�(w, v) (58)

for all v, w. Prove yourself that �(w, v) and �(v, w) +�(w, v) are bilinear expressionsin (v, w). This proves (a).

Proof of (b). Using the notation of (a)�� (q)

�(v) = �(v, v)+�(v, v) = 2�(v, v) =

2 q(v) for all v so � (q) = 2q as required.Proof of (c). Write r = �(⌧). So

� (r)

�(v, w) = ⌧(v, w) + ⌧(w, v) by (58).

Therefore� �(⌧))(v, w) = ⌧(v, w) + ⌧(w, v) = 2 ⌧(v, w) because ⌧ is symmetric.

This proves (c).Proof of (d). By (b) and (c) �/2 is an inverse to . ⇤Let ⌧ : V ⇥ V ! K be a symmetric bilinear form. Let E = (e

1

, . . . , en) be a basisof V. Recall that the coordinates of v with respect to E are defined to be the scalarsxi such that v = Ân

i=1

xi ei.Let A = (↵i j) be the matrix of ⌧ with respect to E. We will also call A the matrix

of q := �(⌧) with respect to this basis. Then A is symmetric because ⌧ is, and by (47)

q(v) = [E, v]T A [E, v] =n

Âi=1

n

Âj=1

xi↵i j x j = Â1in

↵ii x2

i + 2 Â1i< jn

↵i j xi x j.

10Fields with 1 + 1 = 0 are fields of characteristic 2. One can actually do quadratic and bilinear formsover them but the theory is quite specific. It could be a good topic for a second year essay.


When n 3, we shall usually write x, y, z instead of x1

, x2

, x3

. For example, if

n = 2 and A =✓

1 3

3 �2

◆, then q(v) = x2 � 2y2 + 6xy.

Conversely, if we are given a quadratic form as in the right hand side of (3.4),then it is easy to write down its matrix A. For example, if n = 3 and q(v) = 3x2 +

y2 � 2z2 + 4xy � xz, then A =

0

@3 2 �1/2

2 1 0

�1/2 0 �2

1

A.

3.5 Change of variable under the general linear group

Remark 59. Let V be a finite-dimensional vector space. A cobasis of V is a sequence( f

1

, . . . , fn) of linear maps fi : V ! K such that for every linear map g : V ! Kthere are unique scalars ↵i such that g = ↵

1

f1

+ · · ·+↵n fn (recall that this meansg(x) = ↵

1

f1

(x) + · · ·+↵n fn(x) for all x 2 V).We also say that ( f

1

, . . . , fn) is a system of linear coordinates on V. We sometimesabuse notation and write fi instead of fi(x), as in {x 2 V | f

1

+ 2 f3

= 0}.A basis (e

1

, . . . , en) and cobasis ( f1

, . . . , fn) of V are said to be dual to each other(or correspond to each other) if fi(e j) = �i j for all i, j.

Sometimes a cobasis is more useful than a basis. For example, if ( f1

, . . . , fn) is acobasis and ↵i j 2 K for 1 i j n then

Â1i jn

↵i j fi f j

is a quadratic form on V.It is for the sake of simplicity that we have introduced cobasis in the foregoing.

The grown-up way to say ‘cobasis of V’ is ‘basis of the dual of V’.

Theorem 60. Assume that 2 6= 0 in K.Let q be a quadratic form on V and write dim(V) = n. Then there are ↵

1

, . . . ,↵n 2 K and a basis F of V such that q(v) = Ân

i=1

↵i y2

i , where the yi are the coordinatesof v with respect to F.

Equivalently, any symmetric matrix is congruent to a diagonal one.

Proof. This is by induction on n. There is nothing to prove when n = 1. As usual,let A = (↵i j) be the matrix of q with respect to an initial basis e

1

, . . . , en and writev = ⌃i xi ei.

Case 1. First suppose that ↵11

6= 0. As in the example in Subsection 3.3, we cancomplete the square. Since 2 6= 0 we can write

q(v) = ��x2

1

+ 2�2

x1

x2

+ · · ·+ 2�n x1

xn�+ q

0

(v),

where q0

is a quadratic form involving only the coordinates x2

, . . . , xn and �i,� 2 K.We make the change of coordinates

y1

= x1

+�2

x2

+ · · ·+�n xn, yi = xi for 2 i n.

Then q(v) = � y2

1

+ q1

(v) for another quadratic form q1

(v) involving only y2

, . . . , yn.By the inductive hypothesis (applied to the subspace of V spanned by e

2

, . . . , en),we can change the coordinates of q

1

from y2

, . . . , yn to z2

, . . . , zn, say, to bring it tothe required form, and then we get q(v) = Ân

i=1

↵i z2

i (where z1

= y1

) as required.

Case 2. ↵11

= 0 but↵ii 6= 0 for some i > 1. In this case, we start by interchanginge

1

with ei (or equivalently x1

with xi), which takes us back to case 1.

Case 3. ↵ii = 0 for all i. If ↵i j = 0 for all i and j then there is nothing to prove,so assume that ↵i j 6= 0 for some i, j. Then we start by making a coordinate changexi = yi + yj, xj = yi � yj, xk = yk for k 6= i, j. This introduces terms 2↵i j(y2

i � y2

j )into q, taking us back to case 2. ⇤


Example 61. Let n = 3 and q(v) = xy + 3yz � 5xz, so A =

0

@0 1/2 �5/2

1/2 0 3/2

�5/2 3/2 0

1

A.

Since we are using x, y, z for our variables, we can use x1

, y1

, z1

(rather thanx0, y0, z0) for the variables with respect to a new basis, which will make things typo-graphically simpler!

We are in Case 3 of the proof above, and so we start with a coordinate changex = x

1

+ y1

, y = x1

� y1

, z = z1

, which corresponds to the basis change matrixP

1

=0

@1 1 0

1 �1 0

0 0 1

1

A. Then we get q(v) = x2

1

� y2

1

� 2x1

z1

� 8y1

z1

.

We are now in Case 1 of the proof above, and the next basis change, from com-pleting the square, is x

2

= x1

� z1

, y2

= y1

, z2

= z1

, or equivalently, x1

= x2

+ z2

,y

1

= y2

, z1

= z2

, and then the associated basis change matrix is P2

=0

@1 0 1

0 1 0

0 0 1

1

A,

and q(v) = x2

2

� y2

2

� 8y2

z2

� z2

2

.We now proceed by induction on the 2-coordinate form in y

2

, z2

, and completingthe square again leads to the basis change x

3

= x2

, y3

= y2

+ 4z2

, z3

= z2

, whichcorresponds to the basis change matrix P

3

=0

@1 0 0

0 1 �4

0 0 1

1

A, and q(v) = x2

3

� y2

3

+ 15z2

3

.

The total basis change in moving from the original basis with coordinates x, y, zto the final basis with coordinates x

3

, y3

, z3

is

P = P1

P2

P3

=

0

@1 1 �3

1 �1 5

0 0 1

1

A,

and you can check that PTAP =

0

@1 0 0

0 �1 0

0 0 15

1

A, as expected.

Lemma 62. Let P, A 2 Kn,n with P invertible. Then rank(A) = rank(PTAP).

Proof. In MA106 we proved that rank(A) = rank(QAP) for any invertible matricesP, Q. Well, PT is also invertible and (PT)�1 = (P�1)T. Setting Q := PT finishes theproof. ⇤Definition 63. Assume 2 6= 0 in K and q 2 Q(V, K). We define rank(q) to be therank of the matrix of q with respect to a basis E of V. This rank doesn’t depend on Eby theorem 51 and lemma 62.

If PTAP is diagonal, then its rank is equal to the number of non-zero terms on thediagonal.

Remark 64. Notice that both statements of theorem 60 fail in characteristic 2. Let Kbe the field of two elements. The quadratic form q(x, y) = xy cannot be diagonalised.

Similarly, the symmetric matrix ( 0 1

1 0

) is not congruent to a diagonal matrix. Doit as an exercise: there are 6 possible change of variable (you need to choose twoamong three possible variables x, y and x + y) and you can observe directly whathappens with each change of variables.

Proposition 65. Any quadratic form q over C has the form q(v) = Âri=1

y2

i with respectto a suitable basis, where r = rank(q).

Equivalently, for any symmetric matrix A 2 Cn,n, there is an invertible matrix P 2Cn,n such that PTAP = B, where B = (�i j) is a diagonal matrix with �ii = 1 for1 i r, �ii = 0 for r + 1 i n, and r = rank(A).

Proof. By theorem 60 there are coordinates xi such that q(v) = Âni=1

↵ii x2

i .We may assume that ↵ii 6= 0 for 1 i r and ↵ii = 0 for r + 1 i n, where

r = rank(q) (otherwise we permute the coordinates).Finally we make a coordinate change yi =

p↵ii xi (1 i r), giving q(v) =

Âri=1

y2

i . ⇤


When K = R, we cannot take square roots of negative numbers, but we canreplace each positive ↵i by 1 and each negative ↵i by �1 to get:

Proposition 66 (Sylvester’s theorem). Any quadratic form q over R has the formq(v) = Ât

i=1

x2

i � Âui=1

x2

t+i with respect to a suitable basis, where t + u = rank(q).Equivalently, given a symmetric matrix A 2 Rn,n, there is an invertible matrix P 2

Rn,n such that PTAP = B, where B = (�i j) is a diagonal matrix with �ii = 1 for1 i t, �ii = �1 for t + 1 i t + u, and �ii = 0 for t + u + 1 i n, andt + u = rank(A). ⇤

We shall now prove that the numbers t and u of positive and negative terms areinvariants of q, that is, don’t depend on the basis. The difference t � u is called thesignature of q.

Theorem 67 (Sylvester’s Law of Inertia). Let q be a quadratic form on the vector spaceV over R. Suppose that E and F are two bases of V with respective coordinates xi andyi such that

q(v) =t

Âi=1

x2

i �u

Âi=1

x2

t+i =t0

Âi=1

y2

i �u0

Âi=1

y2

t0+i. (68)

Then t = t0 and u = u0.

Proof. Write n = dim(V). We know that t + u = rank(q) = t0 + u0, so it is enoughto prove that t = t0. Suppose not, and suppose that t > t0. Let

V1

= {v 2 V | xt+1

= xt+2

= . . . = xn = 0}V

2

= {v 2 V | y1

= y2

= . . . = yt0 = 0}.

Then

dim(V1

\ V2

) = dim(V1

) + dim(V2

)� dim(V1

+ V2

) = t + (n � t0)� dim(V

1

+ V2

) � t + (n � t0)� dim(V) = t � t0 > 0

and there is a non-zero vector v 2 V1

\ V2

. But it is easily seen from (68) that0 6= v 2 V

1

) q(v) > 0 and 0 6= v 2 V2

) q(v) < 0. This contradiction completesthe proof. ⇤

3.6 Change of variable under the orthogonal group

In subsection 3.6, we assume throughout that K = R.

Definition 69. A quadratic form q on V is said to be positive definite if q(v) > 0

whenever 0 6= v 2 V. The associated symmetric bilinear form ⌧ = (q) is also calledpositive definite when q is.

Proposition 70. Let q be a positive definite quadratic form on V and put n = dim(V).Then there are coordinates xi on V such that q(v) = x2

1

+ · · ·+ x2

n.Equivalently, the matrix of q with respect to a suitable basis is In.

Proof. Easy using proposition 66. ⇤Definition 71. A vector space V over R together with a positive definite symmetricbilinear form ⌧ is called a Euclidean space. We write v · w instead of ⌧(v, w).

Definition 72. Let ⌧ be a bilinear form on a vector space V. A sequence of vectorsF = ( f

1

, . . . , fn) in V is said to be orthonormal if ⌧( fi, f j) = �i j for all i, j.


Every Euclidean space admits an orthonormal basis by proposition 70.We shall assume from now on that (V, ⌧) is a Euclidean space and we fix an

orthonormal basis E = (e1

, . . . , en) of V.Note that v · w = [E, v]T [E, w] for all v, w 2 V.

Definition 73. A linear operator T : V ! V is said to be orthogonal if T(v) · T(w) =v · w for all v, w 2 V. In words, it preserves the scalar product on V.

Definition 74. An n ⇥ n matrix A is called orthogonal if ATA = In.

Proposition 75. A linear operator T : V ! V is orthogonal if and only if [E, T, E] isorthogonal (recall our assumption that E is orthonormal).

Proof. Write A := [E, T, E]. For all v, w 2 V

T(v) · T(w) = [E, Tv]T [E, Tw] =�

A[E, v]�T�A[E, w]

�= [E, v]T

�ATA

�[E, w]

and likewise v · w = [E, v]T[E, w]. Therefore

T is orthogonal () T(v) · T(w) = v · w for all v, w 2 v

() [E, v]T�

ATA�[E, w] = [E, v]T In[E, w] for all v, w () ATA = In. ⇤

Proposition 76. Every orthogonal matrix A is invertible. Equivalently, every orthogo-nal operator is invertible.

Proof. The identity ATA = In implies that AT is an inverse to A. ⇤Proposition 77. Let T : V ! V be linear. Then T is orthogonal if and only if (T(e

1

),. . . , T(en)) is an orthonormal basis.

Proof. Proof of ). By proposition 76 T is invertible so (T(e1

), . . . , T(en)) is a basis.Setting (v, w) = (ei, e j) in the equation T(v) · T(w) = v · w gives T(ei) · T(e j) = �i jfor all i, j as required.

Proof of (. Let v, w 2 V and write v = Âi ai ei, w = Â j b j e j. Using bilinearity ofthe scalar product

T(v) · T(w) =⇣

T Âi

ai ei

⌘·⇣

T Âj

b j e j

⌘= Â

i, jai b j (Tei · Te j) = Â

iai bi = v · w. ⇤

Proposition 78. Let A 2 Rn,n and let c j denote the jth column of A. Then A isorthogonal if and only if c

Ti c j = �i j for all i, j.

Proof. Let c j denote the jth column of A. Then c

Ti is the ith row of AT. So the (i, j)th

entry of ATA is c

Ti c j. So ATA = In if and only if c

Ti c j = �i j for all i, j. ⇤

Example 79. For any ✓ 2 R, let A =✓

cos✓ � sin✓sin✓ cos✓

◆. (This represents a counter-

clockwise rotation through an angle ✓.) Then it is easily checked that ATA = AAT =I2

. Notice that the columns of A are mutually orthogonal vectors of length 1, and thesame applies to the rows of A.

Definition/Lemma 80 (Gram-Schmidt step). Let F = ( f

1

, . . . , fn) be a basis of aEuclidean space V whose first t � 1 vectors are orthonormal. We define St(F) =( f

1

, . . . , ft�1

, h, ft+1

, . . . , fn) where

g = ft �t�1

Âi=1

( ft · fi) fi, h = g (g · g)�1/2

(note that g · g 6= 0 because F is independent). Then St(F) is a basis of V whose firstt vectors are orthonormal.


Proof. For 1 j < t

g · f j = ft · f j �t�1

Âi=1

( ft · fi)( fi · f j) = ft · f j � ft · f j = 0.

Note that g 6= 0 because F is independent. Therefore h := g (g · g)�1/2 is well-defined. We still have h · f j = 0 whenever 1 j < t. We also have h · h = 1 byconstruction. ⇤Definition 81. Let F be any basis of an n-dimensional Euclidean space V. Note thatSn · · · S

1

(F) is an orthonormal basis of V by lemma 80. Turning F into the latter iscalled the Gram-Schmidt orthonormalisation process.

A slight variation of the Gram-Schmidt process replaces ft by

ft �t�1

Âi=1

ft · fi

fi · fifi

in the tth step and rescales all vectors at the very end. This gives the same result asthe original Gram-Schmidt process but is faster in practice.

Corollary 82. Let f

1

, . . . , fr be orthonormal vectors in an n-dimensional Euclidean vec-tor space. Then they can be extended to an orthonormal basis ( f

1

, . . . , fn).

Proof. We prove first that f

1

, . . . , fr are linearly independent. Suppose Âri=1

xi fi = 0for some x

1

, . . . , xr 2 R and let j be such that 1 j r. Taking the scalar productwith f j gives 0 = f j · 0 = Âr

i=1

xi f j · fi = xj, by since f

1

, . . . , fr are orthonormal.In MA106 you proved that these can be extended to a basis F = ( f

1

, . . . , fn). NowSn · · · Sr+1

(F) is an orthonormal basis and includes fi for i r as required. ⇤Proposition 83. Let A 2 Rn,n be a real symmetric matrix. Then:

(a) All complex eigenvalues of A lie in R.

(b) If n > 0 then A has a real eigenvalue.

Proof. Proof of (a). For a column vector v or matrix B over C, we denote by v orB the result of replacing all entries of v or B by their complex conjugates. Since theentries of A lie in R, we have A = A.

Let v be a complex eigenvector associated with �. Then

Av = �v (84)

so, taking complex conjugates and using A = A, we get

Av = �v. (85)

Transposing (84) and using AT = A gives

v

TA = �v

T. (86)

By (85) and (86) we have�v

Tv = v

TAv = �v

Tv

and so (�� )vTv = 0.

But v

Tv 6= 0 because writing v = (↵

1

, . . . ,↵n)T we have v

Tv = ↵

1

↵1

+ · · · +↵n↵n 2 R>0

as v 6= 0. Thus � = �, so � 2 R.Proof of (b). In MA106 we proved that there exists a complex eigenvalue � of A.

By (a) � is a real eigenvalue of A. ⇤


Before coming to the main theorem of this section, we recall the block sum A � Bof matrices, which we introduced in section 2.5. It is straightforward to check that(A

1

� B1

)(A2

� B2

) = (A1

A2

� B1

B2

), provided the sizes of the matrices are suchthat A

1

A2

and B1

B2

are defined.

Theorem 87. For any symmetric matrix A 2 Rn,n, there is an orthogonal matrix Psuch that PTAP is a diagonal matrix.

Note PT = P�1 so that A is simultaneously congruent and similar to the diagonalmatrix PTAP.

Proof. Induction on n. For n = 1 there is nothing to prove. Assume it’s true for n� 1.We use the standard inner product on Rn defined by v · w = v

Tw and the stan-

dard (orthonormal) basis E = (e1

, . . . , en) of Rn.By proposition 83 there exists an eigenvector g 2 Rn of A with eigenvalue, say,

�. Note that g · g > 0 and put f

1

= g (g · g)�1/2. Then f

1

· f

1

= 1 so by corollary 82there exists an orthonormal basis F of Rn whose first vector is f

1

.Put S = [E, 1, F]. Then S is orthogonal by proposition 77. So ST = S�1.Put B = STAS. Then B is again symmetric. Moreover Be

1

= � e

1

because

Be

1

= S�1 ASe

1

= [F1E]A[E1F]e1

= [F1E]A f

1

= [F1E]� f

1

= �e

1

.

In other words, B is a block sum ��C for some symmetric (n� 1)⇥ (n� 1) matrix C.By the induction hypothesis there exists an orthogonal (n � 1)⇥ (n � 1) matrix

Q such that QTCQ is diagonal. Putting P = S(1 � Q) we deduce

PTAP = (1 � Q)TSTA S (1 � Q) = (1 � Q)TB (1 � Q)

= (1 � QT)(�� C)(1 � Q) = (�� QTCQ)

which is diagonal. Also, P is orthogonal because

PTP = (1 � Q)T(STS)(1 � Q)

= (1 � QT)(1 � Q) = 1 � QTQ = 1 � In�1

= In. ⇤

The following is just a restatement of theorem 87:

Theorem 88. Let q be a (second) quadratic form defined on a Euclidean space V. Thenthere exists an orthonormal basis F, with coordinates yi say, and scalars ↵i 2 R suchthat q(v) = Ân

i=1

↵i y2

i . Furthermore, the numbers ↵i are uniquely determined by q.

Proof. Fix an orthonormal basis E of V. Let A be the matrix of q with respect toE. Then A is symmetric. By theorem 87 there exists an orthogonal matrix P 2 Rn,n

such that B := PTAP is diagonal. But B is the matrix of q with respect to the basis Fdefined by P = [E1F]. So q is of the form q(v) = Ân

i=1

↵i y2

i as stated. Moreover F isorthonormal by proposition 77.

Finally, the ↵i are the eigenvalues of the matrix of q with respect to any orthonor-mal basis and hence depend only on q. ⇤

Although it is not used in the proof of the theorem above, the following proposi-tion is useful when calculating examples. It helps us to write down more vectors inthe final orthonormal basis immediately, without having to use corollary 82 repeat-edly.

We use the standard inner product on Rn defined by v · w = v

Tw.

Proposition 89. Let A be a real symmetric matrix, and let �1

, �2

be two distinct eigen-values of A, with corresponding eigenvectors v

1

, v

2

. Then v

1

· v

2

= 0.

Proof. We have

Av

1

= �1

v

1

(1) and Av

2

= �2

v

2

(2).


Transposing (1) and using A = AT gives v

T1

A = �1

v

T1

, and so

v

T1

Av

2

= �1

v

T1

v

2

(3) and by (2) v

T1

Av

2

= �2

v

T1

v

2

(4).

Subtracting (3) from (4) gives (�2

� �1

)vT1

v

2

= 0. Since �2

� �1

6= 0 by assumption,we have v

T1

v

2

= 0. ⇤In the following examples a real symmetric matrix A is given and we aim to find

an orthogonal P such that PTAP is diagonal.

Example 90. Let n = 2 and q(v) = x2 + y2 + 6xy, so A = ( 1 3

3 1

). Then

det(A � xI2

) = (1 � x)2 � 9 = x2 � 2x � 8 = (x � 4)(x + 2),

so the eigenvalues of A are 4 and �2. Solving Av = �v for � = 4 and �2, we findcorresponding eigenvectors (1 1)T and (1 �1)T. Proposition 89 tells us that thesevectors are orthogonal to each other (which we can of course check directly!), soif we divide them by their lengths to give vectors of length 1, giving ( 1p

2

1p2

)T and

( 1p2

�1p2

)T then we get an orthonormal basis consisting of eigenvectors of A, whichis what we want. The corresponding basis change matrix P has these vectors ascolumns, so P = 1p

2

( 1 1

1 �1

), and we can check that PTP = I2

(hence P is orthogonal)and that PTAP = ( 4 0

0 �2

).

Example 91. Let n = 3 and q(v) = 3x2 + 6y2 + 3z2 � 4xy � 4yz + 2xz, so

A =

0

@3 �2 1

�2 6 �2

1 �2 3

1

A.

Then, expanding by the first row,

det(A � xI3

) = (3 � x)(6 � x)(3 � x)� 4(3 � x)� 4(3 � x) + 4 + 4 � (6 � x)

= �x3 + 12x2 � 36x + 32 = (2 � x)2(8 � x),

so the eigenvalues are 8, 2, 2.For the eigenvalue 8, if we solve Av = 8v then we find a solution v = (1 �2 1)T.

Since 2 is a repeated eigenvalue, we need two corresponding eigenvectors, whichmust be orthogonal to each other. The equations Av = 2v all reduce to x � 2y + z =0, and so any vector (x, y, z)T satisfying this equation is an eigenvector for � = 2. Byproposition 89 these eigenvectors will all be orthogonal to the eigenvector for � = 8,but we will have to choose them orthogonal to each other. We can choose the first onearbitrarily, so let’s choose (1 0 �1)T. We now need another solution that is orthogonalto this. In other words, we want x, y and z not all zero satisfying x � 2y + z = 0 andx � z = 0, and x = y = z = 1 is a solution. So we now have a basis (1 �2 1)T,(1 0 �1)T, (1 1 1)T of three mutually orthogonal eigenvectors.

To get an orthonormal basis, we just need to divide by their lengths, which are,respectively,

p6,

p2, and

p3, and then the basis change matrix P has these vectors

as columns, so

P =

0

BB@

1p6

1p2

1p3

� 2p6

0

1p3

1p6

� 1p2

1p3

1

CCA .

It can then be checked that PTP = I3

and that PTAP is the diagonal matrix withdiagonal 8, 2, 2.


3.7 Applications of quadratic forms to geometry

3.7.1 Reduction of the general second degree equation

Definition 92. A quadric in Rn is a subset of Rn of the form {v 2 Rn | f (v) = 0}where f is a polynomial of degree 2, that is, of the form

f =n

Âi=1

↵i x2

i +n

Âi=1

i�1

Âj=1

↵i j xi x j +n

Âi=1

�i xi + � (93)

such that some ↵i or ↵i j is nonzero.

For example, both x2

1

+ 1 = 0 and x2

1

+ 2 = 0 define the same quadric: the emptyset.

A quadric in R2 is also called a quadric curve, in R3 a quadric surface.An isometry of Rn is a permutation preserving distance. Equivalently, it’s the

composition of an orthogonal operator with a translation x 7! x + a.

Theorem 94. Any quadric in Rn can be moved by an appropriate isometry to one of thefollowing:

r

Âi=1

↵i x2

i = 0

r

Âi=1

↵i x2

i + 1 = 0

r

Âi=1

↵i x2

i + xr+1

= 0.

Proof. Step 1. By theorem 88, we can apply an orthogonal basis change (that is,an isometry of Rn that fixes the origin) which has the effect of eliminating the terms↵i j xi x j in (93).

Step 2. Whenever ↵i 6= 0, we can replace xi by xi ��i/(2↵i), and thereby elimi-nate the term �i xi from the equation. This transformation is just a translation, whichis also an isometry.

Step 3. If ↵i = 0, then we cannot eliminate the term �ixi. Let us permute thecoordinates such that ↵i 6= 0 for 1 i r, and �i 6= 0 for r + 1 i r + s. Then ifs > 1, by using corollary 82, we can find an orthogonal transformation that leaves xiunchanged for 1 i r and replaces Âs

i=1

�r+ j xr+ j by �xr+1

(where � is the lengthof Âs

i=1

�r+ j xr+ j), and then we have only a single non-zero �i, namely �r+1

.Step 4. Finally, if there is a non-zero �r+1

= �, then we can perform the transla-tion that replaces xr+1

by xr+1

��/�, and thereby eliminate �. We have now reducedto one of two possible types of equation:

r

Âi=1

↵i x2

i + � = 0 andr

Âi=1

↵i x2

i +� xr+1

= 0.

By dividing through by � or �, we can assume that � = 0 or 1 in the first equation,and that � = 1 in the second. ⇤

A curve defined by the first or second equation is called central quadric because ithas central symmetry, that is, if a vector v satisfies the equation, then so does �v.

We shall now consider the types of curves and surfaces that can arise in the famil-iar cases n = 2 and n = 3. These different types correspond to whether the ↵i arepositive, negative or zero, and whether � = 0 or 1.

We shall use x, y, z instead of x1

, x2

, x3

, and ↵, �, � instead of ↵1

, ↵2

, ↵3

. We shallassume also that ↵, �, � are all positive, and write �↵, etc., for the negative case.

3.7.2 The case n = 2

When n = 2 we have the following possibilities.

(1) ↵x2 = 0. This just defines the line x = 0 (the y-axis).


(2) ↵x2 = 1. This defines the two parallel lines x = ± 1p↵

.

(3) �↵x2 = 1. This is the empty curve!

(4) ↵x2 +�y2 = 0. The single point (0, 0).

(5) ↵x2 ��y2 = 0. Two straight lines y = ±q↵� x, which intersect at (0, 0).

(6) ↵x2 +�y2 = 1. An ellipse.

(7) ↵x2 ��y2 = 1. A hyperbola.

(8) �↵x2 ��y2 = 1. The empty curve again.

(9) ↵x2 � y = 0. A parabola.

3.7.3 The case n = 3

–8–6

–4–2

02

46

8

x–4

–20

24

y

–4

–2

0

2

4

z

Figure 4: x2/4 + y2 � z2 = 0

When n = 3, we still get the nine possibilities (1)–(9) that we had in the casen = 2, but now they must be regarded as equations in the three variables x, y, z thathappen not to involve z.

So, in case (1), we now get the plane x = 0, in case (2) we get two parallel planesx = ±1/

p↵, in case (4) we get the line x = y = 0 (the z-axis), in case (5) two

intersecting planes y = ±p↵/�x, and in cases (6), (7) and (9), we get, respectively,

elliptical, hyperbolic and parabolic cylinders.The remaining cases involve all of x, y and z. We omit �↵x2 � �y2 � �z2 = 1,

which is empty.

(10). ↵x2 +�y2 + �z2 = 0. The single point (0, 0, 0).

(11). ↵x2 +�y2 � �z2 = 0. See Figure 4.This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of

the form ↵x2 +�y2 = c, whereas the cross sections parallel to the other coordinateplanes are generally hyperbolas. Notice also that if a particular point (a, b, c) is onthe surface, then so is t(a, b, c) for any t 2 R. In other words, the surface contains thestraight line through the origin and any of its points. Such lines are called generators.


Figure 5:

–2

–1

0

1

2

x

–2

–1

0

1

2

y

–2

–1

0

1

2

z

(a). x2 + 2y2 + 4z2 = 7

–4

–2

0

2

4

x–2–1

01

2

y

–2

0

2

z

(b). x2/4 + y2 � z2 = 1


–8–6

–4–2

02

46

8

x–4–2

02

4

y

–6

–4

–2

0

2

4

6

z

Figure 6: x2/4 � y2 � z2 = 1

When each point of a 3-dimensional surface lies on one or more generators, it ispossible to make a model of the surface with straight lengths of wire or string.

(12). ↵x2 +�y2 + �z2 = 1. An ellipsoid. See Figure 5(a).(13). ↵x2 +�y2 � �z2 = 1. A hyperboloid. See Figure 5(b).There are two types of 3-dimensional hyperboloids. This one is connected, and

is known as a hyperboloid of one sheet. Although it is not immediately obvious, eachpoint of this surface lies on exactly two generators; that is, lines that lie entirely onthe surface. For each � 2 R, the line defined by the pair of equations

p↵ x �p

� z = �(1 �p� y); �(

p↵ x +

p� z) = 1 +

p� y.

lies entirely on the surface; to see this, just multiply the two equations together. Thesame applies to the lines defined by the pairs of equations

p� y �p

� z = µ(1 �p↵ x); µ(

p� y +

p� z) = 1 +

p↵ x.

It can be shown that each point on the surface lies on exactly one of the lines in eachif these two families.

(14). ↵x2 � �y2 � �z2 = 1. A hyperboloid. See Figure 6. hyperboloid of twosheets. It does not have generators. Besides it is easy to observe that it is disconnected.Substitute x = 0 into its equation. The resulting equation ��y2 � �z2 = 1 has nosolutions. This means that the hyperboloid does not intersect the plane x = 0. Acloser inspection confirms that the two parts of the hyperboloid lie on opposite sidesof the plane: intersect the hyperboloid with the line y = z = 0 to see two points onboth sides.

(15). ↵x2 +�y2 � z = 0. An elliptical paraboloid. See Figure 7(a).(16). ↵x2 � �y2 � z = 0. A hyperbolic paraboloid. See Figure 7(b). As in the

case of the hyperboloid of one sheet, there are two generators passing through eachpoint of this surface, one from each of the following two families of lines:

�(p↵ x �

p�) y = z;

p↵ x +

p� y = �.

µ(p↵ x +

p�) y = z;

p↵ x �

p� y = µ.


Figure 7:

–4

–2

0

2

4

x–2–1

01

2

y

0

2

4

z

(a). z = x2/2 + y2

–4

–2

0

2

4

x

–4–2

02

4

y

–10–8–6–4–20246810

z

(b). z = x2 � y2


3.8 Unitary, hermitian and normal matrices

In the rest of this chapter K = C unless stated otherwise. All our vector spaces arefinite-dimensional.

Definition 95. Let W, V be complex vector spaces. A map ⌧ : W ⇥ V ! C is calledsesquilinear 11 if

(a) ⌧(↵1

w

1

+↵2

w

2

, v) = ↵1

⌧(w1

, v) +↵2

⌧(w2

, v) and

(b) ⌧(w,↵1

v

1

+↵2

v

2

) = ↵1

⌧(w, v

1

) +↵2

⌧(w, v

2

)

for all w, w

1

, w

2

2 W, v, v

1

, v

2

2 V, and ↵1

,↵2

2 C. Here z denotes the complexconjugate of z. If W = V then ⌧ is called a sesquilinear form on V.

Let us choose a basis E = (e1

, . . . , en) of V and a basis F = ( f

1

, . . . , fm) of W.Let ⌧ : W ⇥ V ! K be a sesquilinear map, and let ↵i j = ⌧( fi, e j), for 1 i m,

1 j n. The m ⇥ n matrix A = (↵i j) is called the matrix of ⌧ with respect to thebases E and F.

Let A⇤ = ¯AT denote the conjugate matrix of a matrix A. Similar to (47) we have

⌧(w, v) =m

Âi=1

n

Âj=1

yi ⌧( fi, e j) xj =m

Âi=1

n

Âj=1

yi↵i j x j = [F, w]⇤A[E, v] (96)

The standard inner product on Cn is defined by v · w = v

⇤w rather than v

Tw. Note

that, for v 2 Rn, v

⇤ = v

T, so this definition is compatible with the one for realvectors. The length |v| of a vector is given by |v|2 = v · v = v

⇤v, which is always a

non-negative real number.We shall formulate several propositions that generalise results of the previous

sections to hermitian matrices. The proofs are very similar and left for you to fill inas an exercise. The first two propositions are analogous to theorems 49 and 51.

Proposition 97. Let ⌧ : W ⇥V ! C be sesquilinear. Let E1

, E2

be bases of V and F1

, F2

of W. Let A (respectively, B) be the matrix of ⌧ with respect to E1

, F1

(respectively,E

2

, F2

). Then B = [F1

, 1, F2

]⇤A[E1

, 1, E2

]. ⇤Proposition 98. Let ⌧ : V ⇥ V ! C be sesquilinear. Let E

1

, E2

be bases of V. Let A(respectively, B) be the matrix of ⌧ with respect to E

1

(respectively, E2

). Then B =[E

1

, 1, E2

]⇤A[E1

, 1, E2

]. ⇤Definition 99. A matrix A 2 Cn,n is called hermitian if A = A⇤. A sesquilinear form⌧ on V is called hermitian if ⌧(w, v) = ⌧(v, w) for all v, w 2 V. ⇤

These are the complex analogues of symmetric matrices and symmetric bilinearforms. The following proposition is an analogue of proposition 54.

Proposition 100. Let ⌧ : V ⇥ V ! C be sesquilinear and let E be a basis of V. Then ⌧is hermitian if and only if the matrix of ⌧ with respect to E is hermitian. ⇤

Two hermitian matrices A and B are congruent if there exists an invertible matrixP with B = P⇤AP. A Hermitian quadratic form is a function q : V ! C such that thereexists a sesquilinear form ⌧ on V with q(v) = ⌧(v, v) for all v 2 V. The following isa hermitian version of Sylvester’s theorems (proposition 66 and theorem 67).

Proposition 101. Any hermitian quadratic form q has the form q(v) = Âti=1

|xi|2 �Âu

i=1

|xt+i|2 with respect to a suitable basis.Equivalently, given a hermitian matrix A 2 Cn,n, there is an invertible matrix P 2

Cn,n such that P⇤AP = B, where B = (�i j) is a diagonal matrix with �ii = 1 for1 i t, �ii = �1 for t + 1 i t + u, and �ii = 0 for t + u + 1 i n, andt + u = rank(A).

11from Latin one and a half


The numbers t and u are uniquely determined by q (or A). ⇤Similarly to the real case, t + u is called the rank of q (or A) and t � u the signa-

ture.A hermitian quadratic form q on V is said to be positive definite if q(v) > 0 for all

nonzero v 2 V. By proposition 101 a positive definite hermitian form looks like thestandard inner product on Cn in some choice of a basis. A hermitian vector space is avector space over C equipped with a hermitian positive definite form.

Definition 102. A linear operator T : V ! V on a hermitian vector space (V, ⌧) issaid to be unitary if it preserves ⌧ , that is, if ⌧(Tv, Tw) = ⌧(v, w) for all v, w 2 V.

Definition 103. A matrix A 2 Cn,n is called unitary if A⇤A = In.

Definition 104. A sequence of vectors (e1

, . . . , en) in a hermitian space (V, ⌧) is or-thonormal if ⌧(ei, e j) = �i j for all i, j.

The following is an analogue of proposition 75.

Proposition 105. Let E be an orthonormal basis of a hermitian space (V, ⌧) and letT : V ! V be linear. Then T is unitary if and only if [ETE] is unitary. ⇤

The Gram-Schmidt process works perfectly well in the hermitian setting. In par-ticular we obtain the following analogue to corollary 82:

Proposition 106. Let (V, ⌧) be a hermitian space of dimension n. Then any orthonor-mal vectors f

1

, . . . , fr can be extended to an orthonormal basis ( f

1

, . . . , fn). ⇤Proposition 106 ensures the existence of orthonormal bases in hermitian spaces.

Proposition 83 and theorems 87 and 88 have analogues as well.

Proposition 107. Let A 2 Cn,n be a complex hermitian matrix. Then all complexeigenvalues of A are real. If n > 0 then A has a real eigenvalue. ⇤Theorem 108. Let q be a (second) hermitian quadratic form defined on a hermitianspace (V, ⌧). Then there exists an orthonormal basis F, with coordinates yi say, and realscalars ↵i 2 R such that q(v) = Ân

i=1

↵i y2

i . Furthermore, the numbers ↵i are uniquelydetermined by q.

Equivalently, for any hermitian matrix A there is a unitary matrix P such that P⇤APis a real diagonal matrix. ⇤

Notice the crucial difference between theorem 87 and proposition 108. In theformer we start with a real matrix to end up with a real diagonal matrix. In the latterwe start with a complex matrix but still we end up with a real diagonal matrix. Thepoint is that theorem 88 admits a useful generalisation to a wider class of matrices.

Definition 109. A matrix A 2 Cn,n is called normal if AA⇤ = A⇤A.

In particular, all diagonal, all hermitian and all unitary matrices are normal. Con-sequently, all real symmetric and real orthogonal matrices are normal.

Lemma 110. If A 2 Cn,n is normal and P 2 Cn,n is unitary, then P⇤AP is normal.

Proof. If B = P⇤AP then using (CD)⇤ = D⇤C⇤

BB⇤ = (P⇤AP)(P⇤AP)⇤ = P⇤APP⇤A⇤P= P⇤AA⇤P = P⇤A⇤AP = (P⇤A⇤P)(P⇤AP) = B⇤B. ⇤

Theorem 111. A matrix A 2 Cn,n is normal if and only if there exists a unitary matrixP 2 Cn,n such that P⇤AP is diagonal12.

12with complex entries


Proof. The “if” part follows from lemma 110 as diagonal matrices are normal.For the “only if” part we proceed by induction on n. If n = 1, there is nothing to

prove. Let us assume we have proved the statement for all dimensions less than n.The matrix A admits an eigenvector v 2 Cn with eigenvalue �. Let W be the vectorsubspace of all vectors x satisfying Ax = �x. If W = Cn then A is a scalar matrix andwe are done. Otherwise, we have a nontrivial13 decomposition Cn = W �W? whereW? = {v 2 Cn | v

⇤w = 0 for all w 2 W}.

Let us notice that A⇤W ⇢ W because AA⇤x = A⇤Ax = A⇤�x = �(A⇤

x) for anyx 2 W.

It follows that AW? ⇢ W? since (Ay)⇤x = y

⇤(A⇤x) 2 y

⇤W = 0 so (Ay)⇤x = 0

for all x 2 W, y 2 W?. Also AW ⇢ W.Now choose orthonormal bases of W and W?. Together they form a new or-

thonormal basis of Cn. The change of basis matrix P is unitary, hence by lemma 110the matrix P⇤AP = ( B 0

0 C ) is normal. It follows that the matrices B and C are normalof smaller size and we can use the inductive hypothesis to complete the proof. ⇤

Theorem 111 is an extremely useful criterion for diagonalisability of matrices. Tofind P in practice, we use similar methods to those used in the real case.

Example 112. Let A be the matrix A =✓

6 2+2i2�2i 4

◆. Then

cA(x) = (6 � x)(4 � x)� (2+2i)(2�2i) = x2 � 10x + 16 = (x � 2)(x � 8),

so the eigenvalues of A are 2 and 8. (We saw in proposition 107 that the eigenvaluesof any Hermitian matrix are real.) The corresponding eigenvectors are v

1

= (1+i,�2)T and v

2

= (1+i, 1)T. We find that |v1

|2 = v

⇤1

v

1

= 6 and |v2

|2 = 3, so wedivide by their lengths to get an orthonormal basis v

1

/|v1

|, v

2

/|v2

| of C2. Then thematrix

P =

0

@1+ip

6

1+ip3

�2p6

1p3

1

A

having this basis as columns is unitary and satisfies P⇤AP =✓

2 0

0 8

◆.

3.9 Applications to quantum mechanics (non-examinable)

With all the linear algebra we know it is a little step aside to understand the basics ofquantum mechanics. We discuss Schrodinger’s picture14 of quantum mechanics and(mathematically) derive Heisenberg’s uncertainty principle.

The main ingredient of quantum mechanics is a hermitian vector space (V, h·, ·i).There are physical arguments showing that real Euclidean vector spaces are no goodand that V must be infinite-dimensional. Here we just take their conclusions forgranted. We denote by [v] the line Cv spanned by v 2 V r {0}. The states of thephysical system are the lines [v]. We use normalised vectors, that is, v such thathv, vi = 1, to present states as this makes our formulae slightly easier.

It is impossible to observe the state of the quantum system but we can try to ob-serve some physical quantities such as momentum, energy, spin, etc. Such physicalquantities become observables, that is, hermitian linear operators � : V ! V. Her-mitian in this context means that hx,�yi = h�x, yi for all x, y 2 V. Sweeping asubtle mathematical point under the carpet15, we assume that � is diagonalisable

13that is, neither W nor W? is zero.14The alternative is Heisenberg’s picture which we have no time to discuss here.15If V were finite-dimensional we could have used proposition 108. But V is infinite dimensional! To

ensure diagonalisability V must be complete with respect to the hermitian norm. Such spaces are called


with eigenvectors ei with eigenvalues �i (i � 1). The proof of proposition 107 goesthrough in the infinite dimensional case, so we conclude that all�i belong to R. Backto physics, if we measure � on a state [v] where v is Ân↵nen and is normalised thenthe measurement will return �n as a result with probability |↵n|2.

One observable is energy H : V ! V, often called hamiltonian. It is central to thetheory because it determines the time evolution [v(t)] of the system by Schrodinger’sequation:

dv(t)dt

=1

ihHv(t)

where h ⇡ 10

�34 Joule per second16 is the reduced Planck constant. We know howto solve this equation: v(t) = etH/ih

v(0).As a concrete example, let us look at the quantum oscillator. The full energy of

the classical harmonic oscillator mass m and frequency! is

h =p2

2m+

1

2

m!2x2

where x is the position and p = mx0 is the momentum. To quantise it, we have toplay with this expression. The vector space of all smooth functions C1(R,C) admitsa convenient subspace V = { f (x)e�x2/2 | f (x) 2 C[x]}, which we make hermitian byh�(x), (x)i =

R1�1

¯�(x) (x)dx. Quantum momentum and quantum position arelinear operators (observables) on this space:

P( f (x)) = �ih f 0(x), X( f (x)) = f (x) · x.

The quantum Hamiltonian is a second order differential operator given by the sameequation

H =P2

2m+

1

2

m!2X2 = � h2

2md2

dx2

+1

2

m!2x2

.

As mathematicians, we can assume that m = 1 and ! = 1, so that H( f ) = ( f x2 �f 00)/2. The eigenvectors of H are the Hermite functions

n(x) = (�1)nex2/2(e�x2

)(n) , n = 0, 1, 2 . . .

with eigenvalues n + 1/2 which are discrete energy levels of the quantum oscillator.Notice that h k, ni = �k,n 2

n n!

p⇡ , so they are orthogonal but not orthonormal. The

states [ n] are pure states: they do not change with time and always give n + 1/2 asenergy. If we take a system in a state [v] where v is normalised and

v = Ân↵n

1

⇡4

2

n/2 n!

n

then the measurement of energy will return n + 1/2 with probability |↵n|2. Noticethat the measurement breaks the system!! It changes it to the state [ n] and all futuremeasurements will return the same energy!

Alternatively, it is possible to model the quantum oscillator on the vector spaceW = C[x] of polynomials. One has to use the natural linear bijection

↵ : W ! V, ↵( f (x)) = f (x) e�x2/2

and transfer all the formulae to W. The metric becomes h f , gi = h↵( f ),↵(g)i =R1�1

¯f (x) g(x) e�x2

dx, the formulae for P and X changes accordingly, and at the end

Hilbert spaces. Diagonalisability is still subtle as eigenvectors do not span the whole of V but only adense subspace. Furthermore, if V admits no dense countably dimensional subspace, further difficultiesarise. . . Pandora’s box of functional analysis is wide open, so let us try to keep it shut.

16Notice the physical dimensions: H is energy, t is time, i dimensionless, h equalises the dimensionsin the both sides irrespectively of what v is.


one arrives at Hermite polynomials ↵�1( n(x)) = (�1)nex2

(e�x2

)(n) instead of Her-mite functions.

Let us go back to an abstract system with two observables P and Q. It is pointlessto measure Q after measuring P as the system is broken. But can we measure themsimultaneously? The answer is given by Heisenberg’s uncertainty principle. Mathe-matically, it is a corollary of Schwarz’s inequality:

kvk2 · kwk2 = hv, vihw, wi � |hv, wi|2.

Let e

1

, e

2

, . . . be eigenvectors for P and let p1

, p2

, . . . be the corresponding eigenval-ues. The probability that pj is returned after measuring on [v] with v = Ân↵nennormalised depends on the multiplicity of the eigenvalue:

Prob(pj is returned) = Âpk=pj

|↵k|2.

Hence, we should have the expected value

E(P, v) = Âk

pk|↵k|2 = Âkh↵kek, pk↵keki = hv, P(v)i.

To compute the expected quadratic error we use the shifted observable Pv

= P �E(P, v)I:

D(P, v) = E(Pv

2

, v)1/2

= hP2

v

(v), v)i1/2

= hPv

(v), Pv

(v)i1/2 = kPv

(v)k

where we use the fact that P and Pv

are hermitian. Notice that D(P, v) has a physicalmeaning of uncertainty of measurement of P. Notice also that the operator PQ � QPis no longer hermitian in general but we can still talk about its expected value. Hereis Heisenberg’s principle.

Theorem 113. D(P, v) · D(Q, v) � 1

2

|E(PQ � QP, v)|.

Proof. In the right hand side, E(PQ�QP, v) = E(Pv

Qv

�Qv

Pv

, v) = hv, Pv

Qv

(v)i�hv, Q

v

Pv

(v)i = hPv

(v), Qv

(v)i � hQv

(v), Pv

(v)i. Remembering that the form is her-mitian,

E(PQ � QP, v) = hPv

(v), Qv

(v)i � hPv

(v), Qv

(v)i = 2 · Im(hPv

(v), Qv

(v)i) ,

twice the imaginary part. So the right hand side is estimated by Schwarz’s inequality:

Im(hPv

(v), Qv

(v)i) |hPv

(v), Qv

(v)i| kPv

(v)k · kQv

(v)k . ⇤

Two cases of particular physical interest are commuting observables, that is, PQ =QP and conjugate observables, that is, PQ � QP = ihI. Commuting observable can bemeasured simultaneously with any degree of certainty. Conjugate observables obeyHeisenberg’s uncertainty:

D(P, v) · D(Q, v) � h2

.

4 Finitely Generated Abelian Groups

4.1 Definitions

Groups were introduced in the first year in Foundations, and will be studied in detailnext term in Algebra II: Groups and Rings. In this module, we are only interested inabelian (= commutative) groups, which are defined as follows.


Definition 114. An abelian group is a set G together with a binary operation G ⇥G ! G : (g, h) 7! g+ h, which we write as addition, and which satisfies the followingproperties:

(a) (Associativity). For all g, h, k 2 G, (g + h) + k = g + (h + k);

(b) (Identity or zero). There exists an element 0G 2 G such that g + 0G = g for allg 2 G;

(c) (Inverse or negative). For all g 2 G there exists �g 2 G such that g+(�g) = 0G;

(d) (Commutativity). For all g, h 2 G, g + h = h + g.

It can be shown that 0G is uniquely determined by (G,+). Likewise �g is deter-mined by (G,+, g).

Usually we just write 0 rather than 0G. We only write 0G if we need to distinguishbetween the zero elements of different groups.

The commutativity axiom (d) is not part of the definition of a general group, andfor general (non-abelian) groups, it is more usual to use multiplicative rather thanadditive notation. All groups in this course should be assumed to be abelian, althoughsome of the definitions and results apply equally well to general groups.

The terms identity, inverse are intended for the multiplicative notation of groups.If groups are written additive, as in this chapter, then one says zero, negative instead.

Examples 115. 1. The integers Z.

2. Fix a positive integer n > 0 and let

Zn = {0, 1, 2, . . . , n�1} = {x 2 Z | 0 x < n}.

where addition is computed modulo n. So, for example, when n = 9, we have2 + 5 = 7, 3 + 8 = 2, 6 + 7 = 4, etc. Note that the negative �x of x 2 Zn is equal ton � x (if x 6= 0) in this example. (Later we shall see a better construction of Zn as aquotient Z/nZ.)

3. Examples from linear algebra. Let K be a field, for example, Q, R, C, Zp withp prime.

(a). The elements of K form an abelian group under addition.(b). The non-zero elements of K form an abelian group K⇥ under multiplication.(c). The vectors in any vector space form an abelian group under addition.

4. A group G with just one element (that is, G = {0G}) is said to be a trivialgroup. We simply write G = 0.

Proposition 116. (The cancellation law) Let G be any group, and let g, h, k 2 G. Theng + h = g + k ) h = k.

Proof. Add �g to both sides of the equation and use the axioms of groups. ⇤Definition 117. For n 2 Z and g in an abelian group G we define ng recursively asfollows. Firstly 0g := 0. Next (n + 1)g := ng + g for n � 0. Finally, (�n)g := �(ng)for n > 0.

Exercise 118. Let G be an abelian group. Prove

(m + n)g = mg + ng, m(ng) = (mn)g, n(g + h) = ng + nh (119)

for all m, n 2 Z and g, h 2 G.

Scalar multiplication Z⇥ G ! G allows us to think of abelian groups as “vectorspaces over Z” (or, using correct terminology, Z-modules. Modules over rings willplay a significant role in Rings and Modules in year 3). We shall often use (119) inthis chapter.


One can identify abelian groups with Z-modules and the two terms can be usedinterchangeably. However, Z-modules would be a better term than abelian groupsgiven the material in this chapter.

Definition 120. A group G is called cyclic if there exists an element x 2 G such thatG = {mx | m 2 Z}.

The element x in the definition is called a generator of G. Note that Z and Zn arecyclic with generator 1.

Definition 121. A bijection � : G ! H between two (abelian) groups is called anisomorphism if �(g + h) = �(g) +�(h) for all g, h 2 G. The groups G and H arecalled isomorphic, and we write G ⇠= H, if there is an isomorphism � : G ! H.

Isomorphic groups are often thought of as being essentially the same group, butwith elements having different names.

Exercise 122. Prove that any isomorphism � : G ! H satisfies �(ng) = n�(g) forall g 2 G, n 2 Z.

Proposition 123. Any cyclic group G is isomorphic either to Z or to Zn for some n > 0.

Proof. Let G be cyclic with generator x. So G = {mx | m 2 Z }. Suppose first thatthe elements mx for m 2 Z are all distinct. Then the map � : Z ! G defined by�(m) = mx is a bijection, and it is clearly an isomorphism.

Otherwise, we have lx = mx for some l < m, and so (m� l)x = 0 with m �l > 0. Let n be the least integer with n > 0 and nx = 0. Then the elements0x = 0, 1x, 2x, . . . , (n�1)x of G are all distinct, because otherwise we could find asmaller n. Furthermore, for any mx 2 G, we can write m = rn + s for some r, s 2 Zwith 0 s < n. Then mx = (rn + s)x = sx, so G = { 0, 1x, 2x, . . . , (n�1)x }, and themap � : Zn ! G defined by �(m) = mx for 0 m < n is a bijection, which is easilyseen to be an isomorphism. ⇤Definition 124. For an element g 2 G, the least positive integer n with ng = 0, if itexists, is called the order |g| of g. If there is no such n, then g has infinite order andwe write |g| = 1. The order |G| of a group G is just the number of elements of G.

Exercise 125. If � : G ! H is an isomorphism, then |g| = |�(g)| for all g 2 G.

Exercise 126. Let G be a finite cyclic group with generator g. Prove: |g| = |G|.

Exercise 127. If G is a finite abelian group then any element of G has finite order.

Definition 128. Let X be a subset of a group G. We say that G is generated or spannedby X, and X is said to be a generating set of G, if every g 2 G can be written as a finitesum Âk

i=1

mi xi, with mi 2 Z and xi 2 X for all i.If G is generated by X, then we write G = hXi. We write G = hx

1

, . . . , xni insteadof G = h{x

1

, . . . , xn}i.If G admits a finite generating set X then G is said to be finitely generated.

So a group is cyclic if and only if it has a generating set X with |X| = 1.

Definition 129. Let G1

, . . . , Gn be groups. Their direct sum is written G1

� · · ·� Gnor G

1

⇥ · · ·⇥ Gn and defined to be the set

�(g

1

, . . . , gn)�� gi 2 Gi for all i

(Cartesian product) with component-wise addition

(g1

, . . . , gn) + (h1

, . . . , hn) = (g1

+ h1

, . . . , gn + hn).


Exercise 130. Prove that G1

� · · ·� Gn is again an abelian group with zero element(0, . . . , 0) and �(g

1

, . . . , gn) = (�g1

, . . . ,�gn).

In general (non-abelian) group theory this is more often known as the directproduct of groups.

One of the main results of this chapter is known as the fundamental theorem offinitely generated abelian groups, and states that every finitely generated abelian groupis isomorphic to a direct sum of cyclic groups.

Exercise 131. Prove that the group (Q,+) is not finitely generated. Prove that it isnot isomorphic to a direct sum of cyclic groups.

4.2 Subgroups

Definition 132. A subset H of a group G is called a subgroup of G if it forms a groupunder the same operation as that of G.

Lemma 133. If H is a subgroup of G, then the identity element 0H of H is equal to theidentity element 0G of G.

Proof. Using the identity axioms for H and G, 0H + 0H = 0H = 0H + 0G. Now bythe cancellation law, 0H = 0G. ⇤

The definition of a subgroup is semantic in its nature. While it precisely pinpointswhat a subgroup is, it is quite cumbersome to use. The following proposition gives ausable criterion.

Proposition 134. Let H be a subset of a group G. The following statements are equiv-alent.

(a) H is a subgroup of G.

(b) H is nonempty; and h1

, h2

2 H ) h1

+ h2

2 H; and h 2 H ) �h 2 H.

(c) H is nonempty; and h1

, h2

2 H ) h1

� h2

2 H.

Proof. Proof of (a) ) (c). If H is a subgroup of G then it is nonempty as it contains0H. Moreover, h

1

� h2

= h1

+ (�h2

) 2 H if so are h1

and h2

.Proof of (c) ) (b). Pick x 2 H. Then 0 = x � x 2 H. Now �h = 0 � h 2 H for

any h 2 H. Finally, h1

+ h2

= h1

� (�h2

) 2 H for all h1

, h2

2 H.Proof of (b) ) (a). We need to verify the four group axioms in H. Two of these,

‘Closure’, and ‘Inverse’, are the conditions (b) and (c). The other two axioms are‘Associativity’ and ‘Identity’. Associativity holds because it holds in G, and H is asubset of G. Since we are assuming that H is nonempty, there exists h 2 H, and then�h 2 H by (c), and h + (�h) = 0 2 H by (b), and so ‘Identity’ holds, and H is asubgroup. ⇤Examples 135. 1. There are two standard subgroups of any group G: the wholegroup G itself, and the trivial subgroup {0} consisting of the identity alone. Sub-groups other than G are called proper subgroups, and subgroups other than {0} arecalled non-trivial subgroups.

2. If g is any element of any group G, then the set of all integer multiples hgi ={mg | m 2 Z} forms a cyclic subgroup of G called the subgroup generated by g. Note|hgi| = |g|.

Let us look at a few specific examples. If G = Z, then 5Z, which consists of allmultiples of 5, is the cyclic subgroup generated by 5. Of course, we can replace 5 byany integer here, but note that the cyclic groups generated by 5 and �5 are the same.

If G = hgi is a finite cyclic group of order n and m is a positive integer dividing n,then the cyclic subgroup generated by mg has order n/m and consists of the elementskmg for 0 k < n/m.


Exercise 136. What is the order of the cyclic subgroup generated by mg for generalm (where we drop the assumption that m|n)?

Exercise 137. Show that the groups of non-zero complex numbers C⇥ under theoperation of multiplication has finite cyclic subgroups of all possible orders.

Exercise 138. Let G be an (abelian) group.

(a) Let {Hi | i 2 I} be a family of subgroups of G. Prove that the intersectionK =

Ti2I Hi is also a subgroup of G.

(b) Let X be a subset of G and let K be the intersection of those subgroups H ⇢ Gsatisfying X ⇢ H. Prove K = hXi. In words: hXi is the least subgroup of Gcontaining X.

(c) Give an example showing that the union of two subgroups of G may not be asubgroup of G.

Exercise 139. Let G be a group and n 2 Z. Prove that {x 2 G | nx = 0} is asubgroup of G.

Exercise 140. Let A, B be subgroups of a group G. Prove that {a + b | a 2 A, b 2 B}(written A + B) is also a subgroup of G.

4.3 Cosets and quotient groups

Definition 141. Let H be a subgroup of G and g 2 G. Then the coset H + g is thesubset {h + g | h 2 H} of G.

Note: Since our groups are abelian, we have H + g = g + H, but in general grouptheory the right and left cosets Hg and gH can be different.

Examples 142. 3. G = Z, H = 5Z. There are just 5 distinct cosets H = H + 0 ={ 5n | n 2 Z }, H + 1 = { 5n + 1 | n 2 Z }, H + 2, H + 3, H + 4. Note thatH + i = H + j whenever i ⌘ j (mod 5).

4. G = Z6

, H = {0, 3}. There are 3 distinct cosets, H = H + 3 = {0, 3},H + 1 = H + 4 = {1, 4}, and H + 2 = H + 5 = {2, 5},

5. G = C⇥, the group of non-zero complex numbers under multiplication, H =S1 = {z 2 C : |z| = 1}, the unit circle. The cosets are the circles centered at theorigin. There are uncountably many distinct cosets, one for each positive real number(radius of a circle).

Proposition 143. Let H be a subgroup of a group G and g, k 2 G. The following areequivalent:

(a) k 2 H + g;

(b) H + g = H + k;

(c) k � g 2 H.

Proof. Proof of (b) ) (a). Clearly H + g = H + k ) k 2 H + g.Proof of (a) ) (b). If k 2 H + g, then k = h + g for some fixed h 2 H, so

g = k � h. Let f 2 H + g. Then, for some h1

2 H, we have f = h1

+ g = h1

+ k � h 2H + k, so Hg ✓ Hk. Similarly, if f 2 H + k, then for some h

1

2 H, we havef = h

1

+ k = h1

+ h + g 2 H + g, so H + k ✓ H + g. Thus H + g = H + k.Proof of (a) ) (c). If k 2 H + g, then, as above, k = h + g, so k � g = h 2 H.Proof of (c) ) (a). If k � g 2 H, then putting h = k � g, we have h + g = k, so

k 2 H + g. ⇤Corollary 144. Two cosets of H in G are either equal or disjoint.


Proof. If H + g1

and H + g2

are not disjoint, then there exists an element k 2 (H +g

1

) \ (H + g2

), but then H + g1

= H + k = H + g2

by proposition 143. ⇤Corollary 145. The cosets of H in G partition G. ⇤Proposition 146. If H is finite, then all cosets have exactly |H| elements.

Proof. The map� : H ! H + g defined by�(h) = h+ g is bijective because h1

+ g =h

2

+ g ) h1

= h2

by the cancellation law. ⇤Corollary 145 and proposition 146 together imply:

Theorem 147 (Lagrange’s Theorem). Let G be a finite (abelian) group and H a sub-group of G. Then the order of H divides the order of G. ⇤Definition 148. Let H be a subgroup of H. The set of cosets of H in G is writtenG/H. The number of elements of G/H is called the index of H in G and is written as|G : H|.

If G is finite, then we clearly have |G : H| = |G|/|H|. But, from the exampleG = Z, H = 5Z above, we see that |G : H| can be finite even when G and H areinfinite.

Proposition 149. Let G be a finite (abelian) group. Then for any g 2 G, the order |g|of g divides the order |G| of G.

Proof. Let |g| = n. We saw in example 2 above that the integer multiples {mg | m 2Z } of g form a subgroup H of G. By minimality of n we have |H| = n. The resultfollows now from Lagrange’s Theorem. ⇤

As an application, we can now immediately classify all finite (abelian) groupswhose order is prime.

Proposition 150. Let G be an (abelian) group having prime order p. Then G is cyclic;that is, G ⇠= Zp.

Proof. Let g 2 G with 0 6= g. Then |g| > 1, but |g| divides p by proposition 149, so|g| = p. But then G must consist entirely of the integer multiples mg (0 m < p) ofg, so G is cyclic. ⇤Definition 151. For subsets A and B of a group G we define the sum A+ B = {a+ b |a 2 A, b 2 B}.

Lemma 152. Let H be a subgroup of G and g, h 2 G. Then (H + g) + (H + h) =H + (g + h).

Proof. We have H + H = H so

(H + g) + (H + h) =�

x + y�� x 2 H + g, y 2 H + h

=�(a + g) + (b + h)

�� a, b 2 H =�(a + b) + (g + h)

�� a, b 2 H

=�

c + (g + h)�� c 2 H + H

=�

c + (g + h)�� c 2 H

= H + (g + h). ⇤

Theorem 153. Let H be a subgroup of an abelian group G. Then G/H forms a groupunder addition of subsets.

Proof. We have just seen that (H + g) + (H + h) = H + (g + h), so we have closure,and associativity follows easily from associativity of G. Since (H + 0) + (H + g) =H + g for all g 2 G, H = H + 0 is an identity element, and since (H � g)+ (H + g) =H � g+ g = H, H � g is an inverse to H + g for all cosets H + g. Thus the four groupaxioms are satisfied and G/H is a group. ⇤


Definition 154. The group G/H is called the quotient group (or the factor group) ofG by H.

Notice that if G is finite, then |G/H| = |G : H| = |G|/|H|. So, although thequotient group seems a rather complicated object at first sight, it is actually a smallergroup than G.

Examples 155. 1. Let G = Z and H = mZ for some m > 0. Then there are exactlym distinct cosets, H, H + 1, . . . , H + (m � 1). If we add together k copies of H + 1,then we get H + k. So G/H is cyclic of order m and with generator H + 1. So byproposition 123, Z/mZ ⇠= Zm. The original definition of Zm was rather clumsy so weput Zm := Z/mZ from now on.

2. G = R and H = Z. The quotient group G/H is isomorphic to the circlesubgroup S1 of the multiplicative group C⇥. One writes an explicit isomorphism� : G/H ! S1 by �(x +Z) = e2⇡xi.

3. G = Q and H = Z. The quotient group G/H featured in a previous exam.It was asked to show that this group is infinite, not finitely generated and that everyelement of G/H has finite order.

4. Let V be a vector space over K and W ⇢ V a subspace. In particular V isan abelian group with subgroup W, so that quotient group V/W is defined. It cannaturally be made into a vector space over K (outside our scope).

4.4 Homomorphisms and the first isomorphism theorem

Definition 156. Let G and H be groups. A homomorphism � from G to H is a map� : G ! H such that �(g

1

+ g2

) = �(g1

) +�(g2

) for all g1

, g2

2 G.

Linear maps between vector spaces are examples of homomorphisms.Note that an isomorphism is just a bijective homomorphism.

Lemma 157. Let� : G ! H be a homomorphism. Then�(ng) = n�(g) for all g 2 G,n 2 Z.

Proof. Exercise. ⇤Example 158. Let G be any abelian group, and let n 2 Z. Then � : G ! G definedby �(g) = ng for all g 2 G is a homomorphism.

Definition 159. Let � : G ! H be a homomorphism. Then the kernel ker(�) of � isdefined to be the set of elements of G that map onto 0H; that is,

ker(�) = { g 2 G | �(g) = 0H }.

Note that by lemma 157 above, ker(�) always contains 0G.

Proposition 160. Let � : G ! H be a homomorphism. Then � is injective if and onlyif ker(�) = {0G}.

Proof. Proof of ). Since 0G 2 ker(�), if � is injective then we must have ker(�) ={0G}.

Proof of (. Suppose that ker(�) = {0G}, and let g1

, g2

2 G with �(g1

) = �(g2

).Then 0H = �(g

1

)��(g2

) = �(g1

� g2

) (by lemma 157), so g1

� g2

2 ker(�) andhence g

1

� g2

= 0G and g1

= g2

. So � is injective. ⇤Theorem 161.

(a) Let � : G ! H be a homomorphism. Then ker(�) is a subgroup of G and im(�) isa subgroup of H.


(b) Let H be a subgroup of a group G. Then the map � : G ! G/H defined by�(g) = H + g is a surjective homomorphism with kernel H. We call � the naturalmap or quotient map.

Proof. Part (a) is straightforward using proposition 134.Proof of (b). The map� is a homomorphism by lemma 152. It is clearly surjective.

Also, for all g 2 G we have �(g) = 0G/H , H + g = H + 0G , g 2 H, soker(�) = H. ⇤

The following lemma explains a connection between quotients and homomor-phisms. It clarifies the trickiest point in the proof of the forthcoming First IsomorphismTheorem.

Lemma 162. Let � : G ! H be a homomorphism with kernel K, and let A be a sub-group of G. Then A ⇢ K if and only if there exists a homomorphism � : G/A ! Hgiven by �(A + g) = �(g) for all g 2 G.

Proof. We say that the definition of � in the lemma is well-defined if for all g, h 2 Gwith A + g = A + h one has �(g) = �(h). Then

� is well-defined() for all g, h 2 G with A + g = A + h one has �(g) = �(h)() for all g, h 2 G with g � h 2 A one has g � h 2 K() A ⇢ H.

Once � is well-defined, it is a homomorphism because

�(A + h) +�(A + g) = �(h) +�(g) = �(h + g)

= �(A + h + g) = �((A + h) + (A + g)). ⇤

We denote the set of all homomorphism from G to H by hom(G, H). There isan elegant way to reformulate lemma 162. The composition with the quotient map : G ! G/A defines a bijection

hom(G/A, H) ! {↵ 2 hom(G, H) | ↵(A) = {0}}, � 7! � � .

Theorem 163 (First Isomorphism Theorem). Let� : G ! H be a homomorphism withkernel K. Then G/K ⇠= im(�). More precisely, there is an isomorphism � : G/K !im(�) defined by �(K + g) = �(g) for all g 2 G.

Proof. The map � is a well-defined homomorphism by lemma 162. Clearly, im(�) =im(�). Finally,

�(K + g) = 0H () �(g) = 0H () g 2 K () K + g = 0G/K .

By proposition 160, � is injective. Thus � : G/K ! im(�) is an isomorphism. ⇤One can associate two quotient groups to a homomorphism � : G ! H. The

cokernel of� is Coker(�) = H/ im(�) and the coimage of� is Coim(�) = G/ker(�).In short, the first isomorphism theorem states that the natural homomorphism fromthe coimage to the image is an isomorphism.

4.5 Free abelian groups

In linear algebra you learned about bases of finite-dimensional vector spaces. Weshall now define bases of abelian groups. Another notion that may be new to you isthat of infinite bases which we introduce at the same time.


Let X be an infinite set, G a group, and f : X ! G any map. The support of f is{a 2 X | f (a) 6= 0}. An (apparently infinite) sum Âa2X f (a) is said to be a finite sumif the support of f is finite. A finite sum in this sense has a well-defined value. Inthese notes Âa2X f (a) is not allowed unless f has finite support.

Definition 164. Let X be a possibly infinite subset of an abelian group G.

(a) A linear combination of X is a finite sum Âa2X ca a with ca 2 Z, all but finitelymany zero.

(b) We call X linearly independent if Âa2X ca a = 0 with ca 2 Z, all but finitely manyzero, implies ca = 0 for all a 2 X.

(c) We call X an unordered basis of G if it is independent and spans G.

(d) An abelian group is free if it admits a basis.

(e) An ordered basis or simply basis of G is an n-tuple (x1

, . . . , xn) of distinct ele-ments such that {x

1

, . . . , xn} is a basis in the above sense.

It is possible to define infinite ordered bases but we won’t need them.

Exercise 165. Give an example where {x1

, . . . , xn} is an unordered basis but (x1

,

. . . , xn) is not a basis.

Example 166. The standard vectors e

1

, . . . , en 2 Zn are defined as in Kn for fields K.Then (e

1

, . . . , en) is a basis of Zn. It is known as the standard basis. So Zn is free.

Exercise 167. Contrary to vector spaces, an abelian group may not have a basis.Prove that Zn has no basis for n > 1.

Proposition 168. Let X be a set. Then there exists a group G such that X ⇢ G andsuch that X is a basis of G. Moreover, if H has the same properties then there exists anisomorphism � : G ! H which is the identity on X.

Proof. Existence. The elements of G are the ‘formal finite sums’ Âa2X ca a whereca 2 Z, all but finitely many zero. This contains X in an obvious way. Addition in Gis defined to be pointwise:

⇣Â

a2Xca a

⌘+⇣

Âa2X

da a⌘= Â

a2X(ca + da)a.

Prove yourself that this makes G into a group and that X is a basis of G.Uniqueness. For a 2 X let g(a) be the element a 2 X viewed as an element of G

and h(a) viewed as an element of H. We define � : G ! H by

�⇣

Âa2X

ca g(a)⌘= Â

a2Xca h(a).

It is clear that this � : G ! H is an isomorphism which is identity on X. ⇤Definition 169. Let X be a set. A free (abelian) group on X is a pair (F, f ) where F isa group, f : X ! F an injective map (of sets) such that f (X) is a basis of F.

Many free groups (F, f ) on X are such that f (a) = a for all a 2 X (we say that fis the inclusion map) but we allow otherwise.

Proposition 170. Let (F, f ) be a free (abelian) group on X. Let G be any group andg : X ! G any map of sets.

� (Universal property). Then there exists a unique homomorphism h : F ! G suchthat h � f = g.

� h is injective if and only if g is injective with independent image.

� h is surjective if and only if G is spanned by g(X).


� h is an isomorphism if and only if (G, g) is a free group on X.

Proof. Exercise. ⇤If G has a basis of n elements then G ⇠= Zn. A finitely generated group is free if

and only if it is isomorphic to Zn for some n.As for finite-dimensional vector spaces, it turns out that any two bases of a free

abelian group have the same size, but this has to be proved. It will follow directlyfrom the next theorem.

A square matrix P 2 Zn,n is said to be unimodular if det(P) 2 {�1, 1}.

Theorem 171. Let P 2 Zm,n. Then the following are equivalent:

(a) The columns of P form a basis of Zm.

(b) m = n and P�1 2 Zn,n.

(c) m = n and P is unimodular.

Proof. Let fi denote the ith column of P and F = ( f1

, . . . , fn). Let E = (e1

, . . . , em)be the standard basis of Zm.

(a) ) (b). Since F is a Z-basis of Zm, it is a Q-basis of Qm. So m = n. We haveP = [E1F] and P�1 = [F1E]. But F is a Z-basis so [F1E] 2 Zn,n.

(b) ) (a). By what we learned in linear algebra, F is a Q-basis of Qn and P =[E1F]. So F is Z-independent. But P�1 2 Zn,n so [F1E] 2 Zn,n so every standard basevector is a linear combination of the f j so F spans Zn.

(b) ) (c). If T = P�1 has entries in Z, then det(PT) = det(P) det(T) =det(In) = 1, and since det(P), det(T) 2 Z, this implies det(P) 2 {�1, 1}.

(c) ) (b). From first year linear algebra, P�1 = (det(P))�1 adj(P), so det(P) 2{�1, 1} implies that P�1 has entries in Z. ⇤

Examples. 1. If n = 2 and P = ( 2 1

7 4

) then det(P) = 8 � 7 = 1, so the columns ofP form a basis of Z2.

2. If P = ( 1 0

0 2

) then det(P) = 2, so the columns of P don’t form a basis of Z2.Recall that in linear algebra over a field, any set of n linearly independent vectors

in a vector space V of dimension n form a basis of V. This fails in Zn: the columns ofP are linearly independent but do not span Z2.

4.6 Unimodular elementary row and column operations and the Smithnormal form for integral matrices

We interrupt our discussion of finitely generated abelian groups at this stage to inves-tigate how the row and column reduction process of Linear Algebra can be adaptedto matrices over Z. Recall from MA106 that we can use elementary row and columnoperations to reduce an m ⇥ n matrix of rank r over a field K to a matrix B = (�i j)with �ii = 1 for 1 i r and �i j = 0 otherwise. We called this the Smith NormalForm of the matrix. We can do something similar over Z, but the non-zero elements�ii will not necessarily all be equal to 1.

The reason that we disallowed � = 0 for the row and column operations (R3)and (C3) (multiply a row or column by a scalar �) was that we wanted all of ourelementary operations to be reversible. When performed over Z, (R1), (C1), (R2)and (C2) are reversible, but (R3) and (C3) are reversible only when � = ±1. So, if Ais an m ⇥ n matrix over Z, then we define the three types of unimodular elementaryrow operations as follows:

(UR1): Replace some row ri of A by ri + tr j, where j 6= i and t 2 Z;(UR2): Interchange two rows of A;(UR3): Replace some row ri of A by �ri.


The unimodular column operations (UC1), (UC2), (UC3) are defined similarly.Recall from MA106 that performing elementary row or column operations on a matrixA corresponds to multiplying A on the left or right, respectively, by an elementarymatrix. These elementary matrices all have determinant ±1 (1 for (UR1) and �1 for(UR2) and (UR3)), so are unimodular matrices over Z.

Definition 172. A matrix A = (↵i j) 2 Zm,n is said to be in Smith normal form if↵i, j = 0 whenever i 6= j; and ↵i,i | ↵i+1,i+1

(divides) whenever 1 i < min(m, n).(Note that any integer divides 0).

Theorem 173 (Smith Normal Form). Any matrix A 2 Zm,n can be put into Smith nor-mal form B through a sequence of unimodular elementary row and column operations.Moreover, B is unique.

Proof. We shall not prove the uniqueness part here. We use induction on m + n. Thebase case is m = n = 1, where there is nothing to prove. Also if A is the zero matrixthen there is nothing to prove, so assume not.

Let d be the smallest positive entry in any matrix C = (�i j) that we can obtainfrom A by using unimodular elementary row and column operations. By using (R2)and (C2), we can move d to position (1, 1) and hence assume that �

11

= d. If d doesnot divide �

1 j for some j > 0, then we can write �1 j = qd + r with q, r 2 Z and

0 < r < d, and then replacing the j-th column c j of C by c j � qc

1

results in the entryr in position (1, j), contrary to the choice of d. Hence d | �

1 j for 2 j n andsimilarly d | �i1 for 2 i m.

Now, if �1 j = qd, then replacing c j of C by c j � qc

1

results in entry 0 position(1, j). So we can assume that �

1 j = 0 for 2 j n and �i1 = 0 for 2 i m. Ifm = 1 or n = 1, then we are done. Otherwise, we have C = (d) � C0 for some(m � 1) ⇥ (n � 1) matrix C0. By inductive hypothesis, the result of the theoremapplies to C0, so by applying unimodular row and column operations to C whichdo not involve the first row or column, we can reduce C to D = (�i j), which satisfies�

11

= d, �ii = di > 0 for 2 i r, and �i j = 0 otherwise, where di | di+1

for2 i < r. To complete the proof, we still have to show that d | d

2

. If not, thenadding row 2 to row 1 results in d

2

in position (1,2) not divisible by d, and we obtaina contradiction as before. ⇤

In the following two examples we determine the Smith normal forms of the matrixA.

Example 174. Let A be the matrix A = ( 42 21

�35 �14

). The general strategy is to reducethe size of entries in the first row and column, until the (1,1)-entry divides all otherentries in the first row and column. Then we can clear all of these other entries.

Matrix Operation Matrix Operation✓42 21

�35 �14

◆c

1

! c

1

� 2c

2

✓0 21

�7 �14

◆r

2

! �r

2

r

1

$ r

2

✓7 14

0 21

◆c

2

! c

2

� 2c

1

✓7 0

0 21

◆

Example 175. Let A be the matrix A =

0

BB@

�18 �18 �18 90

54 12 45 48

9 �6 6 63

18 6 15 12

1

CCA.

Matrix Operation Matrix Operation0

BB@

�18 �18 �18 90

54 12 45 48

9 �6 6 63

18 6 15 12

1

CCA c

1

! c

1

� c

3

0

BB@

0 �18 �18 90

9 12 45 48

3 �6 6 63

3 6 15 12

1

CCA r

1

$ r

4


0

BB@

3 6 15 12

9 12 45 48

3 �6 6 63

0 �18 �18 90

1

CCAr

2

! r

2

� 3r

1

r

3

! r

3

� r

1

0

BB@

3 6 15 12

0 �6 0 12

0 �12 �9 51

0 �18 �18 90

1

CCAc

2

! c

2

� 2c

1

c

3

! c

3

� 5c

1

c

4

! c

4

� 4c

1

0

BB@

3 0 0 0

0 �6 0 12

0 �12 �9 51

0 �18 �18 90

1

CCAc

2

! �c

2

c

2

! c

2

+ c

3

0

BB@

3 0 0 0

0 6 0 12

0 3 �9 51

0 0 �18 90

1

CCA r

2

$ r

3

0

BB@

3 0 0 0

0 3 �9 51

0 6 0 12

0 0 �18 90

1

CCA r

3

! r

3

� 2r

2

0

BB@

3 0 0 0

0 3 �9 51

0 0 18 �90

0 0 �18 90

1

CCAc

3

! c

3

+ 3c

2

c

4

! c

4

� 17c

2

0

BB@

3 0 0 0

0 3 0 0

0 0 18 �90

0 0 �18 90

1

CCAc

4

! c

4

+ 5c

3

r

4

! r

4

+ r

3

0

BB@

3 0 0 0

0 3 0 0

0 0 18 0

0 0 0 0

1

CCA

Note: There is also a generalisation to integer matrices of the row reduced normalform from Linear Algebra, where only row operations are allowed. This is knownas the Hermite Normal Form and is more complicated. It will appear on an exercisesheet.

4.7 Subgroups of free abelian groups

Proposition 176. Let G be an abelian group generated by n elements. Then any sub-group of G is also generated by at most n elements.

Proof. Induction on n. Let G be generated by x1

, . . . , xn and let K be a subgroup of G.For n = 0 there is nothing to prove.

Suppose n > 0, and let H be the subgroup of G generated by x1

, . . . , xn�1

. Byinduction, K \ H is generated by y

1

, . . . , yn�1

, say.If K H, then K = K \ H and we are done, so suppose not. Then there exist

elements of the form h + txn 2 K with h 2 H and t 6= 0. Since �(h + txn) 2 K, wecan assume that t > 0. Choose such an element yn = h + txn 2 K with t minimalsubject to t > 0. We claim that K is generated by y

1

, . . . , yn, which will complete theproof.

Let k 2 K. Then k = h0 + uxn for some h0 2 H and u 2 Z. If t does notdivide u then we can write u = tq + r with q, r 2 Z and 0 < r < t, and thenk � qyn = (h0 � qh) + rxn 2 K, contrary to the choice of t. So t | u and hence u = tqand k � qyn 2 K \ H. But K \ H is generated by y

1

, . . . , yn�1

, so we are done. ⇤Definition 177. Let T : G ! H be a homomorphism. Let E be a basis of G and F ofH. Then [FTE] is not only said to represent T but also to represent im(T).

In particular, a matrix in Zm,n represents the subgroup of Zm generated by itscolumns.

Example 178. If n = 3 and H is generated by v

1

= (1 3 �1) and v

2

= (2 0 1), then

A =

0

@1 2

3 0

�1 1

1

A.

Proposition 179. Let H ⇢ Zm be represented by A 2 Zm,n and let P 2 Zm,m andQ 2 Zn,n be unimodular. Then H is also be represented by QAP.

Proof. Let T : G ! H := Zm be a homomorphism and E, F bases of (respectively)G, H such that A = [FTE]. By theorem 171 there are basis E0 of H and F0 of G suchthat P = [E1E0] and Q = [F0

1F]. So QAP = [F0TE0]. ⇤If H is a subgroup of Zm represented by A and B is obtained from A by removing

a zero column then B also represents H.


Theorem 180. Let H be a subgroup of Zm. Then there exists a basis y

1

, . . . , ym of Zm

and integers d1

, . . . , dm � 0 such that H = h d1

y

1

, . . . , dm ym i and di | di+1

for all i.

Proof. By proposition 176 there are m generators x

1

, . . . , xm of H. By the univer-sal property of free abelian groups (proposition 170) there exists a homomorphismT : Zm ! Zm such that T(ei) = xi for all i where E = (e

1

, . . . , em) denotes thestandard basis of Zm. So im(T) = H. Put A = [ETE]. By theorem 173 there areunimodular P, Q such that B := QAP is in Smith normal form. There are basis E0

of H and F0 of G such that P = [E1E0] and Q = [F01E]. So B = QAP = [F0TE0].

Write E0 = (z

1

, . . . , zm), F0 = (y

1

, . . . , ym), di = �i,i. So di | di+1

for all i. ThenT(zi) = di yi for all i so im(T) = h d

1

y

1

, . . . , dm ym i. ⇤Example 181. In Example 178, it is straightforward to calculate the Smith normal

form of A, which is

0

@1 0

0 3

0 0

1

A, so H = hy

1

, 3y

2

i.

By keeping track of the unimodular row operations carried out, we can, if weneed to, find the basis y

1

, . . . , yn of Zn. Doing this in Example 178, we get:

Matrix Operation New basis0

@1 2

3 0

�1 1

1

A r

2

! r

2

� 3r

1

r

3

! r

3

+ r

1

y

1

= (1 3 �1), y

2

= (0 1 0), y

3

= (0 0 1)

0

@1 2

0 �6

0 3

1

Ac

2

! c

2

� 2c

1

y

1

= (1 3 �1), y

2

= (0 1 0), y

3

= (0 0 1)

0

@1 0

0 �6

0 3

1

Ar

2

$ r

3

y

1

= (1 3 �1), y

2

= (0 0 1), y

3

= (0 1 0)

0

@1 0

0 3

0 �6

1

Ar

3

! r

3

+ 2r

2

y

1

= (1 3 �1), y

2

= (0 �2 1), y

3

= (0 1 0)

0

@1 0

0 3

0 0

1

A

4.8 General finitely generated abelian groups

Definition 182. Let X be a basis of an (abelian) group F. Let K ⇢ F be the subgroupgenerated by a subset Y ⇢ F. We write

hX | Yi := F/K.

This is called a presentation of F/K or any isomorphic group. If X and Y are finitethen it is called a finite presentation. In this setting X is the set of generators and Ythe set of relations.

Example 183. Zn ⇠= hx | nxi.

Proposition 184. Any finitely generated abelian group admits a finite presentation ofthe form ⌦

y

1

, . . . , yn�� d

1

y

1

, . . . , dn yn↵

where di 2 Z�0

and di | di+1

for all i.

Proof. Let G be an abelian group generated by a finite set {x

1

, . . . , xn}. Let E =(e

1

, . . . , en) be the standard basis of Zn. Define h : Zn ! G by h(ei) = xi for all i andput K := ker(h). By theorem 180 there exists a basis (y

1

, . . . , yn) of Zn and positive


integers d1

, . . . , dn such that K = hd1

y

1

, . . . , dn yni. By the first isomorphism theorem(theorem 163)

G = im(h) ⇠= Zn/K = hy

1

, . . . , yn | d1

y

1

, . . . , dn yni. ⇤

We shall use the notation Z0

:= Z/0Z ⇠= Z ⇠= hx | 0xi.

Proposition 185. Let di 2 Z�0

for 1 i n. Then⌦

y

1

, . . . , yn�� d

1

y

1

, . . . , dn yn↵ ⇠= Zd

1

� · · ·�Zdn .

Proof. This is another application of the first isomorphism theorem. Let H = Zd1

�· · ·� Zdn , so H is generated by x

1

, . . . , xn, with x

1

= (1, 0, . . . , 0), . . . , xn = (0, . . . ,

0, 1). Let e

1

, . . . , en be the standard basis of Zn. By the universal property (proposi-tion 170) there is a surjective homomorphism � : Zn ! H for which

�(↵1

e

1

+ · · ·+↵nen) = ↵1

x

1

+ · · ·+↵nxn

for all ↵1

, . . . ,↵n 2 Z. By theorem 163, we have H ⇠= Zn/K where K is the kernel of�. We have

K =�(↵

1

, . . . ,↵n) 2 Zn �� ↵1

x

1

+ · · ·+↵nxn = 0H

=�(↵

1

, . . . ,↵n) 2 Zn �� di divides ↵i for all i =⌦d

1

e

1

, . . . , dnen↵.

ThusH ⇠= Zn/K = h e

1

, . . . , en | d1

e

1

, . . . , dnen i. ⇤Note Z

1

= 0 and recall Z0

⇠= Z. Putting all results together, we get the maintheorem of this chapter.

Theorem 186 (Fundamental theorem of finitely generated abelian groups). Let G bea finitely generated abelian group. Then there exist integers r, k and d

1

, . . . , dr � 2 withdi | di+1

for all i such that

G ⇠=�Zd

1

� · · ·�Zdr

��Zk

.

(For k = r = 0 this just means that G = 0). ⇤It can be proved that r, k and the di are uniquely determined by G.The integer k may be 0, which is the case if and only if G is finite. At the other

extreme, if r = 0 then G is free abelian.

Examples 187. 1. The group G corresponding to example 174 is

h x

1

, x

2

| 42x

1

� 35x

2

, 21x

1

� 14x

2

i

and we have G ⇠= Z7

�Z21

, a group of order 7 ⇥ 21 = 147.2. The group defined by example 175 is⌧

x

1

, x

2

, x

3

, x

4

��18x

1

+ 54x

2

+ 9x

3

+ 18x

4

, �18x

1

+ 12x

2

� 6x

3

+ 6x

4

,

�18x

1

+ 45x

2

+ 6x

3

+ 15x

4

, 90x

1

+ 48x

2

+ 63x

3

+ 12x

4

�,

which is isomorphic to Z3

�Z3

�Z18

�Z, and is an infinite group with a (maximal)finite subgroup of order 3 ⇥ 3 ⇥ 18 = 162,

3. The group defined by example 178 is

h x

1

, x

2

, x

3

| x

1

+ 3x

2

� x

3

, 2x

1

+ x

3

i,

and is isomorphic to Z1

�Z3

�Z ⇠= Z3

�Z, so it is infinite, with a finite subgroup oforder 3.


4.9 Finite abelian groups

In particular, any finite abelian group G is of the form G ⇠= Zd1

� · · ·� Zdr , wheredi > 1 and di | di+1

for all i, and |G| = d1

· · · dr. (For r = 0 this just means G = 0).It can be shown that r and the di are uniquely determined by G. This enables us

to classify isomorphism classes of finite abelian groups of a given order n.

Examples 188. 1. n = 4. The decompositions are 4 and 2⇥ 2, so G ⇠= Z4

or Z2

�Z2

.2. n = 15. The only decomposition is 15, so G ⇠= Z

15

is necessarily cyclic.3. n = 36. Decompositions are 36, 2⇥ 18, 3⇥ 12 and 6⇥ 6, so G ⇠= Z

36

, Z2

�Z18

,Z

3

�Z12

and Z6

�Z6

.

4.10 Tensor products

Given two abelian groups A and B, one can form a new abelian group A ⌦ B, theirtensor product — do not confuse it with the direct product! We denote by F the freeabelian group with the elements of the direct product A ⇥ B as a basis:

F = h A ⇥ B | ? i.

Elements of F are formal finite Z-linear combinations Âi ni(ai, bi), ni 2 Z, ai 2 A,bi 2 B. Let F

0

be the subgroup of F generated by the elements

(a + a0, b)� (a, b)� (a0, b) n(a, b)� (na, b)(a, b + b0)� (a, b)� (a, b0) n(a, b)� (a, nb)

(189)

for all possible n 2 Z, a, a0 2 A, b, b0 2 B. The tensor product is the quotient group

A ⌦ B = F/F0

= h A ⇥ B | relations in (189) i.

We have to get used to this definition that may seem strange at first glance. First, itis easy to materialise certain elements of A ⌦ B. An elementary tensor is

a ⌦ b := (a, b) + F0

for a 2 A, b 2 B.The generators (189) for F

0

give rise to properties of elementary tensors:

(a + a0)⌦ b = a ⌦ b + a0 ⌦ b n(a ⌦ b) = (na)⌦ ba ⌦ (b + b0) = a ⌦ b + a ⌦ b0 n(a ⌦ b) = a ⌦ (nb).

(190)

Exercise 191. Show that a ⌦ 0 = 0 ⌦ b = 0 for all a 2 A, b 2 B.

It is important to realise that not all elements of A ⌦ B are elementary. They areZ-linear combinations of elementary tensors.

Proposition 192. Suppose that A, B are abelian groups. Let {ai | i 2 I} span A and{b j | j 2 J} span B. Then the tensor product A ⌦ B is spanned by {ai ⌦ b j | (i, j) 2I ⇥ J}.

Proof. Given Âk ck ⌦ dk 2 A⌦ B, we can express ck = Âi nki ai, and all dk = Â j mk j b j.Then

Âk

ck ⌦ dk = Âk

⇣Â

inki ai

⌘⌦⇣

Âj

mk j b j

⌘= Â

k,i, jnki mk j ai ⌦ b j. ⇤

In fact, even a more subtle statement holds.

Exercise 193. Suppose that A, B are free abelian groups. Let {ai | i 2 I} be a basisof A and {b j | j 2 J} of B. Show that the tensor product A⌦ B is a free abelian groupwith basis {ai ⌦ b j | (i, j) 2 I ⇥ J}.


However, for general groups tensor products could behave in quite an unpre-dictable way. For instance, Z

2

⌦Z3

= 0. Indeed,

1Z2

⌦ 1Z3

= 3 · 1Z2

⌦ 1Z3

= 1Z2

⌦ 3 · 1Z3

= 0.

To help sorting out zero from nonzero elements in tensor products we need tounderstand a connection between tensor products and bilinear maps.

Definition 194. Let A, B, and C be abelian groups. A function ! : A ⇥ B ! C is abilinear map if

!(a + a0, b) = !(a, b) +!(a0, b) n!(a, b) = !(na, b)!(a, b + b0) = !(a, b) +!(a, b0) n!(a, b) = !(a, nb)

(195)

for all n 2 Z, a, a0 2 A, b, b0 2 B.

Let Bil(A ⇥ B, C) be the set of all bilinear maps from A ⇥ B to C.

Lemma 196 (Universal property of tensor product). The function

✓ : A ⇥ B ! A ⌦ B, ✓(a, b) = a ⌦ b

is a bilinear map. This bilinear map is universal, that is, for all C the composition with✓ defines a bijection

hom(A ⌦ B, C) ! Bil(A ⇥ B, C), � 7! � �✓.

Proof. The function ✓ is a bilinear map: the properties (195) of a bilinear map easilyfollow from the corresponding properties (190) of elementary tensors.

Let Fun(·, ·) denote the set of functions between two sets. Recall that F denotesthe free abelian group with basis indexed by A ⇥ B. By the universal property of freeabelian groups (proposition 170), we have a bijection

hom(F, C) ! Fun(A ⇥ B, C).

Bilinear maps correspond to functions vanishing on F0

, or equivalently to linear mapsfrom F/F

0

(lemma 162). ⇤In the following section we will need a criterion for elements of R ⌦ S1 to be

nonzero. The circle group S1 is a group under multiplication, creating certain confu-sion for tensor products. To avoid this confusion we identify the multiplicative groupS1 with the additive group R/2⇡Z via the natural isomorphism exi 7! x + 2⇡Z.

Proposition 197. Let a ⌦ (x + 2⇡Z) 2 R⌦ R/2⇡Z where a, x 2 R. Then a ⌦ (x +2⇡Z) = 0 if and only if a = 0 or x 2 ⇡Q.

Proof. If a = 0, then a ⌦ (x + 2⇡Z) = 0. If x = nm⇡ with m, n 2 Z, then

a ⌦ (x + 2⇡Z) = 2ma

2m⌦ (x + 2⇡Z)

=a

2m⌦ 2m(x + 2⇡Z) = a ⌦ (2n⇡ + 2⇡Z) = a ⌦ 0 = 0.

In the opposite direction, let us consider a ⌦ (x + 2⇡Z) with a 6= 0 and x/⇡ 62 Q.It suffices to construct a bilinear map� : R⇥R/2⇡Z ! A to some group A such that�(a, x + 2⇡Z) 6= 0. By lemma 196 this gives a homomorphism e� : R⌦R/2⇡Z ! Awith e�(a ⌦ (x + 2⇡Z)) = �(a, x + 2⇡Z) 6= 0. Hence, a ⌦ (x + 2⇡Z) 6= 0.

Let us consider R as a vector space over Q. The subgroup ⇡Q of R is a vectorsubspace, hence the quotient group A = R/⇡Q is also a vector space over Q. Since2⇡Z ⇢ ⇡Q, we have a homomorphism

� : R/2⇡Z ! R/⇡Q, �(z + 2⇡Z) = z + ⇡Q.


Since x/⇡ 62 Q, it follows that �(x + 2⇡Z) 6= 0.Choose a basis17 ei of R over Q such that e

1

= a. Let ei : R ! Q be the linearfunction18 computing the i-th coordinate in this basis:

ei⇣

Âj

x j e j

⌘= xi.

The required bilinear map is defined by �(b, z + 2⇡Z) = e1(b)�(z + 2⇡Z). Clearly,�(a, x + 2⇡Z) = e1(e

1

)�(x + 2⇡Z) = 1 ·�(x + 2⇡Z) = x + ⇡Q 6= 0. ⇤

Exercise 198. Zn ⌦Zm ⇠= Zgcd(n,m).

4.11 Hilbert’s Third problem

All the hard work we have done is going to pay off now. We will understand asolution of the third Hilbert problem. In 1900 Hilbert formulated 23 problems that,in his view, would influence mathematics of the 20th century. The third problem wassolved first, in the same year 1900 by Dehn, which is quite remarkable as the problemwas missing from Hilbert’s lecture and appeared in print only in 1902, two years afterits solution.

We need some terminology. A subset C ⇢ Rn is said to be convex if tx+ (1� t)y 2C whenever x, y 2 C and t 2 R and 0 t 1. The convex hull of a subset V ⇢ Rn

is the intersection of those convex subsets of Rn containing V. A polytope in Rn is theconvex hull of a finite subset of Rn.

In his third problem Hilbert asks whether two 3D polytopes of the same volumeare scissor congruent defined as follows.

Definition 199. The scissor group Pn is by definition presented by n-dimensionalpolytopes M ⇢ Rn as generators and relations

� M = N whenever M can be moved to N by an isometry of Rn (we say that Mand N are congruent).

� A = B + C whenever there exists a hyperplane cutting A into two pieces B, C.

For a polytope M, let [M] 2 Pn denote its class in the scissor group. Two polytopesM, N are said to be scissor congruent if [M] = [N].

By lemma 162, n-dimensional volume gives rise to a homomorphism

⌫n : Pn ! R, ⌫n([M]) = volume(M).

The 3rd Hilbert problem asks whether ⌫3

is injective.

Theorem 200. ⌫2

is injective.

Proof. The following picture shows that a triangle with base b and height h is equiv-alent to the rectangle with sides b and h/2.

In particular, any triangle is equivalent to a right-angled triangle.

17Every independent set of vectors in an infinite-dimensional vector space can be extended to a basis.18commonly known as a covector


Next we shall show that two right-angled triangles of the same area are scissorscongruent.

C

P

B

A

Q

The equal area triangles are CAB and CPQ. This means that |CAkCB| = |CPkCQ|.Hence, |CA|/|CQ| = |CP|/|CB| and the triangles CPB and CAQ are similar. Inparticular, the edges AQ and PB are parallel, thus the triangles AQP and AQB sharethe same base AQ and height and, consequently, are scissors congruent. So

[ACB] = [ACQ]� [ABQ] = [ACQ]� [APQ] = [PCQ].

Let x 2 ker(⌫2

). We can write x = Âi ni [Mi] for integers ni and polytopes Mi. Itis easy to prove that [Mi] is a sum of triangles [Mi] = Â j[Ti j]. So there are trianglesAi and Bj such that

x =⇣

Âi[Ai]

⌘�⇣

Âj[Bj]

⌘.

Replacing the Ai by triangles of the same volumes if necessary, we may assumethat the Ai have the same height and that they combine to form a single triangle A,as in the following picture.

Likewise for the Bj. So there are triangles A, B such that x = [A]� [B]. But ⌫2

(x) = 0

so A and B have the same volume. Therefore [A] = [B] and x = 0. ⇤Observe that ⌫n is surjective. So ⌫

2

is an isomorphism and P2

⇠= R. However, ⌫3

is not injective as we shall now show.

Theorem 201. Let T be a regular tetrahedron, C a cube, both of unit volume. Then[T] 6= [C] in P

3

.

Proof. Let M be a polytope with set of edges I. For each edge i, let hi be its length hiand let ↵i 2 R/2⇡Z be the angle along this edge. The Dehn invariant of M is

�(M) = Âi

hi ⌦↵i 2 R⌦R/2⇡Z.

Using lemma 162 we shall prove that � is a well-defined homomorphism � : P3

!R⌦R/2⇡Z. By definition P

3

= F/F0

whereIndeed, � defines a homomorphism from the free group F generated by all poly-

topes. Keeping in mind P3

= F/F0

, we need to check that � vanishes on generatorsof F

0

. Clearly, �(M � N) = 0 if M and N are congruent. It is slightly more subtle tosee that �(A � B � C) = 0 if A = B [ C is a cut. One collects the terms in 4 groups,with the terms in each group summing to zero.

Survivor: an edge of length h and angle ↵ survives completely in B or C but notboth. This contributes h ⌦↵ � h ⌦↵ = 0.

Edge cut: an edge of length h and angle↵ is cut into edges of lengths hB in B and hCin C. This contributes h⌦↵� hB ⌦↵� hC ⌦↵ = (h� hB � hC)⌦↵ = 0⌦↵ = 0.


Angle cut: an edge of length h and angle ↵ has its angle cut into angles ↵B in Band ↵C in C. This contributes h ⌦↵ � h ⌦↵B � h ⌦↵C = h ⌦ (↵ �↵B �↵C) =h ⌦ 0 = 0.

New edge: a new edge of length h is created. If its angle in B is ↵, then its anglein C is ⇡ � ↵. This contributes �h ⌦ ↵ � h ⌦ (⇡ � ↵) = �h ⌦ ⇡ = 0, byproposition 197.

Finally

�([C]) = 12(1 ⌦ ⇡

4

) = 12 ⌦ ⇡

4

= 0, while �([T]) = 6

p2

3

p3

⌦ arccos

1

3

6= 0,

by proposition 197 and lemma 202. Hence, [C] 6= [T]. ⇤Lemma 202. arccos(1/3) /2 ⇡Q.

Proof. Let arccos(1/3) = q⇡ . We consider a sequence xn = cos(2nq⇡). If q isrational, then this sequence admits only finitely many values. On the other hand,x

0

= 1/3 and

xn+1

= cos(2 · 2

nq⇡) = 2 cos

2(2nq⇡)� 1 = 2x2

n � 1.

For example

x1

=�7

9

, x2

=17

81

, x3

=�5983

3

8

, x4

=28545857

3

16

, . . .

It is easy to show that the denominators grow indefinitely. Contradiction. ⇤Now it is natural to ask what exactly the group P

3

is. It was proved later (in 1965)that the joint homomorphism (⌫

3

, �) : P3

! R� (R⌦R/2⇡Z) is injective. It is notsurjective and the image can be explicitly described, but we won’t do it here.

4.12 Possible topics for the second year essays

If you would like to write an essay taking something further from this course, hereare some suggestions. Ask me if you want more information.

(a) Bilinear and quadratic forms over fields of characteristic 2 (i.e. where 1 + 1 =0). You can do hermitian forms too.

(b) Grassmann algebras, determinants and tensors.

(c) Matrix Exponents and Baker-Campbell-Hausdorff formula.

(d) Abelian groups and public key cryptography (be careful not to repeat whateveris covered in Algebra 2).

(e) Lattices (abelian groups with bilinear forms), the E8

lattice and the Leech lattice.

(f) Abelian group law on an elliptic curve.

(g) Groups Pn for other n, including a precise description of P3

.

The End

ma251 algebra i – advanced linear algebramasbal/algebra 1... · ma251 algebra i – advanced...

Documents