springer undergraduate mathematics...

Springer Undergraduate Mathematics Series

Advisory BoardM.A.J. Chaplain University of Dundee, Dundee, Scotland, UKK. Erdmann University of Oxford, Oxford, England, UKA. MacIntyre Queen Mary, University of London, London, England, UKE. Süli University of Oxford, Oxford, England, UKM.R. Tehranchi University of Cambridge, Cambridge, England, UKJ.F. Toland University of Bath, Bath, England, UK

For further volumes:www.springer.com/series/3423

http://www.springer.com/series/3423

Christopher Norman

Finitely GeneratedAbelian Groups andSimilarity of Matricesover a Field

Christopher Normanformerly Senior Lecturer in MathematicsRoyal Holloway, University of LondonLondon, UK

ISSN 1615-2085 Springer Undergraduate Mathematics SeriesISBN 978-1-4471-2729-1 e-ISBN 978-1-4471-2730-7DOI 10.1007/978-1-4471-2730-7Springer London Dordrecht Heidelberg New York

British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library

Library of Congress Control Number: 2012930645

Mathematics Subject Classification: 15-01, 15A21, 20-01, 20K30

© Springer-Verlag London Limited 2012Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permittedunder the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored ortransmitted, in any form or by any means, with the prior permission in writing of the publishers, or inthe case of reprographic reproduction in accordance with the terms of licenses issued by the CopyrightLicensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of aspecific statement, that such names are exempt from the relevant laws and regulations and therefore free forgeneral use.The publisher makes no representation, express or implied, with regard to the accuracy of the informationcontained in this book and cannot accept any legal responsibility or liability for any errors or omissions thatmay be made.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

http://www.springer.com

http://www.springer.com/mycopy

To Lucy, Tim and Susie

Preface

Who is this book for? The target reader will already have experienced a first course inlinear algebra covering matrix manipulation, determinants, linear mappings, eigen-vectors and diagonalisation of matrices. Ideally the reader will have met bases offinite-dimensional vector spaces, the axioms for groups, rings and fields as well assome set theory including equivalence relations. Some familiarity with elementarynumber theory is also assumed, such as the Euclidean algorithm for the greatest com-mon divisor of two integers, the Chinese remainder theorem and the fundamental the-orem of arithmetic. In the proof of Lemma 6.35 it is assumed that the reader knowshow to resolve a permutation into cycles. With these provisos the subject matter isvirtually self-contained. Indeed many of the standard facts of linear algebra, such asthe multiplicative property of determinants and the dimension theorem (any two basesof the same finite-dimensional vector space have the same number of vectors), areproved in a more general context. Nevertheless from a didactic point of view it ishighly desirable, if not essential, for the reader to be already familiar with these facts.

What does the book do? The book is in two analogous parts and is designed to bea second course in linear algebra suitable for second/third year mathematics under-graduates, or postgraduates. The first part deals with the theory of finitely generated(f.g.) abelian groups: the emerging homology theory of topological spaces was builton such groups during the 1870s and more recently the classification of elliptic curveshas made use of them. The starting point of the abstract theory couldn’t be more con-crete if it tried! Row and column operations are applied to an arbitrary matrix havinginteger entries with the aim of obtaining a diagonal matrix with non-negative entriessuch that the (1,1)-entry is a divisor of the (2,2)-entry, the (2,2)-entry is a divisor of

vii

viii Preface

the (3,3)-entry, and so on; a diagonal matrix of this type is said to be in Smith normalform (Snf ) after the 19th century mathematician HJS Smith. Using an extension ofthe Euclidean algorithm it is shown in Chapter 1 that the Snf can be obtained withoutresort to prime factorisation. In fact the existence of the Snf is the cornerstone of thedecomposition theory.

Free abelian groups of finite rank have Z-bases and behave in many ways likefinite-dimensional vector spaces. Each f.g. abelian group is best described as a quo-tient group of such a free abelian group by a subgroup which is necessarily also free.In Chapter 2 some time is spent on the concept of quotient groups which no studentinitially finds easy, but luckily in this context turns out to be little more than workingmodulo a given integer. The quotient groups arising in this way are specified by matri-ces over Z and the theory of the Snf is exactly what is needed to analyse their structure.Putting the pieces together in Chapter 3 each f.g. abelian group is seen to correspondto a sequence of non-negative integers (its invariant factors) in which each integer is adivisor of the next. The sequence of invariant factors of an f.g. abelian group encapsu-lates its properties: two f.g. abelian groups are isomorphic (abstractly identical) if andonly if their sequences of invariant factors are equal. So broadly, apart from importantside-issues such as specifying the automorphisms of a given group G, this is the end ofthe story as far as f.g. abelian groups are concerned! Nevertheless these side-issues arethoroughly discussed in the text and through numerous exercises; complete solutionsto all exercises are on the associated website.

In the second part of the book the ring Z of integers is replaced by the ring F [x]of polynomials over a field F . Such polynomials behave in the same way as integersand in particular the Euclidean algorithm can be used to find the gcd of each pair ofthem. In Chapter 4 the theory of the Smith normal form is shown to extend, almosteffortlessly, to matrices over F [x], the non-zero entries in the Snf here being monic(leading coefficient 1) polynomials. To what end? A question which occupies centrestage in linear algebra concerns t × t matrices A and B over a field F : is there a sys-tematic method of finding, where it exists, an invertible t × t matrix X over F withXA = BX? Should X exist then A and B = XAX−1 are called similar. The answerto the question posed above is a resounding YES! The systematic method amounts toreducing the matrices xI −A and xI −B , which are t × t matrices over F [x], to theirSmith normal forms; if these forms are equal then A and B are similar and X can befound by referring back to the elementary operations used in the reduction processes;if these forms are different then A and B are not similar and X doesn’t exist. Thematrix xI − A should be familiar to the reader as det(xI − A) is the characteristicpolynomial of A. The non-constant diagonal entries in the Snf of xI − A are calledthe invariant factors of A. It is proved in Chapter 6 that A and B are similar if andonly if their sequences of invariant factors are equal. The theory culminates in the ra-tional canonical form (rcf) of A which is the simplest matrix having the same invariantfactors as A. It’s significant that the rcf is obtained in a constructive way; in particularthere is no reliance on factorisation into irreducible polynomials.

Preface ix

The analogy between the two parts is established using R-modules where R is acommutative ring. Abelian groups are renamed Z-modules and structure-preservingmappings (homomorphisms) of abelian groups are Z-linear mappings. The terminol-ogy helps the theory along: for instance the reader comfortable with 1-dimensionalsubspaces should have little difficulty with cyclic submodules. Each t × t matrix A

over a field F gives rise to an associated F [x]-module M(A). The relationship be-tween A and M(A) is explained in Chapter 5 where companion matrices are intro-duced. Just as each finite abelian group G is a direct sum of cyclic groups, so eachmatrix A, as above, is similar to a direct sum of companion matrices; the polynomialanalogue of the order |G| of G is the characteristic polynomial det(xI − A) of A.

The theory of the two parts can be conflated using the overarching concept of afinitely generated module over a principal ideal domain, which is the stance taken byseveral textbooks. An exception is Rings, Modules and Linear Algebra by B. Hartleyand T.O. Hawkes, Chapman and Hall (1970) which opened my eyes to the beauty ofthe analogy explained above. I willingly acknowledge the debt I owe to this classicexposition. The two strands are sufficiently important to merit individual attention;nevertheless I have adopted proofs which generalise without material change, that ofthe invariance theorem 3.7 being a case in point.

Mathematically there is nothing new here: it is a rehash of 19th and early 20thcentury matrix theory from Smith to Frobenius, ending with the work of Shoda onautomorphisms. However I have not seen elsewhere the step-by-step method of cal-culating the matrix Q described in Chapter 1 though it is easy enough once one hasstumbled on the basic idea. The book is an expansion of material from a lecture courseI gave in the University of London, off and on, over a 30 year period to undergradu-ates first at Westfield College and latterly at Royal Holloway. Lively students forcedme to rethink both theory and presentation and I am grateful, in retrospect, to them.Dr. W.A. Sutherland read and commented on the text and Dr. E.J. Scourfield helpedwith the number theory in Chapter 3; I thank both. Any errors which remain are myown.

Finally I hope the book will attract mathematics students to what is undoubtedly animportant and beautiful theory.

Christopher NormanLondon, UK

Contents

Part I Finitely Generated Abelian Groups

1 Matrices with Integer Entries: The Smith Normal Form . . . . . . . . 91.1 Reduction by Elementary Operations . . . . . . . . . . . . . . . . . 91.2 Existence of the Smith Normal Form . . . . . . . . . . . . . . . . . 171.3 Uniqueness of the Smith Normal Form . . . . . . . . . . . . . . . . 32

2 Basic Theory of Additive Abelian Groups . . . . . . . . . . . . . . . . 472.1 Cyclic Z-Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.2 Quotient Groups and the Direct Sum Construction . . . . . . . . . . 612.3 The First Isomorphism Theorem and Free Modules . . . . . . . . . . 75

3 Decomposition of Finitely Generated Z-Modules . . . . . . . . . . . . 973.1 The Invariant Factor Decomposition . . . . . . . . . . . . . . . . . 983.2 Primary Decomposition of Finite Abelian Groups . . . . . . . . . . 1203.3 Endomorphism Rings and Isomorphism Classes of Subgroups and

Quotient Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Part II Similarity of Square Matrices over a Field

4 The Polynomial Ring F [x] and Matrices over F [x] . . . . . . . . . . . 1654.1 The Polynomial Ring F [x] where F is a Field . . . . . . . . . . . . 1664.2 Equivalence of Matrices over F [x] . . . . . . . . . . . . . . . . . . 187

xi

xii Contents

5 F [x]-Modules: Similarity of t × t Matrices over a Field F . . . . . . . 2035.1 The F [x]-Module M(A) . . . . . . . . . . . . . . . . . . . . . . . . 2035.2 Cyclic Modules and Companion Matrices . . . . . . . . . . . . . . . 227

6 Canonical Forms and Similarity Classes of Square Matrices overa Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2516.1 The Rational Canonical Form of Square Matrices over a Field . . . . 2526.2 Primary Decomposition of M(A) and Jordan Form . . . . . . . . . . 2746.3 Endomorphisms and Automorphisms of M(A) . . . . . . . . . . . . 306

Solutions to Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 339

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Part IFinitely Generated Abelian Groups

A Bird’s-Eye View of Finitely Generated Abelian Groups

To get the general idea of the theory let’s start with the smallest non-trivial abeliangroup, that is, a cyclic group of order 2 which we denote by G. We will use additivenotation for all abelian groups meaning that the group operation is denoted by +. SoG consists of just two elements 0 and g �= 0 satisfying

0 + 0 = 0, 0 + g = g, g + 0 = g, g + g = 0.

Here 0 is the zero element of G. From the first and last of these equations we obtain−0 = 0 and −g = g, and so G in common with all additive groups is closed underaddition and negation. The reader should feel comfortable with the group G as it’ssmall enough to present no threat!

It is reasonable to express g + g = 0 as 2g = 0 and in the same way 3g =(g + g) + g = 0 + g = g and 4g = 3g + g = g + g = 0. By the associative law ofaddition, which holds in every additive group, there is no need for brackets in any sumof group elements. So for every positive integer n the group element ng is obtained byadding together n elements equal to g, that is,

ng = g + g + · · · + g (n terms).

So 1g = g and it is also reasonable to define (−n)g to be the group element −(ng)

and to define 0g to be the zero element 0 of G. Therefore we have given meaning tomg for all integers m (positive, negative and zero) and

mg ∈ G for all m ∈ Z, g ∈ G

2

that is, mg is a certain element of G; we define m0 to be the zero element of G,and so G is closed under integer multiplication. This procedure can be carried outfor any additive group and gives it the structure of a Z-module. For our particular G

we have mg = 0 for even m, and mg = g for odd m. We use |G| for the number ofelements (the order of G) in the finite group G. So |G| = 2 in our case. Also G iscyclic with generator g and we write G = 〈g〉 since the elements of G are preciselythe integer multiples of the single element g. The reader will doubtless be familiarwith the field Z2 having just the elements 0 and 1 working modulo 2. Ignoring mul-tiplication, we see that the additive group of Z2 is isomorphic (abstractly identical)to G as

0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0

showing that 0 and 1 in Z2 behave in the same way as 0 and g in G. In fact theadditive group of Z2 is the standard example of a cyclic group of order 2 and anysuch group is said to be of isomorphism type C2. What is behind the close con-nection between Z2 and G? It’s worth finding out because a vista of the wholetheory of finitely generated (f.g.) abelian groups can be glimpsed from this stand-point.

Let θ : Z → G be the mapping defined by (m)θ = mg for all integers m. Then θ isa homomorphism from the additive group of Z to G, meaning

(m1 + m2)θ = (m1)θ + (m2)θ for all m1,m2 ∈ Z.

As (0)θ = 0 and (1)θ = g we see that θ is surjective (onto), that is, the image im θ ={(m)θ : m ∈ Z} of θ equals G. So G = im θ . Which elements of Z does θ map to thezero element of G? These are the elements of the kernel of θ . We write K = ker θ forthis important subgroup of Z. In our case

K = {m ∈ Z : (m)θ = 0} = {m ∈ Z : mg = 0} = {m ∈ Z : m is even}.

So ker θ = 〈2〉 is the subgroup of all even integers. The coset K + 1 ={m + 1 : m ∈ K} is the set of all odd integers. There are just two cosets of K in Z,namely K and K + 1 corresponding to the two elements 0 and g of G. These twocosets are the elements of the quotient group Z/K = Z/〈2〉 = {K,K + 1} with addi-tion

K + K = K, K + (K + 1) = K + 1,

(K + 1) + K = K + 1, (K + 1) + (K + 1) = K.

But hang on a moment: we’ve seen these equations twice before! The reader will knowthat K = 0 and K +1 = 1 are the two elements of Z2 and so Z/〈2〉 is the additive groupof Z2. Also θ leads to the isomorphism (bijective homomorphism)

θ : Z/K ∼= im θ where (K + m)θ = (m)θ for all m ∈ Z.

3

So θ is an additive bijection between the groups Z/〈2〉 and G, that is,

(x + y)θ = (x)θ + (y)θ for all x, y ∈ Z/〈2〉.

In our case (0)θ = 0, (1)θ = g and we write θ : Z/〈2〉 ∼= G. Luckily, as we’ll see, theinverse of every isomorphism is again an isomorphism.

We’ve deliberately made a fuss over the way θ gives rise to θ since this is a par-ticular case of the first isomorphism theorem 2.16. As another (not entirely facetious)application of this important theorem, consider the identity isomorphism ι : Z → Z ofthe additive group Z and so (m)ι = m for all m in Z. As im ι = Z and ker ι = 〈0〉,by the first isomorphism theorem ι : Z/〈0〉 ∼= Z where ι maps the singleton cosetker ι + m = 〈0〉 + m = {m} to the integer m, that is, ({m})ι = m for all m in Z. NowZ = 〈1〉 is an infinite cyclic group being generated by the integer 1; so it’s conve-nient to say that such groups are of isomorphism type C0 as they are isomorphic toZ/〈0〉.

Let G′ denote the abelian group having the 8 pairs (x, y) as its elements wherex ∈ Z2 and y ∈ Z4, the group operation being componentwise addition, that is,

(x, y) + (x′, y′) = (x + x′, y + y′) for all x, x′ ∈ Z2 and y, y′ ∈ Z4.

The elements of G′ are

(0,0), (1,0), (0,1), (1,1), (0,2), (1,2), (0,3), (1,3).

The zero element of G′ is (0,0) and −(1,1) = (−1,−1) = (1,3) since −1 = 1 in Z2and −1 = 3 in Z4. The reader should realise that the group laws hold and so G′ is anabelian group. In fact G′ is the (external) direct sum of the additive group Z/〈2〉 of thefield Z2 and the additive group Z/〈4〉 of the ring Z4 and we write G′ = Z/〈2〉⊕Z/〈4〉.The element g1 = (1,0) has order 2 (the smallest positive integer multiple of g1 equalto the zero element is the order of g1) and g2 = (0,1) has order 4. As 4g = (0,0) forall g in G′, we see that G′ is not cyclic since G′ has no element of order |G′| = 8.However every element of G′ is uniquely expressible as a sum of two elements, onefrom the cyclic subgroup H1 = 〈g1〉 of order 2 and one from the cyclic subgroupH2 = 〈g2〉 of order 4, that is, G′ is the (internal) direct sum of its cyclic subgroupsH1 and H2 and we write G′ = H1 ⊕ H2. As H1 has isomorphism type C2 and H2 hasisomorphism type C4, we say that G′ has isomorphism type C2 ⊕ C4. It is time todefine exactly what we are talking about.

An additive abelian group G is called finitely generated (abbreviated to f.g.) if itcontains a finite number t of elements g1, g2, . . . , gt such that every element of G isexpressible as an integer linear combination m1g1 + m2g2 + · · · + mtgt (mi ∈ Z), inwhich case we write G = 〈g1, g2, . . . , gt 〉 and say that g1, g2, . . . , gt generate G.

A group is cyclic if it is of the form 〈g1〉, that is, its elements consist purely andsimply of all integer multiples of just one element g1.

4

The fundamental theorem 3.4 concerning f.g. abelian groups states

Every f.g. abelian group G has cyclic subgroups H1,H2, . . . ,Ht such thatG = H1 ⊕ H2 ⊕ · · · ⊕ Ht

meaning that every element g of G is expressible as g = h1 + h2 + · · · + ht wherehi ∈ Hi for 1 ≤ i ≤ t and (most crucially) the hi are unique. So every element g ofG is expressible in one and only one way as a sum of elements hi from the cyclicsubgroups Hi , that is, G is the (internal) direct sum of its cyclic subgroups Hi .

Let Cdibe the isomorphism type of Hi for 1 ≤ i ≤ t , and so Hi is cyclic of fi-

nite order di for di > 0 and Hi is infinite cyclic for di = 0. This theorem goes on tosay that the subgroups Hi can be chosen so that the sequence of non-negative inte-gers

d1, d2, . . . , dt satisfies d1 �= 1 and di |di+1 (di is a divisor of di+1) for 1 ≤ i < t.

Finally Corollary 3.5 and Theorem 3.7 deliver the ‘clincher’: d1, d2, . . . , dt areuniquely determined by G, that is, for each f.g. abelian group G there is oneand only one sequence d1, d2, . . . , dt as above. The integers di are called the in-variant factors of G and Cd1 ⊕ Cd2 ⊕ · · · ⊕ Cdt is called the isomorphism typeof G.

The preceding example G′ = Z/〈2〉⊕Z/〈4〉 has invariant factors 2,4. A non-trivialcyclic group has a single invariant factor. Trivial groups have no invariant factors (taket = 0) and, having one element only, belong to the isomorphism class C1. A word ofcaution: although the invariant factors di of G are unique (they depend on G only)the subgroups Hi are not in general unique. For instance G′ = Z/〈2〉 ⊕ Z/〈4〉 hasthe ‘obvious’ decomposition G′ = 〈g1〉 ⊕ 〈g2〉 where g1 = (1,0), g2 = (0,1), but alsoG′ = 〈g1 + 2g2〉 ⊕ 〈g1 + g2〉 is a less obvious but equally valid decomposition. So G′can be decomposed in two different ways as in the fundamental theorem and each wayshows that C2 ⊕ C4 is the isomorphism type of G′.

Now consider the homomorphism θ : Z2 → G′ where (m1,m2)θ = m1g1 + m2g2for all integers m1,m2. Then (1,0)θ = 1g1 + 0g2 = g1 = (1,0) and (0,1)θ =0g1 +1g2 = g2 = (0,1). As G′ = 〈g1, g2〉 we see that θ is surjective, that is, im θ = G′and so G′ is a homomorphic image of Z2. In fact every abelian group which can begenerated by two of its elements is a homomorphic image of Z

2, and the additivegroup Z

2 of all pairs (m1,m2) of integers is the ‘Big Daddy’ of such groups. NoticeZ

2 = 〈e1, e2〉 = 〈e1〉 ⊕ 〈e2〉 where e1 = (1,0) and e2 = (0,1), showing that Z2 is theinternal direct sum of its infinite cyclic subgroups 〈e1〉 and 〈e2〉. So Z

2 has isomor-phism type C0 ⊕ C0. Although Z

2 is not a vector space (we need hardly remind thereader – but we will to be on the safe side – that Z is not a field as 1/2 (the inverseof the integer 2) is not an integer), nevertheless Z

2 has some of the properties of a2-dimensional vector space especially where bases are concerned.

The ordered pair of elements ρ1, ρ2 of Z2 is called a Z-basis of Z2 if each elementof Z2 can be expressed in the form m1ρ1 + m2ρ2 for unique integers m1 and m2. Itturns out that every Z-basis of Z2 consists of the rows of an invertible 2 × 2 matrix

5

Q = ( ρ1ρ2

)over Z, that is, Q has an inverse Q−1 which also has integer entries. The

condition for Q to have this property is that the determinant of Q is an invertibleinteger, in other words detQ = ±1, as ±1 are the only integers having integer inverses.Of course e1, e2 is a Z-basis (the standard basis) of Z2 as e1, e2 are the rows of theidentity matrix I over Z and det I = 1. But ρ1 = (3,5), ρ2 = (2,3), for example, isalso a Z-basis of Z2 since

∣∣ 3 52 3

∣∣ = 3 × 3 − 5 × 2 = −1. How can the unique integersm1 and m2 be found in the case of the element (7,8) of Z

2? The answer is foundusing Q−1:

(7,8) = (7,8)I = (7,8)Q−1Q = (7,8)

(−3 52 −3

)(ρ1

ρ2

)= (−5,11)

(ρ1

ρ2

)

= −5ρ1 + 11ρ2.

So m1 = −5, m2 = 11 in this case.Returning to θ : Z2 → G′ we turn our attention to the subgroup K = ker θ of Z2.

As K consists of the elements (m1,m2) of Z2 with (m1,m2)θ = (0,0) ∈ G′ and

(m1,m2)θ = m1g1 + m2g2 = m1(1,0) + m2(0,1) = (m1,0) + (0,m2) = (m1,m2),we see that (m1,m2) belongs to K ⇔ (m1,m2) = (0,0) in G′, that is, m1 = 0 in Z2and m2 = 0 in Z4. So

K = {(m1,m2) ∈ Z

2 : m1 ≡ 0 (mod 2), m2 ≡ 0 (mod 4)}.

In other words K consists of pairs (m1,m2) in Z2 with m1 even and m2 divisible by 4,

that is, K = 〈2e1,4e2〉. In fact Theorem 3.1 says:

Every subgroup K of Zt has a Z-basis consisting of at most t elements of K

This apparently modest theorem allows the theory of f.g. abelian groups to be ex-pressed in terms of matrices over Z. In our case 2e1,4e2 is a Z-basis of K . We con-struct the 2 × 2 matrix

D =(

2e1

4e2

)=

(2 00 4

)

over Z. The significance of D will become clear later. For the moment notice that therows of D generate the kernel K of θ . On dividing m1 by 2 and m2 by 4 a typicalelement (m1,m2) of Z2 is

(m1,m2) = (2q1 + r1,4q2 + r2) = (q12e1 + q24e2) + (r1, r2) ∈ K + (r1, r2)

where q1, q2, r1, r2 ∈ Z and 0 ≤ r1 < 2, 0 ≤ r2 < 4. So (m1,m2) belongs to the cosetK + (r1, r2). There are 8 cosets of K in Z

2 namely

K + (0,0), K + (1,0), K + (0,1), K + (1,1),

K + (0,2), K + (1,2), K + (0,3), K + (1,3)

6

and these are the elements of the quotient group Z2/K . Applying the first isomorphism

theorem to θ : Z2 → G′ gives the isomorphism

θ : Z2/K ∼= Z/〈2〉 ⊕Z/〈4〉 where (K + (r1, r2))θ = (r1, r2),

that is, θ amounts to no more than a small change in notation. It’s something of ananti-climax to realise that θ is such a bland isomorphism, but the reason is not hard tofind: we started with a group G′ which was already decomposed (expressed as a directsum) as in the fundamental theorem and so there’s nothing for θ to do!

Our final example is of a ‘mixed-up’ group G′′ which, in common with all f.g.abelian groups, can nevertheless be decomposed as above. Let G′′ = Z

2/K ′ whereK ′ = 〈(4,6), (8,10)〉. We construct A = (

4 68 10

)and so K ′ is the subgroup of Z2 gen-

erated by the rows of A. Then G′′ = 〈K ′ + e1,K′ + e2〉 is an f.g. abelian group. There

are invertible matrices

P =(

1 0−1 1

)and Q =

(2 31 1

)

over Z such that PAQ−1 = D = (2 00 4

). We will explain how P and Q are found from

A in Chapter 1; for the moment the reader should check that PA = DQ. The diagonalmatrix D is called the Smith normal form of A and reveals the structure of G′′. In factwe’ll see in a moment that d1 = 2, d2 = 4 are the invariant factors of G′′. Denote therows of Q by ρ1 and ρ2. So Z

2 has Z-basis ρ1 = (2,3), ρ2 = (1,1) as Q is invertibleover Z. The rows of A generate K ′ and, as P is invertible over Z, it follows that therows of PA also generate K ′. The equation PA = DQ = ( 2ρ1

4ρ2

)shows that 2ρ1, 4ρ2

generate K ′. In fact 2ρ1, 4ρ2 is a Z-basis of K ′ and so

Z2 = 〈ρ1, ρ2〉, K ′ = 〈2ρ1,4ρ2〉.

We’ve seen this type of thing before, the only difference being that e1, e2 have beenreplaced by ρ1, ρ2. Consider the natural homomorphism

η : Z2 → Z2/K ′ = G′′

defined by (m1,m2)η = K ′ + (m1,m2) for all m1,m2 ∈ Z. Then imη = G′′ andkerη = K ′. As in the case of G′ we obtain

G′′ = H ′1 ⊕ H ′

2

where H ′1 = 〈(ρ1)η〉 is cyclic of order 2 and H ′

2 = 〈(ρ2)η〉 is cyclic of order 4. Thecrucial fact is that K ′ has a Z-basis consisting of integer multiples (the integers be-ing the invariant factors of G′′) of the elements of a Z-basis of Z

2. The standardZ-basis e1, e2 is good enough to reveal the structure of the preceding example G′ =Z/〈2〉 ⊕ Z/〈4〉. The Z-basis ρ1, ρ2 does the analogous job for G′′ =

7

Z2/〈(4,6), (8,10)〉. As G′ and G′′ have the same sequence of invariant factors, namely

2, 4, the groups G′ and G′′ are isomorphic.

EXERCISES

1. Show that ρ1 = (3,4), ρ2 = (4,5) is a Z-basis of the additive group Z2.

Find integers m1 and m2 such that (10,7) = m1ρ1 +m2ρ2. Do (3,5), (5,6)

form a Z-basis of Z2?2. List the six cosets of K = 〈2e1,3e2〉 in Z

2. Show that Z2/K is cyclic withgenerator g0 = K + e1 + e2. What is the invariant factor of Z2/K?

3. Show that G′ = Z/〈2〉 ⊕ Z/〈4〉 has three elements of order 2 and fourelements of order 4. What is the isomorphism type of the subgroup H =〈(1,0), (0,2)〉? State the isomorphism types of the other seven subgroupsof G′.Hint: Except for G′ itself they are all cyclic.Specify the four pairs of subgroups H1, H2 with G′ = H1 ⊕ H2 where H1and H2 have isomorphism types C2 and C4 respectively.

1Matrices with Integer Entries: The Smith

Normal Form

We plunge in at the deep end by discussing the equivalence of rectangular matriceswith whole number entries. The elegant and concrete conclusion of this theory wasfirst published in 1861 by Henry J.S. Smith and it is exactly what is needed to analysethe abstract concept of a finitely generated abelian group, which is carried out inChapter 3.

1.1 Reduction by Elementary Operations

Let A denote an s × t matrix over the ring Z of integers, that is, all the entries in A

are whole numbers. All matrices in this chapter have integer entries. We consider theeffect of applying elementary row operations (eros) over Z and elementary columnoperations (ecos) over Z to the matrix A, that is, operations of the following types:

(i) interchange of two rows or two columns(ii) changing the sign of a row or a column

(iii) addition of an integer multiple of a row/column to a different row/column.We use r1 ↔ r2 to denote the interchange of rows 1 and 2. We use −c3 to mean:change the sign of column 3. Also r3 + 5r1 means: to row 3 add five times row 1,and so on. Notice that all these operations are invertible and the inverse operationsare again of the same type: operations (i) and (ii) are self-inverse, and the inverse ofr3 + 5r1, for example, is r3 − 5r1.

C. Norman, Finitely Generated Abelian Groups and Similarity of Matrices over a Field,Springer Undergraduate Mathematics Series,DOI 10.1007/978-1-4471-2730-7_1, © Springer-Verlag London Limited 2012

9

http://dx.doi.org/10.1007/978-1-4471-2730-7_1

10 1. Matrices with Integer Entries: The Smith Normal Form

On applying a single ero over Z to the identity matrix I we obtain the elementarymatrix over Z corresponding to the ero. For instance

(0 11 0

),

(1 00 −1

),

(1 50 1

),

(1 0

−3 1

)

are the elementary 2 × 2 matrices which result on applying r1 ↔ r2, −r2, r1 + 5r2,r2 − 3r1 respectively to the 2 × 2 identity matrix I . Every elementary matrix can beobtained equally well by applying a single eco to I ; the above four matrices arise fromI by c1 ↔ c2, −c2, c2 + 5c1, c1 − 3c2. Elementary matrices themselves are unbiased:they do not prefer rows to columns or vice versa. An ero and an eco are paired if theyproduce the same elementary matrix. So r1 + 5r2 and c2 + 5c1 are paired elementaryoperations. Every elementary matrix over Z is invertible and its inverse is again anelementary matrix over Z; for instance, applying the inverse pair of eros r2 + 3r1 andr2 − 3r1 to the 2 × 2 identity matrix I produces the inverse pair

(1 03 1

),

(1 0

−3 1

)

of elementary matrices.Let

P1 =(

1 03 1

)and A =

(a b

c d

).

Then

P1A =(

a b

3a + c 3b + d

).

This tells us that premultiplying A (multiplying A on the left) by the elementary matrixP1 has the effect of applying the corresponding ero r2 + 3r1 to A. We say that A andP1A are equivalent and use the notation

A ≡r2+3r1

P1A.

Let Q1 = (0 11 0

). Then AQ1 = (

b ad c

).

Therefore postmultiplication (multiplication on the right) by the elementary matrixQ1 carries out the corresponding eco c1 ↔ c2 on A. As above, we call A and AQ1

equivalent matrices and write

A ≡c1↔c2

AQ1.

The general principle

pre/postmultiplication by an elementary matrix carriesout the corresponding ero/eco

will be established in Lemma 1.4.

1.1 Reduction by Elementary Operations 11

Now we wish to apply not just one elementary operation but a sequence of erosand ecos to an s × t matrix A. These operations are to be carried out in a particularorder and so let P1,P2, . . . be the elementary s × s matrices corresponding to thefirst, second, . . . of the eros we wish to apply, and let Q1,Q2, . . . be the elementaryt × t matrices corresponding to the first, second, . . . of the ecos we wish to apply. Forsimplicity, let’s suppose there are just two eros and three ecos. Then A can be changedto P2P1AQ1Q2Q3 by following one of the ten routes through the diagram

A AQ1 AQ1Q2 AQ1Q2Q3

P1A P1AQ1 P1AQ1Q2 P1AQ1Q2Q3

P2P1A P2P1AQ1 P2P1AQ1Q2 P2P1AQ1Q2Q3

from top left to bottom right. The associative law of matrix multiplication ensures thatwe arrive at the same destination provided that the row operations are performed inthe correct order amongst themselves, and similarly for the column operations.

We are now ready to start the main task of this chapter: to what extent can a matrixA over Z be simplified by applying elementary operations? You should know fromyour experience of eigenvectors and eigenvalues that diagonal matrices are often whatone is aiming for. In this context, as we shall see, not only can each s × t matrix A bechanged into a diagonal s × t matrix D say (all (i, j)-entries in D are zero for i �= j ),but also the diagonal entries d1, d2, d3, . . . in D can be arranged to be non-negativeand such that d1 is a divisor of d2, d2 is a divisor of d3, and so on. The matrix D isthen unique and is known as the Smith normal form of A. The non-negative integersd1, d2, d3, . . . are called the invariant factors of A.

Let us assume that we have reduced A to D by five elementary operations asabove, that is, P2P1AQ1Q2Q3 = D. Write P = P2P1, and Q = (Q1Q2Q3)

−1. We’llsee later that Q is a particularly important matrix. For the moment notice that Q =(Q1Q2Q3)

−1 = Q−13 Q−1

2 Q−11 expresses Q as a product of elementary matrices. So

P and Q are invertible over Z and satisfy

PA = DQ.

The matrices A and D are said to be equivalent over Z and we write A ≡ D. Thegeneral definition of equivalence over Z is stated in Definition 1.5.

An ero and an eco are called conjugate if their corresponding elementary matricesare an inverse pair. Therefore ri ↔ rj is conjugate to ci ↔ cj , and −ri is conjugate to−ci . More importantly as we explain in a moment

ri − mrj is conjugate to cj + mci for m ∈ Z, i �= j.


For instance, in the 2 × 2 case, r2 − 3r1 is conjugate to c1 + 3c2 since

(1 0

−3 1

)(1 03 1

)=

(1 00 1

).

More generally let Eij denote the t × t matrix with (i, j)-entry 1 and zeros elsewhere.For i �= j we see E2

ij = 0 as row i of Eij fails to ‘make contact’ with column j of Eij

in the matrix product. Hence

(I − mEij )(I + mEij ) = I − mEij + mEij − m2E2ij = I

showing that the elementary matrix I −mEij corresponding to ri −mrj is the inverseof the elementary matrix I + mEij corresponding to cj + mci , that is, ri − mrj isconjugate to cj + mci as stated above. Notice that the conjugate of an eco is an eroand vice-versa.

The invertible matrices P and Q satisfying PA = DQ can be calculated stage bystage as the reduction of A to D progresses. The equation P = P2P1I tells us thatthe matrix P is the combined effect of applying the eros used in the reduction processto I . In the same way the equation Q = Q−1

3 Q−12 Q−1

1 I tells us that the matrix Q

is the cumulative effect of applying the conjugates of the ecos used in the reductionprocess to I .

Next, three reductions are worked through in detail.

Example 1.1

Let A = (4 68 10

). We concentrate first on row 1 applying the following sequence of

elementary operations to A:

A =(

4 68 10

)≡

c2−c1

(4 28 2

)≡

c1−2c2

(0 24 2

)≡

c1↔c2

(2 02 4

)≡

r2−r1

(2 00 4

)= D.

It is worth noticing that

the (1,1)-entry in D is the greatest common divisor (gcd) of all the entries in A

that is, 2 = gcd{4,6,8,10} in this case. This fact will be used to establish the unique-ness of the Smith normal form in Section 1.3.

The matrix P resulting from the above sequence of operations is found by applyingthe above ero (there is only one) to I :

(1 00 1

)≡

r2−r1

(1 0

−1 1

)= P.


The matrix Q resulting from the above reduction is found by applying, in order, theconjugates of the above ecos to I :

(1 00 1

)≡

r1+r2

(1 10 1

)≡

r2+2r1

(1 12 3

)≡

r1↔r2

(2 31 1

)= Q.

You should now check that P and Q are invertible over Z: in fact

P −1 =(

1 01 1

)and Q−1 =

(−1 31 −2

).

The matrix equation

PA =(

4 64 4

)= DQ

shows that P and Q do what is required of them. So PAQ−1 = D = diag(2,4) is theSmith normal form of A, the invariant factors of A being 2 and 4.

The reader should realise that although every matrix A over Z has a unique Smithnormal form D, the matrices P and Q (both invertible over Z) satisfying PA = DQ

are by no means unique. You can check that the sequence r2 − 2r1, r1 + 3r2, −r2,c1 ↔ c2, r1 ↔ r2 also reduces the above matrix A to D and leads to

P ′ =(

2 −1−5 3

)and Q′ =

(0 11 0

)

which are invertible over Z and satisfy P ′A = DQ′.

Example 1.2

Consider the 3 × 3 matrix

A =⎛

⎝2 4 68 10 1214 16 18

⎞

⎠ .

In this case the gcd of the entries in A is already the (1,1)-entry and so we beginby clearing (making zero) the other entries in row 1 and col 1. The reduction is thenfinished by applying the same method to the 2 × 2 submatrix obtained by deleting (inone’s mind) row 1 and col 1.

A =⎛

⎝2 4 68 10 1214 16 18

⎞

⎠ ≡c2−2c1c3−3c1

⎛

⎝2 0 08 −6 −12

14 −12 −24

⎞

⎠ ≡r2−4r1r3−7r1

⎛

⎝2 0 00 −6 −120 −12 −24

⎞

⎠


≡−r2−r3

⎛

⎝2 0 00 6 120 12 24

⎞

⎠ ≡c3−2c2

⎛

⎝2 0 00 6 00 12 0

⎞

⎠ ≡r3−2r2

⎛

⎝2 0 00 6 00 0 0

⎞

⎠ = D.

Note that two ecos have been applied apparently simultaneously at the first stage; thisis unambiguous as these ecos commute: the order in which they are applied makes nodifference. This is always the case when the gcd entry is used to clear the remainingentries in row 1. In the same way the eros used at stage 2 commute as do the erosused at stage 3. However, the reader is warned against carrying out too many ele-mentary operations simultaneously, as generally the order in which they are applied isimportant.

The properties required of the Smith normal form D = diag(2,6,0) of A are moreclearly in evidence: the invariant factors 2, 6, 0 of A are non-negative, 2 is a divisorof 6, and 6 is a divisor of 0. In fact every integer m is a divisor of 0 since m × 0 = 0.As far as divisors are concerned, 0 is the largest integer!

We calculate P and Q as before. Applying the eros in the above sequence to I :⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r2−4r1r3−7r1

⎛

⎝1 0 0

−4 1 0−7 0 1

⎞

⎠ ≡−r2−r3

⎛

⎝1 0 04 −1 07 0 −1

⎞

⎠

≡r3−2r2

⎛

⎝1 0 04 −1 0

−1 2 −1

⎞

⎠ = P.

Applying the conjugates of the ecos in the above sequence to I :

⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r1+2r2r1+3r3

⎛

⎝1 2 30 1 00 0 1

⎞

⎠ ≡r2+2r3

⎛

⎝1 2 30 1 20 0 1

⎞

⎠ = Q.

The reader can now verify

PA =⎛

⎝2 4 60 6 120 0 0

⎞

⎠ = DQ.

Incidentally, the above equation can be used to find all integer solutions x of the systemof equations xA = 0, that is, all 1 × 3 matrices x = (x1, x2, x3) over Z satisfying

(x1, x2, x3)

⎛

⎝2 4 68 10 12

14 16 18

⎞

⎠ = (0,0,0)

The above system can be transformed into a simpler system which can be solved onsight. To do this, put y = xP −1. Then x = yP and so xA = 0 becomes yPA = 0


which is the same as yDQ = 0. Postmultiplying by Q−1 produces yD = 0 which isthe simpler system referred to. Putting y = (y1, y2, y3) we obtain

(y1, y2, y3)

⎛

⎝2 0 00 6 00 0 0

⎞

⎠ = (0,0,0)

which means 2y1 = 0, 6y2 = 0, 0y3 = 0. The solution is y1 = 0, y2 = 0, y3 arbitrary(any integer). So the general solution of yD = 0 is y = (0,0, y3). Hence

x = yP = (0,0, y3)P = y3(−1,2,−1)

is the general solution in integers of the system xA = 0, where y3 ∈ Z. We have shownthat the set of these solutions has Z-basis (−1,2,−1) which is row 3 of P . This pointwill be taken up in Section 2.3. In general the matrix equation PA = DQ shows thatthe last so many rows of P corresponding to the zero rows of D form a Z-basis of theinteger solutions x of xA = 0.

Example 1.3

The reduction method applies to s × t matrices A over Z which might not be squareand the case s = 1, t = 2 is particularly significant.

Let A = (204,63). As A has only one row there is no need for eros and the reduc-tion can be done using ecos alone. The following sequence of ecos:

A = (204,63) ≡c1−3c2

(15,63) ≡c2−4c1

(15,3) ≡c1−5c2

(0,3) ≡c1↔c2

(3,0) = D

shows that D = (3,0) is the Smith normal form of A and 3 (the only diagonal entryin D) is the only invariant factor of A. The reader will recognise this reduction as littlemore than the Euclidean algorithm, the sequence of divisions of one positive integerby another, showing gcd{204,63} = 3. As there are no row operations involved wetake P = I , the 1 × 1 identity matrix. As before Q can be calculated by applying theconjugates of the above ecos to the 2 × 2 identity matrix I :(

1 00 1

)≡

r2+3r1

(1 03 1

)≡

r1+4r2

(13 43 1

)≡

r2+5r1

(13 468 21

)≡

r1↔r2

(68 2113 4

)= Q.

The matrix equation PA = DQ becomes simply A = DQ which, on comparing en-tries, gives the factorisations 204 = 3×64 and 63 = 3×21. Comparing leading entriesin AQ−1 = D, that is,

(204,63)

(−4 2113 −68

)= (3,0),


gives

−4 × 204 + 13 × 63 = gcd{204,63}.It is an important property of gcds that for each pair of integers l,m there are integersa, b satisfying

al + bm = gcd{l,m}(see Corollary 1.16). In general a = q22 detQ and b = −q21 detQ where Q = (qij ) isan invertible 2×2 matrix over Z with A = (l,m) = DQ and D in Smith normal form.

In the following parts of this chapter we show first that every s × t matrix A over Zcan be reduced to a matrix D in Smith normal form using eros and ecos, and secondlythat, no matter how the reduction is done, a given initial matrix A always leads to thesame terminal matrix D.

EXERCISES 1.1

1. Write down the elementary 2 × 2 matrices corresponding to the followingelementary operations: r1 + 2r2, r2 − 3r1, c2 − 2c1, c1 − 3c2.Are any two of the above operations (a) paired (b) conjugate?Which eros leave rows 2,3,4, . . . unchanged?

2. Use eros and ecos to reduce the following matrices A to Smith normalform D. In each case determine matrices P and Q, which are products ofelementary matrices over Z, such that PA = DQ.

(i)

(4 86 10

); (ii)

(7 12

12 21

); (iii)

(6 1015 0

).

3. Answer Question 2 above in the case of the following matrices A:

(i)

⎛

⎝1 2 33 1 22 3 1

⎞

⎠ ; (ii)

⎛

⎝1 2 32 3 43 4 5

⎞

⎠ ; (iii)

⎛

⎝3 0 00 2 00 0 4

⎞

⎠ .

Hint: For (iii) begin with r1 + r2, c1 − c2.In each case find all integer solutions x = (x1, x2, x3) of the system ofequations xA = 0.

4. For each of the following 1 × 2 matrices A = (l,m) over Z, use the Eu-clidean algorithm and ecos to find an invertible 2 × 2 matrix Q overZ satisfying A = DQ, where D = (d,0), d = gcd{l,m}. Hence writedown l/d and m/d as integers. Evaluate detQ and find integers a, b withal + bm = d in each case:

(i) (72,42); (ii) (34,55); (iii) (7497,5474).

1.2 Existence of the Smith Normal Form 17

5. (a) Change the 1 × 2 matrix (0, d) into (d,0) using ecos of type (iii).(b) By applying the sequence: c1 + c2, c2 − c1, c1 + c2 once, twice,

or three times as required, show that every 1 × 2 matrix A over Z

can be changed into a matrix with non-negative entries using ecos oftype (iii).

(c) Using the Euclidean algorithm, show that every 1×2 matrix A over Zcan be reduced to Smith normal form D using only ecos of type (iii).Deduce that AQ = D where detQ = 1.Hint: Type (iii) ecos leave determinants unchanged.

(d) Let P be a 2×2 matrix Z with detP = 1. Show that P can be reducedto Smith normal form I using only ecos of type (iii).Hint: Apply part (c) to row 1 of P .

(e) Reduce(

3 45 7

)to I using only ecos of type (iii).

6. (a) Show that the matrix transpose P T of an elementary matrix P is itselfelementary. How are the corresponding eros related?

(b) Let A be a square matrix over Z. Suppose PA = DQ where P and Q

are products of elementary matrices and D is diagonal. By transposingPA = DQ show that QT PA is symmetric, and interpret this result interms of eros.

(c) Change A = (2 78 5

)into a symmetric matrix using eros only.

1.2 Existence of the Smith Normal Form

The examples of the previous section suggest that every matrix A over Z is reducibleto its Smith normal form D using eros and ecos. Although this is true, the readershould not be lulled into a false sense of security simply because it ‘works out’ in afew special cases. A mathematical proof is required to clinch the hunch you shouldnow have, and the serious business of laying this proof out is now our concern. Thereader should take heart as, in this instance, the transition from practice to theory isrelatively smooth: most of the steps in the reduction of a general matrix have alreadybeen encountered in the numerical examples of the previous section.

In Chapters 4 and 5 we shall need to replace the ring Z of integers by the ringF [x] of polynomials with coefficients in the field F . So it is important to be aware ofthe particular property of Z which allows the reduction process to work; in fact it’snothing more than the familiar integer division property:

for each pair of integers m, n with n > 0 there are unique integers q, r withm = nq + r and 0 ≤ r < n.


In other words, the positive integer n divides q (the quotient) times into the ar-bitrary integer m with remainder r . A suitably modified division property involvingdegree of polynomials holds in F [x], and this analogy between Z and F [x] will allowus, in the second half of this book, to deal with matrices having polynomial entries inmuch the same way as matrices over Z.

We begin the details with a closer look at the principle used repeatedly in Sec-tion 1.1, namely that pre/postmultiplication by an elementary matrix carries out thecorresponding ero/eco. The reader is likely to have met this idea in the context of ma-trices with entries from a field F , where the operation of multiplying a row or columnby any non-zero scalar (the invertible elements of F ) is allowed; here, by contrast,rows and columns may be multiplied by ±1 (the only invertible elements of Z), thatis, their signs may be changed, but that’s all! In both contexts elementary operationsare invertible and their inverses are also elementary.

Throughout we use ei to denote row i of the identity matrix I . Using matrix trans-position, column j of I is denoted by eT

j . The number of entries in ei and eTj should

be clear from the context. For any matrix A

eiA is row i of A, AeTj is column j of A, eiAeT

j is the (i, j)-entry in A.

These useful facts are direct consequences of the matrix multiplication rule.

Lemma 1.4

Let A be an s × t matrix over Z. Let P1 be an elementary s × s matrix over Z. LetQ1 be an elementary t × t matrix over Z. The matrix P1A is the result of applying toA the ero corresponding to P1. The matrix AQ1 is the result of applying to A the ecocorresponding to Q1.

Proof

Let rj ↔ rk be the ero corresponding to P1 where 1 ≤ j, k ≤ s, j �= k. So P1 isobtained from I by interchanging row j and row k. The equations

ejP1 = ek, ekP1 = ej , eiP1 = ei for i �= j, k

describe all the rows of P1 and hence P1 itself. Postmultiplying by A gives

ejP1A = ekA, ekP1A = ejA, eiP1A = eiA, i �= j, k

which describe the s rows of P1A in terms of the rows of A: row j of P1A is row k

of A, row k of P1A is row j of A, row i of P1A is row i of A for i �= j, k. In otherwords, P1A is obtained from A by applying rj ↔ rk .


We leave the reader to deal with eros of types (ii) and (iii) in the same way.Now let cj + lck be the eco corresponding to Q1 where 1 ≤ j, k ≤ t , j �= k, and l

is any integer. So Q1 is the result of applying cj + lck to I . The equations

Q1eTj = eT

j + leTk , Q1e

Ti = eT

i for i �= j

describe all the columns of Q1. Premultiplying by A gives

AQ1eTj = AeT

j + lAeTk , AQ1e

Ti = AeT

i for i �= j

which describe the t columns of AQ1 in terms of the columns of A: col j of AQ1 iscol j of A plus l times col k of A, col i of AQ1 is col i of A for i �= j . So AQ1 resultson applying cj + lck to A.

As before we leave the reader to deal with ecos of types (i) and (ii). �

Suppose now that the s × t matrix A over Z is subjected to a finite number oferos and ecos carried out in succession. Let P1,P2, . . . ,Pu be the elementary matri-ces corresponding to these eros and let Q1,Q2, . . . ,Qv be the elementary matricescorresponding to these ecos. Matrix multiplication is designed for this very situation!Applying the eros first we see that A changes to P1A, then P1A changes to P2P1A,and so on until the matrix Pu · · · P2P1A is obtained; strictly speaking Lemma 1.4has been used u times here. Let P = Pu · · ·P2P1. Then PA is the result of apply-ing these u eros to A. Secondly applying the ecos, we see that PA changes to PAQ1,then PAQ1 changes to PAQ1Q2, until ultimately PAQ1Q2 · · ·Qv is obtained, usingLemma 1.4 a further v times. Let Q = Q−1

v · · ·Q−12 Q−1

1 . Then Q−1 = Q1Q2 · · ·Qv

and so

PAQ−1 = Pu · · ·P2P1AQ1Q2 · · ·Qv

is the result of applying the given sequence of u eros and v ecos to A. The readerwill have to wait until Chapter 3 to fully appreciate the significance of the invertiblematrix Q, although this was touched on in the overview before Chapter 1. In factthe rows of Q tell us how to decompose f.g. abelian groups, the matrix P being lessimportant due to the formulation adopted.

Notice that the number of ways of interlacing the u eros in order with the v ecos inorder is the binomial coefficient

(u+vu

). This is because each interlacing corresponds

to a subset S of size u of {1,2, . . . , u + v}, the ith elementary operation applied toA being either an ero or an eco according as i ∈ S or i /∈ S. By the associative lawof matrix multiplication, all these

(u+vu

)sequences lead to the same matrix, namely

PAQ−1. The diagram in Section 1.1 illustrates the case u = 2, v = 3.As P = PI = Pu · · ·P2P1I , by Lemma 1.4 we see that P is obtained by applying

the corresponding eros, in order, to the s × s identity matrix I . In almost the same


way, Q = QI = Q−1v · · ·Q−1

2 Q−11 I , and so by Lemma 1.4 we see that Q is obtained

by applying, in order, the conjugates of the corresponding ecos (these conjugates areeros in fact) to the t × t identity matrix I . In the following theory we show that A canbe ‘reduced’ to its Smith normal form D by a sequence of eros and ecos. So both P

and Q can be built up step by step as the reduction of A proceeds, as illustrated in theexamples of Section 1.1.

Definition 1.5

The s × t matrices A and B over Z are called equivalent and we write A ≡ B if thereis an s × s matrix P and a t × t matrix Q, both invertible over Z, such that

PAQ−1 = B.

We leave the reader to verify that the symbol ≡ as defined in Definition 1.5 sat-isfies the three laws (reflexive, symmetric, transitive) of an equivalence relation (seeExercises 1.2, Question 1(d)). As elementary matrices and products of elementarymatrices are invertible over Z, the above discussion shows that A changes into anequivalent matrix B when eros and ecos are applied to A. In Theorem 1.18 the con-verse is shown to be true, that is, suppose A ≡ B . Then A can be changed into B byapplying elementary row and column operations to A.

How can we decide whether the s × t matrices A and B over Z are equivalentor not? The reader should remember that rank is the all-important number when Z

is replaced by a field F , that is, the s × t matrices A and B over F are equiva-lent if and only if rankA = rankB . What is more, an efficient method of determin-ing rankA consists of applying elementary operations to A in order to find the sim-plest matrix over F which is equivalent to A. This simplest s × t matrix over F isdiag(1,1, . . . ,1,0,0, . . . ,0) where the number of ones is rankA. We now introducethe corresponding concept for matrices over Z.

Definition 1.6

Let D be an s × t matrix over Z such that(i) the (i, j)-entries in D are zero for i �= j , that is, D is a diagonal matrix,

(ii) each (i, i)-entry di in D is non-negative,(iii) for each i with 1 ≤ i < min{s, t} there is an integer qi with di+1 = qidi , that is,

di |di+1 (di is a divisor of di+1).Then D is said to be in Smith normal form and we write D = diag(d1, d2, . . . , dmin{s,t}).

Notice that d1 is the gcd of the st entries in D. Also d1d2 is the gcd of the 2-minorsof D (the determinants of 2 × 2 submatrices of D). This theme is developed in Sec-tion 1.3 where it is proved Corollary 1.20 that each s × t matrix A over Z is equivalent


to a unique matrix D in Smith normal form. It’s therefore reasonable to refer to theSmith normal form D of A and adopt the notation D = S(A). The integers di for1 ≤ i ≤ min{s, t} are called the invariant factors of A.

For A = (4 68 10

)we see from Example 1.1 that S(A) = diag(2,4) and the invariant

factors of A are 2,4.We shall prove A ≡ B ⇔ S(A) = S(B), that is, the s × t matrices A and B over

Z are equivalent if and only if their Smith normal forms are identical. There’s a fairamount of work to be done to prove this basic fact – so let’s get going!

First we describe a reduction process which changes every s × t matrix A over Zinto a matrix D, as in Definition 1.6, using a finite number of eros and ecos. It turnsout that the reduction of every such matrix, no matter what its size, boils down to thereduction of 1 × 2 matrices and diagonal 2 × 2 matrices over Z.

Lemma 1.7

Let A = (l,m) be a 1 × 2 matrix over Z. There is a sequence of ecos over Z whichreduces A to D = (d,0) where d = gcd{l,m}.

Proof

Applying −c1, −c2 if necessary, we can assume l ≥ 0, m ≥ 0. Suppose first l = d andso l|m. If m = 0 then A = (d,0) = D, that is, A is already in Smith normal form andno ecos are needed. If m > 0, then l > 0 also and the eco c2 − (m/l)c1 reduces A

to D.Suppose now l �= d . If l = 0 then m = d > 0 and the eco c1 ↔ c2 reduces A to D.

If l > 0 then m > 0 also. In this case we let r1 = l, r2 = m and carry out the Euclideanalgorithm as follows to obtain d = gcd{r1, r2}: let ri+2 be the remainder on dividingri by ri+1 for i ≥ 1; these remainders form a decreasing sequence

r2 > r3 > · · · > ri+2 > ri+3 > · · ·

of non-negative integers. So eventually a zero remainder rk+1 is obtained (k ≥ 2) andthe algorithm terminates. As ri = qi+1ri+1 + ri+2 for some non-negative integer qi+1,we deduce gcd{ri , ri+1} = gcd{ri+1, ri+2} for 1 ≤ i < k, and hence d = gcd{r1, r2} =gcd{rk, rk+1} = gcd{rk,0} = rk . So d turns up as the last non-zero remainder rk inthe Euclidean algorithm. Applying c1 − qi+1c2 to the 1 × 2 matrix (ri , ri+1) produces(ri+2, ri+1); applying c2 − qi+1c1 to the 1 × 2 matrix (ri+1, ri) produces (ri+1, ri+2).So each of the k − 1 divisions in the algorithm gives rise to an eco, and applying theseecos to (r1, r2) eventually produces either (rk, rk+1) or (rk+1, rk) according as k is oddor even. As c1 ↔ c2 changes (rk+1, rk) into D = (rk, rk+1) = (d,0), we see A can bereduced to D using at most k ecos. �


In fact the above reduction can be carried out using only ecos of type (iii) (seeExercises 1.1, Question 5). Notice that elementary matrices corresponding to ecos oftype (iii) have determinant 1. The reader will be aware that the determinant functionis multiplicative, that is,

detQ1Q2 = detQ1 detQ2

for all t × t matrices Q1, Q2 over Z and we review this important property in The-orem 1.18. It follows that every 1 × 2 matrix A over Z can be expressed A = DQ

where the 1 × 2 matrix D is in Smith normal form and the 2 × 2 matrix Q over Zsatisfies detQ = 1. We now give a direct proof of this fact.

Lemma 1.8

Every 1 × 2 matrix A over Z can be expressed A = DQ where the 1 × 2 matrix D isin Smith normal form and Q is a 2 × 2 matrix over Z with detQ = 1.

Proof

Let A = (l,m) and take D = (d,0) where d = gcd{l,m}. Then D is in Smith normalform as d ≥ 0. If d = 0 then l = m = 0 and we take Q = I . So suppose d > 0. As d isa common divisor of l and m, there are integers l′ and m′ such that l = dl′, m = dm′.Also there are integers a, b with al + bm = d . Dividing this equation through by d

gives al′ + bm′ = 1. Let Q = (l′ m′

−b a

). Then detQ = 1 and DQ = (d,0)

(l′ m′

−b a

) =(dl′, dm′) = (l,m) = A. �

The reader will know that integers a and b as above can be calculated by revers-ing the steps in the Euclidean algorithm. We review their role from a more generalperspective in Corollary 1.16.

The transposed version of Lemma 1.7 says: there is a sequence of eros whichreduces

A =(

l

m

)to D =

(d

0

)

where d = gcd{l,m}. We now use Lemma 1.7 and its transpose to establish the keystep in the reduction process of a general matrix over Z: all off-diagonal entries inthe first row and first column can be cleared (replaced by zeros), using elementaryoperations. In other words, every s × t matrix A over Z can be changed into a matrixof the type

B =(

b11 0

0 B ′

)


using eros and ecos where B ′ is an (s − 1) × (t − 1) matrix over Z. The reductionprocess can then be completed by induction since B ′ is smaller than A.

Lemma 1.9

Using elementary operations every s × t matrix A = (aij ) over Z can be changed intoan s × t matrix B = (bij ) over Z with b11 ≥ 0, b1j = bi1 = 0 for 2 ≤ i ≤ s,2 ≤ j ≤ t .

Proof

Suppose A �= B as otherwise there is nothing to do. We describe an algorithmbased on Lemma 1.7 which changes A into B using elementary operations. Lethj = gcd{a11, a12, . . . , a1j }, that is, hj is the gcd of the first j entries in row 1of A. Then h1 = ±a11 and hj+1 = gcd{hj , a1j+1} for 1 ≤ j < t . (There is a dis-cussion of gcds of finite sets of integers at the beginning of Section 1.3.) By chang-ing the sign of col 1 of A if necessary we obtain an s × t matrix A1 with first row(h1, a12, . . . , a1t ). Next apply Lemma 1.7 to the 1×2 submatrix (h1, a12) of A1: thereis a sequence of ecos changing (h1, a12) into (h2,0); applying these ecos to A1 pro-duces an s × t matrix A2 with first row (h2,0, a13, . . . , a1t ). Next we apply Lemma 1.7to the 1 × 2 submatrix (h2, a13) of A2: there is a sequence of ecos changing (h2, a13)

into (h3,0); applying these ecos to A2 (they affect columns 1 and 3 only) producesan s × t matrix A3 with first row (h3,0,0, a14, . . . , a1t ). Suppose inductively that Aj

with first row (hj ,0, . . . ,0, a1j+1, . . . , a1t ) has been obtained from A by ecos where1 ≤ j < t . Applying Lemma 1.7 to the 1 × 2 submatrix (hj , a1j+1) of Aj producesa sequence of ecos changing (hj , a1j+1) into (hj+1,0). We obtain Aj+1 with firstrow (hj+1,0, . . . ,0, a1j+2, . . . , a1t ) on applying these ecos to cols 1 and j + 1 of Aj .By induction on j , after t − 1 applications of Lemma 1.7, an s × t matrix At withfirst row (ht ,0, . . . ,0) is obtained which has come from A by applying ecos. We havesuccessfully completed the first step in the reduction of A to B . Let’s write B1 = At

and b1 = ht . So by applying a finite number of ecos to A we have obtained B1 withfirst row (b1,0, . . . ,0) where b1 is the gcd of the entries in row 1 of A. It’s importantto notice that none of the ecos used affect column1 in the case h1 = ht . Should all(i,1)-entries in B1 be zero for 2 ≤ i ≤ s, then the algorithm terminates with B1 = B .Otherwise let b2 denote the gcd of the entries in col 1 of B1. Since b1 is the (1,1)-entryin B1 we see

b2 is a divisor of b1

and this fact is the key to the algorithmic proof. Notice b2 = 0 implies b1 = 0 in whichcase all entries in row 1 and col 1 of A are zero, that is, there’s ‘nothing to do’ as wesaid at the beginning. So b2 > 0 (b1 = 0 is possible, that is, e1A = 0 but this doesn’tseem to shorten the proof).


Next the spotlight is turned on the column 1 of B1. Using the technique of thepreceding paragraph but transposed, there is a sequence of eros changing B1 into ans× t matrix, B2 say, having (1,1)-entry b2 and zero (i,1)-entries for 2 ≤ i ≤ s. Shouldb1 = b2, that is, b1 is the gcd of the entries in column 1 of B1, then none of these eroschange row 1 and the algorithm ends with B2 = B . We therefore assume b1 �= b2.

So far so good. However there is a snag. Clearing off-diagonal entries in column 1may reintroduce non-zero off-diagonal entries in row 1 (if not then the algorithm endsas above with B2 = B). When this happens, that is, when B2 has non-zero off-diagonalentries in its first row, the whole process is started again with B2 in place of the originalmatrix A. (The reader should now work through the numerical example following thisproof.)

The reduction of A is completed by iterating the above procedure. Clearingoff-diagonal entries in row 1 and column 1 alternately, we obtain matrices B2,B3,

. . . ,Bk, . . . their (1,1)-entries forming a decreasing sequence

b2, b3, . . . , bk, . . .

of positive integers such that bk|bk−1 for k = 3,4, . . . . For odd k each Bk is obtainedby applying ecos to Bk−1 and bk is the gcd of the entries in row 1 of Bk−1 and is theonly non-zero entry in row 1 of Bk . For even k each Bk is obtained by applying erosto Bk−1 and bk is the gcd of the entries in column 1 of Bk−1 and is the only non-zero entry in column 1 of Bk . Multiplying together the k − 1 inequalities b2/b3 ≥ 2,b3/b4 ≥ 2, . . . , bk−1/bk ≥ 2, bk ≥ 1 gives b2 ≥ 2k−2 and so k ≤ log2 b2 +2. Thereforek ≤ �log2 b2� + 2 where �log2 b2� denotes the integer part of the real number log2 b2.

The process continues provided Bk �= B , that is, the matrix Bk is not of the typewe are looking for. On the other hand we have just seen that such k are boundedabove by the integer �log2 b2� + 2. So the process must terminate with Bl = B wherel ≤ �log2 b2� + 3. �

To illustrate the above proof, consider

A =(

130 260110 221

).

Using the method of Lemma 1.9 gives

A ≡(

130 0110 1

)= B1 ≡

(20 −1110 1

)≡

(20 −110 6

)≡

(0 −1310 6

)

≡(

10 60 −13

)= B2 ≡

(4 6

13 −13

)≡

(4 213 −26

)≡

(0 2

65 −26

)


≡(

2 0−26 65

)= B3 ≡

(2 00 65

)= B4.

We next show how the Smith normal form of diagonal matrices, such as B4 above,can be obtained. The Chinese remainder theorem is discussed in Theorem 2.11. Whatfollows is a matrix version of this theorem.

Lemma 1.10

Let A be a 2 × 2 diagonal matrix over Z with non-negative entries. Then A canbe changed into Smith normal form D using at most five elementary operations oftype (iii).

Proof

Write A = diag(l,m). In the case l|m there is nothing to do as A = D. Otherwise letd = gcd{l,m}. Then d > 0 and there are integers a, b with al + bm = d (see Corol-lary 1.16). The following sequence of elementary operations of type (iii) changes A

into D:

A =(

l 00 m

)≡

c2+ac1

(l al

0 m

)≡

r1+br2

(l d

0 m

)≡

c1−(l/d−1)c2

(d d

−lm/d + m m

)

≡c2−c1

(d 0

−lm/d + m lm/d

)≡

r2+(m/d)(l/d−1)r1

(d 00 lm/d

)= D. �

The integer lm/d is the least common multiple (lcm) of the positive integers l

and m, that is, lm/d is the smallest positive integer which is divisible by both l and m.We write lm/d = lcm{l,m}, and lcm{0,m} = lcm{0,0} = 0.

For example of Lemma 1.10 take l = 21, m = 35. Then 2 × 21 + (−1) × 35 =7 = d and so a = 2, b = −1. So gcd{21,35} = 7 and lcm{21,35} = (21 × 35)/7 =105. Also

A =(

21 00 35

)≡

c2+2c1

(21 420 35

)≡

r1−r2

(21 70 35

)≡

c1−2c2

(7 7

−70 35

)

≡c2−c1

(7 0

−70 105

)≡

r2+10r1

(7 00 105

)= D.

We are now ready for the main theorem of Section 1.2.


Theorem 1.11 (The existence of the Smith normal form over Z)

Every s × t matrix A over Z can be reduced to an s × t matrix D in Smith normalform using elementary operations over Z.

Proof

We use induction on the positive integer min{s, t}. If min{s, t} = 1, then the matrixB in Lemma 1.9 is already in Smith normal form. Now suppose min{s, t} > 1. ByLemma 1.9 there are elementary operations changing A into

B =(

b1 0

0 B ′

)

where b1 ≥ 0 and B ′ is an (s − 1) × (t − 1) matrix. By inductive hypothesis B ′ canbe reduced to a matrix D′ = diag(d ′

2, d′3, . . .) in Smith normal form using elementary

operations as min{s−1, t −1} = min{s, t}−1. Hence A can be changed using elemen-tary operations into a diagonal matrix D1 = diag(b1, d

′2, d

′3, . . .) having non-negative

entries such that d ′i |d ′

i+1 for 1 < i < min{s, t}. By Lemma 1.10 there is a sequence ofat most five elementary operations which changes the 2 × 2 matrix diag(b1, d

′2) into

diag(d1, b2) where d1 = gcd{b1, d′2} and b2 = lcm{b1, d

′2}. Applying this sequence

of operations to D1 produces D2 = diag(d1, b2, d′3, d

′4, . . .). By inductive hypothesis

there is a sequence of elementary operations changing the (s − 1) × (t − 1) matrixD′

2 = diag(b2, d′3, d

′4, . . .) into D′ = diag(d2, d3, d4, . . .) in Smith normal form. Hence

D2 can be changed into D = (d1, d2, d3, . . .) using elementary operations. Is D inSmith normal form? As d1 = gcd{b1, d

′2} and we see d1|d ′

i for 2 ≤ i ≤ min{s, t}. Asb2 = lcm{b1, d

′2} we see d1|b1 and b1|b2 which give d1|b2. So d1 is a divisor of all

the entries in D′2 and hence d1 is a divisor of all the entries in D′. In particular d1|d2

and so D is in Smith normal form. As A can be changed into D1, D1 into D2 andD2 into D using elementary operations, we see that A can be changed into D usingelementary operations. The induction is now complete. �

We have achieved our aim! So before going on it is perhaps worthwhile pausingto look back at the reduction process. Starting with any s × t matrix A over Z thisprocess first uses Lemma 1.9 to systematically clear all off-diagonal entries, producinga diagonal matrix D′′ say having non-negative entries. Secondly Lemma 1.10 is usedstarting with the last pair of diagonal entries in D′′ (on unravelling the induction) andcontinuing ‘up’ the diagonal to produce ultimately the Smith normal form D = S(A)

of the original matrix A. Here is a numerical example.


Example 1.12

Let

A =⎛

⎝4 16 48 42 2812 54 25

⎞

⎠ .

First use Lemma 1.9 to clear all off-diagonal entries in row 1 and col 1. Then applyLemma 1.9 to the 2 × 2 submatrix which remains on deleting row 1 and col 1 ofresulting matrix to obtain D′′:

⎛

⎝4 16 48 42 28

12 54 25

⎞

⎠ ≡c2−4c1c3−c1

⎛

⎝4 0 08 10 20

12 6 13

⎞

⎠ ≡r2−2r1r3−3r1

⎛

⎝4 0 00 10 200 6 13

⎞

⎠

≡c3−2c2

⎛

⎝4 0 00 10 00 6 1

⎞

⎠ ≡r2−r3r3−r2

⎛

⎝4 0 00 4 −10 2 2

⎞

⎠

≡r2−2r3r2↔r3

⎛

⎝4 0 00 2 20 0 −5

⎞

⎠ ≡c3−c2−r3

⎛

⎝4 0 00 2 00 0 5

⎞

⎠ = D′′.

Next apply Lemma 1.10 to the submatrix(

2 00 5

)of D′′ obtaining D1 = diag(4,1,10),

D2 = diag(1,4,10) and S(A) = D = diag(1,2,20) as in the proof of Theorem 1.11.The reader can check that this reduction of A to D uses 26 elementary operations;in fact the reduction can be done in less than half this number. Inevitably the generaltechnique of Theorem 1.11 rarely yields the shortest reduction in a particular case(see Exercises 1.2, Question 6(b) for a particularly awkward matrix). The merit of thereduction method is not that the least number of eros and ecos is used, but rather thatevery matrix A over Z, no matter what its entries are, can be reduced in this way to D

as in Definition 1.6.From Theorem 1.11 we see that d1, the gcd of the entries in A, can always be

created by applying elementary operations to A; in particular cases this may be easyto see – indeed d1 may be present as an entry in A. In the latter case move d1 into the(1,1)-position, clear all other entries in row 1 and col 1 obtaining

(d1 0

0 A′

)

.

As d1 is a divisor of all the entries in A′, we see

S(A) =(

d1 0

0 S(A′)

)


showing that the reduction of A is completed by reducing the smaller matrix A′.We close this section by making two deductions from Theorem 1.11 which will be

useful later. The first deduction should come as no surprise to the reader.

Corollary 1.13

Let A be an s × t matrix over Z. There are invertible matrices P and Q over Z suchthat PAQ−1 = D where D is in Smith normal form.

Proof

By Theorem 1.11 there is a sequence of elementary operations reducing A to D inSmith normal form. Let P = Pu · · ·P2P1 where Pi is the elementary matrix corre-sponding to the ith ero used in the reduction. Then P is invertible over Z, being aproduct of invertible matrices over Z. Similarly let Q = Q−1

v · · ·Q−12 Q−1

1 where Qj

is the elementary matrix corresponding to the j th eco used in the reduction. As eachQj is invertible over Z, so also is Q and Q−1 = Q1Q2 · · ·Qv . By Lemma 1.4 and thediscussion following it, we deduce that PAQ−1 = Pu · · ·P2P1AQ1Q2 · · ·Qv = D. �

Can every invertible matrix P over Z be built up as a product of elementary ma-trices? We show next that the answer is: Yes! Further P can be reduced to the identitymatrix I , which is its Smith normal form, using elementary operations over Z of justone kind, that is, using eros only or ecos only.

Corollary 1.14

Let P be an invertible s × s matrix over Z. Then P is expressible as a product ofelementary matrices over Z. Also P can be reduced to the identity matrix I bothusing eros only and using ecos only.

Proof

By Corollary 1.13 there are invertible matrices P ′ and Q′ over Z withP ′P(Q′)−1 = D, where D = diag(d1, d2, . . . , ds), di ≥ 0. As D is a product of in-vertible matrices over Z we see that D itself is invertible over Z. So each di is aninvertible integer, that is, di = ±1. Hence di = 1 for 1 ≤ i ≤ s and D = I as each di

is non-negative. So P ′P(Q′)−1 = I and hence

P = (P ′)−1Q′ = P −11 P −1

2 · · ·P −1u Q−1

v · · ·Q−12 Q−1

1


as in Corollary 1.13, which expresses the invertible matrix P as a product of elemen-tary matrices over Z.

The equation P = (P ′)−1Q′ gives P −1 = (Q′)−1P ′ and so (Q′)−1P ′P = I , thatis, Q1Q2 · · ·QvPu · · ·P2P1P = I , which shows that P can be reduced to I using erosonly: after using the eros in the reduction unchanged to obtain P ′P , each eco used inthe reduction is replaced by the ero paired to it. The reduction to I is completed byapplying these eros, in the opposite order, to P ′P .

In the same way, the equation P(Q′)−1P ′ = I , that is,

PQ1Q2 · · ·QvPu · · ·P2P1 = I

shows that P can be reduced to I using ecos only. �

As an illustration of Corollary 1.14 the matrix P = (13 936 25

)is invertible over Z

since detP = 1 (the role of determinants is discussed in the next chapter). Using themethod of Lemma 1.9, the sequence c1 − c2, c2 − 2c1, c1 − 4c2, c1 ↔ c2, r2 − 3r1,−r2 reduces P to I , and so P2P1PQ1Q2Q3Q4 = I in terms of the correspondingelementary matrices. Hence P = P −1

1 P −12 Q−1

4 Q−13 Q−1

2 Q−11 , that is,

(13 936 25

)=

(1 03 1

)(1 00 −1

)(0 11 0

)(1 04 1

)(1 20 1

)(1 01 1

)

showing explicitly that P is a product of elementary matrices. As P −1 =Q1Q2Q3Q4P2P1, the equation P −1P = I shows by Lemma 1.4 that the sequencer2 − 3r1, −r2, r1 ↔ r2, r2 − 4r1, r1 − 2r2, r2 − r1, of eros reduces P to I . Similarlythe equation PP −1 = I produces the sequence c1 − c2, c2 − 2c1, c1 − 4c2, c1 ↔ c2,−c2, c1 − 3c2 of ecos which also reduces P to I .

Let P be an invertible s × s matrix over Z and let Q be an invertible t × t matrixover Z. Applying Corollary 1.14 to P and Q−1 we see that these matrices are ex-pressible as products of elementary matrices. By Lemma 1.4 the matrix B = PAQ−1

can be obtained from the arbitrary s × t matrix A over Z by a sequence of elementaryoperations, in other words using Definition 1.5 we see

A ≡ B if and only if A is obtainable from B by elementary operations.

In Section 1.3 we continue our study of PAQ−1 concentrating on those propertieswhich remain unchanged when elementary operations are carried out.

One final word. The reader will be aware of the fundamental theorem of arithmetic:every positive integer can be uniquely expressed in the form

pn11 p

n22 · · ·pnk

k


where each pi is prime (pi > 1 and the only integer divisors of pi are ±1, ±pi )for 1 ≤ i ≤ k with p1 < p2 < · · · < pk , each ni being a positive integer and k ≥ 0.Because of the difficulty of factorising integers in the above form, it is important thatthe reduction of every matrix A over Z to its Smith normal form D can be carried outin a systematic way without using this theorem.

EXERCISES 1.2

1. (a) Determine all 2 × 2 matrices A over Z such that applying r1 ↔ r2 toA has the same effect as applying c1 ↔ c2 to A.Hint: Use Lemma 1.4 and T = (

0 11 0

).

(b) Use the rows ei of the identity matrix I to describe the elementarymatrix P1 corresponding to the ero rj + lrk (j �= k) and hence writedown a proof of Lemma 1.4 in this case.

(c) Use the columns eTi of I to describe the elementary matrix Q1 corre-

sponding to the eco cj ↔ ck (j �= k), and hence write down a proofof Lemma 1.4 in this case.

(d) Show that ≡ as defined in Definition 1.5 is an equivalence relation onthe set of all s × t matrices over Z, that is,

(i) A ≡ A for all such matrices A,(ii) A ≡ B ⇒ B ≡ A,

(iii) A ≡ B and B ≡ C ⇒ A ≡ C where A, B , C are s × t matricesover Z.

2. List the eight 3 × 3 matrices D in Smith normal form such that detD is adivisor of 12. How many s × s matrices D in Smith normal form are therewith

(i) detD = 105 (ii) detD = 100?

3. The s × t matrix A = (aij ) over Z is such that a11 > 0, ai1 = 0 for2 ≤ i ≤ s, a11 = gcd{a11, a12, . . . , a1t }. List the t − 1 ecos needed to re-duce A to B as in Lemma 1.9.

4. (a) Reduce the 4 × 4 matrix A = diag(10,1,5,25) to Smith normal formD using 7 elementary operations.

(b) Reduce the 4 × 4 matrix A = diag(12,5,50,200) to Smith normalform D using 15 elementary operations.

(c) Let D′ = diag(d1, d2, . . . , ds) be an s ×s matrix in Smith normal formand let d be a positive integer. Show that the (s + 1) × (s + 1) matrixD1 = diag(d, d1, d2, . . . , ds) can be reduced to its Smith normal formS(D1) using at most 5s elementary operations.


(d) Show that every 2 × 2 diagonal matrix over Z can be reduced toSmith normal form using at most 5 elementary operations. Reducediag(18,−24) to Smith normal form using 5 elementary operations.For s ≥ 2 show that every s × s diagonal matrix over Z can be re-duced to Smith normal form using at most (5/2)s(s − 1) elementaryoperations.Hint: Use Lemma 1.10 throughout.

5. (a) Show that an invertible s × s matrix P over Z can be reduced to I =S(P ) by s(s − 1)/2 applications of Lemma 1.7 together with at most1 + s(s − 1)/2 eros.Hint: The entries in each row of P have gcd 1.

(b) Use the method of Corollary 1.14 to reduce the following invertiblematrices over Z to I :

P =(

3 117 6

), Q =

⎛

⎝1 2 32 7 73 23 15

⎞

⎠

(c) Express P above as a product of elementary matrices over Z.(d) Specify a sequence of eros reducing Q to I .(e) Specify a sequence of ecos reducing Q to I .

6. (a) Reduce A = (390 780330 667

)to Smith normal form D using the technique of

Lemma 1.9.(b) The sequence a0, a1, a2, . . . of positive integers is defined by a0 = 1,

a1 = 2, an+1 = an(4an−1 + 1) for n ≥ 1. Calculate an for 2 ≤ n ≤ 4.Show that gcd{an+1,2an} = an for n ≥ 1.Show that an+1 = 2(4an−1 + 1)(4an−2 + 1) · · · (4a0 + 1) for n ≥ 1and deduce an/an−r+1 ≡ 1 (mod 4an−r ) for 1 ≤ r ≤ n.Using the method of Lemma 1.9, reduce

(a4 2a30 −1

)to a diagonal matrix.

Let An = (an 2an−10 −1

)where n ≥ 3. Using the notation of Lemma 1.9

show that Br or BTr is equal to

( an−r 2an−r−1(an/an−r+1)

0 −(an/an−r )

)according as

r is even or odd for 1 ≤ r < n.Hint: Use induction on r , applying the eco c2 − 2an−rqn−r+1c1 toBr−1 (r odd, r > 1) and the ero r2 − 2an−rqn−r+1r1 to Br−1 (r even,r > 2) where an/an−r+2 = 1 + qn−r+1an−r+1.Conclude Bn = diag(2,−an/2). Hence show that the algorithm ofTheorem 1.11 requires 3n + 1 elementary operations to reduce An

to S(An) = diag(1, an) where n ≥ 3.Specify a sequence of four elementary operations which reduces An

to its Smith normal form S(An).


7. Let G denote the group of all pairs (P,Q) where P and Q are invertibles × s and t × t matrices respectively over Z, the group operation beingcomponentwise multiplication.(a) Let D be an s × t matrix over Z. Verify that the ‘centraliser’ Z(D) =

{(P,Q) ∈ G : PD = DQ} is a subgroup of G.Hint: Show that Z(D) is closed under multiplication, closed underinversion, and contains the identity (I, I ) of G.

(b) Suppose s = t and D = diag(d1, d2, . . . , dt ) is in Smith normalform with dt > 0. Write (P,Q) = ((pij ), (qij )) ∈ G. Show that(P,Q) ∈ Z(D) ⇔ pji = (dj /di)qji , qij = (dj /di)pij for all i ≤ j .

(c) You and your classmate reduce the t × t matrix A to Smith normalform D in different ways. You get P ′A = DQ′ and your classmategets P ′′A = DQ′′. Show that (P ′,Q′)(P ′′,Q′′)−1 ∈ Z(D).

(d) Use the two reductions of A in Example 1.1 to derive an element inZ(diag(2,4)) of the type (P ′,Q′)(P ′′,Q′′)−1.

(e) Modify part (b) to cover the case dt = 0.

1.3 Uniqueness of the Smith Normal Form

Let A be an s × t matrix over Z. Our task in this section is to show that A can bereduced to only one matrix D in Smith normal form. A closely related question is:what do the matrices A and PAQ−1 have in common, where P and Q are invertibleover Z? In other words, which properties of A are preserved when A undergoes ele-mentary operations? The reader will know that for matrices over a field F the rank ofa matrix is all that matters: two s × t matrices over F are equivalent if and only if theirranks are equal. The matrix A over Z can be regarded as a matrix over the rationalfield Q; the rank of A is then the number of non-zero invariant factors dj of A, butthis single number is not enough to determine equivalence over Z: equivalent matri-ces over Z have the same rank, but matrices over Z of equal rank can be inequivalentover Z. For instance

I =(

1 00 1

)and J =

(1 00 2

)

both have rank2 but are not equivalent over Z, as J is not invertible over Z whereas allmatrices equivalent to I are invertible over Z. However, as we’ll see shortly, the prop-erty which ‘does the trick’, that is, determines equivalence over Z, is a combination oftwo concepts already known to the reader, namely gcds and determinants.

We begin the details by introducing gcds of any finite collection of integers.Let X = {l1, l2, . . . , ln} be a non-empty set of integers. An integer d is called a gcd

of X if

1.3 Uniqueness of the Smith Normal Form 33

(i) d|li for 1 ≤ i ≤ n

(ii) whenever an integer d ′ satisfies d ′|li for 1 ≤ i ≤ n, then d ′|d .Condition (i) says that d is a common divisor of the integers in X. Condition (ii)says that every common divisor d ′ of the integers in X is a divisor of d , and so d is‘greatest’ in this sense. Could X have two gcds, d1 and d2 say? If so, then d1 beinga common divisor and d2 being a greatest common divisor gives d1|d2. Reversing theroles of d1 and d2 gives d2|d1. Therefore d1 = ±d2, showing that X has at most onenon-negative gcd.

The reader will know, in the case X = {l1, l2}, that the non-negative integer d =gcd{l1, l2} satisfies conditions (i) and (ii) above. Further, apart from the special casel1 = l2 = 0, d = 0, the integer d can be singled out (characterised is the technicalterm) as the smallest positive integer expressible in the form a1l1 + a2l2 where a1, a2

are integers. We now use this property of d to show that every finite set X of n integershas a gcd.

Consider the set K of all integer linear combinations k = b1l1 + b2l2 + · · · + bnln

of the integers in X = {l1, l2, . . . , ln}; here the li are arbitrary integers. It is straight-forward to verify that K is an ideal of Z, that is, K is a subset of Z satisfying:

(i) k1 + k2 ∈ K for all k1, k2 ∈ K (K is closed under addition),(ii) 0 ∈ K (K contains the zero integer),

(iii) −k ∈ K for all k ∈ K (K is closed under negation),(iv) bk ∈ K for all b ∈ Z, k ∈ K (K is closed under integer multiplication).Conditions (i), (ii), (iii) together tell us that K is an additive subgroup of Z, a conceptwhich we’ll study in Chapter 2. Condition (iv) says that all integer multiples of integersin K are again in K ; this condition becomes important when Z is replaced by a moregeneral ring, as we will see later. We write

K = 〈l1, l2, . . . , ln〉and describe K as the ideal generated by l1, l2, . . . , ln.

The set 〈2〉 of even integers is an ideal of Z. More generally

〈d〉 = {md : m ∈ Z} for d ∈ Z

is an ideal of Z, that is, the subset 〈d〉, consisting of all integer multiples of the giveninteger d , satisfies (i) to (iv) above. For instance

〈6〉 = {. . . ,−18,−12,−6,0,6,12,18, . . .} = 〈−6〉.Notice that the smallest positive integer in 〈6〉 is – you’ve guessed it – none otherthan 6; this ‘obvious’ observation will help our understanding of the next theorem.Notice 〈1〉 = Z as every integer is an integer multiple of 1, and 〈0〉 = {0} as 0 is theonly integer which is a multiple of 0. An ideal of the type 〈d〉 is called a principalideal of Z. The integer d is called a generator of the ideal 〈d〉. As 〈d〉 = 〈−d〉 we see


that −d is also a generator of 〈d〉. Does Z have any other ideals? We show next thatthe answer is: No!

Theorem 1.15

Each ideal Kof Z is principal and has a unique non-negative generator d .

Proof

Notice first that the zero ideal K = {0} is principal with generator 0, since {0} = 〈0〉.Now suppose K �= {0} and so k0 ∈ K where k0 is some non-zero integer. Then−k0 ∈ K by condition (iii). As one of ±k0 is positive, we see K contains at leastone positive integer. How can we find a generator of K?

Let d denote the smallest positive integer in K

and hope for the best! Then md ∈ K for all m ∈ Z by condition (iv), that is, 〈d〉 isa subset of K . The proof is completed by showing that every integer k ∈ K belongsto 〈d〉. Divide k by d obtaining q , r ∈ Z with k = qd + r , 0 ≤ r < d . As k, −qd ∈ K

we deduce r = k − qd ∈ K by condition (i). But r ∈ K , r < d shows that r cannotbe positive, d being the smallest positive integer in K . As 0 ≤ r , we conclude r = 0.Hence k = qd , that is k ∈ 〈d〉 and so K = 〈d〉. So d is a positive generator of K .Therefore every ideal K of Z is principal with non-negative generator d .

Suppose K = 〈d〉 = 〈d ′〉 where d and d ′ are non-negative. Then K = {0} ⇒d = d ′ = 0. Also K �= {0} ⇒ d = d ′ as the positive integers d and d ′ satisfy d|d ′and d ′|d . �

An integral domain such that all its ideals are principal is called a Principal IdealDomain (PID). So Theorem 1.15 tells us that Z is a PID. We now use Theorem 1.15to show the existence of gcds.

Corollary 1.16

Let X = {l1, l2, . . . , ln} be a set of n integers, n ≥ 1. Then X has a unique non-negativegcdd . Further there are n integers a1, a2, . . . , an such that d = a1l1 +a2l2 +· · ·+anln

and 〈d〉 = 〈l1, l2, . . . , ln〉.

Proof

We already know that X cannot have two non-negative gcds. To show that X doeshave a non-negative gcd, consider the ideal 〈l1, l2, . . . , ln〉 of Z consisting of all linear


combinations b1l1 + b2l2 +· · ·+ bnln of the integers li in X where the bi are arbitraryintegers. It’s convenient to write 〈l1, l2, . . . , ln〉 = 〈X〉. From Theorem 1.15 we knowthat 〈X〉 = 〈d〉 for some non-negative integer d , which is a concise description of 〈X〉.As d ∈ 〈X〉 we see that d is some linear combination of the generators li , that is, takingbi = ai we obtain d = a1l1 +a2l2 +· · ·+anln. So d is expressible as stated. The proofis completed by showing that d is indeed a gcd of X, that is, conditions (i) and (ii) forgcds are satisfied by d .

Take bi = 1, bj = 0 for i �= j to obtain li ∈ 〈X〉 for 1 ≤ i ≤ n. So each integer li

in X belongs to 〈X〉 = 〈d〉, and this means d|li for 1 ≤ i ≤ n. Therefore d satisfiescondition (i). To verify condition (ii) consider d ′ ∈ Z satisfying d ′|li for 1 ≤ i ≤ n.Then li = qid

′ for some qi ∈ Z. Substituting for li in the above expression for d

produces d = (a1q1 + a2q2 + · · · + anqn)d′, that is d ′|d as a1q1 + a2q2 + · · · + anqn

is an integer. So d satisfies condition (ii). The conclusion is: d is a non-negative gcdof X. �

We write gcdX for the unique non-negative gcd of the set X.The integer d = gcd{l1, l2, . . . , ln} can be found by applying the Euclidean al-

gorithm n − 1 times. For example let d = gcdX where X = {231,385,495}. Thend = gcd{gcd{231,385},495} = gcd{77,495} = 11. The ideal K = {b1 × 231 + b2 ×385 + b3 × 495 : b1, b2, b3 ∈ Z} generated by 231, 385 and 495 is the principal idealgenerated by 11, that is, K = 〈11〉. Also 11 = 13×77−2×495 = 13(2×231−385)−2 × 495 = 26 × 231 − 13 × 385 − 2 × 495 shows explicitly that 11 belongs to K . Thejob of verifying the details is left to the reader: apply the Euclidean algorithm to thepair 231, 385 to get 77 as their gcd, and then apply the Euclidean algorithm to the pair77, 495 obtaining 11 as their gcd. Finally express 11 as an integer linear combinationof the integers in X by tracing backwards the steps in these algorithms.

Let X′ = {l′1, l′2, . . . , l′n′ } be a set of n′ integers. In the application Corollary 1.17of this theory which we have in mind, each l′j is an integer linear combination of theintegers li , in other words, X′ is a subset of 〈X〉, that is, X′ ⊆ 〈X〉.

For example let X = {231,385,495} with gcd d = 11 as before, and letX′ = {231 + 385,231 + 495} ⊆ 〈X〉. So X′ = {616,726} which has gcd d ′ = 22.Clearly d is a divisor of d ′. We now deal with the general case.

Corollary 1.17

Let d and d ′ be the non-negative gcds of X and X′ respectively. Using the abovenotation

X′ ⊆ 〈X〉 ⇔ 〈X′〉 ⊆ 〈X〉 ⇔ d|d ′.


Proof

Suppose X′ ⊆ 〈X〉. A typical element of 〈X′〉 is k′ = b′1l

′1 + b′

2l′2 + · · · + b′

n′ ln′ wherethe b′

j are integers. As each l′j belongs to the ideal 〈X〉, by property (iv) of ideals wesee b′

j l′j ∈ 〈X〉. By property (i) of ideals and induction on n′ we deduce k′ ∈ 〈X〉 and

so 〈X′〉 ⊆ 〈X〉.Suppose 〈X′〉 ⊆ 〈X〉, that is, 〈d ′〉 ⊆ 〈d〉. As d ′ ∈ 〈d ′〉 we see d ′ ∈ 〈d〉, which means

d ′ = qd for some integer q , that is, d|d ′.Suppose d|d ′. Then d ′ = qd with q ∈ Z. For each l′j in X′ there is q ′

j in Z withl′j = q ′

j d′. Hence l′j = q ′

j qd showing l′j ∈ 〈d〉 for 1 ≤ j ≤ n′ since q ′j q ∈ Z. Therefore

X′ ⊆ 〈X〉 which gets us back to the original assumption. �

All we need to know about gcds is contained in Corollary 1.17 which can beexpressed

〈X′〉 ⊆ 〈X〉 ⇔ gcdX|gcdX′.

We now turn our attention to determinants. Let A be an s × t matrix over Z and let l

be an integer in the range 1 ≤ l ≤ min{s, t}. Suppose l rows and l columns of A areselected. The determinant of the l × l matrix which remains on deleting the unselecteds − l rows and t − l columns of A is called an l-minor of A.

So the l-minors of A are integers. The number of l-minors of A is the product(sl

) × (tl

)of binomial coefficients and we’ll be interested in the non-negative gcd of

the l-minors of A.For example, the gcd of the 1-minors of A = (

6 4 08 8 4

)is 2, as the 1-minors are

simply the entries and gcd{6,4,0,8,8,4} = 2. The 2-minors of A are

∣∣∣∣6 48 8

∣∣∣∣ = 16,

∣∣∣∣6 08 4

∣∣∣∣ = 24,

∣∣∣∣4 08 4

∣∣∣∣ = 16

and gcd{16,24,16} = 8. The importance of these gcds comes from the fact, provedin Theorem 1.21, that they remain unchanged when elementary operations over Z areapplied to A. From Theorem 1.11 we know that A can be reduced to D = (

d1 0 00 d2 0

)in

Smith normal form using elementary operations over Z. The set {d1,0,0,0, d2,0} of1-minors of D has gcd d1, and the set

{∣∣∣∣d1 00 d2

∣∣∣∣ ,∣∣∣∣d1 00 0

∣∣∣∣ ,∣∣∣∣

0 0d2 0

∣∣∣∣

}= {d1d2,0,0}

of 2-minors of D has gcd d1d2. Assuming Theorem 1.21 for the moment, we deduced1 = 2, d1d2 = 8 on comparing these gcds, giving D = diag(2,4). The conclusion isthat A can be reduced to one and only one matrix in Smith normal form.


The reader will be familiar with the (t −1)-minors of a t × t matrix A = (aij ) overa field F because they are, apart from sign, the entries in

the adjugate matrix adjA.

In fact, adjA has (j, i)-entry Aij where (−1)i+jAij is the (t −1)-minor of A obtainedby deleting row i and col j . Then adjA satisfies the matrix equation

A(adjA) = (detA)I = (adjA)A

which includes, on comparing diagonal entries, the rules for expanding detA alongany row or column, and explains why Aij is known as the cofactor of aij in detA. Infact the above equation holds for square matrices A over any commutative ring (withidentity element) R and shows, together with the multiplicative property of determi-nants (reviewed in Theorem 1.18 below),

A is invertible over R ⇔ detA is an invertible element of R.

Should detA be invertible over R, the above matrix equation can be divided through-out by detA to produce the familiar equations

AA−1 = I = A−1A where A−1 = (1/detA) adjA.

Since elementary operations can be carried out by matrix multiplication Lemma 1.4,we now address the question: how are the l-minors of a matrix product BC relatedto the l-minors of B and the l-minors of C? The reader will know that a typical l × l

submatrix of BC is formed by multiplying certain l rows of B by certain l columnsof C. So we lose nothing by assuming that B is an l × r matrix and C is an r × l

matrix, where r is a positive integer, in which case detBC is the unique l-minor ofBC. If r < l then detBC = 0 (see Exercises 1.3, Question 4(d)). We now assumer ≥ l and let Y denote a subset of {1,2, . . . , r} having l elements; there are

(rl

)such

subsets Y . Let BY denote the l × l submatrix of B obtained by deleting column j forall j /∈ Y . Similarly let Y C denote the l × l submatrix of C obtained by deleting row j

for all j /∈ Y .To help the reader through the next proof we look first at the case l = 2, r = 3. So

B =(

b11 b12 b13

b21 b22 b23

)and C =

⎛

⎝c11 c12

c21 c22

c31 c32

⎞

⎠ .

The proof uses the columns of B which we denote by B1, B2, B3. Then (B1,B3) = BY

and det Y C = ∣∣ c11 c12c31 c32

∣∣ = c11c32 − c12c31 where Y = {1,3} and so on. The determinantof the 2 × 2 matrix BC can be expressed as the sum of three terms detBY det Y C

where Y runs through the subsets {1,2}, {1,3}, {2,3} of {1,2,3} as follows:


detBC =∣∣∣∣b11c11 + b12c21 + b13c31 b11c12 + b12c22 + b13c32

b21c11 + b22c21 + b23c31 b21c12 + b22c22 + b23c32

∣∣∣∣

= |B1c11 + B2c21 + B3c31, B1c12 + B2c22 + B3c32|

=3∑

i,j=1

|Bi,Bj |ci1cj2 =∑

i �=j

|Bi,Bj |ci1cj2

=∑

i<j

|Bi,Bj |(ci1cj2 − ci2cj1) =∑

Y

detBY det Y C

where Y = {i, j}, i < j.

In the steps above we have used standard properties of the determinant function: it ismultilinear and so detBC is a sum of 9 terms |Bi,Bj |ci1cj2. Three of these terms,those with i = j , are zero as |Bi,Bi | = 0. The 6 remaining terms occur in 3 pairscorresponding to the 3 subsets Y = {i, j} as |Bj ,Bi | = −|Bi,Bj | for i < j . We haveshown that the 2-minors of BC (there is only one, namely detBC) are expressible asinteger linear combinations of the 2-minors of B as well as integer linear combinationsof the 2-minors of C.

We are now ready to state and prove a general theorem which was discoveredindependently by the French mathematicians Binet and Cauchy in 1812.

Theorem 1.18 (The Cauchy–Binet theorem over Z)

Let B be an l × r matrix over Z and let C be an r × l matrix over Z where r ≥ l. Foreach subset Y of {1,2, . . . , r} having l elements, let BY be the l × l submatrix of B

formed by deleting column j for all j /∈ Y . Let Y C be the l × l submatrix of C formedby deleting row j for all j /∈ Y . Then

detBC =∑

Y

detBY det Y C.

Proof

We first note, by way of encouragement, that two special cases of this theorem arealready familiar to us: when l = 1 it is nothing more than the formula for an entry in amatrix product. When l = r it is the multiplicative property of determinants detBC =detB detC.

Write Bi for column i of B and let C = (cij ) where 1 ≤ i ≤ r , 1 ≤ j ≤ l. Then

(column j of BC) =∑

i

Bicij =∑

ij

Bij cij j


on replacing i by ij where 1 ≤ ij ≤ l, as we need a different summation index for eachof the l columns of BC. Using the multilinear property of the determinant functionwe obtain

detBC =∣∣∣∣∣∣

∑

i1

Bi1ci11,∑

i2

Bi2ci22, . . . ,∑

il

Bil cil l

∣∣∣∣∣∣

=∑

i1,i2,...,il

det(Bi1 ,Bi2, . . . ,Bil )ci11ci22 · · · cil l

There are rl terms in the above summation but, as a determinant with two equalcolumns is zero, we need consider only the r(r − 1) · · · (r − l + 1) = r!/(r − l)!terms with i1, i2, . . . , il distinct, that is, no two of i1, i2, . . . , il are equal. Each suchterm gives rise to a subset Y = {i1, i2, . . . , il} of {1,2, . . . , r} and so these r!/(r − l)!terms partition into r!/((r − l)!l!) = (

rl

)equivalence classes of size l!, two terms be-

ing equivalent if they give rise to the same subset Y . We calculate detBC by summingup the terms in each equivalence class and finally adding these sums together.

Let Y = {j1, j2, . . . , jl} where 1 ≤ j1 < j2 < · · · < jl ≤ r and suppose also thatY = {i1, i2, . . . , il}. Then i1, i2, . . . , il is a permutation of j1, j2, . . . , jl and so

∣∣Bi1,Bi2, . . . ,Bil

∣∣ = ±∣∣Bj1,Bj2 , . . . ,Bjl

∣∣ = ±detBY

according as the above permutation is even (plus sign) or odd (minus sign); rememberthat a permutation is even/odd according as it is the product of an even/odd number ofinterchanges and each interchange (of columns of B) produces a change of sign in thedeterminant. Adding the l! terms with {i1, i2, . . . , il} = Y gives

∑

{i1,i2,...,il}=Y

|Bi1 ,Bi2, . . . ,Bil |ci11ci22 · · · cil l

= detBY

( ∑

{i1,i2,...,il}=Y

±ci11ci22 · · · cil l

)= detBY det Y C.

Therefore adding these sums, one for each subset Y of l integers from among{1,2, . . . , r}, gives the formula of Theorem 1.18. �

The theory of determinants is valid for matrices with entries from any commu-tative ring R, and the Cauchy–Binet theorem is also valid for such matrices as theabove proof goes through unchanged. The formula of Theorem 1.18 is then an equal-ity between elements of R and we refer to this equation as the Cauchy–Binet theoremover R.


We are ready to put gcds and determinants together.

Let gl(A) denote the non-negative gcd of the l-minors of the s × t matrix A

over Z for 1 ≤ l ≤ min{s, t}.In the case of A = (

6 4 08 8 4

)as we saw earlier 2 = g1(A) and 8 = g2(A).

Corollary 1.19

Let B be an s × r matrix over Z and let C be an r × t matrix over Z. Then gl(B) andgl(C) are divisors of gl(BC) for 1 ≤ l ≤ min{r, s, t}.

Proof

Every l × l submatrix of BC is of the type B ′C′, where B ′ is an l × r submatrix of B

and C′ is an r × l submatrix of C. By Theorem 1.18

detB ′C′ =∑

Y

detB ′Y det Y C′.

Let X denote the set of l-minors of C. Then det Y C′ belongs to X. Let X′ denotethe set of l-minors of BC. The above equation tells us that each integer in X′ is aninteger linear combination of the integers in X. Therefore X′ ⊆ 〈X〉 in the notationof Corollary 1.17. Now 〈X〉 = 〈gl(C)〉 and 〈X′〉 = 〈gl(BC)〉. So gl(C) is a divisor ofgl(BC) by Corollary 1.17. Since detB ′

Y is an l-minor of B , we deduce in a similarway that gl(B) is also a divisor of gl(BC). �

Therefore just as B and C are factors of BC, so gl(B) and gl(C) are factors(divisors) of gl(BC).

We come next to the climax of the theory!

Corollary 1.20

Let A and B be s × t matrices over Z.

Then A ≡ B ⇔ gl(A) = gl(B) for 1 ≤ l ≤ min{s, t}.Further A is equivalent to a unique matrix D in Smith normal form Definition 1.6.

Proof

Suppose A ≡ B . By Definition 1.5 there are invertible matrices P and Q over Z

with PAQ−1 = B . From Corollary 1.19 we deduce gl(A) is a divisor of gl(PA)


and gl(PA) is a divisor of gl(PAQ−1) = gl(B). So gl(A) is a divisor of gl(B). Butpre and postmultiplying PAQ−1 = B by P −1 and Q respectively gives P −1BQ = A,showing B ≡ A. Hence the roles of A and B can be reversed in the first part of theproof to show that gl(B) is a divisor of gl(A). So gl(A) = gl(B), as these integers arenon-negative and each is a divisor of the other.

From Theorem 1.11 there is an s × t matrix D = diag(d1, d2, . . . , dmin{s,t}) inSmith normal form with A ≡ D. Selecting the first l rows and the first l columns ofD we obtain an l-minor of D equal to d1d2 · · ·dl . In fact gl(D) = d1d2 · · ·dl sinceevery l-minor of D is zero or a product dj1dj2 · · ·djl

where 1 ≤ j1 < j2 < · · · < jl ≤min{s, t}, and di |dji

for 1 ≤ i ≤ l as i ≤ ji . By the first paragraph of the proof wededuce

gl(A) = d1d2 · · ·dl for 1 ≤ l ≤ min{s, t}.These equations determine the invariant factors d1, d2, . . . , dmin{s,t} of A in termsof the integers gl(A) using induction on l: d1 = g1(A) and for l > 1 either dl =gl(A)/gl−1(A) if gl(A) �= 0 or dl = 0 if gl(A) = 0. The conclusion is that A is equiv-alent over Z to a unique matrix D in Smith normal form.

Suppose gl(A) = gl(B) for 1 ≤ l ≤ min{s, t}. By Theorem 1.11 and the aboveparagraph, A and B are equivalent to the same matrix D in Smith normal form,that is, A ≡ D and B ≡ D. There are invertible matrices P , P ′, Q, Q′ over Z withPAQ−1 = D and P ′B(Q′)−1 = D. Therefore PAQ−1 = P ′B(Q′)−1 which rear-ranges to give P ′′A(Q′′)−1 = B where P ′′ = (P ′)−1P and Q′′ = (Q′)−1Q. As P ′′and Q′′ are invertible over Z, we conclude A ≡ B . �

From Theorem 1.11 and Corollary 1.20 we see that it is legitimate to refer tothe Smith normal form D = S(A) of a matrix A over Z and the invariant factorsdi = di(A) of A.

Our next and last theorem is relevant to the theory of subgroups and quotientgroups of finitely generated abelian groups which is covered in Section 3.3. How doesthe Smith normal form behave under matrix multiplication? Let A be an r × s matrixover Z where r ≤ s. We write

S(A) = diag(d1(A), d2(A), . . . , dr (A)) for the Smith normal form of A

and so dk(A) is the kth invariant factor of A for 1 ≤ k ≤ r .

Theorem 1.21

Let A be an r × s matrix over Z and B an s × t matrix over Z where r ≤ s ≤ t .Suppose all the invariant factors of A,B and AB are positive. Then dk(A) and dk(B)

are divisors of dk(AB) for 1 ≤ k ≤ r .


Proof

There are invertible matrices P1, P2, Q1, Q2 over Z such that P1AQ−11 = S(A)

and P2ABQ−12 = S(AB). On substituting A = P −1

1 S(A)Q1 in the second equationand rearranging, we obtain PS(AB) = S(A)C where P1P

−12 = P = (pij ) is an in-

vertible r × r matrix over Z and Q1BQ−12 = C = (cij ). Comparing (i, j)-entries in

PS(AB) = S(A)C gives

pij dj (AB) = di(A)cij (♣)

for all i and j with 1 ≤ i, j ≤ r . For k with 1 ≤ k ≤ r we restrict our attention to pij

where j ≤ k ≤ i ≤ r . We now perform some number-theoretic juggling to show thatall these pij are divisible by a certain integer. First multiply (♣) by the positive integerdk(AB)/dj (AB) to get

pij dk(AB) = dk(A)c′ij (♦)

where c′ij = (di(A)/dk(A))cij (dk(AB)/dj (AB)) is an integer. Next divide (♦) through

by dk = gcd{dk(A), dk(AB)} obtaining

pij (dk(AB)/dk) = (dk(A)/dk)c′ij . (❤)

The integers dk(AB)/dk and dk(A)/dk are coprime (their gcd is 1) and so (❤) showsthat

dk(A)/dk is a divisor of pij for j ≤ k ≤ i ≤ r. (♠)

Next we partition the r × r matrix

P =(

P11 P12

P21 P22

)

where P11 is the leading (k−1)× (k−1) submatrix. By (♠) not only are all the entriesin P21 divisible by dk(A)/dk but so also are all the entries in the first column of P22.We are nearly there! Working modulo dk(A)/dk , the determinant of the invertiblematrix P is

|P | =∣∣∣∣∣P11 P12

P21 P22

∣∣∣∣∣≡

∣∣∣∣∣P11 P12

0 P22

∣∣∣∣∣= |P11| × |P22| ≡ |P11| × 0 = 0

showing that |P | = ±1 is divisible by dk(A)/dk . There is only one way out of thisapparent impasse: dk(A)/dk = 1, the only positive integer divisor of ±1. We concludedk(A) = dk = gcd{dk(A), dk(AB)} is a divisor of dk(AB) for 1 ≤ k ≤ r .

Finally we show in a similar way that dk(B) is a divisor of dk(AB) for1 ≤ k ≤ r . There are invertible matrices P2, P3, Q2, Q3 over Z with P2ABQ−1

2 =S(AB) and P3BQ−1

3 = S(B). So S(AB)Q = ES(B) where Q = Q2Q−13 is invertible


over Z and E = P2AP −13 . Write Q = (qij ) and E = (eij ). Comparing (i, j)-entries in

S(AB)Q = ES(B) for 1 ≤ i ≤ r , 1 ≤ j ≤ s gives

di(AB)qij = eij dj (B) (♣′)

and for 1 ≤ i ≤ r , s < j ≤ t gives di(AB)qij = 0. So qij = 0 for 1 ≤ i ≤ r , s < j ≤ t

as di(AB) > 0. Consider qij with i ≤ k ≤ j ≤ s. As in the first part of the proof (♣′)implies that

qij is divisible by dk(B)/ ¯dk for i ≤ k ≤ j ≤ s (♠′)

where ¯dk = gcd{dk(B), dk(AB)}. We partition the invertible t × t matrix

Q =(

Q11 Q12

Q21 Q22

)

where Q11 is the leading (k − 1) × (k − 1) submatrix. Using (♠′) all entries in Q12

are divisible by dk(B)/ ¯dk as are all entries in the first row of Q22. Hence, as in the

first part of the proof, detQ = ±1 is divisible by dk(B)/ ¯dk . So dk(B)/ ¯dk = 1 giving

dk(B) = ¯dk which is a divisor of dk(AB). �

EXERCISES 1.3

1. (a) Use the Euclidean algorithm to find the non-negative gcd d of each ofthe following sets X of integers. In each case express d as an integerlinear combination of the integers in X.

(i) {21,75,175}; (ii) {42,66,154,231}.Hint: Use the Euclidean algorithm twice for (i) and three times for (ii).

(b) Let X1 and X2 be finite sets of integers. Show that

gcd{gcdX1,gcdX2} = gcdX1 ∪ X2.

(c) Let n1, n2, . . . , nk be positive integers. Write li = n1n2 · · ·nk/ni for1 ≤ i ≤ k. Show that m0 = n1n2 · · ·nk/gcd{l1, l2, . . . , lk} is the lcm(least common multiple) of n1, n2, . . . , nk , that is, show(i) ni |m0 for 1 ≤ i ≤ k and(ii) m0|m for all integers m satisfying ni |m for all 1 ≤ i ≤ k.Hint: Use Corollary 1.16 for (ii).

2. (a) Find the positive generator of the ideal K of Z consisting of all inte-gers of the type k = 63b1 + 231b2 + 429b3 where b1, b2, b3 ∈ Z. Listthe integers k ∈ K satisfying −10 < k < 10.


(b) Let d and d ′ be integers. Show that 〈d〉 ⊆ 〈d ′〉 ⇔ d ′|d . List the 8ideals of Z which contain the ideal 〈30〉. Which ideals of Z containthe ideal 〈p〉 where p is prime? List the ideals of Z containing 〈64〉,beginning with the smallest and ending with the largest.

(c) Let K be an additive subgroup of Z and let k ∈ K . Prove by inductionthat nk ∈ K for all positive integers n. Hence show that K is an idealof Z.

(d) Let K1 and K2 be ideals of Z. Show that K1 ∩ K2 and K1 + K2 ={k1 + k2 : k1 ∈ K1, k2 ∈ K2} are ideals of Z. By Theorem 1.15 thereare non-negative integers d1, d2 with K1 = 〈d1〉, K2 = 〈d2〉. ShowK1 + K2 = 〈gcd{d1, d2}〉 and K1 ∩ K2 = 〈lcm{d1, d2}〉.Hint: Use the first part of (b) above.

3. (a) Find the numbers of l-minors of a 5 × 6 matrix over Z for 1 ≤ l ≤ 5.(b) For each of the following s × t matrices A over Z, calculate the gcd

gl(A) of l-minors, 1 ≤ l ≤ min{s, t}, and find S(A) without furthercalculation:

(i)

(2 1 34 6 8

); (ii)

(6 18 024 9 27

);

(iii)

⎛

⎝6 0 00 10 00 0 15

⎞

⎠ .

(c) Calculate adjA and verify A(adjA) = (detA)I = (adjA)A for

A =⎛

⎝1 1 12 5 24 4 7

⎞

⎠ .

Find invertible 3 × 3 matrices P and Q over Z with PAQ−1 = S(A)

and verify that Q(adjA)P −1 = adjS(A). What is the Smith normalform of adjA?

(d) Let A be a t × t matrix over Z with detA > 0 and let P andQ be invertible t × t matrices over Z with PAQ−1 = S(A) =diag(d1, d2, . . . , dt ) where t ≥ 2. Show that Q(adjA)P −1 = adjS(A)

and hence express the invariant factors of adjA in terms of the in-variant factors di of A. How do these formulae change in the casedetA < 0?Hint: Start by multiplying PAQ−1 and Q(adjA)P −1 together.Suppose now that rankA = t − 1. Show that Q(adjA)P −1 =±diag(0, . . . ,0, d1d2 · · ·dt−1) and hence find S(adjA). What isS(adjA) in the case rankA ≤ t − 2?


(e) Specify the t × t matrices A over Z with t ≥ 2 such that A ≡ adjA.Hint: Consider first the case t = 2. Then try t = 3.

4. (a) Verify the Cauchy–Binet theorem for

B =(

2 1 43 5 7

), C = BT .

(b) Find x and y satisfying

∣∣∣∣14 2 + 2x + 3y

2 + 2x + 3y 4 + x2 + y2

∣∣∣∣ = 0

by applying the Cauchy–Binet theorem to

B =(

1 2 32 x y

), C = BT .

(c) Let B be an s × t matrix over Z where s < t . Show detBBT = 0 ifand only if gs(B) = 0.

(d) Let B be an l × r matrix over Z and let C be an r × l matrix over Zwhere r < l. Show detBC = 0.Hint: Construct l × l matrices

B ′ = (B | 0) and C′ =(

C

0

)

and use Theorem 1.18.5. (a) Let A be an s × t matrix over Z with s ≥ t and let Y be a subset of

{1,2, . . . , t}. Denote by AY the s × l submatrix consisting of columnsj of A for j ∈ Y , where l = |Y |. Show that gl((PA)Y ) = gl(AY ) forall invertible s × s matrices P over Z where 1 ≤ l ≤ t .Let S(A) = diag(d1, d2, . . . , dt ) be the Smith normal form of A. Showthat A can be changed into S(A) using only eros if and only ifg1(AY ) = dj for Y = {j}, 1 ≤ j ≤ t , and gl(AY ) = d1d2 · · ·dl forY = {1,2, . . . , l}, 2 ≤ l ≤ t .Hint: Adapt the method of Lemma 1.9 by applying Lemma 1.7 trans-posed to the columns of A.

(b) Which of the following matrices A can be changed into Smith normalform using eros only?

(i)

⎛

⎝1 2 81 4 43 6 0

⎞

⎠ ; (ii)

⎛

⎝1 4 82 10 161 4 12

⎞

⎠ .


(c) Let A be an s × t matrix over Z with s < t and gs(A) = 1. Show thatA has Smith normal form S(A) = (Is | 0) where Is is the s × s identitymatrix and that A can be changed into S(A) using ecos only. Deducethat there is an invertible t × t matrix Q over Z having the rows of A

as its first s rows.Hint: Consider Q where AQ−1 = S(A).For each of the following matrices A, find a suitable Q:

(i) (6,10,15); (ii)

(10 9 1015 15 16

).

6. (a) Let A and B be t × t matrices over Z having non-zero coprime deter-minants. Use Theorem 1.21 to show S(AB) = S(A)S(B).

(b) Suppose S(A) = diag(2,4) and S(B) = diag(1,3); what is S(AB)?For S(A) = diag(2,6) and S(B) = diag(1,4), find the two possiblematrices S(AB).

(c) Let A be a t × t matrix over Z with dt (A) �= 0. Let p1, . . . , pl be thedistinct prime divisors of dt (A). Show that there are t × t matricesAj over Z, unique up to equivalence, such that A = A1A2 · · ·Al anddk(Aj ) is a power of pj for 1 ≤ j ≤ l.Hint: Use A = P −1S(A)Q.

(d) Express A = A1A2 as in part (c) where A = (4 812 10

). Are A1A2 and

A2A1 necessarily equivalent?

2Basic Theory of Additive Abelian Groups

In this chapter we discuss cyclic groups, the quotient group construction, the directsum construction and the first isomorphism theorem, in the context of additive abeliangroups; we also discuss free modules. These concepts are necessary, as well as thematrix theory of Chapter 1, for the study of finitely generated abelian groups in Chap-ter 3. At the same time the material provides the reader with a taster for general grouptheory.

2.1 Cyclic Z-Modules

We begin with a brief review of abelian groups. They arise in additive and multi-plicative notation. Additive notation is more suited to our purpose and we’ll adopt itwherever possible. The term Z-module is simply another name for an additive abeliangroup. However it signals an approach which emphasises the analogy between vectorspaces and abelian groups. The structure-preserving mappings of abelian groups, theirhomomorphisms in other words, are analogous to linear mappings of vector spaces.So in a sense the reader will have seen it all before. But be careful as the analogy is byno means perfect! For instance from λv = 0 in a vector space one can safely deducethat either λ = 0 or v = 0 (or both). By contrast, in a Z-module the equation mg = 0may hold although m �= 0 and g �= 0 as we will see.

Next we study cyclic groups, that is, groups generated by a single element. Theirtheory is not too abstract and should help the reader’s appreciation of the more generaltheorems on Z-modules later in Chapter 2.


47

http://dx.doi.org/10.1007/978-1-4471-2730-7_2

48 2. Basic Theory of Additive Abelian Groups

Let (G,+) denote a set G with a binary operation denoted by addition. So G isclosed under addition, that is,

g1 + g2 ∈ G for all g1, g2 ∈ G.

Then (G,+) is an (additive) abelian group if the following laws hold:1. The associative law of addition: (g1 + g2) + g3 = g1 + (g2 + g3) for all

g1, g2, g3 ∈ G.2. The existence of a zero element: there is an element, denoted by 0, in G satisfying

0 + g = g for all g ∈ G.3. The existence of negatives: for each g ∈ G there is an element, denoted by −g, in

G satisfying −g + g = 0.4. The commutative law of addition: g1 + g2 = g2 + g1 for all g1, g2 ∈ G.

We now drop the notation (G,+) and in its place refer simply to the abelian group G,the binary operation of addition being taken for granted. Laws 1, 2, 3 are the groupaxioms in additive notation. The reader should know that the zero element 0 of a groupG is unique, that is, given laws 1, 2 and 3 then there is only one element 0 as in law 2.Similarly each element g of a group G has a unique negative −g. Laws 3 and 4 giveg + (−g) = 0 which tells us that g is the negative of −g, that is, all elements g of anadditive abelian group G satisfy the unsurprising equation −(−g) = g.

The reader might like to simplify the expression ((−g1) + (−g2)) + (g1 + g2)

where g1 and g2 are elements of an additive abelian group G, using at each step oneof the above four laws (start by applying law 4 followed by law 1 (twice) and thenlaw 3, etc.). After six steps you should get the zero element 0 of G. The conclusion is:(−g1)+ (−g2) is the negative of g1 +g2 by law 3, as the sum of these two elements ofG is zero. In other words (−g1) + (−g2) = −(g1 + g2) for all g1 and g2 in G. Luck-ily the manipulation of elements of an additive group need not involve the laboriousapplication of its laws as we now explain.

Let g1, g2, . . . , gn be elements of an additive group G where n ≥ 3. These ele-ments can be summed (added up) in order in various ways, but all ways produce thesame element of G. In the case n = 4

((g1 + g2) + g3) + g4 = ((g1 + (g2 + g3)) + g4 = g1 + ((g2 + g3) + g4)

= g1 + (g2 + (g3 + g4)) = (g1 + g2) + (g3 + g4)

using only law 1. Omitting the brackets we see that g1 + g2 + g3 + g4 has an un-ambiguous meaning, namely any one of the above (equal) elements. Using law1 andinduction on n, it can be shown (Exercises 2.1, Question 8(a)) that brackets may beleft out when adding up, in order, any finite number n of elements g1, g2, . . . , gn ofan additive group G to give the generalised associative law of addition. So the sumg1 + g2 + · · · + gn of n elements of an additive abelian group G is unambiguouslydefined and what is more this sum is unchanged when the suffixes are permuted (thegeneralised commutative law of addition). In the case n = 3

2.1 Cyclic Z-Modules 49

g1 + g2 + g3 = g1 + g3 + g2 = g3 + g1 + g2

= g3 + g2 + g1 = g2 + g3 + g1 = g2 + g1 + g3.

Let g be an element of an additive abelian group G. The elements 2g = g + g, 3g =g + g + g belong to G. More generally for every positive integer n the group elementng is obtained by adding together n elements equal g, that is,

ng = g + g + · · · + g (n terms).

So 1g = g and as n(−g)+ng = 0 we deduce that n(−g) = −(ng). Therefore it makessense to define (−n)g to be the group element n(−g) and to define 0g to be the zeroelement 0 of G. It follows that (−n)(−g) = n(−(−g)) = ng. But more importantlywe have given meaning to mg for all integers m (positive, negative and zero) and allelements g in G and

mg ∈ G for all m ∈ Z, g ∈ G

showing that G is closed under integer multiplication.Integer multiplication on G and the group operation of addition on G are con-

nected by the following laws:5. The distributive laws:

m(g1 + g2) = mg1 + mg2 for all m ∈ Z and all g1, g2 ∈ G,

(m1 + m2)g = m1g + m2g for all m1,m2 ∈ Z and all g ∈ G.

6. The associative law of multiplication:

(m1m2)g = m1(m2g) for all m1,m2 ∈ Z and all g ∈ G.

7. The identity law: 1g = g for all g ∈ G.Laws 5 and 6 are the familiar laws of indices expressed in additive notation, rather thanthe more usual multiplicative notation; they allow elements of additive abelian groupsto be manipulated with minimum fuss (see Exercises 2.1, Question 8(c)). Law 7 isfrankly something of an anti-climax, but its presence will help us generalise theseideas later in a coherent way. The structure of G is expressed concisely by saying

G is a Z-module

meaning that laws 1–7 above hold. Notice the close connection between the laws ofa Z-module and the laws of a vector space: they are almost identical! Think of theelements of G as being ‘vectors’ and the elements of Z as being ‘scalars’. The onlything which prevents a Z-module from being a vector space is the fact that Z is not afield.

The reader will already have met many examples of additive abelian groups: forexample the additive group (Q,+) of all rational numbers m/n (m,n ∈ Z, n > 0);


this group is obtained from the rational field Q by ignoring products of non-integerrational numbers – they simply don’t arise in (Q,+). In the same way, ignoring themultiplication on any ring R, we obtain its additive group (R,+). Of particular im-portance are the additive groups of the residue class rings Zn (we’ll shortly reviewtheir properties) as well as the additive group of the ring Z itself.

Let H be a subgroup of the additive abelian group G. So H is a subset of G

satisfying(a) h1 + h2 ∈ H for all h1, h2 ∈ H (H is closed under addition)(b) 0 ∈ H (H contains the zero element of G)(c) −h ∈ H for all h ∈ H (H is closed under negation).By (a) we see that H is a set with a binary operation of addition and so it makes senseto ask: is (H,+) an abelian group? As H ⊆ G and law 1 holds in G, we see that(h1 + h2) + h3 = h1 + (h2 + h3) for all h1, h2, h3 in H , that is, law 1 holds in H . Inthe same way law 4 holds in H . Also (b) and (c) ensure that law 2 and law 3 hold in H .So (H,+) is an abelian group as laws 1–4 hold with G replaced by H . Hence laws 5,6 and 7 also hold with G replaced by H , that is, H is a Z-module. The relationshipbetween H and G is described by saying

H is a submodule of the Z-module G.

The set 〈2〉 of even integers is a subgroup of the additive group Z and so 〈2〉 is asubmodule of Z. The discussion preceding Theorem 1.15 shows that 〈2〉 is an ideal ofthe ring Z. More generally H is a submodule of the Z-module Z if and only if H isan ideal of the ring Z, and when this is the case Theorem 1.15 tells us H = 〈d〉 whered is a non-negative integer.

We now discuss in detail cyclic Z-modules. They are crucial: on the one hand theyand their submodules are easily described as we will soon see, and on the other handevery finitely generated Z-module can be constructed using them as building blocks,as we show in Section 3.1.

Definition 2.1

Let G be a Z-module containing an element g such that every element of G is of thetype mg for some m ∈ Z. Then G is said to be cyclic with generator g and we writeG = 〈g〉.

The additive group Z of all integers is a cyclic Z-module with generator 1 andso Z = 〈1〉. As Z contains an infinite number of elements we say that Z is an infinitecyclic group. Let K be a subgroup of Z (we know that subgroups of Z are idealsTheorem 1.15 and so we denote them by K rather than H ). As K = 〈d〉 where d isnon-negative, every subgroup K of the infinite cyclic group Z is itself cyclic because


K has generator d . Note that −d is also a generator of K as 〈−d〉 = 〈d〉. The subgroup〈2〉 of all even integers is infinite cyclic just like its ‘parent’ Z and the same is true of〈d〉 with d �= 0. Notice that 〈6〉 ⊆ 〈2〉 as 2 is a divisor of 6 and so all integer multiples of6 are even integers. More generally 〈d1〉 ⊆ 〈d2〉 if and only if d2|d1. where d1, d2 ∈ Z.

Let n be a positive integer. The reader is assumed to have met the ring Zn ofintegers modulo n; however, we now briefly review its construction and properties.A typical element of Zn is the congruence class r of an integer r , that is, r = {nq + r :q ∈ Z}. So r is the subset of integers m = nq + r which differ from r by an integermultiple q of n. Therefore m − r = nq , that is, the difference between m and r isdivisible by n; this is expressed by saying that m is congruent to r modulo n andwriting m ≡ r (mod n). So Zn has n elements and

Zn = {0,1,2, . . . , n − 1}

since the n congruence classes r correspond to the n possible remainders r on di-viding an arbitrary integer m by n. You should know that Zn is a commutative ring,the rules of addition and multiplication being unambiguously defined by m + m′ =m + m′, (m)(m′) = mm′ for all m,m′ ∈ Z. The 0-element and 1-element of Zn are 0and 1 respectively. You should also know that Zn is a field if and only if n is prime.The smallest field is Z2 = {0,1} having two elements, namely the set 0 of all evenintegers and the set 1 of all odd integers.

The additive group of Zn is cyclic, being generated by 1 as m = m1. Having n

elements, Zn is a cyclic group of order n.For example, taking n = 4 we obtain the cyclic group Z4 = {0,1,2,3} of order 4

with addition table:

+ 0 1 2 3

0 0 1 2 31 1 2 3 02 2 3 0 13 3 0 1 2

The element x + y appears in the table where the row headed by x meets the columnheaded by y, for x, y ∈ Z4. By inspection Z4 has three subgroups {0}, {0,2} and Z4

itself. The union of the congruence classes belonging to any one of these subgroups ofZ4 is a subgroup of Z. Thus 0 = 〈4〉 since 0 consists of integers which are multiplesof 4. Similarly 0 ∪ 2 = 〈2〉 since 2 consists of integers which are multiples of 2 but notmultiples of 4. Also 0 ∪ 1 ∪ 2 ∪ 3 = Z = 〈1〉. The three subgroups of Z4 correspondin this way to the three subgroups 〈4〉, 〈2〉, 〈1〉 of Z which contain 〈4〉. What is more,the subgroups of Z4 are cyclic with generators 4, 2, 1. We now return to Zn and showthat these ideas can be generalised.


Lemma 2.2

Let n be a positive integer. Each subgroup H of the additive group Zn is cyclic withgenerator d where d|H | = n and |H | is the number of elements in H .

Proof

For each subgroup H of Zn let K = {m ∈ Z : m ∈ H }. So K consists of those inte-gers m which belong to a congruence class in H . We show that K is a subgroup ofZ containing 〈n〉. Let m1,m2 ∈ K . Then m1,m2 ∈ H . As H is closed under addition,m1 + m2 = m1 + m2 ∈ H . So m1 + m2 ∈ K showing that K is closed under addition.Now 0 ∈ H as H contains the zero element of Zn. But 0 consists of all integer mul-tiples of n. So 0 = 〈n〉 ⊆ K and in particular 0 ∈ K . Let m ∈ K . Then m ∈ H and so−m = −m ∈ H as H is closed under negation. Therefore −m ∈ K showing that K isclosed under negation. We have shown that K is a subgroup of Z containing 〈n〉. SoK = 〈d〉 for some non-negative integer d by Theorem 1.15. As 〈n〉 ⊆ 〈d〉 we concludethat d|n. As n is positive, d cannot be zero and so d is positive also. As d ∈ K we seethat d ∈ H . Finally consider m ∈ H . Then m ∈ K and so m = qd for some q ∈ Z. Som = qd showing that H is cyclic with generator d .

Now |H | is the order of H (the number of elements in H ) and so K is the union of|H | congruence classes (mod n). Let m ∈ K . As K = 〈d〉 there is q ∈ Z with m = qd .Divide q by n/d to obtain integers q ′, r with q = q ′(n/d) + r where 0 ≤ r < n/d .Then m = (q ′(n/d)+ r)d = q ′n+ rd , showing that K consists of the n/d congruenceclasses rd . Hence |H | = n/d and so d|H | = n. �

For example, by Lemma 2.2 the additive group Z18 has 6 subgroups correspond-ing to the 6 positive divisors 1, 2, 3, 6, 9, 18, of 18.

These 6 subgroups can be arranged in their lattice diagram, as shown, in which sub-group H1 is contained in subgroup H2 if and only if there is a sequence of upwardlysloping lines joining H1 to H2. For instance 〈6〉 ⊆ 〈1〉 but 〈9〉� 〈2〉.


The proof of Lemma 2.2 shows that each subgroup H of Zn corresponds to asubgroup K of Z which contains 〈n〉. From the last paragraph we see that each suchsubgroup K has a positive generator d and is made up of n/d congruence classes(mod n). These n/d elements form a subgroup H of Zn. So for each K there is one H

and vice-versa. We show in Theorem 2.17 that bijective correspondences of this typearise in a general context. To get the idea, consider

the natural mapping η : Z → Zn

which maps each integer m to its congruence class m modulo n. We use (m)η todenote the image of m by η and so (m)η = m for all integers m. Now η is additive,that is,

(m + m′)η = (m)η + (m′)η for all m,m′ ∈ Z

as m + m′ = m + m′. Such additive mappings provide meaningful comparisons be-tween additive abelian groups and surprisingly each one gives rise to a bijective cor-respondence as above. The image of η is the set of all elements (m)η and is denotedby imη. As η is surjective (onto) we see imη = Zn. The kernel of η is the set ofelements m such that (m)η = 0, the zero element of Zn, and is denoted by kerη.So kerη = 〈n〉. With this terminology the correspondence between H and K , whereK = {m ∈ Z : (m)η ∈ H }, is bijective from the set of subgroups H of imη to the setof subgroups K of Z which contain kerη. We take up this theme in Theorem 2.17.

Definition 2.3

Let G and G′ be additive abelian groups. A mapping θ : G → G′ such that(g1 + g2)θ = (g1)θ + (g2)θ for all g1, g2 ∈ G is called additive or a homomorphism.

Such mappings θ respect the group operation and satisfy (0)θ = 0′, (−g)θ =−(g)θ for g ∈ G, that is, θ maps the zero of G to the zero of G′ and θ respectsnegation (see Exercises 2.1, Question 4(c)). With g1 = g2 = g in Definition 2.3 weobtain (2g)θ = 2((g)θ). Using induction the additive mapping θ satisfies

(mg)θ = m((g)θ)

for all m ∈ Z, g ∈ G. We describe θ as being Z-linear, meaning that θ is additive andsatisfies the above equation, that is, θ maps each integer multiple mg of each elementg of G to m times the element (g)θ of G′.

The natural mapping η : Z → Zn is Z-linear. The ‘doubling’ mapping θ : Z → Z,where (m)θ = 2m for all m ∈ Z, is also Z-linear.

How can we tell whether two Z-modules are really different or essentially thesame? The next definition provides the terminology to tackle this problem.


Definition 2.4

A bijective (one-to-one and onto) Z-linear mapping is called an isomorphism. The Z-modules G and G′ are called isomorphic if there is an isomorphism θ : G → G′ inwhich case we write θ : G ∼= G′ or simply G ∼= G′. An isomorphism θ : G ∼= G of G

to itself is called an automorphism of G.

For example θ : Z ∼= 〈2〉, where (m)θ = 2m for all m ∈ Z, is an isomorphismshowing that the Z-module of all integers is isomorphic to the Z-module of all evenintegers. In the same way Z ∼= 〈d〉 for every non-zero integer d . Isomorphic Z-modulesare abstractly identical and differ at most in notation.

The inverse of an isomorphism θ : G ∼= G′ is an isomorphism θ−1 : G′ ∼= G andthe composition of compatible isomorphisms θ : G ∼= G′ and θ ′ : G′ ∼= G′′ is an iso-morphism θθ ′ : G ∼= G′′ (Exercises 2.1, Question 4(d)).

The additive cyclic group Z is generated by 1 and also by −1. As Z has no othergenerators there is just one non-identity automorphism of Z, namely τ : Z → Z de-fined by (m)τ = −m for all m ∈ Z. Notice that (1)τ = −1 and (−1)τ = 1. Moregenerally, every automorphism of a cyclic group permutes the generators amongstthemselves.

Let g be an element of the additive group G. The smallest positive integer n suchthat ng = 0 is called the order of g; if there is no such integer n then g is said to haveinfinite order. We now reformulate this concept in a more convenient manner – it willenable finite and infinite cyclic groups to be dealt with in a unified way.

Let K = {m ∈ Z : mg = 0}, that is, K consists of those integers m such that mg isthe zero element of G. It’s routine to show that K is an ideal of Z. So K is a principalideal of Z with non-negative generator n, that is, K = 〈n〉 by Theorem 1.15. Thenn = 0 means that g has infinite order whereas n > 0 means that g has finite order n.The ideal K = 〈n〉 is called the order ideal of g. Notice

mg = 0 ⇔ m ∈ K ⇔ n|mwhich is a useful criterion for finding the order of a group element in particular cases.For instance suppose 36g = 0, 18g �= 0, 12g �= 0. Then g has order n such that n isa divisor of 36 but not a divisor of either 18 = 36/2 or 12 = 36/3. There is only onesuch positive integer n, namely 36. So g has order 36. More generally

g has finite order n ⇔ ng = 0 and (n/p)g �= 0 for all prime divisors p of n.

The ⇐ implication is valid because every positive divisor d of n with d < n satisfiesd|(n/p) for some prime divisor p of n; so d cannot be the order of g, and hence n isthe order of g.

Be careful! From 24g = 0 and 12g �= 0 one cannot deduce that g has order 24, asg could have order 8.


Once again let g be an element of the additive group G. Then 〈g〉 = {mg : m ∈ Z}is a subgroup of G. As 〈g〉 is cyclic with generator g it is reasonable to call 〈g〉 thecyclic subgroup of G generated by g.

We now explain how the order ideal K of a group element g determines the iso-morphism type Definition 2.6 of the cyclic group 〈g〉.

Theorem 2.5

Every cyclic group G is isomorphic either to the additive group Z or to the additivegroup Zn for some positive integer n.

Proof

Let g generate G and so G = 〈g〉. Consider θ : Z → G defined by (m)θ = mg for allintegers m. Then θ is Z-linear by laws 5 and 6 of a Z-module. Now θ is surjective(onto) since every element of G is of the form (m)θ for some m ∈ Z, that is, im θ = G

meaning that G is the image of θ (we’ll state the general definition of image and kernelin Section 2.3). The kernel of θ is ker θ = {m ∈ Z : (m)θ = 0} = {m ∈ Z : mg = 0} =K which is the order ideal of g. By Theorem 1.15 there is a non-negative integer n

with ker θ = 〈n〉.Suppose n = 0. Then θ is injective (one-to-one) because suppose (m)θ = (m′)θ .

Then (m − m′)θ = (m)θ − (m′)θ = 0 showing that m − m′ belongs to ker θ =〈0〉 = {0}. So m − m′ = 0, that is, m = m′. Therefore θ is bijective and so θ : Z ∼= G,that is,

all infinite cyclic groups are isomorphic to the additive group Z of integers.

Suppose n > 0. As above we suppose (m)θ = (m′)θ . This means m − m′ ∈ ker θ =K = 〈n〉 and so m − m′ is an integer multiple of n, that is, m ≡ m′ (mod n). The stepscan be reversed to show that m ≡ m′ (mod n) implies (m)θ = (m′)θ . So θ has thesame effect on integers m and m′ which are congruent (mod n). In other words θ hasthe same effect on all the integers of each congruence class m, and it makes sense tointroduce the mapping θ : Zn → G defined by (m)θ = (m)θ for all m ∈ Z. As θ isadditive and surjective, the same is true of θ . As θ has different effects on differentcongruence classes (mod n), we see that θ is injective. Therefore θ : Zn

∼= G whichshows:

every cyclic group of finite order n is isomorphic to the additive group Zn. �

We will see in Theorem 2.16 that every Z-linear mapping θ gives rise to an iso-morphism θ as in the above proof. To illustrate Theorem 2.5 let g = 18 ∈ Z60. The


order of g is the smallest positive integer n satisfying 18n ≡ 0 (mod 60). Dividingthrough by gcd{18,60} = 6 we obtain 3n ≡ 0 (mod 10) and so n = 10. Therefore g

has order 10 and hence generates a subgroup 〈g〉 of Z60 which is isomorphic to theadditive group Z10 by Theorem 2.5. The reader can check

〈g〉 = {0,18,36,54,12,30,48,6,24,42} ⊆ Z60

and θ : Z10 ∼= 〈g〉 where (m)θ = 18m for m ∈ Z10.From the proof of Theorem 2.5 we see

g has finite order n ⇔ |〈g〉| = n

In other words, each element g of finite order n generates a cyclic subgroup 〈g〉 oforder n.

Definition 2.6

Let n be a positive integer. A cyclic Z-module G is said to be of isomorphism type Cn

or C0 according as G is isomorphic to the additive group Zn or the additive group Z.

So for n > 0, groups of isomorphism type Cn are cyclic groups of order n. Groupsof type C0 are infinite cyclic groups. Groups of type C1 are trivial because they con-tain only one element, namely their zero element. Notice that for all non-negativeintegers n

G = 〈g〉 has isomorphism type Cn where 〈n〉 is the order ideal of g

How is the order of mg related to the order of g? Should g have infinite order then mg

also has infinite order for m �= 0 and order 1 for m = 0.

Lemma 2.7

Let the Z-module element g have finite order n. Then mg has finite order n/gcd{m,n}for m ∈ Z.

Proof

The order ideal of g is 〈n〉. Let 〈n′〉 be the order ideal of mg where n′ ≥ 0. Write d =gcd{m,n}. Then (n/d)mg = (m/d)ng = (m/d)0 = 0 showing that n/d annihilatesmg, that is, n/d belongs to the order ideal 〈n′〉 of mg. Hence n′ is a divisor of n/d andalso n′ > 0. On the other hand n′mg = 0 shows that n′m belongs to the order ideal 〈n〉of g. So n′m = qn for some integer q . Hence n/d is a divisor of n′ (m/d). But m/d


and n/d are coprime integers meaning gcd{m/d,n/d} = 1. Hence n/d is a divisorof n′. As n/d and n′ are both positive and each is a divisor of the other we concludethat n′ = n/d . So mg has order n′ = n/d = n/gcd{m,n}. �

For example the element 1 of the additive group Z60 has order 60. So 18 = 18(1) inZ60 has order 60/gcd{18,60} = 60/6 = 10. As we saw earlier, 18 generates a cyclicsubgroup of Z60 with |〈18〉| = 10. We will see in Section 3.1 that Lemma 2.7 plays ancrucial part in the theory of finite abelian groups.

Finally we work through an application of these ideas which involves an abeliangroup in multiplicative notation. This abelian group is familiar to the reader yet has ahint of mystery!

Example 2.8

Let G denote the multiplicative group Z∗43 of non-zero elements of the field Z43. In

Corollary 3.17 it is shown that the multiplicative group F ∗ of every finite field F iscyclic. So Z

∗p is cyclic for all primes p. In particular Z∗

43 is cyclic, and we set out tofind a generator by ‘trial and error’. As |G| = 42 = 2×3×7 each generator g has order42, that is, g42 = 1 the identity element of Z∗

43 , and g42/7 = g6 �= 1, g42/3 = g14 �= 1,g42/2 = g21 �= 1.

We first try g = 2. Then g6 = 64 = 21 �= 1, and so g7 = gg6 = 2 × 21 = 42 = −1.Squaring the last equation gives g14 = g7g7 = (−1)2 = 1 showing that 2 is not agenerator of Z∗

43. In fact 2 has order 14.Now try g = 3. Then g4 = 81 = −5 and so g6 = g4g2 = (−5) × 9 = −45 =

−2 �= 1. Hence g7 = g6g = (−2) × 3 = −6. Squaring gives g14 = g7g7 = (−6)2 =36 = −7 �= 1. Then g21 = g7g14 = (−6) × (−7) = 42 = −1 �= 1. Squaring now givesg42 = g21g21 = (−1)2 = 1. So g42 = 1 and g6 �= 1, g14 �= 1, g21 �= 1 which show thatg has order 42. Therefore the integer powers of g = 3 are the elements of a cyclicsubgroup H of order 42. As H ⊆ Z

∗43 and both H and Z

∗43 have exactly 42 elements,

we conclude H = Z∗43. So Z

∗43 is cyclic with generator 3.

Having found one generator of Z∗43 we use Lemma 2.7 to find all g with 〈g〉 = Z

∗43.

There is an integer m with g = (3)m and 1 ≤ m ≤ 42. Comparing orders gives〈g〉 = Z

∗43 ⇔ 42 = 42/gcd{m,42} ⇔ gcd{m,42} = 1 by Lemma 2.7. So Z

∗43 has

12 generators g = (3)m where m ∈ {1,5,11,13,17,19,23,25,29,31,37,41}. For in-stance (3)5 = 81 × 3 = −5 × 3 = −15 = 28 generates Z

∗43. In Section 2.2 we will

meet the Euler φ-function and see that 12 = φ(42).By Fermat’s little theorem, which is proved at the beginning of Section 2.2, every

element g = r of G = Z∗p , p prime, satisfies gp−1 = 1. So

g is a generator of Z∗p ⇔ g(p−1)/p′ �= 1 for all prime divisors p′ of p − 1.


EXERCISES 2.1

1. (a) Write out the addition table of the additive group Z5. Does 4 ∈ Z5

generate Z5? Which elements of Z5 are generators? Specify the (two)subgroups of Z5.

(b) Write out the addition table of the Z-module Z6. Express the ele-ments 27(2), −17(4), 15(3) + 13(4) in the form r , 0 ≤ r < 6. Listthe elements in each of the four submodules H of Z6 and express thecorresponding submodules K of Z in the form 〈d〉, d ≥ 0. Specify agenerator of each H . Which elements generate Z6?Hint: Use Lemma 2.2.

(c) List the elements in the submodule of the Z-module Z21 generated by

(i) 14; (ii) 15.

What are the orders of 14 and 15 in Z21? What common property dothe 12 elements of Z21 not in either of these submodules have?Hint: Use Definition 2.1.

2. (a) Calculate gcd{91,289}. Does 91 generate the Z-module Z289? Does51 generate this Z-module?Hint: Use Lemma 2.7.

(b) Use Lemma 2.7 to show that m ∈ Zn is a generator of the Z-moduleZn if and only if gcd{m,n} = 1.

(c) List the elements of Z25 which are not generators of the Z-module Z25.Do these elements form a submodule of Z25? How many generatorsdoes the Z-module Z125 have?

(d) Let p be prime. Find the number of generators in each of the followingZ-modules:

(i) Zp; (ii) Zp2; (iii) Zp3; (iv) Zpl .

3. (a) Show that 2 ∈ Z13 satisfies (2)4 = 3, (2)6 = −1. Deduce that 2 hasorder 12. Express each power (2)l for 1 ≤ l ≤ 12 in the form r where1 ≤ r ≤ 12. Does 2 generate the multiplicative group Z

∗13 of non-zero

elements of Z13? Use Lemma 2.7 to find the elements r which gener-ate Z

∗13.

(b) Find a generator g of the multiplicative group Z∗17 by ‘trial and error’.

Specify the 5 subgroups of Z∗17 (each is cyclic with generator a power

of g). How many generators does Z∗17 have?

(c) Verify that 28 ≡ −3 (mod 37). Hence show that 2 generates Z∗37 (it’s

not enough to show (2)36 = 1). Arrange the 9 subgroups of Z∗37 in

their lattice diagram. How many generators does Z∗37 have?


(d) Find the orders of each of the elements 2, 3, 4, 5 of Z∗41. Find a gen-

erator of Z∗41.

4. (a) Let G be a Z-module and c an integer. Show that θ : G → G, givenby (g)θ = cg for all g ∈ G, is Z-linear.Let θ : G → G be Z-linear and G cyclic with generator g0. Show thatthere is an integer c as above. Let 〈n〉 be the order ideal of g0. Showthat c is unique modulo n. Show further that θ is an automorphism ofG if and only if gcd{c,n} = 1.Deduce that the additive group Z has exactly 2 automorphisms.Show that every Z-linear mapping θ : Zn → Zn for n > 0 is of theform (m)θ = cm for all m ∈ Zn and some c ∈ Zn. Show also thatθ : Zn

∼= Zn if and only if gcd{c,n} = 1. How many automorphismsdoes the additive group Z9 have? Are all of these automorphisms pow-ers of θ2 : Z9 → Z9 defined by (m)θ2 = 2m for all m ∈ Z9?

(b) Let G and G′ be Z-modules and let ϕ : G → G′ be a Z-linear map-ping. For g0 in G let 〈n〉 and 〈n′〉 be the order ideals of g0 and (g0)ϕ

respectively. Show that n′|n.Suppose now that G = 〈g0〉 and let g′

0 in G′ have order ideal 〈d〉 whered|n. Show that there is a unique Z-linear mapping θ : G → G′ with(g0)θ = g′

0.How many Z-linear mappings θ : Z → Z12 are there? How many ofthese mappings are surjective?Show that r ∈ Zn has order ideal 〈n/gcd{r, n}〉 for n > 0. Show thatthe number of Z-linear mappings θ : Zm → Zn is gcd{m,n}. Specifyexplicitly the five Z-linear mappings Z10 → Z15 and the five Z-linearmappings Z15 → Z10.

(c) Let θ : G → G′ be a homomorphism Definition 2.3 where G and G′are abelian groups with zero elements 0 and 0′ respectively. Show that(0)θ = 0′ and (−g)θ = −(g)θ for all g ∈ G.

(d) Let G, G′, G′′ be Z-modules and let θ : G → G′, θ ′ : G′ → G′′be Z-linear mappings. Show that θθ ′ : G → G′′ is Z-linear where(m)θθ ′ = ((m)θ)θ ′ ∀m ∈ Z. For bijective θ show that θ−1 : G′ → G

is Z-linear. Deduce that the automorphisms of G are the elements ofa multiplicative group AutG, the group operation being compositionof mappings. Is AutZ9 cyclic? Is AutZ8 cyclic?

5. (a) Let G be a Z-module. Show that H = {2g : g ∈ G} andK = {g ∈ G : 2g = 0} are submodules of G. Find examples of G

with

(i) H ⊂ K; (ii) H = K; (iii) K ⊂ H ;(iv) H � K, K � H.

Hint: Consider G = Zn.


(b) Show that the submodule K in (a) above has the structure of a vectorspace over Z2. Find dimK where G = Zn.

6. (a) Let q1 and q2 be rational numbers. Show that the set 〈q1, q2〉 of allrationals of the form m1q1 + m2q2 (m1,m2 ∈ Z) is a submodule ofthe Z-module (Q,+) of all rational numbers.

(b) Using the above notation show 1/6 ∈ 〈3/2,2/3〉. Show that 〈1/6〉 =〈3/2,2/3〉.

(c) Write qi = ai/bi �= 0 where ai, bi ∈ Z, gcd{ai, bi} = 1, bi > 0for i = 1,2. Let a′

i = ai/gcd{a1, a2}, b′i = bi/gcd{b1, b2}. Show

that gcd{a′1b

′2, a

′2b

′1} = 1 and deduce q0 = gcd{a1, a2}/ lcm{b1, b2} ∈

〈q1, q2〉. Conclude that q0 generates 〈q1, q2〉.Hint: Remember lcm{b1, b2} = b1b2/gcd{b1, b2} and gcd{a, b} =gcd{a, c} = 1 ⇒ gcd{a, bc} = 1 for a, b, c ∈ Z.Is 〈q1, q2, q3〉 = {m1q1 + m2q2 + m3q3 : m1,m2,m3 ∈ Z}, whereq1, q2, q3 ∈ Q, necessarily a cyclic submodule of Q?

(d) Find a generator of 〈6/35,75/56〉 and a generator of 〈6/35,75/56,

8/15〉. Do either of these submodules contain Z?7. (a) Let H1 and H2 be subgroups of the additive abelian group G. Show

(i) the intersection H1 ∩ H2 is a subgroup of G,(ii) the sum H1 + H2 = {h1 + h2 : h1 ∈ H1, h2 ∈ H2} is a subgroupof G,(iii) the union H1 ∪ H2 is a subgroup of G ⇔ either H1 ⊆ H2 orH2 ⊆ H1.Hint: Show ⇒ by contradiction.

(b) Find generators of H1 ∩ H2 and H1 + H2 in the case of G = Z,H1 = 〈30〉, H2 = 〈100〉. Generalise your answer to cover the caseG = Z, H1 = 〈m1〉, H2 = 〈m2〉.

8. (a) Let g1, g2, . . . , gn (n ≥ 3) be elements of an additive group G. Theelements si of G are defined inductively by s1 = g1, s2 = s1 + g2,s3 = s2 + g3, . . . , sn = sn−1 + gn (1 ≤ i ≤ n). Use the associativelaw of addition and induction to show that all ways of summingg1, g2, . . . , gn in order give sn.Hint: Show first that each summation of g1, g2, . . . , gn decomposes assi + s′

n−i where s′n−i is a summation of gi+1, gi+2, . . . , gn for some i

with 1 ≤ i < n.Deduce the generalised associative law of addition: brackets may beomitted in any sum of n elements of G.

(b) Let g1, g2, . . . , gn (n ≥ 2) be elements of an additive abelian group G.Use the associative and commutative laws of addition, and induction,to show that all ways of summing g1, g2, . . . , gn in any order give sn

as defined in (a) above.

2.2 Quotient Groups and the Direct Sum Construction 61

(c) Let g, g1, g2 be elements of an additive abelian group G. Use (b)above to verify laws 5 and 6 of a Z-module namely: m(g1 + g2) =mg1 + mg2, (m1 + m2)g = m1g + m2g, (m1m2)g = m1(m2g) for allintegers m, m1, m2.Hint: Suppose first that m,m1,m2 are positive.

2.2 Quotient Groups and the Direct Sum Construction

Two ways of obtaining new abelian groups from old are discussed: the quotient groupconstruction and the direct sum construction. Particular cases of both constructions arealready known to the reader, the most significant being Zn which is the quotient of Zby its subgroup 〈n〉, that is, Zn = Z/〈n〉 for all positive integers n. Keep this familiarexample in mind as the theory unfolds.

Let G be an additive abelian group having a subgroup K . We construct the quo-tient group G/K which can be thought of informally as G modulo K . Formally theelements of G/K are subsets of G of the type

K + g0 = {k + g0 : k ∈ K} for g0 ∈ G.

Subsets of this kind are called cosets of K in G. The elements of K + g0 are sumsk + g0 where k runs through K and g0 is a given element of G. We write g0 = K + g0

to emphasise the close analogy between cosets and congruence classes of integers.Notice that g ∈ g0 means g = k + g0 for some k ∈ K , that is, g − g0 ∈ K showingthat g differs from g0 by an element of K . The condition g − g0 ∈ K is expressed bywriting g ≡ g0 (mod K) and saying that g is congruent to g0 modulo K . Our nextlemma deals with the set-theoretic properties of cosets. Notice that each coset has asmany aliases (alternative names) as it has elements!

Lemma 2.9

Let K be a subgroup of the additive abelian group G. Using the above notation g = g0

if and only if g ≡ g0 (mod K). Congruence modulo K is an equivalence relation on G.Each element of G belongs to exactly one coset of K in G.

Proof

The subgroup K contains the zero element 0 of G. Hence g = 0 +g ∈ g, showing thateach g in G belongs to the coset g. Suppose g = g0 which means that the sets g andg0 consist of exactly the same elements. As g ∈ g we see g ∈ g0 and so, as above, weconclude that g ≡ g0 (mod K).


Now suppose g ≡ g0 (mod K). Then g − g0 = k0 ∈ K . We first show g ⊆ g0.Consider x ∈ g. Then x = k + g for k ∈ K . Hence x = k + k0 + g0 which belongsto g0 since k + k0 ∈ K as K is closed under addition. So we have shown g ⊆ g0.Now K is closed under negation and so g0 − g = −k0 ∈ K showing g0 ≡ g (mod K).Interchanging the roles of g and g0 in the argument, we see that g0 ⊆ g. The sets g

and g0 are such that each is a subset of the other, that is, g = g0.We now use the ‘if and only if’ condition g = g0 ⇔ g ≡ g0 (mod K) to prove that

congruence modulo K satisfies the three laws of an equivalence relation. As g = g

we see that g ≡ g (mod K) for all g ∈ G, that is, congruence modulo K is reflexive.Suppose g1 ≡ g2 (mod K) for some g1, g2 ∈ G; then g1 = g2 and so g2 = g1 whichmeans g2 ≡ g1 (mod K), that is, congruence modulo K is symmetric. Suppose g1 ≡g2 (mod K) and g2 ≡ g3 (mod K) where g1, g2, g3 ∈ G; then g1 = g2 and g2 = g3 andso g1 = g3 (it really is that easy!) which gives g1 ≡ g3 (mod K), that is, congruencemodulo K is transitive. Congruence modulo K satisfies the reflexive, symmetric andtransitive laws and so is an equivalence relation on G.

The proof is finished by showing that no element g in G can belong to two differentcosets of K in G. We know g ∈ g as 0 ∈ K . Suppose g ∈ g0 for some g0 ∈ G. Thepreliminary discussion shows g ≡ g0 (mod K) and hence g = g0. So g belongs to g

and to no other coset of K in G. �

The cosets of K in G partition the set G, that is, these cosets are non-empty,non-overlapping subsets having G as their union. In other words, each element of G

belongs to a unique coset of K in G.For example, let G = Z and K = 〈3〉. Then m ≡ m′ (mod K) means m ≡

m′ (mod 3) for m, m′ ∈ Z. There are three cosets of K in G, namely 0 = K + 0 =K = {. . . ,−9,−6,−3,0,3,6,9, . . .}, 1 = K + 1 = {. . . ,−8,−5,−2,1,4,7,10, . . .},2 = K + 2 = {. . . ,−7,−4,−1,2,5,8,11, . . .} that is, the congruence classes of inte-gers modulo 3 and these cosets partition Z. So Z3 = {0,1,2} = G/K in this case.

Now suppose G = Z12 and K = 〈4〉. There are four cosets of K in G and theseare K + 0 = K = {0,4,8}, K + 1 = {1,5,9}, K + 2 = {2,6,10}, K + 3 = {3,7,11}.

These cosets partition Z12 and we will see shortly that they are the elements of acyclic group of order 4.

The number |G/K| of cosets of K in G is called the index of the subgroup K inits parent group G. The index is either a positive integer or infinite. Let G be a finiteabelian group, that is, the number |G| of elements in G is a positive integer. In thiscase |G| is called the order of G. The index of the subgroup K in G is the positiveinteger |G/K|. Every coset K + g0 has exactly |K| elements. As these |G/K| cosetspartition G, we obtain the equation |G| = |G/K||K| and so

|G/K| = |G|/|K|


on counting the elements in G coset by coset. So |K| is a divisor of |G|, that is,

the order |K| of every subgroup K of a finite abelian group G is a divisorof the order |G| of G

which is known as Lagrange’s theorem for finite abelian groups.Each element g of G generates a cyclic subgroup 〈g〉. Suppose again that G is

finite. Writing K = 〈g〉, from Theorem 2.5 we see that g has finite order |K|, and so|K|g = 0. Hence |G|g = |G/K||K|g = |G/K| × 0 = 0. We have proved:

|G|g = 0 for all elements g of the finite abelian group G.

We call this useful fact the |G|-lemma. For instance every element g of an additiveabelian group of order 27 satisfies 27g = 0. For each prime p the multiplicative groupZ

∗p is abelian of order p − 1, and so using multiplicative notation for a moment we

obtain (r)p−1 = 1 for all r ∈ Z∗p , that is, rp−1 ≡ 1 (mod p) for all integers r and

primes p with gcd{r,p} = 1. On multiplying through by r we get

rp ≡ r (mod p) for all integers r and primes p

which is known as Fermat’s ‘little’ theorem.Returning to the general case of an additive abelian group G with subgroup K , let

η : G → G/K be the natural mapping defined by (g)η = g for all g ∈ G. So η mapseach element g to the coset g. Each coset of K in G is of the form g for some g ∈ G

and so η is surjective. Can addition of cosets be introduced in such a way that G/K isan abelian group and η is an additive mapping as in Definition 2.3? There is only onepossible way in which this can be done, because (g1 + g2)η = (g1)η + (g2)η, that is,

g1 + g2 = g1 + g2 (♣)

tells us that the sum g1 + g2 of cosets must be the coset containing g1 + g2. Thefollowing lemma assures us that this rule of coset addition is unambiguous: it does notdepend on the particular aliases used for g1 and g2, and it does the job of turning theset G/K into an abelian group.

Lemma 2.10

Let G be an additive abelian group with subgroup K . Let g1, g′1, g2, g′

2 be elements ofG such that g1 ≡ g′

1 (mod K), g2 ≡ g′2 (mod K). Then g1 + g2 ≡ g′

1 + g′2 (mod K).

The above rule (♣) of coset addition is unambiguous and G/K , with this addition, isan abelian group.


Proof

By hypothesis there are k1, k2 in K with g1 = k1 + g′1, g2 = k2 + g′

2. Adding theseequations and rearranging the terms, which is allowed as G is abelian, we obtaing1 + g2 = (k1 + g′

1) + (k2 + g′2) = (k1 + k2) + (g′

1 + g′2) showing that g1 + g2 ≡

g′1 + g′

2 (mod K) as k1 + k2 ∈ K . So it is legitimate to add congruences modulo K .

In terms of cosets, starting with g1 = g′1, g2 = g′

2, we have shown g1 + g2 = g′1 + g′

2.Therefore coset addition is indeed unambiguously defined by

g1 + g2 = g1 + g2 for all g1, g2 ∈ G

as the right-hand side is unchanged when the representatives of the cosets on the left-hand side are changed from gi to g′

i , i = 1,2.We now verify that coset addition satisfies the laws of an abelian group. The asso-

ciative law is satisfied as

(g1 + g2) + g3 = (g1 + g2) + g3 = (g1 + g2) + g3

= g1 + (g2 + g3) = g1 + (g2 + g3) = g1 + (g2 + g3)

for all g1, g2, g3 ∈ G. The coset 0 = {k + 0 : k ∈ K} = {k : k ∈ K} = K is the zero ele-ment of G/K since 0+g = 0 + g = g for all g ∈ G. The coset g has negative −g since−g +g = −g + g = 0 and so −g1 = (−g) for all g ∈ G. Finally, the commutative lawholds in G/K since g1 + g2 = g1 + g2 = g2 + g1 = g2 + g1 for all g1, g2 ∈ G. �

The group G/K is called the quotient (or factor) group of G by K andη : G → G/K is called the natural homomorphism.

The additive group R of real numbers contains the subgroup Z of integers. A typ-ical element of R/Z is the coset x = {. . . , x − 2, x − 1, x, x + 1, x + 2, . . .} con-sisting of all those numbers which differ from the real number x by an integer. Forinstance 1/3 = {. . . ,−5/3,−2/3,1/3,4/3,7/3, . . . }. Every x is uniquely express-ible x = �x� + r where �x� is an integer called the integer part of x, and r is areal number called the fractional part of x with 0 ≤ r < 1. For instance �π� = 3and π − �π� = 0.14159 . . . . In the group R/Z the integer part plays no role, but thefractional part is all-important as x = y if and only if x and y have the same frac-tional part. So every element of R/Z can be expressed uniquely as r where 0 ≤ r < 1.Suppose 0 ≤ r1, r2 < 1. Then coset addition in terms of fractional parts is givenby

r1 + r2 ={

r1 + r2 for r1 + r2 < 1

r1 + r2 − 1 for r1 + r2 ≥ 1.


For instance 1/2 + 2/3 = 1/6. The group R/Z is known as the reals mod one andwe’ll see in Section 2.3 that it’s isomorphic to the multiplicative group of complexnumbers of modulus 1.

The reader knows already that the additive group Zn is the particular case of theabove construction with G = Z, K = 〈n〉, that is, Zn = Z/〈n〉 is the standard exampleof a cyclic group of type Cn as defined in Definition 2.6 for n ≥ 0. Note that Z0 ∼= Z

as the elements of Z0 are singletons (sets with exactly one element) {m} for m ∈ Z

and {m} → m is an isomorphism. At the other end of the scale the singleton Z1 = {Z}is the standard example of a trivial abelian group. Both Z and its subgroups 〈n〉 areexamples of free Z-modules, that is, Z-modules having Z-bases. This concept will bediscussed in Section 2.3. For the moment let’s note that Z has Z-basis 1 (or −1) and〈n〉 has Z-basis n (or −n) for n > 0; the trivial Z-module 〈0〉 has Z-basis the emptyset ∅. The equation Zn = Z/〈n〉 expresses the additive group of Zn as a quotient of Z(which is free) by its subgroup 〈n〉 (which is also free). It turns out that all f.g. abeliangroups are best thought of as quotients of free Z-modules by free subgroups, as we’llsee in Section 3.1.

We now discuss the direct sum construction. You will be used to the formula(x1, y1) + (x2, y2) = (x1 + x2, y1 + y2) for the sum of vectors. Carrying out additionin this componentwise way is the distinguishing feature of a direct sum.

Let G1 and G2 be additive abelian groups. The elements of G1 ⊕ G2 are orderedpairs (g1, g2) where g1 ∈ G1, g2 ∈ G2. The rule of addition in G1 ⊕ G2 is

(g1, g2) + (g′1, g

′2) = (g1 + g′

1, g2 + g′2) for g1, g

′1 ∈ G1 and g2, g

′2 ∈ G2.

It is straightforward to show that G1 ⊕ G2 is itself an additive abelian group, andwe confidently leave this job to the reader (Exercises 2.2, Question 4(f)). Suffice itto say that (01,02) is the zero element of G1 ⊕ G2 where 0i is the zero of Gi fori = 1,2 and −(g1, g2) = (−g1,−g2) showing that negation, like addition, is carriedout component by component. The abelian group G1 ⊕G2 is called the external directsum of G1 and G2.

This construction, which is easier to grasp than the quotient group construction,produces ‘at a stroke’ a vast number of abelian groups. For instance

Z2 ⊕Z2, Z3 ⊕Z, (Z1 ⊕Z5) ⊕Z, (Z2 ⊕Z4) ⊕ (Z4 ⊕Z7)

these groups being built up using cyclic groups and the direct sum construction. Itturns out that all groups constructed in this way are abelian and finitely generated.Can every finitely generated abelian group be built up in this way? We will see in thenext chapter that the answer is: Yes! What is more, the Smith normal form will helpus decide which pairs of these groups are isomorphic.

We now look in detail at the group G = Z2 ⊕ Z2. Write 0 = (0,0), u = (1,0),


v = (0,1), w = (1,1). Then G = {0, u, v,w} has addition table

+ 0 u v w

0 0 u v w

u u 0 w v

v v w 0 u

w w v u 0

Notice that the sum of any two of u,v,w is the other one, v + w = u etc., and eachelement is equal to its negative, as v + v = 0 means v = −v for instance. This grouphas five subgroups namely 〈0〉, 〈u〉, 〈v〉, 〈w〉 and G itself. Now G is not cyclic andwe write G = 〈u,v〉 meaning that each element of G is of the form lu + mv for someintegers l and m. In fact G is the smallest non-cyclic group. Any group isomorphicto G is called a Klein 4-group after the German mathematician Felix Klein. Beingthe direct sum of two cyclic groups of order 2, G is said to be of isomorphism typeC2 ⊕ C2 (see Definition 2.13). You may have already met this group in the contextof vector spaces because Z2 ⊕ Z2 is the standard example of a 2-dimensional vectorspace over the field Z2. The elements 0, u, v, w of G are the vectors and the elements0, 1 of Z2 are the scalars. The ordered pair u,v is a basis of this vector space. Thesubgroups of G are precisely the subspaces and the automorphisms of G are preciselythe invertible linear mappings θ of this vector space. For example θ : G ∼= G such that(0)θ = 0, (u)θ = v, (v)θ = w, (w)θ = u is an automorphism of G.

Using the approach outlined in the introduction, we now give the reader a glimpseahead to Chapter 3. Just as the natural mapping η : Z → Z2 encapsulates the rela-tionship between Z and Z2, so the Z-linear mapping θ : Z ⊕ Z → G, defined by(l,m)θ = lu + mv for all l,m ∈ Z, tells us all there is to know about G = Z2 ⊕Z2 interms of the more tractable module Z⊕Z; in fact Z⊕Z is a free Z-module of rank 2because e1 = (1,0), e2 = (0,1) is a Z-basis having two ‘vectors’ (the term rank ratherthan dimension is used in this context). Note that (e1)θ = 1u + 0v = u and similarly(e2)θ = v, (e1 + e2)θ = w. So θ is surjective, that is, im θ = G. Which pairs (l,m) ofintegers belong to the kernel of θ? In other words, which pairs (l,m) of integers aremapped by θ to the zero element of G? The answer is: l and m are both even, becausethis is the condition for the equation lu+mv = 0 to be true. So ker θ = 〈2e1,2e2〉, thatis, ker θ consists of all integer linear combinations of 2e1 = (2,0) and 2e2 = (0,2). Infact 2e1,2e2 is a Z-basis of ker θ which is therefore a free subgroup of Z⊕Z. Notice2 = rank ker θ . The Z-bases of Z⊕Z and ker θ are related by

D =(

2 00 2

).

We will see in Section 3.1 that f.g. abelian groups are not usually as prettily presentedas this one; here the matrix D is already in Smith normal form. There are four cosets


of K = ker θ in Z⊕Z depending on the parity of the integers l and m, that is,

(Z⊕Z)/K = {K,K + e1,K + e2,K + e1 + e2}For instance, the elements of K +e1 are the pairs (l,m) of integers with l odd, m even.These cosets correspond, using θ , to the elements 0, u, v, w respectively of im θ = G,that is, θ : (Z⊕Z)/ker θ ∼= G where

(K)θ = (0)θ = 0, (K + e1)θ = (e1)θ = u, (K + e2)θ = (e2)θ = v,

(K + e1 + e2)θ = (e1 + e2)θ = w.

Kernels and images are defined at the start of Section 2.3. The isomorphism θ , whichis a particular case of Theorem 2.16, shows that the Klein 4-group is isomorphic to(Z⊕Z)/ker θ , a quotient of two free Z-modules. The point is: every f.g. abelian groupcan be analysed in this way as we’ll see in Theorem 3.4.

Frequently occurring examples of the direct sum construction are provided by theChinese remainder theorem. This theorem which we now discuss plays an importantrole in the decomposition of rings and abelian groups. Let rn denote the congruenceclass of the integer r in Zn for all positive integers n. Let m and n be given positiveintegers and consider the mapping

α : Zmn → Zm ⊕Zn defined by (rmn)α = (rm, rn) for all rmn ∈ Zmn.

For instance with m = 5, n = 7 and r = 24 we have (24)α = (4,3) since 24 ≡4 (mod 5) and 24 ≡ 3 (mod 7). Then α is unambiguously defined and respects ad-dition since

(smn + tmn)α = ((s + t)mn)α = ((s + t)m, (s + t)n) = (sm + tm, sn + tn)

= (sm, sn) + (tm, tn) = (smn)α + (tmn)α for all integers s, t.

The group Zm ⊕Zn becomes a commutative ring (the direct sum of the rings Zm andZn) provided multiplication is carried out, like addition, component by component,that is, (x, y)(x′, y′) = (xx′, yy′) for all x, x′ ∈ Zm and y, y′ ∈ Zn. Replacing each‘+’ in the above equations by the product symbol ‘·’ produces

(smn · tmn)α = (smn)α · (tmn)α for all integers s, t

showing that α respects multiplication. Also 1mn is the 1-element of Zmn and(1mn)α = (1m,1n) is the 1-element of Zm ⊕Zn. Therefore

α is a ring homomorphism

meaning that α is a mapping of rings which respects addition, multiplication and1-elements.


Theorem 2.11 (The Chinese remainder theorem)

Let m and n be positive integers with gcd{m,n} = 1. Then α : Zmn∼= Zm ⊕ Zn is a

ring isomorphism.

Proof

Using the above theory, α is a ring homomorphism and so it is enough to show thatα is bijective. As Zmn and Zm ⊕ Zn both contain exactly mn elements it is enoughto show that α is surjective. Consider a typical element (sm, tn) of Zm ⊕Zn. We mayassume 0 ≤ s < m and 0 ≤ t < n. Can an integer r be found which leaves remainders on division by m and remainder t on division by n? (Special cases of this problemwere solved in ancient China – hence the name of the theorem.) The answer is: Yes!Let r = atm+ bsn where a, b are integers with am + bn = 1. Then r ≡ bsn (mod m)

and bsn = s − sam ≡ s (mod m). So r ≡ s (mod m). Similarly r ≡ t (mod n) andso r leaves remainders s, t on division by m, n respectively. Therefore (rmn)α =(rm, rn) = (sm, tn) showing that α is indeed surjective. �

Let R be a ring with 1-element e. An element u of R is a unit (invertible element)of R if there is an element v of R with uv = e = vu. It is straightforward to verifythat the product uu′ of units of R is itself a unit of R, and together with this productthe set of units of R is a multiplicative group U(R). Note that U(F) = F ∗ for everyfield F , as every non-zero element of F is a unit of F . The groups U(Zn) are studiedin Section 3.3. We now use Theorem 2.11 to determine the order |U(Zn)| of U(Zn) interms of the prime factorisation of the positive integer n. The reader will know that r

is a unit of Zn if and only if gcd{r, n} = 1. It is convenient to assume (as we may) that1 ≤ r ≤ n. The reader may also have met the Euler φ-function defined by

φ(n) = |{r : 1 ≤ r ≤ n,gcd{r, n} = 1|that is, φ(n) is the number of integers r between 1 and n which are coprime tothe positive integer n, and so φ(n) = |U(Zn)|. One sees directly that φ(1) = 1 andφ(p) = p − 1 for all primes p. The closed interval [1,pl] contains pl integers andthe pl−1 multiples of p in this interval are exactly those which are not coprime to p

since gcd{r,pl} �= 1 ⇔ p|r ; hence φ(pl) = pl −pl−1. In particular φ(7) = 7 − 1 = 6,φ(8) = 8 − 4 = 4, φ(9) = 9 − 3 = 6.

Corollary 2.12

The Euler φ-function is multiplicative, that is, φ(mn) = φ(m)φ(n) where m, n arecoprime positive integers. Let n = p

l11 p

l22 · · ·plk

k where p1,p2, . . . , pk are different

primes. Then φ(n) = (pl11 − p

l1−11 )(p

l22 − p

l2−12 ) . . . (p

lkk − p

lk−1k ).


Proof

As multiplication in Zm ⊕ Zn is carried out componentwise, (sm, tn) is a unit ofthe ring Zm ⊕ Zn if and only if sm is a unit of the ring Zm and tn is a unit ofthe ring Zn. Therefore U(Zm ⊕ Zn) = U(Zm) × U(Zn) where × denotes the Carte-sian product (see Exercises 2.3, Question 4(d)). Comparing sizes of these sets gives|U(Zm ⊕ Zn)| = |U(Zm)||U(Zn)| = φ(m)φ(n). Suppose gcd{m,n} = 1. As isomor-phic rings have isomorphic groups of units, or specifically in our case, rmn is a unitof Zmn if and only if (rmn)α is a unit of Zm ⊕ Zn by Theorem 2.11, we deduceφ(mn) = |U(Zmn)| = |U(Zm ⊕Zn)|. So φ(mn) = φ(m)φ(n) where gcd{m,n} = 1.

We use induction on the number k of distinct prime divisors of n. Asφ(1) = 1 we take k > 0 and assume φ(p

l22 · · ·plk

k ) = (pl22 − p

l2−12 ) · · · (plk

k − plk−1k ).

As gcd{pl11 ,p

l22 m, . . . ,p

lkk } = 1, the multiplicative property of φ gives

φ(n) = φ(pl11 (p

l22 · · ·plk

k )) = φ(pl11 )φ(p

l22 · · ·plk

k )

= (pl11 − p

l1−11 )(p

l22 − p

l2−12 ) · · · (plk

k − plk−1k )

as in Corollary 2.12. By induction, the formula for φ(n) is as stated. �

For example φ(500) = φ(2253) = φ(22)φ(53) = (22 − 2)(53 − 52) = 200.Let n be a positive integer. Which elements r ∈ Zn satisfy 〈r〉 = Zn? In other

words, which elements of the additive abelian group Zn have order n? We may assume1 ≤ r ≤ n. As 1n has order n and r = r1n, by Lemma 2.7 we see that r has ordern/gcd{r, n}. So r has order n if and only if gcd{r, n} = 1. Hence

each finite cyclic group of order n has φ(n) generators.

For instance Z10 has φ(10) = (2 − 1)(5 − 1) = 4 generators and Z10 = 〈1〉 = 〈3〉 =〈7〉 = 〈9〉.

The direct sum construction can be extended to any finite number t of Z-modules.Let G1,G2, . . . ,Gt be Z-modules. Their external direct sum G1 ⊕ G2 ⊕ · · · ⊕ Gt isthe Z-module having all ordered t-tuples (g1, g2, . . . , gt ) where gi ∈ Gi (1 ≤ i ≤ t)

as its elements, addition and integer multiplication being carried out componentwise.So (g1, g2, . . . , gt )+ (g′

1, g′2, . . . , g

′t ) = (g1 + g′

1, g2 + g′2, . . . , gt + g′

t ) for gi, g′i ∈ Gi

and m(g1, g2, . . . , gt ) = (mg1,mg2, . . . ,mgt ) for m ∈ Z, gi ∈ Gi where 1 ≤ i ≤ t .We now generalise Definition 2.6.

Definition 2.13

Suppose the Z-module Gi is cyclic of isomorphism type Cdifor 1 ≤ i ≤ t . Any

Z-module G isomorphic to G1 ⊕ G2 ⊕ · · · ⊕ Gt is said to be of isomorphism typeCd1 ⊕ Cd2 ⊕ · · · ⊕ Cdt .


For instance the additive group G of the ring Z2 ⊕ Z3 has isomorphism typeC2 ⊕ C3. By Theorem 2.11 we know that G is cyclic of isomorphism type C6 and sowe write C2 ⊕C3 = C6 since the isomorphism class of Z-modules of type C2 ⊕C3 co-incides with the isomorphism class of Z-modules of type C6. Also C2 ⊕C3 = C3 ⊕C2

as G1 ⊕ G2 ∼= G2 ⊕ G1 for all Z-modules G1 and G2. More generally for positiveintegers m and n we have

Cm ⊕ Cn = Cn ⊕ Cm and Cm ⊕ Cn = Cmn in case gcd{m,n} = 1

by Theorem 2.11. We will use these rules in Chapter 3 to manipulate the isomorphismtype symbols and show Theorem 3.4 that every finitely generated Z-module G isof isomorphism type Cd1 ⊕ Cd2 ⊕ · · · ⊕ Cdt where the non-negative integers di aresuccessive divisors, that is, di |di+1 for 1 ≤ i < t .

Next we generalise the Chinese remainder theorem. Using Theorem 2.11 and in-duction on k we obtain the ring isomorphism

α : Zn∼= Zq1 ⊕Zq2 ⊕ · · · ⊕Zqk

given by (rn)α = (rq1 , rq2 , . . . , rqk)

for all r ∈ Z, where n = q1q2 · · ·qk and q1, q2, . . . , qk are powers of distinct primesp1,p2, . . . , pk . For example consider α : Z60 ∼= Z4 ⊕ Z3 ⊕ Z5. As 11 leaves remain-ders 3, 2, 1 on division by 4, 3, 5 respectively we see (suppressing the subscripts)that (11)α = (3,2,1). Doubling gives (22)α = (6,4,2) = (2,1,2) and negating gives(49)α = (−11)α = (−3,−2,−1) = (1,1,4). Squaring gives ((22)2)α = (2,1,2)2 =(2

2,1

2,2

2) = (0,1,4) = (4)α and so (22)2 = 4 in Z60. Similarly ((49)2α = (1,1,1) =

(1)α showing that 49 is a self-inverse element of Z60 as (49)2 = 1, that is, (49)−1 = 49in Z60. It is an amazing fact that the 60 triples in Z4 ⊕ Z3 ⊕ Z5 add and multiply inexactly the same way as the 60 elements of Z60.

We now look at the direct sum construction from the opposite point of view. Un-der which circumstances is the Z-module G isomorphic to a direct sum G1 ⊕ G2

of Z-modules? We will see shortly that the submodules of G hold the answer tothis question. Let 0i denote the zero element of the Z-module Gi for i = 1,2. ThenG1 ⊕ G2 has submodules G′

1 = {(g1,02) : g1 ∈ G1} and G′2 = {(01, g2) : g2 ∈ G2}

which are isomorphic to G1 and G2 respectively. Also each element (g1, g2) ofG1 ⊕ G2 is uniquely expressible as a sum g′

1 + g′2, where g′

1 ∈ G′1, g′

2 ∈ G′2, since

(g1, g2) = g′1 + g′

2 if and only if g′1 = (g1,02), g′

2 = (01, g2). Consider an isomor-phism α : G ∼= G1 ⊕ G2 and let Hi = {hi ∈ G : (hi)α ∈ G′

i} for i = 1,2. So H1 andH2 are the submodules of G which correspond under α to G′

1 and G′2. We write

G = H1 ⊕ H2 and call G the internal direct sum of its submodules H1 and H2

as each element g of G is uniquely expressible as g = h1 + h2 where h1 ∈ H1 andh2 ∈ H2, since α is an isomorphism.


For example α : Z6 → Z2 ⊕Z3 as above leads to the submodules H1 = {0,3} andH2 = {0,2,4}. The six elements of Z6 are

0 = 0+0, 1 = 3+4, 2 = 0+2, 3 = 3+0, 4 = 0+4, 5 = 3+2

and they coincide as shown with the six elements h1 + h2 where h1 ∈ H1, h2 ∈ H2.So Z6 = H1 ⊕ H2 is the internal direct sum of its submodules H1 and H2. Of courseit is equally true that Z6 = H2 ⊕ H1. More generally the order in which the Hi (thesummands) appear in any internal direct sum is not important. The Klein 4-groupG = {0, u, v,w} can be decomposed (expressed as an internal direct sum) as

G = 〈u〉 ⊕ 〈v〉 = 〈v〉 ⊕ 〈w〉 = 〈w〉 ⊕ 〈u〉which tells us (three times!) that G, being the internal direct sum of two cyclic sub-groups of order 2, has isomorphism type C2 ⊕ C2.

The argument of the paragraph above can be extended. Let G be a Z-module withsubmodules H1,H2, . . . ,Ht such that for each element g in G there are unique ele-ments hi in Hi (1 ≤ i ≤ t) with g = h1 + h2 + · · · + ht . It is straightforward to checkthat α : G ∼= H1 ⊕ H2 ⊕ · · · ⊕ Ht , defined by (g)α = (h1, h2, . . . , ht ), is an isomor-phism; so G is isomorphic to the external direct sum of the Z-modules H1,H2, . . . ,Ht .Generalising the above paragraph it is usual to write G = H1 ⊕H2 ⊕· · ·⊕Ht and callG the internal direct sum of its submodules H1,H2, . . . ,Ht .

Confused? We’ve shown that the internal direct sum of the Hi , when it exists, isisomorphic to the external direct sum of the Hi . Nevertheless we’ll usually tell thereader, when it occurs in the theory ahead, which version of direct sum we have inmind.

As we have already seen Z6 = 〈3〉⊕〈4〉 as Z6 is the internal direct sum of H1 = 〈3〉and H2 = 〈4〉; note that (3)α = (1,0) and (4)α = (0,1). In the same way let us lookat α : Z60 ∼= Z4 ⊕ Z3 ⊕ Z5. How can we quickly find r in Z60 with (r)α = (1,0,0)

and 1 ≤ r ≤ 60? As r is divisible by 3 and 5 there are only four possibilities: 15, 30,45 and 60. So r = 45 as r ≡ 1 (mod 4). The reader can check that (40)α = (0,1,0),(36)α = (0,0,1). So Z60 = 〈45〉 ⊕ 〈40〉 ⊕ 〈36〉 shows how Z60 decomposes as aninternal direct sum.

Let H1,H2, . . . ,Ht be submodules of the Z-module G. It is straightforward toverify that their sum

H1 + H2 + · · · + Ht = {h1 + h2 + · · · + ht : hi ∈ Hi for all 1 ≤ i ≤ t}is a submodule of G. For example take G = Z60, H1 = 〈6〉, H2 = 〈10〉, H3 = 〈15〉. As1 = 6 + 10 − 15 and so r = 6r + 10r + (−15r) ∈ H1 + H2 + H3 for r ∈ Z60, we seeG = H1 +H2 +H3. However 1 = 6 − 2 × 10 + 15 showing that 1 can be expressed inat least two ways as a sum h1 +h2 +h3 with hi ∈ Hi . Conclusion: G is not the internaldirect sum of H1, H2 and H3 as there’s no such thing! The uniqueness condition of


the internal direct sum is violated. The next lemma tells us the best way of checkingwhether or not a sum of submodules is direct.

Definition 2.14

The submodules Hi (1 ≤ i ≤ t) of the Z-module G are called independent if theequation h1 + h2 + · · · + ht = 0, where hi ∈ Hi for all i with 1 ≤ i ≤ t , holds only inthe case h1 = h2 = · · · = ht = 0.

The reader should note the similarity between Definition 2.14 and linear indepen-dence of vectors. We show next that the internal direct sum of independent submodulesalways exists.

Lemma 2.15

Let H1,H2, . . . ,Ht be independent submodules of the Z-module G such that G =H1 + H2 + · · · + Ht . Then G = H1 ⊕ H2 ⊕ · · · ⊕ Ht .

Proof

Consider g ∈ G. As G = H1 + H2 + · · · + Ht there are hi ∈ Hi (1 ≤ i ≤ t) withg = h1 +h2 +· · ·+ht . Suppose g = h′

1 +h′2 +· · ·+h′

t where h′i ∈ Hi (1 ≤ i ≤ t). Sub-

tracting produces 0 = g−g = (h1 −h′1)+ (h2 −h′

2)+· · ·+ (ht −h′t ). As hi −h′

i ∈ Hi

we deduce hi − h′i = 0 (1 ≤ i ≤ t) using the independence of H1,H2, . . . ,Ht . Hence

hi = h′i for 1 ≤ i ≤ t showing that g is uniquely expressible as a sum of elements, one

from each Hi . Therefore G = H1 ⊕ H2 ⊕ · · · ⊕ Ht . �

Remember that the external direct sum G1 ⊕ G2 makes sense for all Z-modulesG1 and G2. But the internal direct sum of submodules exists only in the special caseof independent submodules detailed above. Nevertheless we shall see in Chapter 3that this special case frequently occurs.

EXERCISES 2.2

1. (a) Let G = Z8 and K = {0,4}. List the 4 cosets of K in G. Show thatG/K is cyclic and state its isomorphism type.

(b) Let G = Z12 and K = {0,3,6,9} = 〈3〉. List the 3 cosets of K in G.Show that K +1 generates G/K . State the isomorphism type of G/K .


(c) Let G = Z24 and K = 〈18〉. Show |K| = 4. What is the order of G/K?Show that G/K is cyclic and state its isomorphism type.

(d) Write G = Zn,K = 〈m〉 where m ∈ Zn. Use Lemma 2.7 to determine|K| and |G/K|. Show that G/K is cyclic and state its isomorphismtype.

2. (a) Let d be a positive divisor of the positive integer n. Use Lemma 2.2to show that Zn has a unique subgroup K of index d .

(b) Let d be a positive integer. Use Theorem 1.15 to show that Z has aunique subgroup K of index d . Does every subgroup of Z have finiteindex?

(c) Let G be a cyclic group with subgroup K . Show that G/K is cyclic.Hint: Use a generator of G to find a generator of G/K .

3. (a) Let Q denote the additive group of rational numbers. In the groupQ/Z of rationals mod one find the orders of Z + 1/3 and Z + 5/8.Show that every element of Q/Z has finite order.

(b) Let K be a subgroup of Q/Z and suppose Z+m/n, Z+m′/n′ belongto K where gcd{m,n} = 1, gcd{m′, n′} = 1. Use integers a, b witham + bn = 1 to show that Z + 1/n ∈ K . Show also Z + d/nn′ ∈ K

where d = gcd{n,n′}. Suppose K is finite. Show that K is cyclic.Hint: Consider Z+ 1/n ∈ K with n maximum.

(c) Let n be a positive integer. Show that Q/Z has a unique subgroup oforder n.Hint: Use (b) above.

(d) Let K = {Z + l/2s : l, s ∈ Z, s ≥ 0}. Show that K is a subgroup ofQ/Z having infinite order. List the finite subgroups of K . Show thatK has a unique infinite subgroup. (The group K is denoted by Z(2∞).)

4. (a) Verify that Z3 ⊕Z4 has generator g = (13,14) by listing the elementsg,2g,3g,4g, . . . in the form (s, t) where 0 ≤ s < 3, 0 ≤ t < 4. Stateits isomorphism type.

(b) Find the orders of the 8 non-zero elements of G = Z3 ⊕ Z3. Specifygenerators of the 4 subgroups of order 3. Express G in six ways asthe internal direct sum of subgroups of order 3. State the isomorphismtype of G.Hint: G is a vector space over Z3 and the subgroups are subspaces.

(c) Let the elements gi of the Z-module Gi have finite order ni (i = 1,2).Show that the element (g1, g2) of the external direct sum G1 ⊕G2 hasorder l = lcm{n1, n2} = n1n2/gcd{n1, n2}.Hint: Start by showing that (g1, g2) has finite order n say, where n|l.Then show ni |n.

(d) Let m and n be coprime positive integers. Use Lemma 2.7 and part(c) above to show that (s, t) in Zm ⊕ Zn has order mn if and only if


gcd{s,m} = 1, gcd{t, n} = 1. How many generators does the cyclicgroup Z7 ⊕Z8 have?

(e) Let g and h be elements of an additive abelian group having orders m

and n where gcd{m,n} = 1. Show that g + h has order mn.The additive abelian group G has a cyclic subgroup K of order m suchthat G/K is cyclic of order n where gcd{m,n} = 1. Show that G iscyclic of order mn.Hint: Let K + h0 generate G/K . Deduce from Exercises 2.1, Ques-tion 4(b) that n is a divisor of the order s of h0. Now use g and h

where 〈g〉 = K and h = (s/n)h0.(f) Let G1 and G2 be additive abelian groups. Show that their external

direct sum G1 ⊕ G2 is an additive abelian group. Show that G1 ⊕ G2

and G2 ⊕ G1 are isomorphic.5. (a) Find r in Z143 such that r leaves remainders 7 and 6 on division by

11 and 13 respectively.(b) Find the 4 elements x ∈ Z143 satisfying x2 = x.

Hint: First solve x2 = x for x ∈ Z11 and secondly for x ∈ Z13. Thenuse the Chinese remainder theorem.

(c) Find the 4 elements x ∈ Z143 satisfying x2 = 1, and the 9 elementsx ∈ Z143 satisfying x3 = x.Hint: Use the method of (b) above.

(d) How many Z-linear mappings θ : Z3 ⊕ Z5 → Z15 are there? Howmany of these mappings are (i) group isomorphisms, (ii) ring isomor-phisms?Hint: Consider r ∈ Z15 where (13,15)θ = r .

6. (a) Let m1,m2, . . . ,mt be integers. Show

Z = 〈m1〉 + 〈m2〉 + · · · + 〈mt 〉 ⇔ gcd{m1,m2, . . . ,mt } = 1.

Is Z = 〈15〉 + 〈36〉 + 〈243〉? Is Z = 〈15〉 + 〈36〉 + 〈80〉?(b) Suppose Z = H1 ⊕ H2 (internal direct sum of subgroups). Use Theo-

rem 1.15 to show that Z is indecomposable, that is, either H1 or H2 istrivial.

(c) Let H1 and H2 be submodules of the Z-module G such thatH1 ∩H2 = {0}. Show that H1,H2 are independent. More generally, letH1,H2, . . . ,Ht−1,Ht be submodules of G such that H1,H2, . . . ,Ht−1

are independent and (H1 + H2 + · · · + Ht−1) ∩ Ht = {0}. Showthat H1,H2, . . . ,Ht−1,Ht are independent. What is the order ofH1 ⊕ H2 ⊕ · · · ⊕ Ht given that each Hi is finite?

(d) Write G = Z3 ⊕Z9 (external direct sum of abelian groups). Use Ques-tion 4(d) above to show that G has 18 elements of order 9 and 8 ele-ments of order 3. Deduce that G has 3 cyclic subgroups of order 9 and

2.3 The First Isomorphism Theorem and Free Modules 75

4 cyclic subgroups of order 3. (Remember that a cyclic group of ordern has φ(n) generators.) Specify generators of these 7 cyclic subgroupsof G. Find the number of pairs of cyclic subgroups H1, H2 of G with|H1| = 3, |H2| = 9 such that G = H1 ⊕ H2.Hint: Choose H2 first and then H1 with H1 ∩ H2 = {0}.

(e) Let H1,H2, . . . ,Ht be independent submodules of a Z-module G andlet Ki be a submodule of Hi for 1 ≤ i ≤ t . Show that K1,K2, . . . ,Kt

are independent. Write

H = H1 ⊕ H2 ⊕ · · · ⊕ Ht and K = K1 ⊕ K2 ⊕ · · · ⊕ Kt

(internal direct sums). Show

H/K ∼= (H1/K1) ⊕ (H2/K2) ⊕ · · · ⊕ (Ht/Kt )

(external direct sum).Hint: Consider α defined by

(K + h)α = (K1 + h1,K2 + h2, . . . ,Kt + ht )

where h = h1 + h2 + · · · + ht , hi ∈ Hi for 1 ≤ i ≤ t . Show first that α

is unambiguously defined.

2.3 The First Isomorphism Theorem and Free Modules

In this section we introduce the last two topics required for our onslaught on f.g.abelian groups. First we explain how each homomorphism of abelian groups givesrise to an isomorphism; this is the first isomorphism theorem and it plays a vital rolein expressing every f.g. abelian group as a quotient group Z

t /K , both Zt (the external

direct sum of t copies of Z) and its subgroup K being free Z-modules. Secondly wediscuss bases of free modules. Some of the theorems are analogous to those familiarto the reader in the context of finite-dimensional vector spaces – it’s nice to knowthat two bases of the same free module are guaranteed to have the same number ofelements (this number is called the rank of the free module and is analogous to di-mension of a vector space). Also the rows of invertible t × t matrices over Z are, asone might expect, precisely the Z-bases of Zt . So far so good, but the analogy has itsshortcomings. For example only certain Z-independent subsets of Zt can be extendedto Z-bases of Zt (see Exercises 1.3, Question 5(c)). Dually, there are subsets of Zt

which generate Zt but which don’t contain a Z-basis of Zt ; in fact {2,3} is such a

subset of Z = Z1 as 〈2,3〉 = Z, 〈2〉 �= Z, 〈3〉 �= Z. The message is: take nothing for

granted!


Let G and G′ be Z-modules and let θ : G → G′ be a Z-linear mapping. As we’veseen in previous discussions, there are two important submodules associated with θ .The first is the kernel of θ and consists of those elements of G which θ maps to thezero element 0′ of G′. Therefore

ker θ = {g ∈ G : (g)θ = 0′}It is routine to show that ker θ is a submodule of G.

The second is the image of θ and consists of those elements of G′ which areimages by θ of elements in G. Therefore

im θ = {(g)θ : g ∈ G}Again it is routine to show that im θ is a submodule of G′ (see Exercises 2.3, Ques-tion 1(a)).

Next we show how θ gives rise to an isomorphism θ .

Theorem 2.16 (The first isomorphism theorem for Z-modules)

Let G and G′ be Z-modules and let θ : G → G′ be a Z-linear mapping. WriteK = ker θ . Then θ , defined by (K + g)θ = (g)θ for all g ∈ G, is an isomorphism

θ : G/K ∼= im θ.

Proof

All the elements in the coset K + g are mapped by θ to (g)θ because (k + g)θ =(k)θ +(g)θ = 0′ +(g)θ = (g)θ for all k ∈ K . So the above definition of θ makes senseand produces the mapping θ : G/K → im θ . Suppose (g1)θ = (g2)θ for g1, g2 ∈ G.Then (g1 − g2)θ = (g1)θ − (g2)θ = 0′ showing that g1 − g2 = k ∈ K , that is, g1 =k + g2 and so g1 and g2 belong to the same coset of ker θ in G. Therefore θ has adifferent effect on the elements of different cosets, in other words, θ is injective. As θ

is additive, so also is θ because

((K + g1) + (K + g2))θ = (K + (g1 + g2))θ = (g1 + g2)θ = (g1)θ + (g2)θ

= (K + g1)θ + (K + g2)θ for all g1, g2 ∈ G.

Finally im θ = im θ and so θ is surjective. Therefore θ is an isomorphism being bijec-tive and additive. �

The isomorphism θ is said to be induced by the homomorphism θ . So every homo-morphism θ : G → G′ induces (gives rise to) an isomorphism θ as in Theorem 2.16between the quotient group G/ker θ and the subgroup im θ of G′.


We’ve met particular cases of Theorem 2.16 already in our discussion of cyclicgroups. Let’s briefly recapitulate Theorem 2.5. Suppose that G is a cyclic group withgenerator g and let θ : Z → G be the Z-linear mapping defined by (m)θ = mg for allm ∈ Z. Then G = 〈g〉 = im θ and K = ker θ = 〈n〉 is the order ideal of g where thenon-negative integer n is unique. Applying Theorem 2.16 we obtain the isomorphismθ : Z/〈n〉 ∼= G where (m)θ = (K + m)θ = (m)θ = mg for all m ∈ Z/〈n〉. FinallyCn is the isomorphism type of G. So the isomorphism types Cn of cyclic groups G

correspond bijectively to the non-negative integers n. From the classification point ofview this is all there is to know about cyclic groups!

Applying Theorem 2.16 to the natural homomorphism η : Z → Zn produces theisomorphism η : Z/〈n〉 ∼= Zn given by (m)η = m for all m ∈ Z as kerη = 〈n〉. So infact Z/〈n〉 = Zn and η is the identity mapping. More generally let K be a submoduleof the Z-module G. Remember that the natural homomorphism η : G → G/K is de-fined by (g)η = g = K + g for all g ∈ G. Also remember that the 0-element of G/K

is the coset K + 0 = K . Therefore

kerη = {g ∈ G : (g)η = K} = {g ∈ G : K + g = K} = {g ∈ G : g ∈ K} = K.

We’ve shown that the natural homomorphism η : G → G/K has K as its kernel.A typical element of G/K is K +g = (g)η and so η is surjective, that is, imη = G/K .We’re ready to apply Theorem 2.16 to η and the outcome is something of an anti-climax because η : G/K ∼= G/K is nothing more than the identity mapping as(g)η = g for all g ∈ G.

The mapping θ : Z⊕Z → Z, defined by (l,m)θ = l−m for all l,m ∈ Z, is additiveand surjective. In this case im θ = Z and ker θ = {(l,m) : (l,m)θ = l − m = 0} ={(l, l) : l ∈ Z} = 〈(1,1)〉. From Theorem 2.16 we conclude that (Z⊕Z)/〈(1,1)〉 is aninfinite cyclic group as θ : (Z ⊕ Z)/〈(1,1)〉 ∼= Z. Each coset of 〈(1,1)〉 in Z ⊕ Z canbe expressed as (l,0) for a unique integer l and ((l,0))θ = l − 0 = l.

Isomorphisms occur between abelian groups in additive notation and abeliangroups in multiplicative notation. Let C

∗ denote the multiplicative group of non-zero complex numbers and let θ : R → C

∗ be the mapping defined by (x)θ =cos 2πx + i sin 2πx for all x ∈ R. Therefore (x)θ is the complex number of modulus1 and argument 2πx. The reader will certainly know that multiplication of complexnumbers is carried out by multiplying moduli and adding arguments and so

(x + x′)θ = (x)θ · (x′)θ for all x, x′ ∈R.

Therefore θ is a homomorphism from the additive group R to the multiplicativegroup C

∗. In this context ker θ consists of the real numbers x which θ maps to theidentity element 1 of C∗. Now (x)θ = 1 if and only if x is a whole number, that is,ker θ = Z. From Theorem 2.16 we deduce that θ : R/Z ∼= im θ , and so R/Z (the realsmodulo 1) is isomorphic to the group im θ of complex numbers of modulus 1.


Let G be an abelian group. Any group isomorphic to a quotient group G/K ,where K is a subgroup of G, is called a homomorphic image of G. The reason forthis terminology is as follows. Let θ : G → G′ be a homomorphism from G to anabelian group G′. Then (G)θ = im θ ∼= G/K where K = ker θ by Theorem 2.16.So every homomorphic image (G)θ of G is isomorphic to a quotient group G/K .On the other hand, every quotient group G/K is a homomorphic image of G sinceG/K = (G)η = imη where η : G → G/K is the natural homomorphism. The preced-ing paragraph shows that the multiplicative group of complex numbers of modulus 1is a homomorphic image of the additive group of real numbers.

We next generalise the discussion following Lemma 2.2 by showing that everyZ-linear mapping gives rise to a bijective correspondence between two sets of sub-modules.

Theorem 2.17

Let G and G′ be Z-modules and let θ : G → G′ be a Z-linear mapping. Let L be theset of submodules H of G with ker θ ⊆ H . Let L′ be the set of submodules H ′ of G′with H ′ ⊆ im θ . Then (H)θ = {(h)θ : h ∈ H } is in L

′ for all H in L. The mappingH → (H)θ is a bijection from L to L

′ and satisfies

H1 ⊆ H2 ⇔ (H1)θ ⊆ (H2)θ for all H1,H2 ∈ L.

Proof

For each submodule H of G it is routine to verify that (H)θ is a submodule ofim θ , and so (H)θ belongs to L

′ for H in L. For each submodule H ′ of im θ let(H ′)ϕ = {h ∈ G : (h)θ ∈ H ′}. Again it is routine to verify that (H ′)ϕ is a submod-ule of G. As the zero 0′ of G′ belongs to H ′ we see that ker θ ⊆ (H ′)ϕ, that is,(H ′)ϕ belongs to L for all H ′ in L

′. The proof is completed by showing that the map-ping L → L

′ given by H → (H)θ for all H ∈ L and the mapping L′ → L given by

H ′ → (H ′)ϕ for all H ′ ∈ L′ are inverses of each other. (The reader is reminded that

only bijective mappings have inverses and often the best way (as here) of showing thata mapping is a bijection amounts to ‘conjuring up’ another mapping which turns outto be its inverse.) Notice H ⊆ (H)θϕ as (h)θ ∈ (H)θ for all h ∈ H . Now considerg ∈ (H)θϕ = ((H)θ)ϕ. Then (g)θ ∈ (H)θ . So (g)θ = (h)θ for some h ∈ H . How-ever g − h = k ∈ ker θ since (g − h)θ = (g)θ − (h)θ = 0′. So g = h + k ∈ H sinceker θ ⊆ H and H is closed under addition. So (H)θϕ ⊆ H . Therefore H = (H)θϕ forall H in L.

In a similar way (H ′)ϕθ = ((H ′)ϕ)θ ⊆ H ′ since (g)θ ∈ H ′ for all g ∈ (H ′)ϕ. Leth′ ∈ H ′. Then h′ = (g)θ for g ∈ G since H ′ ⊆ im θ . But (g)θ ∈ H ′ means g ∈ (H ′)ϕand so h′ = (g)θ ∈ ((H ′)ϕ)θ = (H ′)ϕθ . We’ve shown H ′ ⊆ (H ′)ϕθ and so H ′ =


(H ′)ϕθ for all H ′ in L′. The mapping L → L

′, in which H → (H)θ , has an inverseand so this mapping is bijective. Finally it’s straightforward to show that H1 ⊆ H2 ⇒(H1)θ ⊆ (H2)θ for H1,H2 ∈ L. Now suppose (H1)θ ⊆ (H2)θ for H1,H2 ∈ L. ThenH1 = (H1)θϕ ⊆ (H2)θϕ = H2 and therefore

H1 ⊆ H2 ⇔ (H1)θ ⊆ (H2)θ. �

Each pair of submodules H1 and H2 in L gives rise to submodules H1 ∩ H2 andH1 + H2 in L. The set L, partially ordered by inclusion, is therefore a lattice as is L′.We shall not have much to say about lattices per se, but it is often illuminating to drawtheir diagrams as below.

We return to the Z-linear mapping θ : Z⊕Z → Z2 ⊕Z2 mentioned before Theo-rem 2.11. So (l,m)θ = lu+mv = (l,m) ∈ Z2 ⊕Z2 for all l,m ∈ Z, where u = (1,0),v = (0,1) ∈ Z2 ⊕Z2. The Klein 4-group Z2 ⊕Z2 = 〈u,v〉 = im θ has five subgroups{0}, 〈u〉, 〈v〉, 〈u + v〉, 〈u,v〉. These are the subgroups H ′ of Theorem 2.17 shown intheir lattice diagram:

The five corresponding subgroups H = (H ′)ϕ = {(l,m) ∈ Z ⊕ Z : (l,m) ∈ H ′} ofZ ⊕ Z are 〈2e1,2e2〉, 〈e1,2e2〉, 〈2e1, e2〉, 〈e1 + e2,2e2〉, 〈e1, e2〉 respectively and byTheorem 2.17 they fit together in the same way:

Notice that 〈e1 + e2,2e2〉 = 〈e1 + e2,2e1〉 = {(l,m) : parity l = parity m}. Each ofthese subgroups H has a Z-basis as shown, that is, each element of H is uniquely ex-


pressible as an integer linear combination of the Z-basis elements which themselvesbelong to H . We show in Theorem 3.1 that all subgroups H of Z ⊕ Z have Z-bases.This fact allows the abstract theory to be expressed using matrices over Z: for eachsubgroup H we construct a matrix A over Z having as its rows a set of generatorsof H . So the above five subgroups H of Z ⊕ Z give rise to the following five 2 × 2matrices A over Z:

(2 00 2

),

(1 00 2

),

(2 00 1

),

(1 10 2

),

(1 00 1

).

We develop this idea in Chapter 3. For the moment notice that the invariant factorsof these matrices present the isomorphism types of the corresponding quotient groupsZ ⊕ Z/H ∼= Z2 ⊕ Z2/H

′ on a plate! Thus Z ⊕ Z/〈2e1,2e2〉 ∼= Z/〈2〉 ⊕ Z/〈2〉 hasisomorphism type C2 ⊕ C2 being the direct sum of two cyclic groups of type C2.Similarly Z ⊕ Z/〈e1,2e2〉 ∼= Z/〈1〉 ⊕ Z/〈2〉 has isomorphism type C1 ⊕ C2 = C2 asthe trivial C1 term can be omitted. The second, third and fourth of the above matri-ces are equivalent over Z, the corresponding quotient groups being isomorphic. AlsoZ⊕Z/〈e1, e2〉 ∼= Z/〈1〉 ⊕Z/〈1〉 has isomorphism type C1 ⊕ C1 = C1.

Much of the abstract theory of subgroups and quotient groups developed here ap-plies, with some minor changes, to non-abelian groups G, which are usually expressedin multiplicative notation. For such groups the quotient group G/K makes sense onlywhen K is a normal subgroup of G, that is,

Kg = gK for all g in G.

The above equation means each element kg for k ∈ K , g ∈ G can be expressed asgk′ for some k′ ∈ K , and conversely each element gk′ for g ∈ G, k′ ∈ K can beexpressed as kg for some k ∈ K . The kernel of every homomorphism θ : G → G′between groups is a normal subgroup of G. Lagrange’s theorem and the conclusionsof Theorems 2.16 and 2.17 are valid for groups in general (see Exercises 2.3, Ques-tion 4).

We now discuss (finitely generated) free Z-modules. In fact the following theory‘works’ when Z is replaced by any non-trivial commutative ring R with 1-element.The theory of determinants extends to square matrices over R as was pointed out inSection 1.3. So a t × t matrix P over R is invertible over R if and only if detP ∈ U(R),that is, the determinant of P is an invertible element of R.

Lemma 2.18

Let P and Q be t × t matrices over a non-trivial commutative ring R such that PQ = I

where I is the t × t identity matrix over R. Then QP = I .


Proof

Comparing determinants in the matrix equation PQ = I gives detPQ = det I = 1the 1-element of R. Using the multiplicative property Theorem 1.18 of determinantswe obtain (detP)(detQ) = 1 and so detP is an invertible element of R. The matrixP −1 = (1/detP) adjP over R satisfies P −1P = I = PP −1. Hence P −1 = P −1I =P −1PQ = IQ = Q and so QP = P −1P = I . �

Interchanging the roles of P and Q we see that QP = I ⇒ PQ = I . So from thesingle equation PQ = I we can deduce that P and Q are both invertible over R andeach is the inverse of the other: Q = P −1 and P = Q−1. We will need this fact in theproof of the next theorem.

The set of t × t matrices over the ring R is closed under matrix addition and ma-trix multiplication and is itself a ring Mt (R). The invertible elements of Mt (R) formthe general linear group GLt (R) of degree t over R. We will study certain aspectsof Mt (F ), where F is a field, in the second half of the book. Let us suppose that F isa finite field with q elements. You should be aware that q must be a power of a prime(Exercises 2.3, Question 5(a)). Then |Mt (F )| = qt2

, there being q choices for each ofthe t2 entries in a t × t matrix over F . How can we find the number |GLt (F )| of invert-ible t × t matrices P over F ? The reader will know that P is invertible over F if andonly if the rows of P form a basis of the vector space F t of all t-tuples over F . What ismore each basis v1, v2, . . . , vt of F t can be built up, vector by vector, ensuring linearindependence at each stage as we now explain. There are qt − 1 choices for v1 (anyof the |F t | = qt vectors in F t except the zero vector). Suppose i linearly independentvectors v1, v2, . . . , vi have been chosen where 1 ≤ i < t . Then v1, v2, . . . , vi, vi+1 arelinearly independent ⇔ vi+1 /∈ 〈v1, v2, . . . , vi〉. So there are qt − qi choices for vi+1

(any of the |F t | = qt vectors in F t except for the qi = |〈v1, v2, . . . , vi〉| vectors in〈v1, v2, . . . , vi〉). Hence

|GLt (F )| = (qt − 1)(qt − q)(qt − q2) · · · (qt − qt−1)

there being qt − qi remaining choices for row i + 1 of a matrix P in GLt (F ), theprevious i rows of P having already been chosen. In particular the number of 3 × 3matrices over Z2 is 29 = 512 and (23 − 1)(23 − 2)(23 − 4) = 7 × 6 × 4 = 168 of theseare invertible. So |M3(Z2)| = 512 and |GL3(Z2)| = 168.

The Chinese remainder theorem generalises to decompose Mt (Zmn) wheregcd{m,n} = 1 using the ring isomorphism α : Zmn

∼= Zm ⊕ Zn of Theorem 2.11 asfollows: let rmn denote the (i, j)-entry in the t × t matrix A over Zmn for 1 ≤ i, j ≤ t .Write (A)α = (B,C) where B is the t × t matrix over Zm with (i, j)-entry rm andC is the t × t matrix over Zn with (i, j)-entry rn, that is, B and C are obtained byreducing each entry in A modulo m and modulo n respectively. Then

α : Mt (Zmn) ∼= Mt (Zm) ⊕Mt (Zn) for gcd{m,n} = 1


as α is a ring isomorphism. So α maps invertible elements to invertible elements andhence we obtain the group isomorphism

α| : GLt (Zmn) ∼= GLt (Zm) × GLt (Zn) for gcd{m,n} = 1

where the right-hand side denotes the external direct product of the indicated groups(see Exercises 2.3, Question 4(d)) and α| denotes the restriction of α to GLt (Zmn).

For example take t = 2, m = 7, n = 8 and

A =(

27 17

44 51

)

∈ M2(Z56).

Then

(A)α = (B,C) =((

6 32 2

),

(3 14 3

))∈M2(Z7) ⊕M2(Z8).

In fact A, B and C are invertible and

(A−1)α = (B−1,C−1) =((

5 32 1

),

(7 34 7

)).

Hence

A−1 =(

47 344 15

)

on applying Theorem 2.11 to each entry. Note |M2(Z56)| = 564 and

|GL2(Z56)| = |GL2(Z7)| × |GL2(Z8)| = (72 − 1)(72 − 7) × 44 × 6 = 3096576

as

|GL2(Z8)| = 44 × |GL2(Z2)|.More generally let detA = m where A ∈Mt (Zn). As m is an invertible element of

Zn ⇔ gcd{m,n} = 1, we see that A ∈ GLt (Zn) ⇔ gcd{m,n} = 1. Also |GLt (Zq)| =pt2(s−1)|GLt (Zp)| where q = ps , p prime (see Exercises 2.3, Question 5(b)).

In Chapter 5 we use the concept of an F [x]-module M in order to discuss thetheory of similarity of square matrices over the field F . Here F [x] is the ring of allpolynomials a0 + a1x + a2x

2 + · · · + anxn over F . There is a close analogy between

the theory of similarity and the theory of finite abelian groups as we’ll come to realise.Both theories involve R-modules where R is a principal ideal domain. However, theaspect of the theory which we deal with next ‘works’ in the general context of R

being merely a non-trivial commutative ring (with 1-element). So we assume in thefollowing theory that R is such a ring.


Let M be a set closed under a binary operation called ‘addition’ and denoted inthe familiar way by ‘+’. Suppose that (M,+) satisfies laws 1, 2, 3 and 4 of an abeliangroup as introduced at the beginning of Section 2.1. Suppose also that it makes senseto multiply elements r of a commutative ring R and elements v of M together, theresult always being an element of M , that is,

rv ∈ M for all r ∈ R, v ∈ M.

Then M is called an R-module if the above product rv satisfies:5. r(v1 + v2) = rv1 + rv2 for all r ∈ R and all v1, v2 ∈ M ,

(r1 + r2)v = r1v + r2v for all r1, r2 ∈ R and all v ∈ M ,6. (r1r2)v = r1(r2v) for all r1, r2 ∈ R and all v ∈ M ,7. 1v = v for all v ∈ M where 1 denotes the 1-element of R.

There are no surprises here! We have simply mimicked the Z-module definition atthe start of Section 2.1. Should the ring R happen to be a field F then laws 1–7 aboveare the laws of a vector space, that is,

the concepts F -module and vector space over F are the same.

Definition 2.19

Let M be an R-module containing v1, v2, . . . , vt .(i) The elements v1, v2, . . . , vt generate M if each element of M can be expressed

r1v1 + r2v2 + · · · + rtvt for some r1, r2, . . . , rt ∈ R in which case we write

M = 〈v1, v2, . . . , vt 〉.

(ii) The elements v1, v2, . . . , vt are R-independent if the equation

r1v1 + r2v2 + · · · + rtvt = 0

holds only in the case r1 = r2 = · · · = rt = 0.(iii) The ordered set v1, v2, . . . , vt is an R-basis of M if v1, v2, . . . , vt generate M

and are R-independent.

The above definitions are modelled on the corresponding vector space conceptswhich will be well-known to the reader. You are used to regarding the bases v1, v2 andv2, v1 of a 2-dimensional vector space V as being different – the order in which thevectors appear is important and the same goes for R-bases.

Let the R-module M have R-basis v1, v2, . . . , vt and let v ∈ M . As v1, v2, . . . , vt

generate M there are ring elements r1, r2, . . . , rt with v = r1v1 + r2v2 + · · · + rt vt .


In fact r1, r2, . . . , rt are unique because suppose v = r ′1v1 + r ′

2v2 + · · · + r ′t vt where

r ′1, r

′2, . . . , r

′t ∈ R. Subtracting we obtain

0 = v − v = (r1 − r ′1)v1 + (r2 − r ′

2)v2 + · · · + (rt − r ′t )vt

and so from the R-independence of v1, v2, . . . , vt we deduce r1 − r ′1 = 0, r2 − r ′

2 = 0,

. . . , rt − r ′t = 0; therefore ri = r ′

i for 1 ≤ i ≤ t , showing that each v in M can beexpressed in one and only one way as an R-linear combination of v1, v2, . . . , vt . Inparticular (as in the next proof) from vi = r1v1 + r2v2 + · · · + rt vt we deduce ri = 1and rk = 0 for k �= i.

It is not encouraging that some generating sets of a Z-module G do not contain anyZ-basis of G, that some Z-independent subsets of G are not contained in any Z-basisof G, and that quite possibly G has no Z-basis at all. However, as a consequence of thenext theorem, should an R-module M have an R-basis consisting of exactly t (a non-negative integer) elements then every R-basis of M has t elements also, in which caseM is said to be a free R-module of rank t .

Theorem 2.20

Let R be a non-trivial commutative ring and suppose that M is an R-module withR-basis v1, v2, . . . , vt . Suppose also that M contains elements u1, u2, . . . , us whichgenerate M . Then s ≥ t .

Proof

Each vi is a linear combination of u1, u2, . . . , us and so there are ring elements pij ∈ R

with vi = ∑sj=1 pijuj for 1 ≤ i ≤ t . Let P = (pij ) denote the t × s matrix over

R with (i, j)-entry pij . In the same way each module element uj is expressible asa linear combination of v1, v2, . . . , vt and so there are ring elements qjk ∈ R withuj = ∑t

k=1 qjkvk for 1 ≤ j ≤ s. Let Q = (qjk) be the s × t matrix over R with (j, k)-entry qjk . We’ve chosen the symbols i, j , k so that the (i, k)-entry

∑sj=1 pij qjk in the

t × t matrix PQ appears in the familiar notation. Substituting for uj we obtain

vi =s∑

j=1

pijuj =s∑

j=1

pij

(t∑

k=1

qjkvk

)

=t∑

k=1

(s∑

j=1

pij qjk

)

vk for 1 ≤ i ≤ t

which must in fact be no more than the unsurprising equation vi = vi as v1, v2, . . . , vt

are R-independent. Looking at the last term above we see that the (i, k)-entry in PQ

is 1 or 0 according as i = k or i �= k, that is,

PQ = It the t × t identity matrix over R. (♦)


Suppose s < t . We’ll shortly discover a contradiction to this supposition and that willcomplete the proof. We can’t use Lemma 2.18 and leap to the conclusion that P andQ are inverses of each other as neither P nor Q is a square matrix. But the readershould have the feeling that something is wrong: the condition s < t means that P

is ‘long and thin’ and Q is ‘short and fat’, but nevertheless their product PQ is thelarge ‘virile’ identity matrix It . We clinch the matter by partitioning P = (

P1P2

)and

Q = (Q1 Q2 ) where P1 and Q1 are s × s matrices, and so P2 is (t − s) × s and Q2

is s × (t − s). Then (♦) gives

PQ =(

P1

P2

)(Q1 Q2

) =(

P1Q1 P1Q2

P2Q1 P2Q2

)

= It =(

Is 0

0 It−s

)

and so P1Q1 = Is on comparing leading entries. Now Lemma 2.18 can be used togive Q1P1 = Is as the s × s matrices P1 and Q1 are inverses of each other. Compar-ing (1,2)-entries in the above partitioned matrices gives P1Q2 = 0 and hence Q2 =IsQ2 = (Q1P1)Q2 = Q1(P1Q2) = Q10 = 0. The 1-element of R cannot be zero(for if so then R = {0}). Comparing (2,2)-entries above now gives P2Q2 = It−s �= 0whereas P2Q2 = P20 = 0. We have found the contradiction to s < t we are lookingfor as P2Q2 cannot be both non-zero and zero! Therefore s ≥ t . �

Corollary 2.21

Let M be an R-module with R-basis v1, v2, . . . , vt . Then every R-basis of M hasexactly t elements. Let u1, u2, . . . , ut be elements of M and let Q = (qjk) be the t × t

matrix over R such that uj = ∑tk=1 qjkvk for 1 ≤ j ≤ t . Then u1, u2, . . . , ut is an

R-basis of M if and only if Q is invertible over R.

Proof

Let u1, u2, . . . , us be an R-basis of M . As u1, u2, . . . , us generate M we deduce t ≤ s

from Theorem 2.20. As u1, u2, . . . , us is an R-basis of M and v1, v2, . . . , vt gener-ate M , interchanging the roles of the u’s and v’s, we obtain s ≤ t from Theorem 2.20.So s = t . Using (♦) above and Lemma 2.18 we see that the t × t matrix Q is invertibleover R.

Now suppose that u1, u2, . . . , ut are elements of M such that Q is invertibleover R. Write Q−1 = (pij ). Multiplying uj = ∑t

k=1 qjkvk by pij and summing overj gives

t∑

j=1

pijuj =t∑

j=1

pij

(t∑

k=1

qjkvk

)

=t∑

k=1

(t∑

j=1

pij qjk

)

vk = vi for 1 ≤ i ≤ t (❤)


which shows that each vi is an R-linear combination of u1, u2, . . . , ut . Considerv ∈ M . As M = 〈v1, v2, . . . , vt 〉 there are elements r1, r2, . . . , rt ∈ R with v =∑t

i=1 rivi . Using (❤) we see

v =t∑

i=1

ri

(t∑

j=1

pijuj

)

=t∑

j=1

(t∑

i=1

ripij

)

uj =t∑

j=1

r ′j uj

where r ′j = ∑t

j=1 ripij for 1 ≤ j ≤ t , that is, (r ′1, r

′2, . . . , r

′t ) = (r1, r2, . . . , rt )Q

−1. Sou1, u2, . . . , ut generate M .

Finally we show that u1, u2, . . . , ut are R-independent. Suppose∑t

j=1 r ′j uj = 0

where r ′1, r

′2, . . . , r

′t ∈ R. On multiplying uj = ∑t

k=1 qjkvk by r ′j and summing over j

we obtain

0 =t∑

j=1

r ′j uj =

t∑

j=1

r ′j

(t∑

k=1

qjkvk

)

=t∑

k=1

(t∑

j=1

r ′j qjk

)

vk =t∑

k=1

rkvk

where rk = ∑tj=1 r ′

j qjk for 1 ≤ k ≤ t , that is, (r1, r2, . . . , rt ) = (r ′1, r

′2, . . . , r

′t )Q. As

v1, v2, . . . , vt are R-independent we see r1 = r2 = · · · = rt = 0. Hence(r ′

1, r′2, . . . , r

′t ) = (r1, r2, . . . , rt )Q

−1 = 0 × Q−1 = 0 showing r ′1 = r ′

2 = · · · = r ′t = 0.

So u1, u2, . . . , ut are R-independent and hence they form an R-basis of M . �

Definition 2.22

Let R be a commutative ring. An R-module M having an R-basis is called free. Thenumber t of elements in any R-basis of a free R-module M is called the rank of M .

So the concept ‘rank of a module’ applies only to free modules. This conceptmakes sense for R-modules by Corollary 2.21 and generalises the familiar idea ofdimension of a finite-dimensional vector space. We’ll use rank as defined in Defini-tion 2.22 to establish the important Invariance Theorem 3.7 concerning f.g. Z-modulesin Section 3.1.

The set Rt of t-tuples (r1, r2, . . . , rt ), where each ri belongs to the non-trivialcommutative ring R, is itself an R-module, the module operations being carried outcomponentwise. It should come as no surprise to the reader that Rt has an R-basis,namely

e1 = (1,0,0, . . . ,0), e2 = (0,1,0, . . . ,0), . . . , et = (0,0,0, . . . ,1)

which is known as the standard basis of Rt . So Rt is free of rank t .Our next corollary tells us how to recognise R-bases of Rt : they are nothing more

than the rows of invertible t × t matrices over R.


Corollary 2.23

Let ρ1, ρ2, . . . , ρt denote the rows of the t × t matrix Q over a non-trivial commutativering R. Then ρ1, ρ2, . . . , ρt is an R-basis of Rt if and only if Q is invertible over R.

Proof

Write Q = (qjk). Then ρj = ∑tk=1 qjkek for 1 ≤ j ≤ t . On applying Corollary 2.21

with M = Rt , uj = ρj and vk = ek we see that ρ1, ρ2, . . . , ρt is an R-basis of Rt ifand only if Q is invertible over R. �

We’ll use the case R = Z of Corollary 2.23 in Section 3.1. As an illustration con-sider ρ1 = (4,5), ρ2 = (5,6). Then ρ1, ρ2 is a Z-basis of Z

2 as P = ( ρ1ρ2

) = (4 55 6 )

is invertible over Z since detP = −1 is an invertible element of Z. The rows ofP −1 = (−6 5

5 −4

)tell us how the elements e1, e2 of the standard Z-basis of Z

2 areexpressible as Z-linear combinations of ρ1, ρ2 because

(e1

e2

)= I = P −1P =

(−6 55 −4

)(ρ1

ρ2

)=

(−6ρ1 + 5ρ2

5ρ1 − 4ρ2

),

that is, e1 = −6ρ1 + 5ρ2, e2 = 5ρ1 − 4ρ2 on equating rows. Hence (m1,m2) =(−6m1 + 5m2)ρ1 + (5m1 − 4m2)ρ2 for all (m1,m2) ∈ Z

2, showing explicitly thatρ1, ρ2 generate Z

2, that is, 〈ρ1, ρ2〉 = Z2.

Definition 2.24

Let M and M ′ be R-modules. A mapping θ : M → M ′ is called R-linear if(v1 + v2)θ = (v1)θ + (v2)θ for all v1, v2 ∈ M and (rv)θ = r((v)θ) for all r ∈ R,v ∈ M . A bijective R-linear mapping θ is called an isomorphism. If there is an iso-morphism θ : M → M ′, then the R-modules M and M ′ are called isomorphic and wewrite θ : M ∼= M ′.

The above definition mimics Definitions 2.3 and 2.4 replacing Z by the commuta-tive ring R. The following lemma will be used in Section 3.1.

Lemma 2.25

Let M be a free R-module of rank t and let M ′ be an R-module which is isomorphicto M . Then M ′ is also free of rank t .


Proof

There is an isomorphism θ : M ∼= M ′. The free R-module M has an R-basisv1, v2, . . . , vt . Write v′

i = (vi)θ for 1 ≤ i ≤ t . It is straightforward to show thatv′

1, v′2, . . . , v

′t is an R-basis of M ′ (see Exercises 2.3, Question 7(a)). Hence M ′ is

free of rank t by Definition 2.22. �

One advantage of expressing abelian groups in the language of Z-modules is thatsome theorems we have already met painlessly generalise to R-modules. In particularthis is true of Lemma 2.10, Theorems 2.16 and 2.17 as we now outline.

Definition 2.26

Let N be a subset of an R-module M . Suppose that N is a subgroup of the additivegroup of M and ru ∈ N for all r ∈ R and all u ∈ N . Then N is called a submodule ofthe R-module M .

So submodules of the R-module M are subgroups N of the abelian group (M,+)

which are closed under multiplication by elements of the ring R. It is important torealise that submodules of R-modules are themselves R-modules: laws 1–7 of anR-module hold with M replaced by N throughout. The reader will be familiar withthis type of thing as subspaces of vector spaces are vector spaces ‘in their own right’.Indeed should the ring R be a field F , then Definition 2.26 tells us that submodules ofthe F -module M are exactly subspaces of the vector space M .

Let N be a submodule of the R-module M . As N is a subgroup of the additivegroup M the quotient group M/N can be constructed as in Lemma 2.10 (here M andN replace G and K respectively). The elements of M/N are cosets N + v for v ∈ M

where (unsurprisingly) N +v = {u+v : u ∈ N}. Can the abelian group M/N be giventhe extra structure of an R-module? The answer is: Yes!

Lemma 2.27

Let N be a submodule of the R-module M . Write r(N + v) = N + rv for r ∈ R andv ∈ M . This product is unambiguously defined and using it M/N is an R-module. Thenatural mapping η : M → M/N is R-linear where (v)η = N + v for all v ∈ N .

Proof

Suppose N + v = N + v′ where v, v′ ∈ M . Then v − v′ ∈ N as in Lemma 2.9. Sorv − rv′ = r(v − v′) ∈ N as N is a submodule of the R-module M . But rv − rv′ ∈ N


gives N +rv = N +rv′ by Lemma 2.9 and shows that the given definition of r(N +v)

is unambiguous.By Lemma 2.10 coset addition in M/N obeys laws 1, 2, 3, 4 of a Z-module. We

should check that laws 5, 6 and 7 are obeyed by the product r(N + v) defined above.Consider r ∈ R and v, v1, v2 ∈ M . Then

r((N + v1) + (N + v2)) = r(N + (v1 + v2)) = N + r(v1 + v2)

= N + (rv1 + rv2) = (N + rv1) + (N + rv2)

= r(N + v1) + r(N + v2)

which shows that the first part of law 5 is obeyed. The remaining parts can be checkedin a similar way (see Exercises 2.3, Question 7(d)) showing that M/N is an R-module.

As (rv)η = N + rv = r(N + v) = r((v)η) for all r ∈ R, v ∈ M we see that η isR-linear. �

The reader should verify that kernels and images of R-linear mappings are sub-modules (see Exercises 2.3, Question 7(c)).

We are now ready to generalise Theorem 2.16 and 2.17.

Corollary 2.28 (The first isomorphism theorem for R-modules)

Let M and M ′ be R-modules and let θ : M → M ′ be an R-linear mapping. WriteK = ker θ . Then θ : M/K ∼= im θ is an isomorphism of R-modules where θ is definedby (K + v)θ = (v)θ for all v ∈ M .

Proof

By Theorem 2.16 we know that θ is an isomorphism of Z-modules. So it is enough tocheck that θ is R-linear: (r(K + v))θ = (K + rv)θ = (rv)θ = r((v)θ) = r((K + v)θ)

for r ∈ R, v ∈ M using the R-linearity of θ and the definition of θ . �

Corollary 2.29

Let M and M ′ be R-modules and let θ : M → M ′ be an R-linear mapping. Let L bethe set of submodules N of M with ker θ ⊆ N . Let L′ be the set of submodules N ′ ofM ′ with N ′ ⊆ im θ . Then (N)θ = {(u)θ : u ∈ N} is in L

′ for all N in L. The mappingN → (N)θ is a bijection from L to L

′ and satisfies N1 ⊆ N2 ⇔ (N1)θ ⊆ (N2)θ for allN1,N2 ∈ L.


Proof

In view of Theorem 2.17 there is not a great deal left to prove and what’s left is routine.We know that (N)θ is a subgroup of (M ′,+). Consider r ∈ R and u ∈ N . Then ru ∈ N

as N is a submodule of M . So r((u)θ) = (ru)θ ∈ (N)θ showing that (N)θ is a sub-module of M ′. Therefore N → (N)θ is a mapping from L to L

′. Following the proofof Theorem 2.17 for each submodule N ′ of M ′ write (N ′)ϕ = {v ∈ M : (v)θ ∈ N ′}.The diligent reader will have checked that (N ′)ϕ is a subgroup of (M,+). Con-sider r ∈ R and v ∈ (N ′)ϕ. Is rv ∈ (N ′)ϕ? Yes it is, as v′ = (v)θ ∈ N ′ and so(rv)θ = r((v)θ) = rv′ ∈ N ′ as N ′ is a submodule of M ′. The conclusion is: (N ′)ϕ isa submodule of M and N ′ → (N ′)ϕ is a mapping from L

′ to L. As before these map-pings are inverses of each other and are inclusion-preserving. Therefore N → (N)θ isa bijection from L to L

′ satisfying N1 ⊆ N2 ⇔ (N1)θ ⊆ (N2)θ for all N1,N2 ∈ L. �

EXERCISES 2.3

1. (a) Let G and G′ be Z-modules and let θ : G → G′ be a Z-linear map-ping.(i) Show that ker θ is a submodule of G. Show that ker θ = {0} ⇔ θ isinjective.(ii) Show that im θ is a submodule of G′. Is it true that im θ = G′ ⇔ θ

is surjective?(b) The Z-linear mapping θ : Z ⊕ Z → Z is given by (l,m)θ = 4l − 2m

for all l,m ∈ Z. Verify that (1,2) ∈ ker θ . Do (−1,−2) and (2,4)

belong to ker θ? Show that ker θ = 〈(1,2)〉. Which integers belong toim θ? Is im θ infinite cyclic? Using the notation of Theorem 2.16, arethe integers (ker θ + (12,20))θ and (ker θ + (17,30))θ equal? Showthat Z⊕Z/ker θ is infinite cyclic and specify a generator.

(c) Specify the subgroups of each of the following groups and hence de-termine the isomorphism types of their homomorphic images.

(i) Z8; (ii) Z12; (iii) Zn (n > 0);(iv) Z; (v) Z2 ⊕Z2.

(d) Let G1 and G2 be additive abelian groups. By applying Theo-rem 2.16 to suitable homomorphisms, establish the isomorphisms:G1/{0} ∼= G1, G1/G1 ∼= {0}. Show also that G1 and G2 are homo-morphic images of G1 ⊕ G2. Are the groups G1 and (G1 ⊕ G1)/K

isomorphic where K = {(g1, g1) : g1 ∈ G1}?(e) Let G be a Z-module and θ : G → G a Z-linear mapping which is

idempotent (i.e. θ2 = θ). Use the equation g = (g − (g)θ) + (g)θ for


g ∈ G to show G = ker θ ⊕ im θ . Let G = Z2 ⊕ Z4 and let (l,m)θ =(m,2l − m) for all l,m ∈ Z. Show that θ : G → G is idempotent andfind generators of ker θ and im θ .

2. (a) Let G and G′ be Z-modules and θ : G → G′ a Z-linear mapping.(i) For each subgroup H of G show that H ′ = {(h)θ : h ∈ H } is a sub-group of G contained in im θ . Show H/(ker θ ∩H) ∼= H ′ by applyingTheorem 2.16 to the restriction of θ to H (i.e. the Z-linear mappingθ |H : H → G′ defined by (h)θ |H = (h)θ for all h ∈ H ).(ii) For each subgroup H ′ of G′, show that H = {h ∈ G : (h)θ ∈ H ′}is a subgroup of G containing ker θ . Show H/ker θ ∼= H ′ ∩ im θ byapplying Theorem 2.16 to θ |H .

(b) The Z-linear mapping θ : Z ⊕ Z → Z2 is defined by (l,m)θ = l − m

for all l,m ∈ Z. Verify that (1,1) and (2,0) belong to ker θ . Showthat (1,1), (2,0) is a Z-basis of ker θ . Show im θ = Z2. Using The-orem 2.16 determine the isomorphism type of Z ⊕ Z/ker θ . UseTheorem 2.17 to show that ker θ is a maximal subgroup of Z ⊕ Z

(i.e. ker θ �= Z ⊕ Z and there are no subgroups H of Z ⊕ Z withker θ ⊂ H ⊂ Z⊕Z).

(c) The Z-linear mapping θ : Z ⊕ Z → Z4 is defined by (l,m)θ = l + m

for all l,m ∈ Z. Show that ker θ = 〈(1,−1), (4,0)〉. Verify thatim θ = Z4 and hence find the isomorphism type of (Z ⊕ Z)/ker θ .List the subgroups H ′ of Z4 and the corresponding subgroups H ofZ⊕Z with ker θ ⊆ H as in Theorem 2.17. Taking H ′ = 〈2〉 specify aZ-basis of the corresponding H .

(d) The Z-linear mapping θ : Z ⊕ Z → Z2 ⊕ Z4 = G′ is defined by(l,m)θ = (l,m) for all l,m ∈ Z. Verify that 2e1, 4e2 is a Z-basis ofker θ . For each of the following subgroups H ′ of G′ specify a Z-basisρ1, ρ2 of H = (H ′)ϕ = {h ∈ Z⊕Z : (h)θ ∈ H ′}:

〈(1,0)〉, 〈(0,2)〉, 〈(1,1)〉.In each case find the Smith normal form diag(d1, d2) of the 2 × 2 ma-trix A = ( ρ1

ρ2

)and check that G′/H ′ has isomorphism type Cd1 ⊕ Cd2 .

(e) Let G and G′ be Z-modules and θ : G → G′ a surjective Z-linearmapping. Let H ′ be a subgroup of G′ and H = {h ∈ G : (h)θ ∈ H ′}.By applying Theorem 2.16 to θη, where η : G′ → G′/H ′ is the naturalhomomorphism, show G/H ∼= G′/H ′.

3. (a) Let R be a ring with 1-element e. A subgroup K of the additive groupof R is called an ideal of R if rk and kr belong to K for all r ∈ R,k ∈ K . Show that multiplication of cosets is unambiguously definedby (K + r1)(K + r2) = K + r1r2 for all r1, r2 ∈ R (i.e. show that ifK + r1 = K + r ′

1 and K + r2 = K + r ′2 then K + r1r2 = K + r ′

1r′2).


Using the notation K + r = r and Lemma 2.10, show that the setR/K = {r : r ∈ R} of cosets is a ring (the quotient ring of R by K).Show further that η : R → R/K , given by (r)η = r for all r ∈ R, isa ring homomorphism (the natural homomorphism from R to R/K).What are imη and kerη?

(b) Let R and R′ be rings and let θ : R → R′ be a ring homomorphism.Show that im θ is a subring of R′ (i.e. im θ is a subgroup of the addi-tive group of R′, im θ is closed under multiplication and im θ containsthe 1-element e′ of R′).Show that K = ker θ = {k ∈ R : (k)θ = 0′} is an ideal of R where 0′is the 0-element of R′. Prove the first isomorphism theorem for ringsnamely θ : R/K ∼= im θ (i.e. θ defined by (K + r)θ = (r)θ for allr ∈ R is a ring isomorphism).

(c) Let θ : Z → R be a ring homomorphism from the ring Z of integersto a ring R. Use part (b) above and Theorem 1.15 to show that there isa non-negative integer d with Z/〈d〉 ∼= im θ , that is, the rings Zd are,up to isomorphism, the (ring) homomorphic images of Z.

(d) Let R, R′, R′′ be rings and let θ : R → R′, θ ′ : R′ → R′′ be ringhomomorphisms. Show that θθ ′ : R → R′′ is a ring homomorphism.Suppose θ is a ring isomorphism. Show that θ−1 is also a ring isomor-phism. Deduce that the automorphisms θ of R (the ring isomorphismsθ : R → R) form a group AutR, the group operation being mappingcomposition.

(e) Let R1 and R2 be rings. Show that R1 ⊕ R2 = {(r1, r2) : r1 ∈ R1,

r2 ∈ R2}, with addition and multiplication of ordered pairs defined by(r1, r2)+ (r ′

1, r′2) = (r1 + r ′

1, r2 + r ′2), (r1, r2)(r

′1, r

′2) = (r1r

′1, r2r

′2) for

all r1, r′1 ∈ R1 and all r2, r

′2 ∈ R2, is itself a ring (the direct sum of R1

and R2).(f) Let K and L be ideals of a ring R (see Question 3(a) above). Show

that K ∩ L and K + L = {k + l : k ∈ K, l ∈ L} are ideals of R. Estab-lish the generalised Chinese remainder theorem which states: supposeK +L = R; then α : R/(K ∩L) ∼= R/K ⊕R/L is a ring isomorphismwhere (r + K ∩ L)α = (r + K,r + L) for all r ∈ R.Hint: Consider α : R → R/K ⊕R/L defined by (r)α = (r +K,r +L)

for all r ∈ R.4. (a) Let G be a multiplicative group with subgroup K . Then K is called

normal in G if g−1kg ∈ K for all k ∈ K , g ∈ G. Write Kg ={kg : k ∈ K} and gK = {gk : k ∈ K}. Show that K is normal in G

if and only if Kg = gK for all g ∈ G.The group S3 consisting of the 6 bijections (permutations) of {1,2,3}to {1,2,3} contains the elements σ and τ where (1)σ = 2, (2)σ = 3,


(3)σ = 1 and (1)τ = 2, (2)τ = 1, (3)τ = 3. Show that 〈σ 〉 ={σ,σ 2, σ 3} is normal in S3 but 〈τ 〉 = {τ, τ 2} is not normal in S3.Hint: 〈σ 〉 is the subgroup of even permutations in S3.Suppose that K is normal in G. Show that the product of cosets isunambiguously defined by (Kg1)(Kg2) = K(g1g2) for g1, g2 ∈ G.Hence show that the set G/K of all cosets Kg(g ∈ G) is a group,the quotient group of G by K – it’s the multiplicative version ofLemma 2.10.

(b) Let G = U(Z15) the multiplicative group of invertible elements in thering Z15. List the 8 elements in G. For each of the following sub-groups K , partition G into cosets of K , construct the multiplicationtable of G/K and state the isomorphism type of G/K :

K = {1,4}, K = {1,14}, K = {1,4,11,14},K = {1,2,4,8}.

From your results decide whether or not (i) K1 ∼= K2 implies G/K1 ∼=G/K2, (ii) G/K1 ∼= G/K2 implies K1 ∼= K2, where K1 and K2 arenormal subgroups of G.

(c) Let G and G′ be multiplicative groups. Suppose the mappingθ : G → G′ satisfies (g1g2)θ = (g1)θ(g2)θ for all g1, g2 ∈ G. Thenθ is called a group homomorphism. Show that (e)θ = e′ where e, e′are the identity elements of G, G′ by considering (e2)θ . Deduce that(g−1)θ = ((g)θ)−1 for all g ∈ G.Show that K = ker θ = {k ∈ G : (k)θ = e′} is a normal subgroup of G.Show that im θ = {(g)θ : g ∈ G} is a subgroup of G′. Prove the firstisomorphism theorem for groups namely θ : G/K ∼= im θ , i.e. θ de-fined by (Kg)θ = (g)θ for all g ∈ G is a group isomorphism (a bijec-tive group homomorphism) – it’s the multiplicative version of Theo-rem 2.16.Let R be a non-trivial commutative ring and let t be a positive in-teger. Use Theorem 1.18 to show that θ : GLt (R) → U(R) given by(A)θ = detA for all A ∈ GLt (R) is a group homomorphism. Showim θ = U(R) and write ker θ = SLt (R) the special linear group of de-gree t over R. Find a formula for |SLt (Zp)| where p is prime.

(d) Let G1 and G2 be multiplicative groups. Show that the Cartesian prod-uct G1 × G2 (the set of all ordered pairs (g1, g2) where g1 ∈ G1,g2 ∈ G2) with componentwise multiplication is a group, the externaldirect product of G1 and G2 – it’s the multiplicative version of theexternal direct sum (Exercises 2.2, Question 4(e)).The projection homomorphisms πi : G1 × G2 → Gi are defined by(g1, g2)πi = gi for all (g1, g2) ∈ G1 × G2 where i = 1,2. Show that


G1 ×G2 has normal subgroups K1 and K2, isomorphic to G1 and G2,such that K1 ∩ K2 is trivial and K1K2 = {k1k2 : ki ∈ Ki} = G1 × G2.Hint: Use kerπi .

5. (a) Let F be a field with 1-element e. Show that the mapping χ : Z → F ,defined by (m)χ = me for all m ∈ Z, is a ring homomorphism. Thenon-negative integer d with kerχ = 〈d〉 is called the characteristicof F (d exists by Theorem 1.15). Show that χ : Zd

∼= imχ (i.e. imχ

is a subring of F which is isomorphic to Zd ). Using the fact that F

has no divisors of zero, deduce that either d = 0 or d = p (prime). Itis customary to write d = χ(F ).Let F be a finite field. Explain why Zp

∼= imχ in this case; imχ = F0

is called the prime subfield of F . Explain how F has the structure ofa vector space over F0. Why is this vector space finitely generated?Show |F | = ps where s is the dimension of F over F0. Use the bino-mial theorem to show (a + b)p = ap + bp for a, b ∈ F . Hence showthat θ : F → F defined by (a)θ = ap for all a ∈ F is an automor-phism of F (the Frobenius automorphism).

(b) Let d and n be positive integers with d|n. Using the notation of The-orem 2.11, let δ1 : Zn → Zd be the surjective ring homomorphismgiven by (mn)δ1 = md for all m ∈ Z. Let A = (aij ) be the t × t

matrix over Zn with (i, j)-entry aij . Write (A)δt = ((aij )δ1), i.e.(A)δt is the t × t matrix over Zd with (i, j)-entry (aij )δ1. Show thatδt : Mt (Zn) → Mt (Zd) is a surjective ring homomorphism. Describethe elements of ker δt and show |ker δt | = (n/d)t

2.

Take d = p (prime), n = ps where s > 0 and write detA = mn. Showthat

A ∈ GLt (Zn) ⇔ gcd{m,ps} ⇔ gcd{m,p} = 1

⇔ (A)δt ∈ GLt (Zd).

Deduce that the restriction δt | : GLt (Zn) → GLt (Zp) is a surjec-tive homomorphism of multiplicative groups having kernel the cosetker δt + I , where I is the t × t identity matrix over Zn. Hence show

|GLt (Zps )| = p(s−1)t2(pt − 1)(pt − p) · · · (pt − pt−1).

Calculate |GL3(Z4)| and |GL2(Z17)|. Does GL2(Z125) have fewer el-ements than GL2(Z128)? Taking p = s = t = 2, list the 16 matrices inthe normal subgroup

ker δ2| = ker δ2 +(

1 00 1

)of GL2(Z4).


(c) Let α : M2(Z72) ∼= M2(Z8) ⊕ M2(Z9) be the generalised Chineseremainder theorem isomorphism. Find A1 and A2 in M2(Z72) with

(A1)α =((

1 23 4

),

(5 67 8

)),

(A2)α =((

1 00 1

),

(0 11 0

)).

Calculate A1 + A2, A1A2 and check

(A1 + A2)α = (A1)α + (A2)α, (A1A2)α = (A1)α(A2)α.

Let n = ps11 p

s22 · · ·psk

k where p1,p2, . . . , pk are distinct primes. Writedown a formula for |GLt (Zn)| in terms of t , pi and qi = p

sii for

1 ≤ i ≤ k.6. (a) Let ρ1, ρ2, . . . , ρs denote the rows of the s × t matrix A over Z where

s ≤ t . Show that ρ1, ρ2, . . . , ρs are Z-independent elements of Zt ifand only if all the invariant factors d1, d2, . . . , ds of A are positive.Use Exercises 1.3, Question 5(c) to show that there is a Z-basis of Zt

beginning with ρ1, ρ2, . . . , ρs if and only if d1 = d2 = · · · = ds = 1.(b) Let ρ1, ρ2, . . . , ρs denote the rows of the s × t matrix A over Z where

s ≥ t . Let d1, d2, . . . , dt denote the invariant factors of A. Show thatρ1, ρ2, . . . , ρs generate Z

t if and only if d1 = d2 = · · · = dt = 1.Hint: Use Corollaries 1.13 and 1.19.Suppose 〈ρ1, ρ2, . . . , ρs〉 = Z

t and let ρ′1, ρ

′2, . . . , ρ

′t denote the rows

of AT . Show that a Z-basis of Zt can be selected from ρ1, ρ2, . . . , ρs

if and only if there are s − t rows of the s × s identity matrix whichtogether with ρ′

1, ρ′2, . . . , ρ

′t form a Z-basis of Zs .

Hint: It is possible to select a Z-basis of Zt from ρ1, ρ2, . . . , ρs if andonly if A has a t-minor equal to ±1.

(c) Test each of the following sets for Z-independence. Which of them iscontained in a Z-basis of Z3?

(i) (1,3,2), (4,6,5), (7,9,8); (ii) (1,3,7), (3,5,9);(iii) (1,2,3), (4,3,5).

Which of the following sets generate Z3? Which of them contains a

Z-basis of Z3?

(iv) (1,2,3), (3,1,2), (1,1,4), (4,1,1);(v) (1,1,1), (1,2,3), (1,3,4, ), (1,5,6);(vi) (1,1,1), (1,1,2), (1,3,1), (4,1,1).


7. (a) Let M and M ′ be R-modules and let θ : M → M ′ be an R-linearmapping. Suppose that M is free with R-basis v1, v2, . . . , vt . Letv′i = (vi)θ for 1 ≤ i ≤ t . Show that v′

1, v′2, . . . , v

′t generate M ′ ⇔ θ

is surjective. Show that v′1, v

′2, . . . , v

′t are R-independent ⇔ θ is in-

jective.Let M and M ′ be free R-modules of rank t and t ′ respectively. Showthat M and M ′ are isomorphic if and only if t = t ′.

(b) Let M be a free R-module of rank t . Suppose u1, u2, . . . , ut gener-ate M . Use Lemma 2.18 and Corollary 2.21 to show that u1, u2, . . . , ut

form an R-basis of M .(c) Let M and M ′ be R-modules and let θ : M → M ′ be an R-linear

mapping. Show that ker θ = {v ∈ M : (v)θ = 0} is a submodule of M .Show that im θ = {(v)θ : v ∈ M} is a submodule of M ′. Suppose θ isbijective; show that θ−1 is R-linear. Is the inverse of an isomorphismof R-modules itself such an isomorphism?

(d) Let N be a submodule of an R-module M . Complete the proofLemma 2.27 that M/N is an R-module.

(e) A non-empty subset N of an R-module M is closed under additionand ru ∈ N for all r ∈ R, u ∈ N . Is N a submodule of M? Justifyyour answer.Let N1 and N2 be submodules of an R-module M . Show thatN1 + N2 = {u1 + u2 : u1 ∈ N1, u2 ∈ N2} is a submodule of M . Showthat N1 ∩ N2 is a submodule of M .

(f) Let M1 and M2 be R-modules. Using Exercises 2.2, Question 4(f)show that the Cartesian product

M1 × M2 = {(v1, v2) : v1 ∈ M1, v2 ∈ M2}is an R-module (the external direct sum M1 ⊕ M2 of M1 and M2)on defining (v1, v2) + (v′

1, v′2) = (v1 + v′

1, v2 + v′2) and r(v1, v2) =

(rv1, rv2) for all v1, v′1 ∈ M1, v2, v

′2 ∈ M2, r ∈ R.

Use (e) above to show that N1 = {(v1,0) ∈ M1 ⊕ M2} andN2 = {(0, v2) ∈ M1 ⊕ M2} are submodules of M1 ⊕ M2 satisfyingN1 +N2 = M1 ⊕M2, N1 ∩N2 = {(0,0)}. Show that there are R-linearisomorphisms α1 : N1 ∼= M1, and α2 : N2 ∼= M2.What is the connection between the external direct sum M1 ⊕M2 andthe internal direct sum N1 ⊕ N2? (The answer is very short!)

3Decomposition of Finitely Generated

Z-Modules

In the first part of this chapter the theory of f.g. abelian groups G is completed: eachG gives rise to a unique sequence (d1, d2, . . . , dt ′) of non-negative integers, the invari-ant factor sequence of G, such that di |di+1 for 1 ≤ i < t ′ where d1 �= 1 and t ′ ≥ 0.Further two f.g. abelian groups are isomorphic if and only if their invariant factor se-quences are identical. One could not wish for a more concise conclusion! Notice that1 cannot be an invariant factor of G and any zero invariant factors occur at the endof the sequence. For example (2,6,6,24,96,96,0,0,0) and (6,6,18,18,18,36,72)

are invariant factor sequences.Nevertheless some questions remain unanswered, for instance: what is the number

of isomorphism classes of abelian groups of order n? The answer depends on theprime factorisation of n as we show in Section 3.2. Indeed, just as n can be resolvedinto a product of prime powers, so every finite abelian group can be decomposed intoa direct sum of groups of prime power order, its primary decomposition.

Two applications of the theory are discussed in Section 3.3. First the abelian mul-tiplicative groups of units (invertible elements) of finite fields and of the residue classrings Zn are analysed. Secondly the isomorphism types of subgroups and quotientgroups of a given f.g. abelian group are determined.


97

http://dx.doi.org/10.1007/978-1-4471-2730-7_3

98 3. Decomposition of Finitely Generated Z-Modules

3.1 The Invariant Factor Decomposition

Here the theory of matrices over Z is combined with the theory of abelian groups(aka Z-modules). The main ingredients are Theorem 1.11 on Smith normal form andthe first isomorphism theorem 2.16. At the centre of attention is the free Z-module ofrank t

Zt = Z⊕Z⊕ · · · ⊕Z

made up of t copies of the infinite cyclic group Z. The elements of Zt are t-tuples(m1,m2, . . . ,mt ) of integers, that is, row vectors with t integer entries. We saw in Sec-tion 2.3 that in some ways the theory of Zt is analogous to the theory of t-dimensionalvector spaces. Our first job here is to show that the rank of submodules of Zt behavesas one might expect. Further every finitely generated Z-module G is a homomorphicimage of Zt for some positive integer t , and so G is isomorphic, by Theorem 2.16, toa quotient group Z

t /K where K is a submodule of Zt . The matrix theory of Chapter 1is then applied to lick the concrete group Z

t /K into a recognisable shape. In particularthe minimum number t ′ of elements needed to generate G can be read off. The con-clusion is: G contains t ′ non-trivial cyclic submodules Hi such that G is the internaldirect sum

G = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′

where Hi is of type Cdiand di |di+1 for 1 ≤ i < t ′, 0 ≤ t ′ ≤ t . Although there are

in general many ways of decomposing a given f.g. abelian group G as above, Corol-lary 3.5, Lemma 3.6 and Theorem 3.7 together prove that the sequence d1, d2, . . . , dt ′is unique.

We first study submodules K of the Z-module Zt . You should glance back at

Theorem 1.15 because it is crucial for our next proof that every ideal of Z is principal.

Theorem 3.1

Let K be a submodule of Zt . Then K has a Z-basis z1, z2, . . . , zs where 0 ≤ s ≤ t ,that is, K is a free Z-module of rank s.

Proof

The proof is by induction on t . First consider t = 1. Submodules K of Z1 = Z areprecisely ideals K of Z. By Theorem 1.15 there is d in Z with K = 〈d〉. By conventionthe empty set ∅ is regarded as being a Z-basis of K = 〈0〉, that is, K = {0} is free ofrank s = 0. For K �= {0} the single non-zero integer d = z1 is a Z-basis of K and sos = 1.

3.1 The Invariant Factor Decomposition 99

Now suppose t > 1. To help the induction we regard Zt−1 as being the submodule

of Zt consisting of t-tuples having last entry zero, that is, Zt−1 = {(l1, l2, . . . , lt ) ∈Z

t : lt = 0}. Let K be a submodule of Zt and consider K ′ = {lt : (l1, l2, . . . , lt ) ∈ K},that is, K ′ consists of those integers lt which occur in the last place of t-tuples in K .Then K ′ is an ideal of Z (we leave the reader to check this fact). By Theorem 1.15there is d in Z with K ′ = 〈d〉. The intersection K ∩Z

t−1 is a submodule of Zt−1 andso by inductive hypothesis K ∩ Z

t−1 has a Z-basis z1, z2, . . . , zs−1 where s ≤ t . Ifd = 0 then K ⊆ Z

t−1 and K = K ∩ Zt−1 has Z-basis as above; so K is free of rank

s − 1 < t . If d �= 0 then d ∈ K ′, that is, there is a t-tuple zs in K having last entry d .We finish the proof by showing that z1, z2, . . . , zs−1, zs is a Z-basis of K .

Let k ∈ K . The last entry in the t-tuple k belongs to K ′ and so is qd for some q

in Z. Hence k−qzs has last entry zero. As zs belongs to K so also does k−qzs . There-fore k − qzs ∈ K ∩Z

t−1 = 〈z1, z2, . . . , zs−1〉. Hence there are integers l1, l2, . . . , ls−1

such that k = l1z1 + l2z2 + · · · + ls−1zs−1 + qzs . So K = 〈z1, z2, . . . , zs−1, zs〉, thatis, z1, z2, . . . , zs−1, zs generate K according to Definition 2.19(i).

We show that z1, z2, . . . , zs−1, zs are Z-independent. Suppose there are integersl1, l2, . . . , ls−1, ls with l1z1 + l2z2 + · · · + ls−1zs−1 + lszs = 0 which is an equalitybetween t-tuples of integers. Comparing last entries gives lsd = 0 as the last entry ineach of z1, z2, . . . , zs−1 is zero and zs has last entry d . Since d �= 0 we deduce ls = 0.This leaves l1z1 + l2z2 + · · · + ls−1zs−1 = 0. As z1, z2, . . . , zs−1 form a Z-basis ofK ∩ Z

t−1 they are Z-independent Definition 2.19 and so l1 = l2 = · · · = ls−1 = 0.Hence z1, z2, . . . , zs−1, zs are indeed Z-independent and so they form a Z-basis of K .Therefore rankK = s ≤ t and the induction is complete. �

By Theorem 3.1 every submodule K of the free Z-module Zt is itself free and

K ∼= Zs for 1 ≤ s ≤ t or K = {0} by Exercises 2.3, Question 7(a). So when K is

taken out of context its structure is seen to be almost dull. However the interest lies inthe relationship between K and its parent Zt . Suppose K �= {0}. Then K has Z-basisz1, z2, . . . , zs .

Let A denote the s × t matrix over Z having row i equal to zi for 1 ≤ i ≤ s.Regarding A as a matrix over the rational field Q we see that s = rankA as

A has s linearly independent rows (the reader should think through the implica-tion: z1, z2, . . . , zs linearly dependent over Q implies z1, z2, . . . , zs linearly dependentover Z). Now A = (aij ) relates the Z-basis z1, z2, . . . , zs of K to the standard Z-basise1, e2, . . . , et of Zt by the s row equations

zi =t∑

j=1

aij ej for 1 ≤ i ≤ s.

How does A change when the Z-bases of K and Zt are changed? The answer should

come as a pleasant surprise! Let z′1, z

′2, . . . , z

′s be a Z-basis of K . Form the s × t matrix


A′ = (a′ij ) having z′

i as row i. By Corollary 2.21 there is an invertible s × s matrix

P = (pij ) over Z such that z′i = ∑t

j=1 pij zj for 1 ≤ i ≤ t , that is, A′ = PA. Letρ1, ρ2, . . . , ρt be a Z-basis of Zt . By Corollary 2.23 the t × t matrix Q with ρj asrow j (1 ≤ j ≤ t) is invertible over Z. Let B = (bij ) be the s × t matrix relating theZ-basis z′

1, z′2, . . . , z

′s of K to the Z-basis ρ1, ρ2, . . . , ρt of Zt , that is,

z′i =

t∑

j=1

bijρj for 1 ≤ i ≤ s.

The above s row equations amount to the single matrix equation A′ = BQ. ThereforePA = BQ showing that

A changes to the equivalent matrix B = PAQ−1

on changing the Z-bases of K and Zt . We have already seen the above matrix equation

in Definition 1.5! The matrices P and Q, both invertible over Z, are at our disposal– we can choose them to suit our purpose. Using Theorem 1.11 we choose P and Q

such that PAQ−1 = D = diag(d1, d2, . . . , ds) is in Smith normal form, because thischoice will reveal the structure of the quotient group Z

t /K . Notice that the di arenon-zero as rankA = s. In fact B = D gives z′

i = diρi for 1 ≤ i ≤ s, showing that K

has Z-basis d1ρ1, d2ρ2, . . . , dsρs made up of integer multiples of the first s elementsof the Z-basis ρ1, ρ2, . . . , ρt of Zt .

Let G be a finitely generated Z-module. Then G contains a finite number t ofelements g1, g2, . . . , gt such that G = 〈g1, g2, . . . , gt 〉, that is, every element g of G

is expressible g = m1g1 + m2g2 + · · ·+ mtgt where m1,m2, . . . ,mt are integers. Thenext step should come almost as second nature to the reader: consider the Z-linearmapping

θ : Zt → G defined by (m1,m2, . . . ,mt )θ = m1g1 + m2g2 + · · · + mtgt

for all integers m1,m2, . . . ,mt . As we have seen already in special cases the homo-morphism θ is the key which unlocks the structure of G. Notice that (ej )θ = gj

showing that θ maps the j th element ej of the standard Z-basis of Zt to the j th

element gj of the given generators of G for 1 ≤ j ≤ t . Also θ is surjective asg = m1g1 + m2g2 + · · · + mtgt can be expressed g = (m1,m2, . . . ,mt )θ , that is,G = im θ . Let K = ker θ and then

θ : Zt /K ∼= G where (K + z)θ = (z)θ for all z = (m1,m2, . . . ,mt ) ∈ Zt

by Theorem 2.16. The concrete quotient group Zt /K is therefore an isomorphic

replica of the abstract f.g. abelian group G.Before tackling the decomposition in general, we study two particular examples.


Example 3.2

Let G be a Z-module generated by g1, g2, g3 subject to the relations

3g1 + 5g2 + 3g3 = 0, 3g1 + 3g2 + 5g3 = 0, 7g1 + 3g2 + 7g3 = 0.

This description is called a presentation of G, meaning that the abstract group G is thehomomorphic image of a concrete group, the kernel of the homomorphism being spec-ified by a set of generators. In our case G = 〈g1, g2, g3〉 = im θ where θ : Z3 → G isgiven by (m1,m2,m3)θ = m1g1 + m2g2 + m3g3. Write z1 = (3,5,3), z2 = (3,3,5),z3 = (7,3,7). Then the above equations can be expressed as (z1)θ = 0, (z2)θ = 0,(z3)θ = 0 and so z1, z2, z3 belong to ker θ . The above phrase ‘subject to the rela-tions. . . ’ means, by convention, that all relations between g1, g2, g3 are Z-linear com-binations of the three given relations. In other words K = ker θ is generated by z1, z2,z3, that is, K = 〈z1, z2, z3〉 and so the rows of

A =⎛

⎝z1

z2

z3

⎞

⎠ =⎛

⎝3 5 33 3 57 3 7

⎞

⎠

generate K . In this case z1, z2, z3 is a Z-basis of K as will become clear shortly. Usingthe method of Chapter 1, the sequence of elementary operations:

c2 − c1, c1 − c2, c2 − 2c1, c3 − 3c1, r2 − 3r1,

r3 − 11r1, −r2, −r3, c2 − c3, c3 − 2c2

reduces A to D = diag(1,2,26). So there are invertible 3 × 3 matrices P and Q overZ satisfying PA = DQ. The rows of the diagonal matrix D are non-zero and so theserows are Z-independent. Hence the rows of A = P −1DQ are Z-independent also,that is, z1, z2, z3 is a Z-basis of K . Now D = PAQ−1 and P and Q can be foundas in Chapter 1. Applying, in order, the eros in the above sequence, that is, r2 − 3r1,r3 − 11r1, −r2, −r3 to the 3 × 3 identity matrix I produces

P =⎛

⎝1 0 03 −1 011 0 −1

⎞

⎠ .

Applying, in order, the eros r1 + r2, r2 + r1, r1 + 2r2, r1 + 3r3, r3 + r2, r2 + 2r3, thatis, the conjugates of the ecos in the above sequence, to the 3 × 3 identity matrix I

produces the important matrix

Q =⎛

⎝ρ1

ρ2

ρ3

⎞

⎠ =⎛

⎝3 5 33 6 21 2 1

⎞

⎠ .


So the rows ρ1 = (3,5,3), ρ2 = (3,6,2), ρ3 = (1,2,1) of Q form a Z-basis of Z3.As the rows of A form a Z-basis of K , the rows of PA also form a Z-basis of K ,that is, the rows ρ1, 2ρ2, 26ρ3 of DQ form a Z-basis of K . Hence K is free ofrank 3. So we’ve arrived at the desirable situation where Z

3 = 〈ρ1, ρ2, ρ3〉 and K =〈ρ1,2ρ2,26ρ3〉, the Z-basis of K consisting of integer multiples of the Z-basis of Z3.At this point we abandon the standard Z-basis e1, e2, e3 of Z3 in favour of ρ1, ρ2, ρ3

which is tailor-made for the analysis of G.Consider the cyclic subgroups 〈(ρ1)θ〉, 〈(ρ2)θ〉, 〈(ρ3)θ〉 of G. As ρ1 ∈ K = ker θ

we see 〈(ρ1)θ〉 = 〈0〉 and this trivial group is promptly discarded. Write H1 = 〈(ρ2)θ〉and H2 = 〈(ρ3)θ〉. We show next that G = H1 ⊕ H2 (internal direct sum) and that H1

and H2 are cyclic of orders 2 and 26 respectively.As θ is surjective, for each g ∈ G there is z ∈ Z

3 with g = (z)θ . As ρ1, ρ2, ρ3

form a Z-basis of Z3, there are integers l1, l2, l3 with z = l1ρ1 + l2ρ2 + l3ρ3. As θ

is Z-linear g = (l1ρ1 + l2ρ2 + l3ρ3)θ = l1(ρ1)θ + l2(ρ2)θ + l3(ρ3)θ ∈ H1 + H2 since(ρ1)θ = 0. Therefore G = H1 + H2.

We now show that H1 and H2 are independent subgroups of G. Supposeh1 + h2 = 0 where h1 ∈ H1, h2 ∈ H2. There are l2, l3 ∈ Z with h1 = l2(ρ2)θ andh2 = l3(ρ3)θ . So l2(ρ2)θ + l3(ρ3)θ = 0. As θ is Z-linear we deduce (l2ρ2 + l3ρ3)θ = 0which gives l2ρ2 + l3ρ3 ∈ ker θ = K . As K = 〈ρ1,2ρ2,26ρ3〉 there are integersl′1, l′2, l′3 with l2ρ2 + l3ρ3 = l′1ρ1 + 2l′2ρ2 + 26l′3ρ3. Now ρ1, ρ2, ρ3 are Z-independentand so ‘comparing coefficients’ of ρ1, ρ2, ρ3, as explained after Definition 2.19, gives

0 = l′1, l2 = 2l′2, l3 = 26l′3. (♣)

Two consequences follow from (♣). First

h1 = l2(ρ2)θ = 2l′2(ρ2)θ = l′2(2ρ2)θ = 0,

h2 = l3(ρ3)θ = 26l′3(ρ3)θ = l′3(26ρ3)θ = 0

since 2ρ2,26ρ3 ∈ K = ker θ . Therefore h1 = h2 = 0 and so

G = 〈(ρ2)θ〉 ⊕ 〈(ρ3)θ〉 = H1 ⊕ H2

by Lemma 2.15 as we set out to prove. Secondly the orders of the generators (ρi)θ

for i = 2,3 can be directly deduced from (♣): suppose l2(ρ2)θ = 0 for l2 ∈ Z. Tak-ing l3 = 0 we obtain the equation l2(ρ2)θ + l3(ρ3)θ = 0 and so conclude l2 = 2l′2from (♣). Hence (ρ2)θ has order 2 since (ρ2)θ �= 0 but 2(ρ2)θ = (2ρ2)θ = 0.In the same way suppose l3(ρ3)θ = 0 for l3 ∈ Z. Setting l2 = 0 we again obtainl2(ρ2)θ + l3(ρ3)θ = 0. From (♣) we obtain l3 = 26l′3. So r(ρ3)θ �= 0 for 1 ≤ r < 26,and as 26(ρ3)θ = (26ρ3)θ = 0 the order of (ρ3)θ is 26. We’ve shown that H1 = 〈(ρ2)θ〉and H2 = 〈(ρ3)θ〉 are cyclic of isomorphism types C2 and C26 respectively and so G

has isomorphism type C2 ⊕ C26. The invariant factor sequence of G is (2,26) by


Definition 3.8. Finally rows 2 and 3 of Q express the generators of H1 and H2 asinteger linear combinations of the original generators g1, g2, g3 of G, that is,

(ρ2)θ = (3,6,2)θ = 3g1 + 6g2 + 2g3,

(ρ3)θ = (1,2,1)θ = g1 + 2g2 + g3

and so

G = H1 ⊕ H2 = 〈3g1 + 6g2 + 2g3〉 ⊕ 〈g1 + 2g2 + g3〉.

Example 3.3

Let G = 〈g1, g2, g3〉 where

2g1 + 4g2 + 6g3 = 0,

8g1 + 10g2 + 12g3 = 0,

14g1 + 16g2 + 18g3 = 0.

This presentation means first that G = im θ where θ : Z3 → G is the Z-linear mappingwith (ei)θ = gi for i = 1,2,3. Secondly K = ker θ is the subgroup of Z3 generatedby the rows of

A =⎛

⎝2 4 68 10 1214 16 18

⎞

⎠

and so G is isomorphic to Z3/K ; in fact θ : Z3/K ∼= G by Theorem 2.16. We know

from Example 1.2 that A has Smith normal form D = diag(2,6,0) and that there areinvertible 3 × 3 matrices P and Q over Z with PA = DQ. The zero entry in thediagonal of D shows that the rows of A = P −1DQ are Z-linearly dependent and soin this case the rows of A do not form a Z-basis of K . The rows of

Q =⎛

⎝ρ1

ρ2

ρ3

⎞

⎠ =⎛

⎝1 2 30 1 20 0 1

⎞

⎠

form a Z-basis ρ1, ρ2, ρ3 of Z3 and the non-zero rows 2ρ1, 6ρ2 of PA = DQ generateK and are Z-independent, that is, K has Z-basis 2ρ1, 6ρ2. So in this case K is free ofrank 2. As in Example 3.2, and in all cases as we’ll see in Theorem 3.4, the rows of Q

provide us with a suitable Z-basis of Zt for decomposing Zt /K ∼= G; this is because

the non-zero rows of PA = DQ form a Z-basis of K consisting of integer multiples ofthe rows of Q. What is more these integer multiples are the non-zero diagonal entriesin D and so divide each other successively. Let Hi denote the cyclic submodule of G


generated by (ρi)θ , that is, Hi = 〈(ρi)θ〉 for i = 1,2,3. As in Example 3.2 we obtainthe internal direct sum decomposition:

G = H1 ⊕ H2 ⊕ H3 = 〈(ρ1)θ〉 ⊕ 〈(ρ2)θ〉 ⊕ 〈(ρ3)θ〉= 〈g1 + 2g2 + g3〉 ⊕ 〈g1 + g2〉 ⊕ 〈g3〉.

Comparing coefficients of ρ1, ρ2, ρ3, as in Example 3.2 we see that the order ideals of(ρ1)θ , (ρ2)θ , (ρ3)θ are 〈2〉, 〈6〉, 〈0〉 respectively. So H1, H2, H3 are of isomorphismtypes C2, C6, C0 being isomorphic to Z/〈2〉, Z/〈6〉, Z/〈0〉 respectively. Finally G hasisomorphism type C2 ⊕C6 ⊕C0 and so has (as we will see in Definition 3.8) invariantfactor sequence (2,6,0).

We are ready for the first important theorem of this section.

Theorem 3.4 (The invariant factor decomposition of f.g. Z-modules)

Let G be a finitely generated Z-module. Then G contains t ′ ≥ 0 non-trivial cyclicsubmodules Hi (1 ≤ i ≤ t ′) such that G = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′ is their internal directsum, where Hi is of isomorphism type Cdi

with di |di+1 for 1 ≤ i < t ′. Therefore G isof isomorphism type Cd1 ⊕ Cd2 ⊕ · · · ⊕ Cdt ′ .

Proof

Let g1, g2, . . . , gt generate G and let θ : Zt → G be the surjective Z-linear mappingdefined, as usual, by (m1,m2, . . . ,mt )θ = m1g1 + m2g2 + · · · + mtgt for all integersm1,m2, . . . ,mt . So G = 〈g1, g2, . . . , gt 〉 = im θ . What about K = ker θ? Suppose firstK = {0}. Then θ : Zt ∼= G and so the decomposition Z

t = 〈e1〉⊕〈e2〉⊕· · ·⊕〈et 〉 givesG = H1 ⊕ H2 ⊕ · · · ⊕ Ht where Hi = 〈gi〉 = 〈(ei)θ〉 is of isomorphism type C0 for1 ≤ i ≤ t . In this case G is free of rank t , di = 0 for 1 ≤ i ≤ t and t ′ = t .

Suppose now K �= {0}. From Theorem 3.1 we know K has a Z-basis z1, z2, . . . , zs

where 1 ≤ s ≤ t . Let A denote the s × t matrix over Z having zi as row i andso eiA = zi (1 ≤ i ≤ s). The invariant factors of A consist of 1 (r times) fol-lowed by di for 1 ≤ i ≤ s − r . Let s′ = s − r (we allow r = 0 but insist thatd1 �= 1). By Corollary 1.13 there are invertible matrices P and Q over Z such thatPA = DQ where D = diag(1,1, . . . ,1, d1, d2, . . . , ds′) is in Smith normal form. AsrankD = rankA = s the integers d1, d2, . . . , ds′ are non-zero. By Corollary 2.23 therows ρ1, ρ2, . . . , ρt of Q form a Z-basis of Zt . As the rows of A form a Z-basis of K

we see from Corollary 2.21 that the rows of PA also form a Z-basis of K . Thereforethe rows of DQ, that is,

ρ1, . . . , ρr , d1ρr+1, . . . , ds′ρs form a Z-basis of K .


As ρ1, ρ2, . . . , ρt generate Zt and θ is surjective, it follows that (ρ1)θ, (ρ2)θ, . . . , (ρt )θ

generate im θ = G. But (ρ1)θ = (ρ2)θ = · · · = (ρr)θ = 0 since ρj ∈ ker θ for1 ≤ j ≤ r . Discarding these r zero generators we are left with (ρr+1)θ, (ρr+2)θ, . . . ,

(ρt )θ which generate G. Let t ′ = t − r and let H1 = 〈(ρr+1)θ〉, H2 = 〈(ρr+2)θ〉,. . . , Ht ′ = 〈(ρt )θ〉. So Hi is a cyclic submodule of G for 1 ≤ i ≤ t ′ and G =H1 + H2 + · · · + Ht ′ . We now use Lemma 2.15 to show that G is the internal di-rect sum of H1,H2, . . . ,Ht ′ . At the same time we will discover the isomorphism typeof each Hi .

Suppose h1 + h2 + · · · + ht ′ = 0 where hi ∈ Hi for 1 ≤ i ≤ t ′. For each such i

there is an integer lr+i with hi = lr+i (ρr+i )θ as Hi is cyclic with generator (ρr+i )θ .Hence lr+1(ρr+1)θ + lr+2(ρr+2)θ + · · · + lt (ρt )θ = 0 on substituting for each hi .Since θ is Z-linear we deduce (lr+1ρr+1 + lr+2ρr+2 + · · · + ltρt )θ = 0, that is,lr+1ρr+1 + lr+2ρr+2 + · · · + lt ρt ∈ ker θ = K . As ρ1, . . . , ρr , d1ρr+1, . . . , ds′ρs is aZ-basis of K there are unique integers l′1, l′2, . . . , l′s satisfying

lr+1ρr+1 + lr+2ρr+2 +· · ·+ lt ρt = l′1ρ1 +· · ·+ l′rρr + l′r+1d1ρr+1 +· · ·+ l′sds′ρs. (♦)

As ρ1, ρ2, . . . , ρt are Z-independent we can legitimately ‘compare coefficients’ inequation (♦). The first r elements ρ1, ρ2, . . . , ρr do not appear on the left-hand sideof (♦) and so l′1 = l′2 = · · · = l′r = 0. The next s′ elements ρr+1, ρr+2, . . . , ρs occuron both sides of (♦) and so lr+i = l′r+idi for 1 ≤ i ≤ s′. Hence hi = lr+i (ρr+i )θ =l′r+idi(ρr+i )θ = l′r+i (diρr+i )θ = l′r+i × 0 = 0 as diρr+i ∈ ker θ for 1 ≤ i ≤ s′. Thelast t − s elements ρs+1, ρs+2, . . . , ρt do not appear on the right-hand side of (♦) andso ls+1 = ls+2 = · · · = lt = 0. Hence hi = lr+i (ρr+i )θ = 0(ρr+i )θ = 0 for s′ < i ≤ t ′.Therefore hi = 0 for 1 ≤ i ≤ t ′ showing that H1,H2, . . . ,Ht ′ are independent sub-modules of G. By Lemma 2.15

G = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′ .

Every f.g. abelian group is an internal direct sum of cyclic subgroups.

This is an important part of Theorem 3.4, but there is still more to prove! What arethe isomorphism types Definition 2.6 of the above subgroups Hi? Suppose first that1 ≤ i ≤ s′ and let lr+i belong to the order ideal of the generator (ρr+i )θ of Hi . Thenlr+i (ρr+i )θ = 0 and hence (lr+iρr+i )θ = 0, that is, lr+iρr+i ∈ ker θ . Equation (♦)holds with lr+j = 0 for 1 ≤ j ≤ t ′, j �= i. From (♦) we deduce di |lr+i , that is, lr+i

is an integer multiple of di . On the other hand di(ρr+i )θ = (diρr+i )θ = 0 as diρr+i

belongs to K (it is present in a Z-basis of K). We conclude that 〈di〉 is the order idealof (ρr+i )θ and so Hi = 〈(ρr+i )θ〉 has isomorphism type Cdi

for 1 ≤ i ≤ s′.Secondly suppose s′ < i ≤ t ′ and let lr+i belong to the order ideal of the generator

(ρr+i )θ of Hi . Then lr+i (ρr+i )θ = 0 and hence (lr+iρr+i )θ = 0, that is, lr+iρr+i ∈ker θ as before. Also equation (♦) holds with lr+j = 0 for 1 ≤ j ≤ t ′, j �= i. The


conclusion is lr+i = 0 in this case and 〈0〉 is the order ideal of (ρr+i )θ . So Hi =〈(ρr+i )θ〉 has isomorphism type C0 for s′ < i ≤ t ′ and

the number of infinite cyclic Hi is t ′ − s′ = t − s = rankZt − rankK.

Let us define di = 0 for s′ < i ≤ t ′. Then the two preceding paragraphs show that Hi isof isomorphism type Cdi

for 1 ≤ i ≤ t ′. As D = diag(1,1, . . . ,1, d1, d2, . . . , ds′) is inSmith normal form we know di |di+1 for 1 ≤ i < s′. Therefor di |di+1 for 1 ≤ i < t ′ asds′ |0 and 0|0. So G is of isomorphism type Cd1 ⊕ Cd2 ⊕ · · · ⊕ Cdt ′ by Definition 2.13and the proof is complete. �

The decomposition theorem 3.4 is a milestone, and you know that it has takensome effort to get this far. However it is an existence theorem – every finitely generatedZ-module G can be decomposed in a certain way – and as such raises a numberof related questions. We mentioned earlier that the cyclic submodules Hi appearingin the decomposition Theorem 3.4 are not unique. Are the non-negative integers di ,which specify the isomorphism types of the Hi , unique? Our immediate aim is toconvince you that the answer here is: Yes! As a consequence the additive abeliangroups

Z2 ⊕Z6 ⊕Z24 ⊕Z48 and Z2 ⊕Z12 ⊕Z12 ⊕Z48

are not isomorphic although they have many properties in common: both have order29 × 33, both have exponent 48 (both contain an element of order 48 but no elementof higher order), both are generated by 4 of their elements, both have 15 elements oforder 2 and 26 elements of order 3.

The proof of the uniqueness of the integers di lies in the study of invariants ofZ-modules, that is, those properties of Z-modules which are preserved by isomor-phisms. Anticipating Definition 3.8 the sequence (d1, d2, . . . , dt ′) as in Theorem 3.4is uniquely determined by the f.g. Z-module G and is called the sequence of invari-ant factors of G. From the abstract point of view this tells us all there is to knowabout finitely generated Z-modules: two such modules are isomorphic if and only iftheir invariant factor sequences are identical. Note that the integer 1 never occurs asan invariant factor: because d1 �= 1 and so either d1 = 0 or d1 > 1; as d1|di we seedi �= 1 for 1 ≤ i ≤ t ′. The empty set ∅ is the invariant factor sequence of every trivialZ-module as t ′ = 0 in this case. We now present the details leading up to Defini-tion 3.8.

Let G be a Z-module. An element g of G has finite order if there is a non-zero integer l with lg = 0. From l1g1 = 0, l2g2 = 0, l1 �= 0, l2 �= 0 we deducel1l2(g1 + g2) = l2l1g1 + l1l2g2 = 0 + 0 = 0, showing that the sum g1 + g2 of ele-ments g1, g2 ∈ G of finite order is itself an element of finite order since l1l2 �= 0. LetT denote the set of all elements g in G having finite order. It is routine to check that


T is a submodule of G. It is customary to call T the torsion subgroup or torsion sub-module of G. For example the torsion subgroup T of G = Z2 ⊕ Z6 ⊕ Z consists ofthe 12 triples (r, s,0) where r ∈ Z2, s ∈ Z6, and so T ∼= Z2 ⊕Z6.

When the f.g. Z-module G is decomposed as in Theorem 3.4, it is a relativelysimple matter to locate its torsion submodule T as we show next. What is more we’llsee that the quotient G/T is free and its rank is a useful invariant.

Corollary 3.5

Let G = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′ be an internal direct sum decomposition as in Theo-rem 3.4 and so Hi is of isomorphism type Cdi

for 1 ≤ i ≤ t ′ and di |di+1 for 1 ≤ i < t ′.Let s′ be the integer in the range 0 ≤ s′ ≤ t ′ such that di > 1 for 1 ≤ i ≤ s′ anddi = 0 for t ′ ≥ i > s′. Then G has torsion subgroup T = H1 ⊕ H2 ⊕ · · · ⊕ Hs′ and|T | = d1d2 · · ·ds′ . The quotient module G/T is free of rank t ′ − s′.

Proof

Each element g in G can be expressed g = h1 + h2 + · · · + ht ′ where hi ∈ Hi . By theuniqueness property of the internal direct sum, we see that lg = 0 if and only if lhi = 0for each i with 1 ≤ i ≤ t ′. So g has finite order if and only if each hi has finite order.For 1 ≤ i ≤ s′ each hi has finite order since it belongs to the finite cyclic group Hi . Fors′ < i ≤ t ′ the cyclic group Hi is of type C0 as di = 0; in other words Hi is infinitecyclic and the only element hi in such a group having finite order is hi = 0. So T

consists of elements g = h1 + h2 + · · · + hs′ and hence T = H1 ⊕ H2 ⊕ · · · ⊕ Hs′ . As|Hi | = di for 1 ≤ i ≤ s′ we see that T is a finite abelian group of order d1d2 · · ·ds′ .

The mapping π : G → G defined by (g)π = hs′+1 + hs′+2 + · · · + ht ′ where g =h1 +h2 +· · ·+ht ′ is Z-linear. In fact π is the projection of G onto its direct summandimπ = Hs′+1 ⊕Hs′+2 ⊕ · · ·⊕Ht ′ . Now (g)π = 0 if and only if hi = 0 for s′ < i ≤ t ′,that is kerπ = H1 ⊕H2 ⊕· · ·⊕Hs′ = T . By Theorem 2.16 we know G/kerπ ∼= imπ

and so G/T ∼= imπ . However imπ has Z-basis (ρs+1)θ, (ρs+2)θ, . . . , (ρt )θ using thenotation of Theorem 3.4, as Hi has isomorphism type C0 if and only if s′ < i ≤ t ′. Soimπ is free of rank t − s = t ′ − s′ and the same is true of G/T by Lemma 2.25. �

From Theorem 3.4 and Corollary 3.5 it follows that every f.g. Z-module G has aninternal direct sum decomposition

G = T ⊕ M

where T is the torsion submodule of G and M is a free submodule of G, becauseM = imπ satisfies these conditions. However M is not usually unique: in the case


G = Z2 ⊕ Z then T = {(0,0), (1,0)} and both M0 = 〈(0,1)〉, M1 = 〈(1,1)〉 are freesubmodules of rank 1 satisfying G = T ⊕ M0 = T ⊕ M1.

We show next that isomorphic modules have isomorphic torsion submodules andalso that the corresponding quotient modules are isomorphic.

Lemma 3.6

Let α : G ∼= G′ be an isomorphism between the Z-modules G and G′. Let T and T ′be the torsion subgroups of G and G′ respectively. Then (T )α = T ′ and so T ∼= T ′.Also α : G/T ∼= G′/T ′ where (T + g)α = T ′ + (g)α for all g ∈ G.

Proof

Consider g0 ∈ T . Then lg0 = 0 for some non-zero integer l. Hence l(g)α = 0 by theZ-linearity of α. So (g0)α ∈ T ′ which means that (T )α ⊆ T ′. Now let g1, g2 ∈ G besuch that g1 ≡ g2 (mod T ). Then g1 − g2 ∈ T and applying α gives (g1 − g2)α ∈ T ′,that is, (g1)α − (g2)α ∈ T ′ and so (g1)α ≡ (g2)α (mod T ′). We have shown thatT + g1 = T + g2 implies T ′ + (g1)α = T ′ + (g2)α. So it makes sense to define α asabove. Since α is Z-linear, so also is α : G/T → G′/T ′.

The bijective property of α now comes into play. The mapping α−1 : G′ → G isZ-linear by Exercises 2.1, Question 4(d), and so α−1 is a Z-module isomorphism.Replacing α by β = α−1 in the above paragraph we obtain (T ′)β ⊆ T . Applying α tothis set containment gives T ′ = (T ′)βα ⊆ (T )α. Therefore (T )α = T ′ and so T ∼= T ′as α restricted to T is an isomorphism between T and T ′. Also β : G′/T ′ → G/T ,defined by (T ′ + g′)β = T + (g′)β for all g′ ∈ G, is Z-linear. As α, β are an inversepair of isomorphisms, the same is true of α, β . Therefore α : G/T ∼= G′/T ′. �

We apply Lemma 3.6 to the finitely generated Z-module G with decompositionG = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′ as in Theorem 3.4 and the isomorphic Z-module G′ withan analogous decomposition G′ = H ′

1 ⊕ H ′2 ⊕ · · · ⊕ H ′

t ′′ where H ′j is cyclic of type

Cd ′j

with d ′1 �= 1, d ′

j |d ′j+1 for 1 ≤ j ≤ t ′′. Notice that the case G = G′ is included

here, that is, amongst other things, we are considering two decompositions of thesame Z-module. What must these decompositions have in common? Let us supposethat the integer s′′ with 0 ≤ s′′ ≤ t ′′ is such that d ′

j > 1 for 1 ≤ j ≤ s′′ and d ′j = 0

for s′′ < j ≤ t ′′. Then T ′ = H ′1 ⊕ H ′

2 ⊕ · · · ⊕ H ′s′′ is the torsion submodule of G′ and

G′/T ′ is free of rank t ′′ − s′′, on applying Corollary 3.5 to G′. By Lemma 3.6 the freeZ-modules G/T and G′/T ′ are isomorphic and so have equal rank by Lemma 2.25,that is,

t ′ − s′ = t ′′ − s′′


showing that the numbers of infinite cyclic summands in the two decompositions areequal. Therefore this number, which is called the torsion-free rank of G, is unambigu-ously defined and is an invariant of f.g. Z-modules.

We now compare the torsion submodules T and T ′ of G and G′. From Lemma 3.6we know that T and T ′ are isomorphic. Our aim is to show s′ = s′′ and di = d ′

i for1 ≤ i ≤ s′. This is achieved by studying certain corresponding submodules of T andT ′ as we now explain. For each integer n and Z-module G, let nG denote the sub-module of G consisting of all elements ng for g ∈ G. In other words nG = imμn

where the Z-linear mapping μn : G → G is given by (g)μn = ng for all g ∈ G. WriteG(n) = {g ∈ G : ng = 0} and so G(n) = kerμn is a submodule of G. For Z-modulesG1,G2, . . . ,Gs it is straightforward to verify

n(G1 ⊕ G2 ⊕ · · · ⊕ Gs) = nG1 ⊕ nG2 ⊕ · · · ⊕ nGs and

(G1 ⊕ G2 ⊕ · · · ⊕ Gs)(n) = (G1)(n) ⊕ (G2)(n) ⊕ · · · ⊕ (Gs)(n)

(see Exercises 3.1, Question 5(c)). Suppose the Z-module element g has positive or-der d . Then ng has order d/gcd{n,d} by Lemma 2.7. Therefore

nZd∼= Zd/gcd{n,d}

on taking g = 1 ∈ Zd . In particular nZd is trivial if and only if d/gcd{n,d} = 1,that is, if and only if d|n. The Z-module element g′ = (d/gcd{n,d})g has or-der d/gcd{d/gcd{n,d}, d} = d/(d/gcd{n,d}) = gcd{n,d} by Lemma 2.7. Withg = 1 ∈ Zd as above we obtain

(Zd)(n)∼= Zgcd{n,d}

as g′ = (d/gcd{n,d})1 generates (Zd)(n) (see Exercises 3.1, Question 5(d)).There is just one more thing to point out: for n ≥ 2 the order of each element of

G(n) is a divisor of n and so the Z-module status of G(n) can be upgraded to that of aZn-module at no extra cost! Specifically the product ig = (〈n〉 + i)g = ig for i ∈ Zn,g ∈ G(n) is unambiguously defined and gives G(n) the structure of a Zn-module.

As an illustration consider T = Z6 ⊕Z6 ⊕Z12 ⊕Z36. Then

2T ∼= Z3 ⊕Z3 ⊕Z6 ⊕Z18, 3T ∼= Z2 ⊕Z2 ⊕Z4 ⊕Z12,

4T ∼= Z3 ⊕Z3 ⊕Z3 ⊕Z9, 5T ∼= Z6 ⊕Z6 ⊕Z12 ⊕Z36,

6T ∼= Z1 ⊕Z1 ⊕Z2 ⊕Z6, 8T ∼= Z3 ⊕Z3 ⊕Z3 ⊕Z9,

T(2)∼= Z2 ⊕Z2 ⊕Z2 ⊕Z2, T(3)

∼= Z3 ⊕Z3 ⊕Z3 ⊕Z3,

T(4)∼= Z2 ⊕Z2 ⊕Z4 ⊕Z4, T(5)

∼= Z1 ⊕Z1 ⊕Z1 ⊕Z1,

T(6)∼= Z6 ⊕Z6 ⊕Z6 ⊕Z6, T(8)

∼= Z2 ⊕Z2 ⊕Z4 ⊕Z4.


The decompositions 6T and T(6) are the most useful: the following proof is by induc-tion on the number of non-isomorphic non-trivial summands (terms) present whichis 3 for T but only 2 for 6T . Also T(6)

∼= Z46 is a free Z6-module of rank 4 and this

shows, as we will see, that every decomposition of T as in Theorem 3.4 has exactly 4non-trivial summands. We remind the reader that every finite cyclic group of order d

is isomorphic to (the additive group of the ring) Zd (see Theorem 2.5) and this fact isused in the next proof.

We are now ready for the final theorem of Section 3.1.

Theorem 3.7 (The invariance theorem for finite Z-modules)

Let T and T ′ be isomorphic finite Z-modules. By Corollary 3.5 there are positiveintegers d1, d2, . . . , ds′ such that T ∼= Zd1 ⊕Zd2 ⊕· · ·⊕Zds′ where d1 > 1 and di |di+1

for 1 ≤ i < s′. In the same way there are positive integers d ′1, d

′2, . . . , d

′s′′ such that

T ′ ∼= Zd ′1⊕ Zd ′

2⊕ · · · ⊕ Zd ′

s′′where d ′

1 > 1 and d ′i |d ′

i+1 for 1 ≤ i < s′′. Then s′ = s′′

and di = d ′i for 1 ≤ i ≤ s′.

Proof

By hypothesis there is an isomorphism α : T ∼= T ′. Let n be an integer. The Z-linearity of α gives μnα = αμn as (ng)α = n(g)α for all g ∈ T . Therefore (nT )α ={(ng)α : g ∈ T } = {n(g)α : g ∈ T } ⊆ nT ′ as (g)α ∈ T ′ for g ∈ T . Replacing α by α−1

we see (nT ′)α−1 ⊆ nT . Applying α to this inclusion gives nT ′ ⊆ (nT )α. Therefore(nT )α = nT ′ showing that the submodules nT and nT ′ correspond under α and so areisomorphic. We write α| : nT ∼= nT ′ as the restriction α| of α to nT is an isomorphismbetween nT and nT ′.

In the same way for n ∈ Z and g ∈ T(n) we have (g)μn = 0 and so 0 = (g)μnα =(g)αμn showing (g)α ∈ T ′

(n). We’ve shown (T(n))α ⊆ T ′(n). Replacing α by α−1

gives (T ′(n))α

−1 ⊆ T(n) and so, on applying α, we obtain T ′(n) ⊆ (T(n))α. Therefore

(T(n))α = T ′(n) showing that the submodules T(n) and T ′

(n) correspond under α and soare isomorphic. As above α| : T(n)

∼= T ′(n).

Take n = d1 and focus on the isomorphism α| : T(d1)∼= T ′

(d1). Using the prelimi-

nary theory (Zdi)(d1)

∼= Zd1 since gcd{d1, di} = d1 for 1 ≤ i ≤ s′. FromT ∼= Zd1 ⊕Zd2 ⊕ · · · ⊕Zds′ we deduce

T(d1)∼= (Zd1)(d1) ⊕ (Zd2)(d1) ⊕ · · · ⊕ (Zds′ )(d1)

∼= Zd1 ⊕Zd1 ⊕ · · · ⊕Zd1 = (Zd1)s′.

The Zd1 -module T(d1) is therefore isomorphic to the free Zd1 -module (Zd1)s′

ofrank s′. By Lemma 2.25 both T(d1) and T ′

(d1)are free Zd1 -modules of rank s′. Com-


bining (Zd ′i)(d1)

∼= Zgcd{d1,d′i } for 1 ≤ i ≤ s′′ and T ′ ∼= Zd ′

1⊕Zd ′

2⊕ · · · ⊕Zd ′

s′′gives

T ′(d1)

∼= Zgcd{d1,d′1} ⊕Zgcd{d1,d

′2} ⊕ · · · ⊕Zgcd{d1,d

′s′′ }

showing that T ′(d1)

is the direct sum of s′′ cyclic submodules and so is generated by s′′of its elements. From Theorem 2.20 we deduce s′′ ≥ s′. We have reached the tippingpoint in the proof! Using the isomorphism α−1 : T ′ ∼= T the preceding theory ‘works’on interchanging T and T ′. Using the isomorphism α−1| : T ′

(d ′1)

∼= T(d ′1)

we see that

the Zd ′1-module T ′

(d ′1)

is isomorphic to the free Zd ′1-module (Zd ′

1)s

′′of rank s′′. Hence

T(d ′1)

∼= Zgcd{d ′1,d1} ⊕Zgcd{d ′

1,d2} ⊕ · · · ⊕Zgcd{d ′1,ds′ }

is a free Zd ′1-module of rank s′′ by Lemma 2.25 and is generated by s′ of its ele-

ments. Therefore s′ ≥ s′′ by Theorem 2.20 and so s′ = s′′. From Lemma 2.18 andCorollary 2.21 the s′ generators of T ′

(d1)form a Zd1 -basis of the Zd1 -module T ′

(d1)(see

Exercises 2.3, Question 7(b)). Therefore each of these s′ generators has order ideal{0} = {〈d1〉} in the Zd1 -module T ′

(d1)and so has order d1 in the Z-module T ′

(d1). From

the first of these generators we deduce gcd{d1, d′1} = d1 showing d1|d ′

1. Interchangingthe roles of T and T ′ gives d ′

1|d1 and so d1 = d ′1.

Let m1 denote the number of i with di = d1 and let m′1 denote the number of

i with d ′i = d1. As d1Zdi

∼= Zdi/gcd{d1,di } = Zdi/d1 , from the preliminary discussion,we obtain d1T ∼= ∑

m1<i≤s′ ⊕Zdi/d1 . So d1T is the direct sum of s′ − m1 non-trivialcyclic submodules and, since di/d1|dj /d1 for m1 < i ≤ j ≤ s′, this decompositionis again as in Corollary 3.5. In the same way d1T

′ ∼= ∑m′

1<i≤s′ ⊕Zd ′i /d1

which is a

decomposition of d1T′ into s′ − m′

1 non-trivial cyclic submodules as in Corollary 3.5.As α| : d1T ∼= d1T

′ the proof can be completed by induction on the number, r say, ofdifferent integers among d1, d2, . . . , ds′ . Take r = 1. Then m1 = s′ and d1T is trivial.So d1T

′ is also trivial and m′1 = s′. Therefore di = d1 = d ′

i for 1 ≤ i ≤ s′. Now taker > 1. There are r − 1 different integers among dm1+1/d1, dm1+2/d1, . . . , ds′/d1 andso the conclusion of Theorem 3.7 holds on replacing α : T ∼= T ′ by α| : d1T ∼= d1T

′,that is, s′ −m1 = s′ −m′

1 (showing m1 = m′1) and also di/d1 = d ′

i/d1 for m1 < i ≤ s′.We now have s′ = s′′, di = d1 = d ′

i for 1 ≤ i ≤ m1 = m′1 and di = d ′

i for m1 < i ≤ s′on multiplying by d1. The induction is therefore complete. �

We have finally arrived and all we need to do is pull ourselves together!

Definition 3.8

Let G be a finitely generated Z-module. Then G decomposes G = H1 ⊕H2 ⊕· · ·⊕Ht ′as the internal direct sum of t ′ non-trivial cyclic subgroups Hi (1 ≤ i ≤ t ′) where Hi is


of isomorphism type Cdiand di |di+1 (1 ≤ i < t ′) by Theorem 3.4. By Corollary 3.5,

Lemma 3.6, Theorem 3.7 the integers di are unique and so it is legitimate to say:

(d1, d2, . . . , dt ′)

is the invariant factor sequence of G.

Let s′ be as in Corollary 3.5. Then (d1, d2, . . . , dt ′) = (d1, d2, . . . , ds′,0,0, . . . ,0)

showing that the invariant factor sequence of G terminates in t ′ − s′ zeros.For example suppose (2,6,42,84,0,0,0) is the invariant factor sequence of G.

Then G has torsion-free rank 3 (the number of zero invariant factors) and the freeZ-module G/T has invariant factor sequence (0,0,0). Also (2,6,42,84) is the in-variant factor sequence of the torsion subgroup T of G and so |T | = 2×6×42×84 =25 × 33 × 72.

Our next corollary ‘sows up’ the theory of isomorphism classes of f.g. Z-modules:there is just one class for each invariant factor sequence.

Corollary 3.9 (Classification of finitely generated Z-modules)

Let G and G′ be finitely generated Z-modules. Then G and G′ are isomorphic if andonly if their invariant factor sequences (d1, d2, . . . , dt ′) and (d ′

1, d′2, . . . , d

′t ′′) are equal,

that is, t ′ = t ′′ and di = d ′i for 1 ≤ i ≤ t ′.

Proof

Suppose first that G and G′ are isomorphic Z-modules. So there is an isomorphismα : G ∼= G′. By Lemma 3.6 the torsion subgroups T and T ′ of G and G′ are isomor-phic. We deduce di = d ′

i for 1 ≤ i ≤ s′ by Corollary 3.5 and Theorem 3.7, accountingfor all non-zero invariant factors of G and G′. So (d1, d2, . . . , dt ′) terminates witht ′ − s′ zeros and (d ′

1, d′2, . . . , d

′t ′′) terminates with t ′′ − s′ zeros. By Corollary 3.5 and

Lemma 3.6 the quotient Z-modules G/T and G′/T ′ are isomorphic and free of rankst ′ − s′ and t ′′ − s′ respectively. Hence t ′ − s′ = t ′′ − s′ by Lemma 2.25 showing that G

and G′ have the same torsion-free rank. Therefore t ′ = t ′′ and di = d ′i for 1 ≤ i ≤ t ′.

Conversely suppose (d1, d2, . . . , dt ′) = (d ′1, d

′2, . . . , d

′t ′′), that is, t ′ = t ′′ and di = d ′

i

for 1 ≤ i ≤ t ′. Then G and G′ have internal direct sum decompositions G =H1 ⊕H2 ⊕· · ·⊕Ht ′ and G′ = H ′

1 ⊕H ′2 ⊕· · ·⊕H ′

t ′ where Hi is a cyclic subgroup of G

and H ′i is a cyclic subgroup of G′; also Hi and H ′

i are of the same isomorphism typeCdi

for 1 ≤ i ≤ t ′. By Theorem 2.5 there are isomorphisms αi : Hi∼= H ′

i for 1 ≤ i ≤ t ′.For each g in G there are unique elements hi ∈ Hi with g = h1 + h2 + · · · + ht ′ ; wedefine α : G → G′ by (g)α = (h1)α1 + (h2)α2 + · · · + (ht ′)αt ′ ∈ G′. As each αi isZ-linear so also is α. As each αi is bijective so also is α. We conclude α : G ∼= G′ andso G and G′ are isomorphic Z-modules. �


The sequence (d1, d2, . . . , dt ′) of t ′ non-negative integers di is said to satisfy theinvariant factor condition if d1 �= 1 and di |di+1 for 1 ≤ i < t ′ where t ′ ≥ 0. Do allsuch sequences arise from f.g. Z-modules as in Definition 3.8? The answer is: Yes!The additive group Z/〈n〉 has invariant factor sequence (n) for all non-negative inte-gers n (n �= 1). More generally suppose (d1, d2, . . . , dt ′) satisfies the invariant factorcondition; then the external direct sum Z/〈d1〉⊕Z/〈d2〉⊕ · · ·⊕Z/〈dt ′ 〉 is a Z-modulewith invariant factor sequence (d1, d2, . . . , dt ′).

There are six isomorphism classes of abelian groups G of order 72, that is,with |G| = 72 because there are six sequences (d1, d2, . . . , dt ′) of non-negative inte-gers with d1d2 · · ·dt ′ = 72 satisfying the invariant factor condition namely (2,2,18),(2,6,6), (2,36), (3,24), (6,12), (72). We will discover in Section 3.2 that these se-quences are best found using the prime factorisation of 72 = 23 × 32.

It follows from Theorem 3.4 and Definition 3.8 that the finitely generatedZ-module G cannot be generated by less than t ′ of its elements. So the invariantfactor decomposition G = H1 ⊕ H2 ⊕ · · · ⊕ Ht ′ is the best one can achieve in thesense that the number t ′ of cyclic direct summands Hi is a minimum. In Section 3.2we discuss a way of decomposing a finite abelian group into a direct sum of non-trivialcyclic subgroups where the number of summands is maximum. This new method ofdecomposition is in some ways more revealing than the invariant factor decomposi-tion.

EXERCISES 3.1

1. Throughout this question the Z-module G is generated by its elementsg1, g2 and θ : Z2 → G is the Z-linear mapping defined by (m1,m2)θ =m1g1 + m2g2 for all m1,m2 ∈ Z.(a) The generators g1, g2 of G are subject to the relations 4g1 + 2g2 = 0,

10g1 + 2g2 = 0. Calculate the Smith normal form D of A = (4 210 2

).

Find invertible matrices P and Q over Z with PA = DQ. Do therows z1, z2 of A form a Z-basis of K = ker θ? Select a Z-basis of ker θfrom the rows of DQ. What is the value of rankK? Express (ρ1)θ and(ρ2)θ as integer linear combinations of g1, g2 where ρ1, ρ2 denote therows of Q. Is G = 〈(ρ1)θ〉 ⊕ 〈(ρ2)θ〉? (Yes/No). State the orders of(ρ1)θ and (ρ2)θ . State the invariant factor sequence and isomorphismtype of G.

(b) The generators g1, g2 of G are subject to the relations 3g1 + 5g2 = 0,7g1 + 9g2 = 0. Answer Question 1(a) above in the case of thisZ-module G, that is, calculate the Smith normal form D of A = (

3 57 9

)

etc. Is G cyclic?


(c) The generators g1, g2 of G are subject to the relations 8g1 + 9g2 = 0,7g1 + 8g2 = 0. Answer Question 1(a) above in the case of thisZ-module G. Is G trivial?

(d) The generators g1, g2 of G are subject to the relations 2g1 + 4g2 = 0,4g1 + 8g2 = 0. Answer Question 1(a) above in the case of thisZ-module G. What is the torsion-free rank of G and the order of itstorsion submodule?

(e) The generators g1, g2 of G are subject to the relations 3g1 + 4g2 = 0,6g1 + 8g2 = 0. Answer Question 1(a) above in the case of thisZ-module G. Is G infinite cyclic?

(f) The generators g1, g2 of G are subject to the relations14g1 + 36g2 = 0, 12g1 + 28g2 = 0, 28g1 + 12g2 = 0. Find the Smithnormal form of

A =⎛

⎝14 3612 2828 12

⎞

⎠

and hence answer Question 1(a) above in the case of this Z-module G.(g) The generators g1, g2 of G are subject to the relations n1g1 = 0,

n2g2 = 0 where n1, n2 are positive integers. Use Lemma 1.10 to findthe invariant factors of G. Under what condition on n1, n2 is G cyclic?Hint: The number of invariant factors of G is 0, 1 or 2.

2. Throughout this question the Z-module G is generated by its elementsg1, g2, g3 and θ : Z

3 → G is the Z-linear mapping defined by(m1,m2,m3)θ = m1g1 + m2g2 + m3g3 for all m1,m2,m3 ∈ Z.(a) The generators g1, g2, g3 of G are subject to the relations

2g1 + 4g2 + 4g3 = 0,

4g1 + 8g2 + 6g3 = 0,

2g1 + 6g2 + 6g3 = 0.

Find the Smith normal form D of the coefficient matrix

A =⎛

⎝2 4 44 8 62 6 6

⎞

⎠

of these relations. Find also invertible matrices P and Q over Z satis-fying PA = DQ. Use the rows of Q and the non-zero rows of DQ tospecify Z-bases of Z3 and K = ker θ . Hence decompose G into an in-ternal direct sum of cyclic submodules expressing their generators asinteger linear combinations of g1, g2, g3. State the isomorphism typeand the sequence of invariant factors of G.


(b) The generators g1, g2, g3 of G are subject to the relations

3g1 + 2g2 + 5g3 = 0,

2g1 + 4g2 + 8g3 = 0,

3g1 + 4g2 + 7g3 = 0.

Answer Question 2(a) above in the case of this Z-module, i.e. find theSmith normal form D of the coefficient matrix

A =⎛

⎝3 2 52 4 83 4 7

⎞

⎠

of these relations, etc. Is this Z-module isomorphic to the Z-moduleG of Question 1(a) above?

(c) The generators g1, g2, g3 of G are subject to the relations

2g1 + 4g2 + 2g3 = 0,

4g1 + 6g2 + 2g3 = 0,

2g1 + 6g2 + 4g3 = 0.

Answer Question 2(a) above in the case of this Z-module. State theinvariant factors of the torsion submodule T of G. State the torsion-free rank of G. Is G/T cyclic?

(d) The generators g1, g2, g3 of G are subject to the single relation8g1 + 12g2 + 18g3 = 0. Answer Question 2(a) above in the case ofthis Z-module. State the invariant factor sequence of the torsion sub-module T of G. State the torsion-free rank of G and the invariantfactor sequence of G/T .

3. (a) Let K = {(m1,m2) ∈ Z2 : parity m1 = parity m2}. Show that K is a

submodule of Z2. Verify that K has Z-basis z1 = (1,1), z2 = (0,2).Find the Smith normal form of the 2 × 2 matrix A having zi as row i

for i = 1,2. Hence find the isomorphism type of Z2/K .(b) Let K = {(m1,m2,m3) ∈ Z

3 : m1 ≡ m2 ≡ m3 (mod 2)}. Find aZ-basis z1, z2, z3 of the submodule K of Z3. Find the Smith normalform of the 3 × 3 matrix A having zi as row i for i = 1,2,3. State theisomorphism type of Z3/K .

(c) Let t and n be given positive integers. Let K = {(m1,m2, . . . ,mt ) ∈Z

t : m1 ≡ mj (mod n) for all j with 2 ≤ j ≤ t}. Find a Z-basisz1, z2, . . . , zt of the submodule K of Zt . Find the Smith normal formof the t × t matrix A having zi as row i for 1 ≤ i ≤ t . Find the iso-morphism type of Zt /K .

4. (a) Let G denote the additive group Z8 ⊕ Z12. Show that g = (1,1) ∈ G

has order 24 = lcm{8,12}.


Hint: Start by showing 24g = 0.Find the orders of the elements (2,1) and (1,2) of G.

(b) Let n1, n2 be positive integers. Show that the element g = (1,1) ofthe additive group Zn1 ⊕Zn2 has order lcm{n1, n2}.The Z-module G has submodules H1,H2, . . . ,Ht such that G =H1 ⊕ H2 ⊕ · · · ⊕ Ht . Let hi ∈ Hi be an element of finite orderni (1 ≤ i ≤ t). Show that g = h1 + h2 + · · · + ht has finite orderm = lcm{n1, n2, . . . , nt }.Hint: Show first that g has finite order m′ where m′|m, and then showni |m′ (1 ≤ i ≤ t). Finally use Exercises 1.3, Question 1(c).

(c) Suppose that the Z-module G is the internal direct sum of cyclic sub-modules Hi = 〈hi〉 (1 ≤ i ≤ t) and so each g in G is expressible asg = m1h1 + m2h2 + · · · + mtht where m1,m2, . . . ,mt ∈ Z. Let hi

have order ideal 〈ni〉 for 1 ≤ i ≤ t . Use Lemma 2.7 and (b) above toshow that the order of g is

lcm{n1/gcd{m1, n1}, n2/gcd{m2, n2}, . . . , nt/gcd{mt,nt }}in case each ni > 0. What is the order of g if there is i with mi �= 0,ni = 0?

(d) Use the formula of (c) above to calculate the orders of g1, g2 in themodule G of Question 1(f) above.Hint: The relevant integers mj appear in the rows of Q−1, and thismatrix can be calculated directly by applying to I , in order, the ecosappearing in the reduction of A to D.

(e) Use the formula of (c) above to calculate the orders of g1, g2, g3 inthe module G of Question 2(b) above.

5. (a) Let G be a Z-module and let θ : Zt → G be a surjective Z-linearmapping. Let z1, z2, . . . , zs generate ker θ and let A be the s × t matrixover Z having zi as row i for 1 ≤ i ≤ s. Let D be the Smith normalform of A and let P and Q be invertible matrices over Z satisfyingPA = DQ. Show that the non-zero rows of DQ form a Z-basis ofker θ .

(b) Let H be a subgroup of the f.g. Z-module G. Use Theorem 3.1 and asurjective Z-linear mapping θ : Zt → G to show that H is also finitelygenerated.Hint: Consider K ′ = {k ∈ Z

t : (k)θ ∈ H }.(c) Let G1,G2, . . . ,Gs be Z-modules. Using the notation introduced be-

fore Theorem 3.7 verify

n(G1 ⊕ G2 ⊕ · · · ⊕ Gs) = nG1 ⊕ nG2 ⊕ · · · ⊕ nGs and

(G1 ⊕ G2 ⊕ · · · ⊕ Gs)(n) = (G1)(n) ⊕ (G2)(n) ⊕ · · · ⊕ (Gs)(n)

for n ∈ Z.


(d) Let n and d be integers with d ≥ 1. Show that g′ = (d/gcd{n,d})1generates (Zd)(n) where 1 is the 1-element of Zd .

(e) Let d1, d2, . . . , dr be r distinct positive integers with d1 ≥ 2 satisfy-ing di |dj for 1 ≤ i ≤ j ≤ r and let m1,m2, . . . ,mr be positive in-tegers. Suppose G ∼= (Zd1)

m1 ⊕ (Zd2)m2 ⊕ · · · ⊕ (Zdr )

mr and so di

occurs mi times in the invariant factor sequence Definition 3.8 of G

for 1 ≤ i ≤ r . Express the submodules diG and G(di) in the same way(see the proof of Theorem 3.7) for 1 ≤ i ≤ r . Is (diG)(di+1/di ) a freeR-module for 1 ≤ i < r? If so find the ring R and state the rank of thisR-module.

6. (a) Let G be an additive abelian group. A homomorphism χ : G → Q/Z

is called a character of G. (Here Q/Z denotes the additive groupof rationals modulo 1 consisting of cosets Z+ q for q ∈ Q, see Exer-cises 2.2, Question 3.) We use, as is customary, the functional notationχ(g) for the image of g ∈ G by χ . Let χ1 and χ2 be characters of G.Show that their sum χ1 +χ2, defined by (χ1 +χ2)(g) = χ1(g)+χ2(g)

for all g ∈ G, is a character of G. Show also that the set of all char-acters of G, together with addition as defined above, is an abeliangroup G∗. The group G∗ is called the character group of G.Let H be a subgroup of G. Show that Ho = {χ ∈ G∗ : χ(h) = Z

(the zero element of Q/Z) for all h ∈ H } is a subgroup of G∗. Forχ ∈ (G/H)∗ let (χ)α : G → Q/Z be defined by ((χ)α)(g) =χ(H + g) for all g ∈ G. Show (χ)α ∈ Ho and α : (G/H)∗ ∼= Ho.

(b) Let G be a finite abelian group. By Theorem 3.4 there are elements hi

of order di in G (1 ≤ i ≤ t) such that G = 〈h1〉 ⊕ 〈h2〉 ⊕ · · · ⊕ 〈ht 〉where t is a positive integer (there’s no need here to insist that d1 �= 1or di |di+1). For each i with 1 ≤ i ≤ t show that G has a unique char-acter χi such that χi(hi) = Z+ 1/di , χi(hj ) = Z+ 0 for all j �= i.Hint: Z+ 1/di has order di in Q/Z.Show that χi has order di in G∗ and G∗ = 〈χ1〉 ⊕ 〈χ2〉 ⊕ · · · ⊕ 〈χt 〉.Deduce that there is a unique isomorphism β : G∗ ∼= G with(χi)β = hi for 1 ≤ i ≤ t .For each subgroup H of G write (H)π = (Ho)β . Let L(G) denotethe set of all subgroups H of G. Show that π : L(G) → L(G) is apolarity, that is,(i) (H)π2 = H for all H ∈ L(G),

(ii) H1 ⊆ H2 ⇔ (H1)π ⊇ (H2)π where H1,H2 ∈ L(G) (π is in-clusion-reversing).

Hint: For (i) show first (H)β−1 ⊆ ((Ho)β)o.


7. (a) (i) Let R be a non-trivial commutative ring and let U(R) denote itsmultiplicative group of invertible elements. For a, b ∈ R write a ≡ b ifthere is u ∈ U(R) with a = bu. Show that ≡ is an equivalence relationon R (the equivalence classes are called associate classes). Are {0}and U(R) associate classes? Which commutative rings partition intotwo associate classes? Partition Z12 into associate classes.(ii) Let R be an integral domain. For a, b ∈ R write b|a if there isc ∈ R with a = bc. Show b|a ⇔ 〈a〉 ⊆ 〈b〉. Show also a ≡ b if andonly if a|b and b|a. Deduce 〈a〉 = 〈b〉 ⇔ a ≡ b.

(b) Let R be a principal ideal domain (PID), that is, R is a non-trivialcommutative ring with no zero-divisors such that each ideal K of R isexpressible as K = 〈d〉 = {rd : r ∈ R} for some d ∈ R.For a, b ∈ R write 〈a〉 + 〈b〉 = 〈d〉 and 〈a〉 ∩ 〈b〉 = 〈m〉. Show thatd and m have the divisor properties of gcd{a, b} and lcm{a,b} re-spectively, that is, d|a, d|b and d ′|a, d ′|b ⇒ d ′|d for d ′ ∈ R and alsoa|m, b|m and a|m′, b|m′ ⇒ m|m′ for m′ ∈ R. Show d = a′a + b′b fora′, b′ ∈ R. Also show ab ≡ dm.Hint: Show ab/d has the properties of lcm{a, b} in the case d �= 0.Let Ki be an ideal of R for each positive integer i with Ki ⊆ Ki+1.Show that K = ⋃

i≥1 Ki is an ideal of R. Deduce the existence of apositive integer l with Ki = Kl for i ≥ l. Let b1, b2, . . . , bi, . . . be asequence of elements of R such that bi+1|bi for i ≥ 1. Show that thereis an integer l with bi ≡ bl for i ≥ l.

(c) Let R be a PID. A diagonal s × t matrix D with (i, i)-entry di

(1 ≤ i ≤ min{s, t}) over R is said to be in Smith normal form (Snf )if di |dj for 1 ≤ i ≤ j ≤ min{s, t}. Is every 1 × 2 diagonal matrix overR in Snf?(i) Let A be a 1 × 2 matrix over R. Adapt the proof of Lemma 1.8 toshow that A = DQ where the 1 × 2 matrix D over R is in Snf and the2 × 2 matrix Q over R satisfies detQ = e the 1-element of R.(ii) Let A be a diagonal 2 × 2 matrix over R. Generalise Lemma 1.10to show that A can be reduced to D in Snf using at most five elemen-tary operations of type (iii).

(d) Let A be an s × t matrix over R where R is a PID and s ≥ 2. LetP ′ = (p11 p12

p21 p22

)be a non-elementary matrix over R with detP ′ = e

and let (i, j) be an ordered pair of distinct integers with 1 ≤ i, j ≤ s.The operation of replacing eiA and ejA (rows i and j of A) byp11eiA + p12ejA and p21eiA + p22ejA respectively is called a non-elementary row operation (nero) over R. For each nero show that thereis an s × s matrix P over R with detP = e such that PA is the resultof applying this nero to A.


Let t ≥ 2. A non-elementary column operation (neco) consists of re-placing AeT

i (column i of A) by p11AeTi + p21AeT

j and AeTj (col-

umn j of A) by p12AeTi + p22AeT

j all other columns of A remainingunchanged. For each neco show that there is a t × t matrix Q over R

with detQ = e such that AQ is the matrix which results on applyingthis neco to A.Does Lemma 1.4 apply without change on replacing Z by a commuta-tive ring R? (Yes/No). (Note that multiplication of a row (or column)by any element of U(R) is allowed as an ero (or eco) over R.)

(e) Let A be an s × t matrix over R where R is a PID. Outline a methodof obtaining an invertible s × s matrix P over R and an invertible t × t

matrix Q over R such that PAQ−1 = D is in Snf.Hint: Reduce A to D as in Lemma 1.9 and Theorem 1.11 but usingneros and necos in place of the Euclidean algorithm Lemma 1.7.

(f) Let D = diag(d1, d2, . . . , dmin{s,t}) and D′ = diag(d ′1, d

′2, . . . , d

′min{s,t})

be s × t matrices over a PID R which are both in Snf. SupposeD ≡ D′, that is, there is an invertible s × s matrix P over R andan invertible t × t matrix Q over R such that PDQ−1 = D′. Showdi ≡ d ′

i for 1 ≤ i ≤ min{s, t}.Hint: The theory of Corollaries 1.19 and 1.20 remains essentially un-changed on replacing Z by R.

8. (a) Let M be an R-module where R is an integral domain (R is a non-trivial commutative ring having no divisors of zero). Write T (M) ={v ∈ M : there is r ∈ R with rv = 0, r �= 0}. Show that T (M) is asubmodule Definition 2.26 of M . T (M) is called the torsion submod-ule of M . Let M ′ be an R-module and suppose M ∼= M ′. GeneraliseLemma 3.6 to show T (M) ∼= T (M ′) and M/T (M) ∼= M ′/T (M ′).Describe the torsion submodules of T (M) and M/T (M).

(b) Let M be a free R-module of (finite) rank t where R is a PID (seeQuestion 7(a)(ii) above). Let N be a submodule of M . Adapt the proofof Theorem 3.1 to show that N is free of rank s where s ≤ t .

(c) Let v be an element of an R-module M where R is a PID. Show thatK = {r ∈ R : rv = 0} is an ideal of R. Let K = 〈d〉; then v is said tohave order d in M (also the associative class (see Question 7(a)(i)) ofd is called the order of v in M). Let M be finitely generated. Adaptthe proof of Theorem 3.4 using Question 7(e) above to show M =N1 ⊕ N2 ⊕ · · · ⊕ Nt ′ where Nj is a non-zero cyclic submodule ofM having generator vj of order dj in M such that di |dj for 1 ≤ i ≤j ≤ t ′. Deduce M = T (M) ⊕ N0 where N0 is a free submodule.

(d) Let R be a PID. Let M and M ′ be cyclic R-modules with generatorsv and v′ of orders d and d ′ respectively. Show M ∼= M ′ if and only ifd ≡ d ′.


(e) Let M and M ′ be finitely generated R-modules where R is a PID.Suppose M = N1 ⊕ N2 ⊕ · · · ⊕ Nt ′ where Nj is a non-trivial cyclicsubmodule of M having generator vj of order dj in M such that di |dj

for 1 ≤ i ≤ j ≤ t ′. Suppose M ′ = N ′1 ⊕ N ′

2 ⊕ · · · ⊕ N ′t ′′ where N ′

j isa non-trivial cyclic submodule of M ′ having generator v′

j of order d ′j

in M ′ such that d ′i |d ′

j for 1 ≤ i ≤ j ≤ t ′′. Show M ∼= M ′ if and only ift ′ = t ′′ and dj ≡ d ′

j for 1 ≤ j ≤ t ′.Hint: Use (a), (b) and (c) above and generalise Theorem 3.7.The sequence (〈d1〉, 〈d2〉, . . . , 〈dt ′ 〉) of ideals of the PID R is calledthe invariant factor sequence of the f.g. R-module M .

3.2 Primary Decomposition of Finite Abelian Groups

Let G be a finite abelian group. From Theorem 3.7 we know that there is a unique se-quence (d1, d2, . . . , ds′) of s′ positive integers with d1 �= 1, di |di+1 (1 ≤ i < s′) suchthat G = H1 ⊕ H2 ⊕ · · · ⊕ Hs′ where Hi is a cyclic subgroup of G having order di

(1 ≤ i ≤ s′). Although invariant factor decompositions, as above, score top marks forelegance, they do have some disadvantages: for one thing, the subgroups Hi are notusually unique. Here we discuss the primary decomposition of G, which is analogousto resolving the positive integer |G| into prime powers (the fundamental theorem ofarithmetic is stated at the end of Section 1.2), and does have an important uniquenessproperty. However there is a ‘down-side’ to this approach: there is no practical algo-rithm known for obtaining the prime factorisation of a general positive integer n. Herewe dodge the issue by assuming that the factorisation of |G| has already been done!

Consider first an additive abelian group G of order 144 = 16 × 9. Then G hassubgroups G2 = {g ∈ G : 16g = 0} and G3 = {g ∈ G : 9g = 0}. We’ll see shortly thatG2 and G3 are the unique subgroups of G having orders 16 and 9 respectively. FurtherG has primary decomposition G = G2 ⊕ G3 showing that G is completely specifiedby its primary components G2 and G3. Isomorphisms respect decompositions of thistype and so the analysis of G is reduced to that of G2 and G3. It turns out that there arefive isomorphism classes of abelian groups G2 with |G2| = 16 and two isomorphismclasses of abelian groups G3 with |G3| = 9. Hence there are 5 × 2 = 10 isomorphismclasses of abelian groups G of order 144.

The primary components of G are not cyclic in general. Our aim here is to obtainan invariant factor decomposition of each primary component. We will do this byusing the primary decomposition of each cyclic subgroup Hi in an invariant factordecomposition, as above, of G. The ultimate outcome is a decomposition of G intothe internal direct sum of a number of cyclic subgroups of prime power order, and allsubgroups of this type are indecomposable – they cannot themselves be expressed asa direct sum in a non-trivial way.

3.2 Primary Decomposition of Finite Abelian Groups 121

Let |G| = pn11 p

n22 · · ·pnk

k be the factorisation of the order |G| of G into positivepowers of distinct primes p1,p2, . . . , pk . The pj -component of G is

Gpj= {g ∈ G : pnj

j g = 0} for 1 ≤ j ≤ k.

We know that |G|g = 0 for all g in G by the |G|-lemma of Section 2.2. So Gpj

consists of those elements of G having orders which are powers of the prime pj . It isstraightforward to verify that Gpj

is a subgroup of G. Collectively Gp1,Gp2 , . . . ,Gpk

are called the primary components of G.For example G = Z6 ⊕ Z20 has order |G| = 6 × 20 = 23 × 3 × 5. The primary

components of G are

G2 = 〈(3,0), (0,5)〉 ∼= Z2 ⊕Z4, G3 = 〈(2,0)〉 ∼= Z3, G5 = 〈(0,4)〉 ∼= Z5.

Consider an isomorphism α : G ∼= G′ between the finite abelian groups G and G′.Then |G| = |G′| and G′

pj= {g′ ∈ G′ : p

nj

j g′ = 0} is the pj -component of G′. For

g ∈ Gpjwe see p

nj

j (g)α = (pnj

j g)α = (0)α = 0 using the Z-linearity of α. So

(Gpj)α ⊆ G′

pjshowing that α maps Gpj

to G′pj

. In the same way (G′pj

)α−1 ⊆ Gpj.

So α, restricted to Gpj, is an isomorphism α| : Gpj

∼= G′pj

. We have shown:

Isomorphic finite abelian groups have isomorphic primary components.

Taking G = G′ gives (Gpj)α = Gpj

for all automorphisms α of G. Therefore isomor-phisms and automorphisms respect primary components.

We now show that every finite abelian group is the internal direct sum of its pri-mary components.

Theorem 3.10 (The primary decomposition of finite abelian groups)

Let G be a finite abelian group and suppose |G| = pn11 p

n22 · · ·pnk

k where p1,p2, . . . , pk

are distinct primes. Then

G = Gp1 ⊕ Gp2 ⊕ · · · ⊕ Gpk

where Gp1,Gp2 , . . . ,Gpkare the primary components of G.

Proof

Write mj = |G|/pnj

j . So mj is the product of the k −1 prime powers pni

i where i �= j .The k positive integers m1,m2, . . . ,mk are coprime, meaning gcd{m1,m2, . . . ,mk} = 1,as there is no common prime divisor of m1,m2, . . . ,mk . By Corollary 1.16 there areintegers a1, a2, . . . , ak such that a1m1 + a2m2 + · · · + akmk = 1. For each g in G wehave


g = 1g = (a1m1 + a2m2 + · · · + akmk)g

= a1m1g + a2m2g + · · · + akmkg

= g1 + g2 + · · · + gk

where gj = ajmjg (1 ≤ j ≤ k). Now pnj

j mj = |G| and hence pnj

j gj = aj |G|g = 0as |G|g = 0 by the |G|-lemma. So gj ∈ Gpj

for 1 ≤ j ≤ k. We have shown G =Gp1 + Gp2 + · · · + Gpk

as each element of G is a sum of k elements one from eachof the k primary components Gpj

.To show that the sum of the primary components is direct, suppose

g1 + g2 + · · · + gk = 0 where gj ∈ Gpj.

We fix our attention on one particular term gj . The positive integer mj has factor pni

i

and so mjgi = 0 as pni

i gi = 0 for i �= j . Inserting k − 1 zero terms mjgi gives

mjgj = mjgj +k∑

i=1,i �=j

mjgi =k∑

i=1

mjgi = mj

(k∑

i=1

gi

)

= mj × 0 = 0.

We’ve now know mjgj = 0 and pnj

j gj = 0 with gcd{mj ,pnj

j } = 1 and so gj doesn’t

stand a chance! The positive integer mi has factor pnj

j for i �= j . So the integer

1 − ajmj = ∑ki−1,i �=j aimi has factor p

nj

j and hence a′j = (1 − ajmj )/p

nj

j is aninteger. Therefore

gj = 1gj = (ajmj + ((1 − ajmj )/pnj

j )pnj

j )gj

= ajmjgj + a′jp

nj

j gj = aj × 0 + a′j × 0 = 0

for 1 ≤ j ≤ k. The equation g1 +g2 +· · ·+gk = 0 where gj ∈ Gpjfor 1 ≤ j ≤ k holds

only in the case g1 = g2 = · · · = gk = 0. By Definition 2.14 the primary componentsof G are independent. By Lemma 2.15 the sum of the primary components of G isdirect, that is, G = Gp1 ⊕ Gp2 ⊕ · · · ⊕ Gpk

. �

As an illustration we look at the cyclic group G = Z360. Then |G| = 23 × 32 × 5and with p1 = 2, p2 = 3, p3 = 5 and n1 = 3, n2 = 2, n3 = 1 we obtain m1 = 45,m2 = 40, m3 = 72. The element 1 of G has order 360. By Lemma 2.7 the ele-ment 45 = 45(1) has order 360/gcd{45,360} = 360/45 = 8, and so 〈45〉 = G2 isthe 2-component of G. Similarly 40 has order 9 and 72 has order 5. So 〈40〉 = G3 isthe 3-component of G and 〈72〉 = G5 is the 5-component of G. The primary compo-nents are cyclic, as they must be by Lemma 2.2, and we’ve expressed their generatorsas multiples of the generator 1 of G. Therefore G = 〈45〉 ⊕ 〈40〉 ⊕ 〈72〉 is the primarydecomposition Theorem 3.10 of G.


From Theorem 3.10 we deduce, as in Exercises 3.2, Question 5(b):

Any two finite abelian groups of equal order having isomorphic primarycomponents are isomorphic.

Definition 3.11

The exponent of a finite abelian group G is the smallest positive integer n withnG = {0}, that is, ng = 0 for all g ∈ G.

The exponent of G = Z6 ⊕ Z20 is 60: on the one hand 60G is trivial and on theother hand the element (1,1) of G has order 60 showing n′G �= {0} for all integersn′ with 1 ≤ n′ < 60. The invariant factor sequence of G is (2,60) and, as we provenext, the largest invariant factor of every non-trivial finite abelian group is its expo-nent.

Corollary 3.12

Let (d1, d2, . . . , ds′) be the invariant factor sequence of the non-trivial finite abeliangroup G. Then ds′ is the exponent of G and G has an element of order ds′ . Further ds′is a divisor of |G| and |G| is a divisor of (ds′)s

′. The order and the exponent of G have

the same prime divisors.

Proof

Let K = {m ∈ Z : mG = {0}} where mG = {mg : g ∈ G}. We know from Section 3.1that mG is a subgroup of G for each integer m. It is routine to check that K is an idealof Z. By the |G|-lemma of Section 2.2 we see |G| ∈ K . So K is non-zero and henceK = 〈n〉 by Theorem 1.15 where n is a positive integer. As n ∈ K but n′ /∈ K for allintegers n′ with 1 ≤ n′ < n, the exponent of G is n. As |G| ∈ 〈n〉 we deduce that n isa divisor of |G|.

Now G has subgroups H1,H2, . . . ,Hs′ such that G = H1 ⊕ H2 ⊕ · · · ⊕ Hs′ andHi is of isomorphism type Cdi

for 1 ≤ i ≤ s′ by Theorem 3.4. Also di |di+1 and sothere is a positive integer qi with diqi = di+1 for 1 ≤ i < s′. Hence di |ds′ for all i with1 ≤ i ≤ s′ since di(qiqi+1 · · ·qs′−1) = ds′ . As diHi = {0} (in fact Hi has exponent di )we see that ds′Hi = {0}. Hence ds′G = ds′H1 ⊕ ds′H2 ⊕ · · · ⊕ ds′Hs′ = {0} and sods′ ∈ K . Therefore n|ds′ . Let hs′ generate the cyclic subgroup Hs′ . Then Hs′ = 〈hs′ 〉and hs′ has order ds′ . As nhs′ = 0 we see ds′ |n and so n = ds′ . Therefore ds′ is theexponent of G.


From Corollary 3.5 we know |G| = d1d2 · · ·ds′ . Multiplying together the s′ equa-tions di(qiqi+1 · · ·qs′+1) = ds′ for 1 ≤ i ≤ s′ gives

|G|q1q22q3

3 · · ·qs′−1s′−1 = (ds′)s

′

from which we see that |G| is a divisor of (ds′)s′. Let p be a prime divisor of |G|.

Then p|(ds′)s′

and so p|ds′ as a prime divisor of a product must be a divisor of at leastone of its factors. Conversely all divisors of ds′ are divisors of |G|. �

Let G be a finite abelian group and suppose |G| = pn11 p

n22 · · ·pnk

k as in Theo-rem 3.10. We apply Corollary 3.12 to the primary component Gpj

of G: as pnj

j Gpj=

{0} the exponent of Gpjis a divisor of p

nj

j and so is a power of pj . Therefore |Gpj|

is also a power of pj by Corollary 3.12. From Theorem 3.10 we deduce

|G| = |Gp1 | × |Gp2 | × · · · × |Gpk| and so |Gpj

| = pnj

j for 1 ≤ j ≤ k

on comparing powers of pj . Therefore the decomposition Theorem 3.10 of G

into primary components corresponds to the factorisation of |G| into prime pow-ers.

Let G be a finite abelian group of prime exponent p. Such a group is called iscalled an elementary abelian p-group. Groups of this kind and their automorphismswill be analysed in Section 3.3. By Corollary 3.12 the order of G is a power of p,that is, |G| = ps′

, the invariant factor sequence of G being (p,p, . . . ,p), that is, thes′ invariant factors of G are all equal to p. The one essential fact (as we will see)is that G is the additive group of an s′-dimensional vector space over the field Zp .In particular the Klein 4-group has exponent 2 and its elements are the vectors of a2-dimensional vector space over Z2.

Now G = Z6 ⊕ Z20 has primary components G2 ∼= Z2 ⊕ Z4, G3 ∼= Z3, G5 ∼= Z5.By Theorem 3.10 we obtain G = G2 ⊕ G3 ⊕ G5, that is, Z6 ⊕ Z20 ∼= Z2 ⊕ Z4 ⊕Z3 ⊕ Z5, which amounts to two applications of the Chinese remainder theorem,namely Z6 ∼= Z2 ⊕ Z3 and Z20 ∼= Z4 ⊕ Z5, followed by a rearrangement of the sum-mands (the terms in the direct sum). In fact the derivation of a decomposition of G

into the internal direct sum of cyclic subgroups of prime power order, starting from aninvariant factor decomposition of G, which we carry out shortly, is nothing more thana systematic application of the Chinese remainder theorem 2.11.

Let G be a cyclic group of prime power order pn. The positive divisors of pn are1,p,p2, . . . , pn, and they form a chain, meaning that every two integers a, b in thislist are such that either a|b or b|a. By Lemma 2.2 every pair H,H ′ of subgroups ofG satisfy either H ⊆ H ′ or H ′ ⊆ H . Let’s suppose H ⊆ H ′ and ask: is it possible forG = H ⊕ H ′? So each element g in G is uniquely expressible as g = h + h′ whereh ∈ H , h′ ∈ H ′. Suppose, just for a moment, that there is a non-zero element h in H .


Then −h ∈ H and so −h ∈ H ′. Hence the zero element 0 of G is expressible as a sumof two elements, one from H and one from H ′ in two different ways: 0 = h + (−h),0 = 0+0. By the uniqueness property of H ⊕H ′ we deduce H = {0}. The conclusionis:

Cyclic groups of prime power order are indecomposable

that is, G cannot be expressed in the form G = H ⊕ H ′ where both H and H ′ arenon-trivial.

Now consider an arbitrary finite abelian group G with invariant factor sequence(d1, d2, . . . ds′) and let G = H1 ⊕ H2 ⊕ · · · ⊕ Hs′ where Hi = 〈hi〉 is of isomorphismtype Cdi

for 1 ≤ i ≤ s′. As before let |G| = pn11 p

n22 · · ·pnk

k where p1,p2, . . . , pk

are different primes and each nj ≥ 1 for 1 ≤ j ≤ k. Each invariant factor di hasfactorisation di = p

ti11 p

ti22 · · ·ptik

k where tij ≥ 0. As |G| = d1d2 · · ·ds′ we obtainnj = t1j + t2j + · · · + ts′j on comparing powers of pj . For each i let ki denote thenumber of positive exponents tij . As di |di+1 for 1 ≤ i < s′ we see k1 ≤ k2 ≤ · · · ≤ ks′ .

Write mij = di/ptijj . The generator hi of Hi has order di and so mijhi has order

di/gcd{mij , di} = ptijj by Lemma 2.7. Let

Hi,pj= 〈mijhi〉 in the case tij > 0,

that is, Hi,pj= (Hi)pj

is the pj -component of Hi . By Theorem 3.10 the primarydecomposition of Hi is

Hi =∑

j

⊕Hi,pj

as Hi is the internal direct sum of its ki non-trivial primary components Hi,pj.

The prime powers ptijj = |Hi,pj

| > 1 are called the elementary divisors of G.

The elementary divisors of G are the primes to the highest power in the factorisationsof the invariant factors of G and so they are also invariants.

For example

Z10 ⊕Z100 ∼= (Z2 ⊕Z5) ⊕ (Z4 ⊕Z25) ∼= (Z2 ⊕Z4) ⊕ (Z5 ⊕Z25)

has elementary divisors 2,4; 5,25, whereas

Z4 ⊕Z100 ∼= Z4 ⊕ (Z4 ⊕Z25) ∼= (Z4 ⊕Z4) ⊕Z25

has elementary divisors 4,4; 25.In the general case, replacing each Hi by its primary decomposition gives

G =∑

i,j

Hi,pj


which expresses G as the internal direct sum of k1 + k2 + · · · + ks′ non-trivial cyclicsubgroups Hi,pj

of prime power order. This decomposition of G is ‘best’ in the sensethat the number of non-trivial cyclic summands is as large as possible (Exercises 3.2,Question 6(b)). From it we now directly deduce the structure of the primary compo-nents of G. For each j the direct sum of the subgroups Hi,pj

is a subgroup of order

pt1j

j pt2j

j · · ·pts′jj = p

nj

j and so this direct sum is the pj -component of G, that is,

Gpj=

∑

i

⊕Hi,pj(♣♣)

Now t1j ≤ t2j ≤ · · · ≤ ts′j since di |di+1 for 1 ≤ i < s′. Hence, on omitting the first somany trivial summands Hi,pj

with tij = 0, (♣♣) above becomes an invariant factordecomposition, as in Theorem 3.7, of Gpj

for 1 ≤ j ≤ k. In other words:

The elementary divisors of each finite abelian group are the invariant factorsof its primary components.

We have now completed the analysis of an individual finite abelian group G. Our finaltask is to determine the number of isomorphism classes of abelian groups G havinga specified order |G| = p

n11 p

n22 · · ·pnk

k . We will see that this number depends on thepowers n1, n2, . . . , nk of the distinct prime divisors of |G| and not on the prime di-visors themselves. You already know one special case of this phenomenon: any twogroups of prime order p are isomorphic, that is, for each prime p there is just one iso-morphism class of groups of order p, each of these groups being cyclic. This examplecan be generalised: consider an abelian group G such that |G| = p1p2 · · ·pk (a prod-uct of distinct primes). What could the sequence (d1, d2, . . . , ds′) of invariant factorsof G be? By Corollary 3.12 we have ds′ |p1p2 · · ·pk and also pj |ds′ for 1 ≤ j ≤ k.Therefore p1p2 · · ·pk|ds′ and so ds′ = p1p2 · · ·pk . As |G| is the product of all theinvariant factors of G we see that ds′ is the only invariant factor of G, that is, s′ = 1and G is cyclic. In particular, all abelian groups of order 105 = 3 × 5 × 7 are cyclicand hence any two are isomorphic.

The following table illustrates the relationship between the invariant factors of afinite abelian group G and its elementary divisors.

∼= G p1 p2 . . . pk

d1 t11 t12 . . . t1k

d2 t21 t22 . . . t2k

......

......

ds′ ts′1 ts′2 . . . ts′k

The rows correspond to the invariant factors di of G and each contains ki non-zeroentries. The columns correspond to the prime divisors pj of |G|, and the exponent tij


appears in row i and columnj . We suppose pj < pj+1 for 1 ≤ j < k. Then isomorphicgroups have identical tables and non-isomorphic groups have different tables and soit is reasonable to refer to the table of the isomorphism class of the finite abeliangroup G. We use ∼= G to denote the isomorphism class of G.

The table of the isomorphism class of abelian groups G with invariant factor se-quence (2,6,60,600) is shown below.

∼= G 2 3 5

2 1 0 06 1 1 060 2 1 1

600 3 1 2

The elementary divisors of G are 2,2,4,8; 3,3,3; 5,25. The connection betweenelementary divisors and invariant factors is evident from the rows in the table. Startingat the bottom row: 8 × 3 × 25 = 600, that is, the product of the largest elementarydivisors for each prime divisor of |G| is the largest invariant factor of G. From thenext-to-last row we read off the product of the next-to-largest elementary divisorswhich is the next-to-largest invariant factor of G, that is, 4 × 3 × 5 = 60 and so on.Now |G| = 27 ×33 ×53 = 432000. How many isomorphism classes of abelian groupsof order 432000 are there? Equivalently, how many tables are there such that the non-zero entries in the 2-column are non-decreasing and have sum 7, the non-zero entriesin the 3-column are non-decreasing and have sum 3, and the non-zero entries in the5-column are non-decreasing and have sum 3? The answer is 135; read on to find outwhy!

We return to the general table and concentrate on the pj -column: any zero entriesoccur at the top and the remaining positive entries form a non-decreasing sequence(reading downwards) with sum nj .

Definition 3.13

Let n be a non-negative integer. A partition of n is a non-decreasing sequence(t1, t2, . . . , ts) of positive integers with t1 + t2 +· · ·+ ts = n. The integers ti (1 ≤ i ≤ s)

are called the parts of the partition.

The number of partitions of n is denoted by p(n).

The partitions of 4 are (1,1,1,1), (1,1,2), (1,3), (2,2), (4), and so p(4) = 5.You can check p(1) = 1, p(2) = 2, p(3) = 3. It is convenient to allow s = 0 in Defi-nition 3.13, that is, the empty sequence ∅ is a partition of 0 and so p(0) = 1.

The partition function p(n) has been extensively studied, notably by the eighteenthcentury Swiss mathematician Euler.


We now describe a way of calculating p(n) akin to the Pascal triangle method of

computing binomial coefficients. Let

p(n, j) denote the number of partitions of n having no part less than j

(n ≥ 0, j ≥ 1).

Directly from Definition 3.13 we see that p(n,1) = p(n). Note p(0, j) = 1 for all

j ≥ 1 as the partition ∅ of 0 has no parts, but p(n, j) = 0 for 1 ≤ n < j . Each partition

(t1, t2, . . . , ts) of n with s ≥ 1 consists of a first part t1 and a partition (t2, . . . , ts) of

n − t1 having no part less than t1. So there are p(n − t1, t1) partitions of n having first

part t1. Therefore

p(n) = p(n − 1,1) + p(n − 2,2) + · · · + p(0, n) (♦)

on counting up the partitions of n according as their first part is 1,2, . . . , n. In the

same way p(n, j) = p(n− j, j)+p(n− j − 1, j + 1)+ · · ·+p(0, n) on counting up

the partitions of n having first part j, j + 1, . . . , n (1 ≤ j ≤ n). Therefore

p(n, j + 1) = p(n, j) − p(n − j, j) for 1 ≤ j < n (❤)

as p(n, j +1) = p(n−j −1, j +1)+· · ·+p(0, n). Using (♦) and (❤) the array having

p(n, j) in row n and column j can be constructed row by row. The first few rows are

shown in the following table. Suppose rows 0,1,2, . . . ,9 have been completed. From

(♦) with n = 10 we obtain

p(10,1) = p(10) = p(9,1) + p(8,2) + · · · + p(0,10)

= 30 + 7 + 2 + 1 + 1 + 0 + 0 + 0 + 0 + 1,

that is, p(10) = 42. The remaining non-zero entries in row 10 can be found using (❤)

with n = 10 putting j = 1,2, . . . ,9 successively. So

p(10,2) = p(10,1) − p(9,1) = 42 − 30 = 12,

p(10,3) = p(10,2) − p(8,2) = 12 − 7 = 5

and so on to complete row 10.


n p(n) p(n,2) p(n,3) p(n,4) p(n,5) p(n,6) p(n,7) p(n,8) p(n,9) p(n,10) . . .

0 1 1 1 1 1 1 1 1 1 1 . . .

1 1 0 0 0 0 0 0 0 0 0 . . .

2 2 1 0 0 0 0 0 0 0 0 . . .

3 3 1 1 0 0 0 0 0 0 0 . . .

4 5 2 1 1 0 0 0 0 0 0 . . .

5 7 2 1 1 1 0 0 0 0 0 . . .

6 11 4 2 1 1 1 0 0 0 0 . . .

7 15 4 2 1 1 1 1 0 0 0 . . .

8 22 7 3 2 1 1 1 1 0 0 . . .

9 30 8 4 2 1 1 1 1 1 0 . . .

10 42 12 5 3 2 1 1 1 1 1 . . .

A readable account of a more efficient method of calculating p(n) is in Chapter 19 ofNorman Biggs, Discrete Mathematics, OUP, 1985.

Let G be an abelian group of order pn, where p is prime, having invariant factorsequence (d1, d2, . . . , ds′). We know from Corollary 3.5 that d1d2 · · ·ds′ = pn and sodi = pti for 1 ≤ i ≤ s′. Further as di |di+1, on comparing powers of p it follows that(t1, t2, . . . , ts′) is a partition Definition 3.13 of n. Conversely every partition of n arisesfrom an invariant factor sequence of G in this way and so:

There are p(n) isomorphism classes of abelian groups of order pn.

In particular there are five isomorphism classes of abelian groups of order 16, theirisomorphism types being C2 ⊕ C2 ⊕ C2 ⊕ C2, C2 ⊕ C2 ⊕ C4, C2 ⊕ C8, C4 ⊕ C4, C16

corresponding to the five partitions of 4.Now consider the general case of an abelian group G of order |G| = p

n11 p

n22 · · ·pnk

k .By the above reasoning the primary component Gpj

of G belongs to one of the p(nj )

isomorphism classes of abelian groups of order pnj

j . So the non-zero entries in thepj -column of the table of the isomorphism class of G are precisely the parts of apartition of nj (1 ≤ j ≤ k). Hence using Theorem 3.10 we deduce:

There are p(n1)p(n2) · · ·p(nk) isomorphism classes

of abelian groups of order pn11 p

n22 · · ·pnk

k

as the primary components are independent of each other.For example consider isomorphism classes of abelian groups of order 144 =

24 × 32. Such a class corresponds to a row in the following table:


Primary decomposition Invariant factor decomposition2-component and 3-component Isomorphism type

Z2 ⊕Z2 ⊕Z2 ⊕Z2 ⊕Z3 ⊕Z3 C2 ⊕ C2 ⊕ C6 ⊕ C6

Z2 ⊕Z2 ⊕Z2 ⊕Z2 ⊕Z9 C2 ⊕ C2 ⊕ C2 ⊕ C18

Z2 ⊕Z2 ⊕Z4 ⊕Z3 ⊕Z3 C2 ⊕ C6 ⊕ C12

Z2 ⊕Z2 ⊕Z4 ⊕Z9 C2 ⊕ C2 ⊕ C36

Z2 ⊕Z8 ⊕Z3 ⊕Z3 C6 ⊕ C24

Z2 ⊕Z8 ⊕Z9 C2 ⊕ C72

Z4 ⊕Z4 ⊕Z3 ⊕Z3 C12 ⊕ C12

Z4 ⊕Z4 ⊕Z9 C4 ⊕ C36

Z16 ⊕Z3 ⊕Z3 C3 ⊕ C48

Z16 ⊕Z9 C144

As 144 = 24 × 32 the number of isomorphism classes of abelian groups of order 144is p(4)×p(2) = 5×2 = 10. Each abelian group of order 144 is isomorphic to a groupin one of the rows of the above table.

There are p(7)p(3)p(3) = 15 × 3 × 3 = 135 isomorphism classes of abeliangroups G of order 27 × 33 × 53. Each such class corresponds to a triple of parti-tions (of 7, 3 and 3) arising from the three columns of its table and the number s′ ofrows in this table is the largest number of parts in any one of these partitions.

The reader should have gained the impression that finitely generated abeliangroups are ‘manageable’. This fact is exploited in other branches of mathematics:the homology of finite simplicial complexes in algebraic topology is a case in point.In Chapters 5 and 6 the analogous theory of F [x]-modules is developed, and thisculminates in the canonical forms of square matrices over a field F .

EXERCISES 3.2

1. (a) Let G denote the additive group of Z10. List the elements in the pri-mary components G2 and G5. Verify G = G2 + G5, G2 ∩ G5 = {0}and deduce G = G2 ⊕ G5. Let μ3 be the automorphism of G definedby (g)μ3 = 3g for all g ∈ G. Verify (G2)μ3 = G2 and (G5)μ3 = G5.Is AutG cyclic?Hint: Use Exercises 2.1, Question 4(a).

(b) Let G denote the additive group of Z12. List the elements in the pri-mary components G2 and G3. Verify G = G2 + G3, G2 ∩ G3 = {0}and deduce G = G2 ⊕G3. Does G have elementary divisors 2,2,3 or4,3? Verify (G2)μ5 = G2 and (G3)μ5 = G3 where (g)μ5 = 5g forall g ∈ G. Is AutG cyclic?


(c) The cyclic group G has order |G| = pn11 p

n22 . . . p

nk

k where p1,p2,

. . . , pk are distinct primes. List the elementary divisors of G.(d) The order of the finite abelian group G is |G| = pnm where

gcd{p,m} = 1. Show that the p-component Gp = {g ∈ G : png = 0}is a subgroup of G. Show mG = Gp . Is mGp = Gp?

2. (a) Let G denote the additive group of Z10 ⊕Z12. State the invariant fac-tors of G and those of its 2-component G2. List the elementary di-visors of G. List the elementary divisors and invariant factors of allabelian groups of order |G| = 120.

(b) Show that there are six isomorphism classes of abelian groups of order200. List the elementary divisors of these classes and their invariantfactor sequences. Do any two have the same exponent?

(c) List the isomorphism types of the p(4) = 5 isomorphism classes ofabelian groups of order 16. Do any two have the same exponent? Listthe elementary divisors and invariant factor sequences of the eight iso-morphism classes of abelian groups of order (900)2 having exponent900.

(d) List the elementary divisors and the invariant factor sequences of theten isomorphism classes of abelian groups of order 400. How manyisomorphism classes of abelian groups of order 144 are there?

(e) Find three abelian groups having the same order (less than 100) andthe same exponent but no two of which are isomorphic.

3. (a) List the p(5) = 7 partitions of 5. List the p(9,2) = 8 partitions of 9having all parts ≥2.

(b) Calculate the entries for the row n = 11 in the table in the text follow-ing Definition 3.13. Hence show p(12) = 77. Find p(13) and p(14).

(c) Show from the definition that p(2n,n) = 2 and find a formula forp(3n,n).

(d) Find, directly from Definition 3.13, a formula for the number of parti-tions of n having all parts ≤2. Hence find the number of isomorphismclasses of abelian groups of order pn having exponent p2.

(e) Let j , k, n be integers with j ≥ 1, n ≥ 1 and 0 ≤ k ≤ �n/j�. Explainwhy the number of partitions of n having all parts ≥ j and exactlyk parts equal to j is p(n − jk, j + 1). Hence show that p(n, j) =∑�n/j�

k=0 p(n − jk, j + 1).Use the table following Definition 3.13 to check the above equation inthe cases n = 10, k = 1,2,3,4.

4. (a) Let G be the additive group of Z5 ⊕Z5. How many elements of order5 does G have? Is G an elementary abelian 5-group? (Yes/No). Showthat G has 6 subgroups of order 5 and specify a generator of each.How many (ordered) pairs of subgroups H1,H2 of G are there with|H1| = |H2| = 5 and G = H1 ⊕ H2?


(b) Let G be the additive group of Zp ⊕ Zp where p is prime. Is G anelementary p-group? Show that G has p2 −1 elements of order p andp + 1 subgroups Hof order p. Show that H can be expressed eitherH = 〈(1, t)〉 where t ∈ Zp or H = 〈(0,1)〉. How many (ordered) pairsof subgroups H1,H2 of G are there with |H1| = |H2| = p and G =H1 ⊕ H2?

(c) Let G be the additive group of Z9 ⊕Z27 and let H = {h ∈ G : 3h = 0}.List the elements of H . Is H a subgroup of G? (Yes/No). If so is H

an elementary abelian 3-group? State the invariant factor sequences ofG, H and G/H .

(d) Let G be an additive cyclic group of order pt where p is primeand t ≥ 1. Let H = {h ∈ G : ph = 0}. Show H = pt−1G usingLemma 2.7. State the orders of H and G/H .

(e) Let (t1, t2, . . . , ts) be a partition of n(> 0) and suppose G =H1 ⊕ H2 ⊕ · · · ⊕ Hs (internal direct sum) where Hi is cyclic of orderpti for 1 ≤ i ≤ s and p is prime. Write H = {h ∈ G : ph = 0}. Use(d) above to show H = pt1−1H1 ⊕ pt2−1H2 ⊕ · · ·⊕pts−1Hs . Deducethe order of H . State the invariant factor sequences of H and G/H .Under what condition on ts is G/H an elementary abelian p-group?

(f) The abelian group G has order |G| = pn11 p

n22 . . . p

nk

k where p1,p2,

. . . , pk are distinct primes and nj > 0 for 1 ≤ j ≤ k. Use Theo-rem 3.10 and Corollary 3.12 to show that there are n1n2 · · ·nk pos-sibilities for the exponent of G.Hint: Treat the case k = 1 first.Let H = {h ∈ G : p1p2 · · ·pkh = 0}. Is H necessarily cyclic? Does H

cyclic ⇒ G cyclic? Justify your answer.5. (a) The additive abelian group G is the direct sum of its subgroups H1

and H2. Let α1 ∈ AutH1 and α2 ∈ AutH2. Show that α : G → G,defined by (h1 + h2)α = (h1)α1 + (h2)α2 for all h1 ∈ H1, h2 ∈ H2,is an automorphism of G. Show also (H1)α = H1 and (H2)α = H2.Write α = α1 ⊕ α2.Let β ∈ AutG satisfy (H1)β = H1 and (H2)β = H2. Show thatthere are β1 ∈ AutH1 and β2 ∈ AutH2 with β = β1 ⊕ β2. Is L ={β ∈ AutG : (H1)β = H1, (H2)β = H2} a subgroup of AutG?(Yes/No) Is L ∼= AutH1 × AutH2? (Yes/No)

(b) Let G be a finite additive abelian group with |G| = pn11 p

n22 . . . p

nk

k

where n1, n2, . . . , nk are positive integers and p1,p2, . . . , pk are dis-tinct primes. Let αj ∈ AutGpj

for 1 ≤ j ≤ k. Using Theorem 3.10 andthe notation of Question 5(a) above, show that α = α1 ⊕ α2 ⊕ · · · ⊕ αk

is an automorphism of G; here (g)α = ∑kj=1(gj )αj with g =

∑kj=1 gj and gj ∈ Gpj

for 1 ≤ j ≤ k. Show also that each auto-

3.3 Endomorphism Rings and Isomorphism Classes of Subgroups and Quotient Groups 133

morphism β ∈ AutG can be expressed as β = β1 ⊕ β2 ⊕ · · · ⊕ βk forunique βj ∈ AutGpj

. Deduce that

AutG ∼= AutGp1 × AutGp2 × · · · × AutGpk,

that is, AutG is isomorphic to the external direct product (Exer-cises 2.3, Question 4(d)) of the automorphism groups of the primarycomponents of G.Hint: Consider β ↔ (β1, β2, . . . , βk) where β = β1 ⊕ β2 ⊕ · · · ⊕ βk .

6. (a) Show that a non-trivial finitely generated Z-module H is indecompos-able if and only if H has isomorphism type either C0 or Cpn where p

is prime.Hint: Use Exercises 2.2, Question 6(b).

(b) Let G be a finitely generated Z-module with torsion-free rank r andtorsion subgroup T . Let l denote the number of elementary divisorsof T . Suppose G = H1 ⊕ H2 ⊕ · · · ⊕ Hm where Hi is a non-trivialsubmodule of G for 1 ≤ i ≤ m. Show m ≤ l + r .Hint: Decompose each Hi into a direct sum of non-trivial indecom-posable submodules using Exercises 3.1, Question 5(b), Theorem 3.4,Corollary 3.5 and Theorem 3.10.Deduce m = l + r if and only if Hi is indecomposable for 1 ≤ i ≤ m.

3.3 Endomorphism Rings and Isomorphism Classesof Subgroups and Quotient Groups

Let G be an abelian group. An additive mapping α : G → G is called an endomor-phism of G (see Lemma 2.9; we have used α in place of θ here, reserving θ for animportant job in Theorem 3.15). So an endomorphism of the Z-module G is a Z-linearmapping of G to itself. Let α and α′ be endomorphisms of G. It is straightforward toverify that

their sum α + α′, defined by (g)(α + α′) = (g)α + (g)α′ for all g ∈ G,

is an endomorphism of G. The composition of α followed by α′, that is,

their product αα′, defined by (g)αα′ = ((g)α)α′ for all g ∈ G,

is also an endomorphism of G (Exercises 2.1, Question 4(d)). The set of all endomor-phisms of G, together with the above binary operations of sum and product, is denotedby EndG.


Lemma 3.14

Let G be an abelian group. Then EndG is a ring and AutG = U(EndG).

Proof

Consider α and α′ in EndG. Then α + α′ and αα′ also belong to EndG. As(g)(α + α′) = (g)α + (g)α′ = (g)α′ + (g)α = (g)(α′ + α) for all g ∈ G we see thatthe endomorphisms α + α′ and α′ + α are equal, that is, α + α′ = α′ + α. In fact(EndG,+), the set of all endomorphisms of G together with the binary operationof addition, is an abelian group, the additive group of EndG (Exercises 3.3, Ques-tion 1(a)). Let α′′ belong to EndG. The distributive law (α + α′)α′′ = αα′′ + α′α′′holds as

(g)((α + α′)α′′) = ((g)(α + α′))α′′ = ((g)α + (g)α′)α′′

= ((g)α)α′′ + ((g)α′)α′′ = (g)αα′′ + (g)α′α′′ = (g)(αα′′ + α′α′′)

for all g ∈ G. The remaining laws of a ring may be verified in the same way and weleave this to the reader. Note that the zero endomorphism 0, defined by (g)0 = 0 forall g ∈ G, is the 0-element of EndG. The identity mapping ι : G → G, defined by(g)ι = g for all g ∈ G, is the 1-element of EndG.

The elements of the group U(EndG) are the invertible endomorphisms α of G,that is, those α ∈ EndG such that there is β ∈ EndG satisfying αβ = ι = βα. Soeach α ∈ U(EndG) is invertible and its inverse α−1 = β is also an endomorphismof G. From Definition 2.4 we see U(EndG) ⊆ AutG as each α ∈ U(EndG) is bijec-tive. Conversely consider α ∈ AutG. From Exercises 2.1, Question 4(d) we deduceα−1 ∈ AutG. As AutG ⊆ EndG we conclude α−1 ∈ EndG. So α ∈ U(EndG) show-ing AutG ⊆ U(EndG). Therefore U(EndG) = AutG. �

Just as AutG is usually a non-abelian group, so EndG is in general a non-commutative ring. We now look at three particular types of f.g. abelian groups G

and show that in each case EndG is a ring already familiar to the reader.First suppose G is free of rank t . Let v1, v2, . . . , vt be a Z-basis of G. The reader is

reminded that Mt (Z) denotes the ring of all t × t matrices over Z. Each endomorphismα of G gives rise to a matrix A = (aij ) in Mt (Z) where

(vi)α =t∑

j=1

aij vj for 1 ≤ i ≤ t. (♣♣♣)

It is usual to call A the matrix of α relative to v1, v2, . . . , vt . You are certain to havemet this concept in the context of linear mappings of finite-dimensional vector spaces


(this topic is revised in Definition 5.1). The mapping θ : EndG → Mt (Z) is definedby (α)θ = A for all α ∈ EndG, that is, each endomorphism α of G is mapped by θ

to its matrix A relative to the Z-basis v1, v2, . . . , vt of G. It’s reasonable to expect– even take for granted – that if the endomorphisms α and α′ of G have matricesA and A′ respectively relative to v1, v2, . . . , vt then αα′ has matrix AA′ relative tov1, v2, . . . , vt . Indeed this property is included in our next lemma. However the readershould be aware that the innocent-looking equations (♣♣♣) only ‘work’ because wehave adopted the notation (g)α (rather than the more usual α(g)) for the image of g

under the mapping α; so throughout αα′ means: first apply α and secondly apply α′.

Theorem 3.15

Let G be a free abelian group of rank t and let v1, v2, . . . , vt be a Z-basis of G. Letθ : EndG → Mt (Z) be defined by (α)θ = A for all α ∈ EndG where A = (aij ) inMt (Z) is given by (♣♣♣) above. Then θ : EndG ∼= Mt (Z) is a ring isomorphism.

Proof

Consider α and α′ in EndG having matrices A = (aij ) and A′ = (a′ij ) respec-

tively relative to the Z-basis v1, v2, . . . , vt of G. So (α)θ = A and (α′)θ = A′. Then(vi)α = ∑t

j=1 aij vj and (vi)α′ = ∑t

j=1 a′ij vj for 1 ≤ i ≤ t by (♣♣♣). Adding these

equations together gives

(vi)(α + α′) = (vi)α + (vi)α′ =

t∑

j=1

aij vj +t∑

j=1

a′ij vj =

t∑

j=1

(aij + a′ij )vj

for 1 ≤ i ≤ t showing that α + α′ has matrix (aij + a′ij ) = A + A′ relative to

v1, v2, . . . , vt . So (α + α′)θ = A + A′ = (α)θ + (α′)θ , that is, θ is additive.In order to find (αα′)θ we first replace the dummy suffixes i, j in (vi)α

′ =∑tj=1 a′

ij vj by j , k giving (vj )α′ = ∑t

k=1 a′jkvk for 1 ≤ j ≤ t . Then

(vi)αα′ = ((vi)α)α′ =(

t∑

j=1

aij vj

)

α′ =t∑

j=1

aij (vj )α′

=t∑

j=1

aij

(t∑

k=1

a′jkvk

)

=t∑

k=1

(t∑

j=1

aij a′jk

)

vk

for 1 ≤ i ≤ t which shows, on replacing α and the dummy suffix j in (♣♣♣) byαα′ and k, that the matrix of αα′ relative to v1, v2, . . . , vt is AA′ as its (i, k)-entryis

∑tj=1 aij a

′jk . We’ve shown (αα′)θ = AA′ = (α)θ(α′)θ for all α and α′ in EndG.


As (ι)θ = I , that is, θ maps the 1-element of EndG to the 1-element of Mt (Z), weconclude that θ is a ring homomorphism.

Finally we show that θ is bijective. Consider A = (aij ) ∈ Mt (Z). Suppose there isan α in EndG having matrix A relative to v1, v2, . . . , vt , that is, (α)θ = A. Let g ∈ G.There are unique integers λ1, λ2, . . . , λt such that g = λ1v1 +λ2v2 +· · ·+λtvt . Write(λ1, λ2, . . . , λt )A = (μ1,μ2, . . . ,μt ), that is,

∑ti=1 λiaij = μj for 1 ≤ j ≤ t . From

(♣♣♣) we know (vi)α = ∑tj=1 aij vj and so

(g)α =(

t∑

i=1

λivi

)

α =t∑

i=1

λi(vi)α =t∑

i=1

λi

(t∑

j=1

aij vj

)

=t∑

j=1

(t∑

i=1

λiaij

)

vj =t∑

j=1

μjvj .

This tells us that α maps g = ∑ti=1 λivi to the element

∑tj=1 μjvj of G where

(λ1, λ2, . . . , λt )A = (μ1,μ2, . . . ,μt ). So there is at most one endomorphism α of G

with (α)θ = A, showing θ to be injective.We have still to prove that there is at least one endomorphism α of G with

(α)θ = A. To do this let β : G → G be defined by (g)β = ∑tj=1 μjvj where

g = ∑ti=1 λivi and

∑ti=1 λiaij = μj for 1 ≤ j ≤ t . Is β an endomorphism of G?

Consider g′ ∈ G. There are integers λ′i (1 ≤ i ≤ t) with g′ = ∑t

i=1 λ′ivi . Then (g′)β =∑t

j=1 μ′j vj where

∑ti=1 λ′

iaij = μ′j for 1 ≤ j ≤ t . Then g + g′ = ∑t

i=1(λi + λ′i )vi

and∑t

i=1(λi + λ′i )aij = μj + μ′

j for 1 ≤ j ≤ t . So

(g + g′)β =t∑

j=1

(μj + μ′j )vj =

t∑

j=1

μjvj +t∑

j=1

μ′j vj = (g)β + (g′)β

showing that β is indeed an endomorphism of G. Is β the endomorphism we arelooking for? Taking g = vi gives (λ1, λ2, . . . , λt ) = ei which is row i of the t × t

identity matrix I over Z for 1 ≤ i ≤ t . Hence

(λ1, λ2, . . . , λt )A = eiA = (ai1, ai2, . . . , ait ) = (μ1,μ2, . . . ,μt )

and so (vi)β = ∑tj=1 aij vj for 1 ≤ i ≤ t . We have obtained (♣♣♣) with β in place

of α. So β is the endomorphism of G we want, that is, (β)θ = A showing θ to besurjective. Therefore θ is a ring isomorphism. �

Let G be a free abelian group with Z-basis v1, v2, . . . , vt as in Theorem 3.15.Restricting θ to the group AutG = U(EndG) gives the isomorphism θ | : AutG ∼=GLt (Z) in which each automorphism α of G corresponds to its invertible t × t matrixA = (α)θ relative to v1, v2, . . . , vt . So


The automorphism group of every free abelian group of rank t is isomorphicto the multiplicative group of t × t invertible matrices over Z

Taking t = 1 we obtain End〈v1〉 ∼= Z and Aut〈v1〉 ∼= {1,−1} for the infinite cyclicgroup 〈v1〉. Endomorphism rings of finite cyclic groups will be dealt with shortly. Butfirst we discuss the structure of finite elementary abelian p-groups (their definitionfollows Corollary 3.12) which is, in effect, no more than the modulo p version ofTheorem 3.15.

Theorem 3.16

Let p be prime and let G be an elementary abelian group of order pt . Then G is theadditive group of a t-dimensional vector space over Zp . Let v1, v2, . . . , vt be a basis ofthis vector space. The subgroups of G are precisely the subspaces of 〈v1, v2, . . . , vt 〉.Also θ : EndG ∼= Mt (Zp) is a ring isomorphism where (α)θ = A is the matrix ofα ∈ EndG relative to v1, v2, . . . , vt . Restricting θ to AutG gives a group isomorphismθ | : AutG ∼= GLt (Zp).

Proof

As pg = 0 for all g ∈ G we obtain mg = m′g for all integers m, m′ with m ≡m′ (mod p). So it makes sense (and is very pertinent) to introduce the product ofscalar m ∈ Zp and vector g ∈ G by

mg = mg. (♦♦)

The breath-taking simplicity of (♦♦) is matched by its significance: the seven laws ofa Z-module, listed at the start of Section 2.1, immediately become the laws of a vectorspace over Zp , showing that G, together with scalar multiplication as in (♦♦), is avector space over Zp . As |G| = pt we see that this vector space, which we continueto denote by G, has dimension t and so has basis v1, v2, . . . , vt (the reader will knowthat every finite-dimensional vector space has a basis). Also the reader should haveno qualms about the equation G = 〈v1, v2, . . . , vt 〉 because v1, v2, . . . , vt generate theZ-module G and v1, v2, . . . , vt span the Zp-module G. It is easy to check, using (♦♦),that subgroups (=submodules) of the Z-module G coincide with subspaces of theZp-module G.

Consider α ∈ EndG. As α is Z-linear, we deduce from (♦♦) that (mg)α =(mg)α = m((g)α) = m((g)α) for all m ∈ Z, g ∈ G showing that α is a linear mappingof the vector space G. Conversely every linear mapping of the vector space G belongsto EndG. The last part of Theorem 3.16 concerning θ is standard linear algebra and isleft as an exercise for the reader. �


From the theory at the start of Section 3.2, we deduce that the endomorphism ringof each finite abelian group is isomorphic to the external direct sum of the endomor-phism rings of its primary components (Exercises 3.3, Question 1(f)).

We next determine the structure of the multiplicative group F ∗ of non-zero ele-ments of a finite field F . It turns out that F ∗ is cyclic as mentioned in Example 2.8.However the proof is non-constructive: it shows that F ∗ must be generated by a singleelement without explicitly specifying a generator.

Corollary 3.17

Let F be a field. Every finite subgroup G of F ∗ is cyclic. In particular the multiplica-tive group of every finite field is cyclic.

Proof

We need one fact, proved independently in Corollary 4.2(ii), which should be familiarto the reader, concerning polynomials f (x) with coefficients in F . Suppose f (x) =a0 + a1x + · · · + anx

n where each ai belongs to F for 0 ≤ i ≤ n and an �= 0, that is,f (x) is a polynomial in x of degree n over F . Then f (x) has at most n zeros in F ;equivalently the equation f (c) = 0 has at most n roots: there are at most n elements c

in F satisfying f (c) = 0, that is, a0 +a1c+· · ·+ancn = 0. A quadratic equation with

real coefficients has at most two real roots, a cubic equation with real coefficients hasat most three real roots and so on. But note that, working in the ring Z6, the quadraticequation x2 = x has four roots, namely 0, 1, 3, 4. Of course Z6 is not a field.

We apply Theorem 3.4 to the finite abelian group G expressing the result in mul-tiplicative notation: there are t non-trivial cyclic subgroups Hi of G such that Hi hasorder di (1 ≤ i ≤ t) where di |di+1 (1 ≤ i < t) and each element of G is uniquely ex-pressible as a product h1h2 · · ·ht (hi ∈ Hi). So G = H1 ×H2 ×· · ·×Ht meaning thatG is the internal direct product of its subgroups H1,H2, . . . ,Ht (it is the multiplicativeversion of Lemma 2.15). We ask the question: is it possible for t to be larger than 1?If so, by the multiplicative version of the |G|-lemma of Section 2.2, we see that the d1

elements h1 of H1 satisfy hd11 = 1, the d2 elements h2 of H2 satisfy h

d22 = 1, and hence

the d1d2 elements h1h2 of G satisfy (h1h2)d2 = 1 as d1|d2. So the polynomial xd2 − 1

with coefficients in F has d1d2 zeros h1h2 in F . Now d1 > 1 as H1 is non-trivialand so d1d2 > d2, showing that the polynomial xd2 − 1 over F has more zeros in F

than its degree! This contradiction shows t ≤ 1. Therefore G is either trivial (t = 0)

or G = H1 is cyclic and non-trivial (t = 1). In any case G is cyclic. �

You are certain to have met the group G of fourth roots of 1, that is, G ={z ∈ C : z4 = 1}. As G = {1,−1, i,−i} where i2 = −1 we see that G is cyclic with


generator i. Similarly (1 + i)/√

2 generates the group of eighth roots of 1. It followsfrom Corollary 3.17 that the only finite subgroups of the group C∗, of non-zero com-plex numbers, are of isomorphism type Cn where n is a positive integer. In fact thesubgroup generated by cos(2π/n) + i sin(2π/n) is the only subgroup of C∗ havingorder n.

Let p be prime. As Zp is a field Corollary 3.17 tells us that Z∗p is cyclic. However

there is no clue as to how a generator of Z∗p might be found, and one may have to

resort to ‘trial and error’ in the hunt for a generator of this group as in Example 2.8.For a group G of prime order p we obtain EndG ∼= Zp and AutG ∼= Z

∗p on taking

t = 1 in Theorem 3.16. Hence from Corollary 3.17 we deduce

AutG has isomorphism type Cp−1 where G is a group of prime order p.

For example let G be the additive group of Z7. Then G = 〈1〉 is a 1-dimensional vectorspace over Z7. The element a ∈ Z7 corresponds to the endomorphism μa of G definedby (g)μa = ag for all g ∈ G. In fact the ring isomorphism θ : EndG ∼= Zp of Theo-rem 3.16 is given by (μa)θ = a for all a ∈ Z7, the 1×1 matrix of μa relative to the ba-sis 〈1〉 of G being simply a (matrix brackets being omitted). We leave the reader to ver-ify that 3 generates Z∗

7 and so α = μ3 generates AutG = {μ1,μ2,μ3,μ4,μ5,μ6, } ={α6, α2, α,α4, α5, α3}.

Next we discuss EndG and AutG in the case of G being finite cyclic. We assume|G| = p

n11 p

n22 · · ·pnk

k and show how AutG decomposes into the direct sum of at mostk + 1 cyclic subgroups.

Lemma 3.18

Let G be an additive abelian group and let m be an integer. The mappingμm : G → G, given by (g)μm = mg for all g ∈ G, is an endomorphism of G. Themapping χ : Z → EndG, where (m)χ = μm for all m ∈ Z, is a ring homomorphism.

Suppose G is finite and cyclic. Then imχ = EndG and kerχ = 〈|G|〉. Henceχ : Z|G| ∼= EndG is a ring isomorphism where (m)χ = μm for all m ∈ Z.

Proof

As (g1 + g2)μm = m(g1 + g2) = mg1 + mg2 = (g1)μm + (g2)μm for all g1, g2 ∈ G

we see μm ∈ EndG. From (g)μm+m′ = (m+m′)g = mg +m′g = (g)μm + (g)μm′ =(g)(μm + μm′) for m,m′ ∈ Z we deduce that χ is additive as (m + m′)χ =μm+m′ = μm + μm′ = (m)χ + (m′)χ . Also (g)μmm′ = mm′g = m′mg = (g)μmμm′for m,m′ ∈ Z and so χ respects multiplication as (mm′)χ = μmm′ = μmμm′ =(m)χ(m′)χ . As (1)χ = μ1 = ι we conclude that χ is a ring homomorphism.

Suppose G is cyclic with generator g0 and consider α ∈ EndG. As (g0)α ∈ G =〈g0〉 there is an integer m with (g0)α = mg0. Each g ∈ G can be expressed as g = m′g0


where m′ ∈ Z and so (g)α = (m′g0)α = m′(g0)α = m′mg0 = mm′g0 = m(m′g0) =mg = (g)μm showing that α = μm. We’ve shown that each endomorphism α of thecyclic group G is of the form μm and so belongs to the image of χ , that is, imχ =EndG.

Now suppose that G is finite and cyclic. By Exercises 2.3, Question 3(b) thekernel of χ is an ideal of Z. By Theorem 1.15 there is a non-negative integer d

with kerχ = 〈d〉. The |G|-lemma of Section 2.2 gives (|G|)χ = μ|G| = 0 and so|G| ∈ kerχ . Hence d||G|. On the other hand μd = 0 and so dg0 = 0. As g0 has order|G| from Theorem 2.5 we deduce |G||d and so |G| = d . We have shown kerχ = 〈|G|〉,that is, the kernel of χ is the principal ideal of Z generated by |G|. Now the quotientring Z/〈|G|〉 is the same as the ring Z|G| of integers modulo |G|. By the first isomor-phism theorem for rings (Exercises 2.3, Question 3(b) again), we see

χ : Z|G| ∼= EndG where (m)χ = μm for all m ∈ Z,

is a ring isomorphism. �

Restricting χ to the group of φ(|G|) units of Z|G| gives a group isomorphismU(Z|G|) ∼= AutG. Restricting the ring isomorphism α of the generalised Chinese re-mainder theorem to U(Z|G|) gives a group isomorphism

U(Z|G|) ∼= U(Zq1) × U(Zq2) × · · · × U(Zqk)

where every two of the prime powers qj = pnj

j have gcd 1 (1 ≤ j ≤ k).Our next theorem reveals the structure of the abelian groups U(Zqj

).

Theorem 3.19

Let n ≥ 3. Then U(Z2n) = 〈−1〉× 〈3〉, that is, U(Z2n) is the internal direct product of

its cyclic subgroups 〈−1〉 of order 2 and 〈3〉 of order 2n−2.Let p be an odd prime and n ≥ 2. Then U(Zpn) is cyclic of order pn − pn−1.

Proof

Our first job is to determine the order of 3 in the multiplicative group U(Z2n). To dothis we show by induction on n, where n ≥ 3, that

32n−2 − 1 = 2n × m for odd m. (❤❤)

As 32 − 1 = 23 × 1 we see that (❤❤) holds for n = 3. Suppose n > 3 and (❤❤)holds with n replaced by n − 1, that is, 32n−3 − 1 = 2n−1 × m where m is odd.


Now 3 ≡ −1 (mod 4) and so 32n−3 + 1 ≡ (−1)2n−3 + 1 (mod 4), that is,32n−3 + 1 ≡ 2 (mod 4) which means 32n−3 + 1 = 2 × m′ where m′ is odd. Fac-torising a difference of two squares gives 32n−2 − 1 = (32n−3 − 1)(32n−3 + 1) =2n−1 × m × 2 × m′ = 2n × mm′ which shows that (❤❤) holds with m replaced bythe odd integer mm′. The induction is now complete. Therefore 32n−2 ≡ 1 (mod 2n)

and so the order of 3 in U(Z2n) is a divisor of 2n−2. However for n ≥ 4 the aboveinduction shows 32n−3 − 1 = 2n−1m = 2n−1(2l + 1) for some integer l and so32n−3 ≡ 2n−1 + 1 (mod 2n). Hence 32n−3 �≡ 1 (mod 2n) for n ≥ 4. As 3 �≡ 1 (mod 8)

we conclude that the order of 3 in U(Z2n) is 2n−2 as this order is not a divisor of 2n−3

for n ≥ 3.The cyclic subgroup 〈3〉 of U(Z2n) contains a unique element of order 2, namely

(3)2n−3. Also −1 has order 2 in U(Z2n). Is it possible for these two elements of order

2 to coincide? We suppose for a moment that (3)2n−3 = −1 in Z2n . As 3 �= −1 in Z8

we see n ≥ 4. Hence 2n−1 + 1 ≡ −1 (mod 2n) from the above paragraph; so 2n is adivisor of 2n−1 +2 which happens only for n = 2. The conclusion is: 〈−1〉∩〈3〉 = 〈1〉,that is, the cyclic subgroups 〈−1〉 and 〈3〉, of orders 2 and 2n−2, have trivial intersec-tion. Hence the two cosets 〈3〉 and −1〈3〉 are different and so disjoint. Together thesecosets account for 2n−2 + 2n−2 = 2n−1 elements of U(Z2n). But |U(Z2n)| = φ(2n) =2n − 2n−1 = 2n−1 and so 〈−1〉〈3〉 = U(Z2n), that is, the product of the subgroups〈−1〉 and 〈3〉 is the whole group U(Z2n). Therefore U(Z2n) = 〈−1〉 × 〈3〉, that is,U(Z2n) is the direct product of 〈−1〉 and 〈3〉.

Let p be an odd prime. By Corollary 3.17 there is a generator a of Z∗p = U(Zp).

As |Z∗p| = p − 1 we obtain (a)p−1 = 1 and so ap−1 ≡ 1 (mod p). It turns out, as we

prove in a moment, that the integer a can always be chosen with ap−1 �≡ 1 (mod p2).In fact suppose ap−1 ≡ 1 (mod p2). Consider b = a + p. Then a = b in Zp and so b

generates Z∗p . Using the binomial theorem

bp−1 = (a + p)p−1 =p−1∑

i=0

(p − 1)!i!(p − 1 − i)!a

p−1−ipi .

All terms in the summation with i ≥ 2 have factor p2 and so

bp−1 ≡ 1 + (p − 1)ap−2p (mod p2)

on using ap−1 ≡ 1 (mod p2). Hence bp−1 �≡ 1 (mod p2) as (p − 1)ap−2p �≡0 (mod p2) since (p − 1)ap−2 is not divisible by p. We have shown that if theoriginal integer a fails to satisfy ap−1 �≡ 1 (mod p2), then b = a + p satisfiesbp−1 �≡ 1 (mod p2).

So there is an integer a such that a generates Z∗p and ap−1 �≡ 1 (mod p2). Write

xn for the congruence class of a modulo pn and write en for the congruence class of1 modulo pn. So xn and en are elements of Zpn . As gcd{a,p} = 1 we see that xn

belongs to the multiplicative group U(Zpn) which has identity element en. The proof


is finished by showing that xn generates U(Zpn) for all n ≥ 2. To do this we show byinduction on n

a(p−1)pn−2 �≡ 1 (mod pn) for all n ≥ 2. (♠)

Now (♠) holds for n = 2 by our choice of the integer a. So suppose n ≥ 3 anda(p−1)pn−3 �≡ 1 (mod pn−1). Now xn−2 belongs to U(Zpn−2) which is a multiplicativegroup of order φ(pn−2) = pn−2 − pn−3 = (p − 1)pn−3 with identity element en−2.Using the multiplicative version of the |G|-lemma we obtain (xn−2)

(p−1)pn−3 = en−2

which gives a(p−1)pn−3 ≡ 1 (mod pn−2). So a(p−1)pn−3 = 1+ rpn−2 where, by induc-tive hypothesis, r is not divisible by p. We raise this last equation to the power p anduse the binomial expansion

a(p−1)pn−2 = (a(p−1)pn−3)p = (1 + rpn−2)p =

p∑

i=0

p!i!(p − i)! r

ipi(n−2).

The term given by i = 2 in the sum is divisible by p1+2(n−2) and 1 + 2(n − 2) ≥ n

as n ≥ 3. Each of the following terms (with 3 ≤ i ≤ p) is divisible by pi(n−2) andi(n−2) ≥ 3(n−2) ≥ n as n ≥ 3. So all terms, except the first two, in the above sum aredivisible by pn and hence a(p−1)pn−2 ≡ 1 + rpn−1 (mod pn). As rpn−1 �≡ 0 (mod pn)

we conclude that a(p−1)pn−2 �≡ 1 (mod pn) completing the inductive step. So (♠) isestablished.

The end of the proof is now in sight! Let r be the order of xn. Then xn gener-ates a cyclic subgroup of order r . By Lagrange’s theorem r is a divisor of φ(pn) =(p − 1)pn−1 = |U(Zpn)| and so r = spt where s|p − 1 and 0 ≤ t < n. From (♠) wededuce xn(p − 1)pn−2 �= en. Hence r is not a divisor of (p − 1)pn−2 and so t ≤ n− 2is impossible, that is, t = n − 1. Now xr

n = en which means ar ≡ 1 (mod pn), and soar ≡ 1 (mod p), that is, (a)r = 1 in Z

∗p . Therefore (p − 1)|r as p − 1 is the order of

a in Z∗p . As gcd{p − 1,pn−1} = 1 we deduce (p − 1)|s and so s = p − 1. We have

shown that xn has order r = (p − 1)pn−1 = |U(Zpn)| and so xn generates U(Zpn). �

The reader should check that U(Z4) has order 2 and so is cyclic; in factU(Z4) = 〈3〉. Let k be the number of different prime divisors of n. CombiningLemma 3.14, Corollary 3.17, Lemma 3.18, Theorem 3.19 and the Chinese remain-der theorem we see that AutZn is the direct product of k non-trivial cyclic subgroupsif either n is odd or n ≡ 4 (mod 8). Further AutZn is the direct product of k − 1 ork + 1 non-trivial cyclic subgroups according as n ≡ 2 (mod 4) or n ≡ 0 (mod 8).

We look at a few particular cases. The automorphisms η and μ of G = Z16 definedby (g)η = −g, (g)μ = 3g for all g ∈ G correspond to the elements −1 and 3 ofU(Z16) under the isomorphism χ of Lemma 3.18, that is, (−1)χ = η, (3)χ = μ. Theorders of η and μ are 2 and 4 respectively and AutG = 〈η〉× 〈μ〉 consists of the eight


automorphisms

ι, η, μ, ημ, μ2, ημ2, μ3, ημ3, μ4, ημ4

by Theorem 3.19. The isomorphism class of the abelian group AutZ16 is C2 ⊕ C4.The element 2 of Z3 generates Z

∗3. Here a = 2, p = 3 as in Theorem 3.19. As

ap−1 = (2)2 = 4 and 4 �≡ 1 (mod 9) we see U(Zn3) is cyclic with generator 2 ∈ Z3n

for all positive integers n. In particular

U(Z9) = {2, (2)2, (2)3, (2)4, (2)5, (2)6} = {2,4,8,7,5,1}.The reader may verify that U(Z27) consists of the 18 powers of 2 ∈ Z27.

As 800 = 25 × 52 we see from Theorem 3.19 and the discussion preceding it thatAutZ800 is the direct product of cyclic groups of order 2, 8 and 20. By Theorem 2.11

C2 ⊕ C8 ⊕ C20 = C2 ⊕ C8 ⊕ C4 ⊕ C5 = C2 ⊕ C4 ⊕ (C8 ⊕ C5)

= C2 ⊕ C4 ⊕ C40

showing that AutZ800 has invariant factor sequence (2,4,40).

Let G be an f.g. abelian group and let H be a subgroup of G. What is the con-nection between the invariant factors of G, the invariant factors of H and those ofG/H ? We begin by looking at the torsion-free rank of these groups. Remember thatthe torsion-free rank of G, which we denote by tf rankG, is the number of zero invari-ant factors of G.

Lemma 3.20

Let H be a subgroup of a finitely generated abelian group G. Then H and G/H arefinitely generated and tf rankG/H + tf rankH = tf rankG.

Proof

Let g1, g2, . . . , gt generate G and let θ : Zt → G be the surjective Z-linear mappingdefined, as usual, by (m1,m2, . . . ,mt )θ = m1g1 + m2g2 + · · · + mtgt for all integersm1,m2, . . . ,mt . Write z = (m1,m2, . . . ,mt ) and let K ′ = {z ∈ Z

t : (z)θ ∈ H }. ThenK ′ is a submodule of Zt and so K ′ is free of rank s say where s ≤ t by Theorem 3.1.Also K ⊆ K ′ where K = ker θ since H contains the zero element of G. So rankK ≤ s

by Theorem 3.1. From the proof of Theorem 3.4 we see tf rankG = t − rankK .Let z1, z2, . . . , zs be a Z-basis of K ′ and let θ ′ be the restriction of θ to K ′.

Then θ ′ : K ′ → H is the Z-linear mapping defined by (z)θ ′ = (z)θ for all z ∈ K ′and 〈(z1)θ

′, (z2)θ′, . . . , (zs)θ

′〉 = im θ ′ = H is finitely generated. As ker θ ′ = K wesee, as above, tf rankH = s − rankK .


Consider the composite Z-linear mapping θη : Zt → G/H where η : G → G/H

is the natural mapping. Now 〈(e1)θη, (e2)θη, . . . , (et )θη〉 = im θη = G/H is finitelygenerated. As ker θη = K ′ we see, once again as above, tf rankG/H = t − s. Sotf rankG = t − rankK = (t − s) + (s − rankK) = tf rankG/H + tf rankH . �

From Lemma 3.20 we deduce

tf rankH ≤ tf rankG and tf rankG/H ≤ tf rankG.

Definition 3.21

A sequence (d1, d2, . . . , ds) of non-negative integers with di |di+1 for 1 ≤ i < s iscalled a divisor sequence of length s. The divisor sequence (d ′

1, d′2, . . . , d

′s) is called a

subsequence of the divisor sequence (d1, d2, . . . , ds) if d ′i |di for 1 ≤ i ≤ s.

For instance the divisor sequence (2,2,6) of length 3 has subsequences (1,1,1),(1,1,2), (1,1,3), (1,1,6), (1,2,2), (1,2,6), (2,2,2), (2,2,6). It follows fromour next theorem that each abelian group of isomorphism type C2 ⊕ C2 ⊕ C6 hassubgroups of the eight isomorphism types C1, C2, C3, C6, C2 ⊕ C2, C2 ⊕ C6,C2 ⊕ C2 ⊕ C2, C2 ⊕ C2 ⊕ C6 – no more and no less – corresponding to the eightsubsequences of (2,2,6).

In effect we have already met divisor sequences in Chapter 1. Let A be an s × t

matrix over Z where s ≤ t . Then the invariant factors di of A form a divisor sequence(d1, d2, . . . , ds) of length s. By Corollary 1.20 the equivalence classes of such matricesA correspond bijectively in this way to divisor sequences of length s.

Theorem 3.22

Let H be a subgroup of the finite abelian group G. Let (d1, d2, . . . , dt ) be the invariantfactor sequence of G. Then H has invariant factor sequence (d ′

1, d′2, . . . , d

′s) where

s ≤ t and d ′k|dt−s+k for 1 ≤ k ≤ s.

Proof

By Theorem 3.4 there are cyclic subgroups Hj of G such that Hj has isomorphismtype Cdj

for 1 ≤ j ≤ t and G = H1 ⊕ H2 ⊕ · · · ⊕ Ht . Let θ : Zt → G be defined by(m1,m2, . . . ,mt )θ = m1h1 + m2h2 + · · · + mtht for all z = (m1,m2, . . . ,mt ) ∈ Z

t

where hj generates Hj for 1 ≤ j ≤ t . Then K = ker θ has Z-basis consisting of therows of the t × t matrix D = diag(d1, d2, . . . , dt ). As in the proof of Lemma 3.20write K ′ = {z ∈ Z

t : (z)θ ∈ H }. Then K ′ is a free submodule of Zt of rank t by Theo-rems 3.1 and 3.4. Let the rows of the t × t matrix B form a Z-basis of K ′. As 0 ∈ H we


see K ⊆ K ′ and so each row of D is an integer linear combination of the rows of B . Inother words there is a t × t matrix A over Z with D = AB . As detD = d1d2 · · ·dt > 0we see detA �= 0 and so the hypothesis of Theorem 1.21 is satisfied: the matrices A, B ,AB have positive invariant factors. By Theorem 1.21 the invariant factors of A forma subsequence (1,1, . . . ,1, d ′

1, d′2, . . . , d

′s) of the divisor sequence (d1, d2, . . . , dt ) of

invariant factors of AB where d ′1 > 1; so s is the number of invariant factors of A

which are different from 1 and d ′k|dt−s+k for 1 ≤ k ≤ s.

Consider the restriction of θ to K ′, that is, the Z-linear mapping θ ′ : K ′ → H

defined by (k′)θ ′ = (k′)θ for all k′ ∈ K ′. From the proof of Lemma 3.20, we haveim θ ′ = H and ker θ ′ = K . Write zi = eiB for row i of B and hi = (zi)θ

′ (1 ≤ i ≤ t).Now ϕ : Zt ∼= K ′, defined by (z)ϕ = zB for all z = (m1,m2, . . . ,mt ) ∈ Z

t , is an iso-morphism since z1, z2, . . . , zt is a Z-basis of K ′. As h1, h2, . . . , ht generate H and(ei)ϕθ ′ = (zi)θ

′ = hi for 1 ≤ i ≤ t , the composite Z-linear mapping

ϕθ ′ : Zt → H is given by (m1,m2, . . . ,mt )ϕθ ′ = m1h1 + m2h2 + · · · + mtht

and so is the analogue of θ : Zt → G. Therefore the invariant factors of H can befound using Theorem 3.4 with ϕθ ′ in place of θ . In fact there is just one more pieceof the jigsaw to be put in place! As ϕ is an isomorphism we see imϕθ ′ = H and(kerϕθ ′)ϕ = ker θ ′ = K . The equation AB = D gives the t row equations (eiA)ϕ =eiAB = eiD = diei for 1 ≤ i ≤ t . From K = 〈d1e1, d2e2, . . . , dt et 〉 we deduce that therows e1A,e2A, . . . , etA of A form a Z-basis of kerϕθ ′. By Theorem 3.4 the invariantfactors of H are the invariant factors �=1 of A, that is, d ′

1, d′2, . . . , d

′s . �

Let H be a subgroup of the f.g. abelian group G. From Lemma 3.20 and Theo-rem 3.22 it is possible to give a complete description of the invariant factors of H interms of those of G. Suppose that G has torsion-free rank r and the torsion subgroupT (G) of G has invariant factor sequence (d1, d2, . . . , dt ). Then H has torsion-freerank at most r by Lemma 3.20 and its torsion subgroup T (H) has invariant factorsequence (d ′

1, d′2, . . . , d

′s) where s ≤ t and d ′

i |dt−s+i for 1 ≤ i ≤ s by Theorem 3.22,as T (H) is a subgroup of T (G).

For example suppose G has invariant factor sequence (2,2,6,0,0) and let H bea subgroup of G. Comparing torsion subgroups, there are 8 possible isomorphismtypes for T (H), namely those listed after Definition 3.21, as T (H) is a subgroup ofT (G) and T (G) has isomorphism type C2 ⊕ C2 ⊕ C6. There are 3 possible values fortf rankH , namely 0, 1, 2 as 0 ≤ tf rankH ≤ tf rankG = 3 by Lemma 3.20. So G hasexactly 8 × 3 = 24 isomorphism types of subgroups H .

Finally we discuss the connection between the invariant factors of an f.g. abeliangroup G and those of its homomorphic images G/H . As one might expect G/H can-not have more invariant factors than G and the invariant factors of G/H are divisorsof the last so many corresponding invariant factors of G.


Theorem 3.23

Let (d1, d2, . . . , dt ) be the invariant factor sequence of the finitely generated abeliangroup G. Let H be a subgroup of G. Then G/H has invariant factor sequence(d ′

1, d′2, . . . , d

′t ′) where t ′ ≤ t and d ′

k|dt−t ′+k for 1 ≤ k ≤ t ′.

Proof

By Theorem 3.4 there are cyclic subgroups Hj of G such that Hj has isomorphismtype Cdj

for 1 ≤ j ≤ t and G = H1 ⊕ H2 ⊕ · · · ⊕ Ht . Let θ : Zt → G be defined by(m1,m2, . . . ,mt )θ = m1h1 + m2h2 + · · · + mtht for all z = (m1,m2, . . . ,mt ) ∈ Z

t

where hj generates Hj for 1 ≤ j ≤ t . Then K = ker θ has Z-basis consisting of therows of the r × t matrix D = diag(d1, d2, . . . , dr ) where d1, d2, . . . , dr are the non-zero invariant factors of G. As before write K ′ = {z ∈ Z

t : (z)θ ∈ H } and considerthe composite Z-linear mapping θη : Zt → G/H where η : G → G/H is the naturalmapping. Then im θη = G/H and ker θη = K ′. Write s = rankK ′. Then r ≤ s ≤ t

and d ′k = 0 for s − t + t ′ < k ≤ t ′ as the torsion-free rank of G/H is t − s. There is an

s × t matrix B over Z such that its rows form a Z-basis of K ′. Using the method ofTheorem 3.4 with G, θ , A replaced by G/H , θη, B respectively, we see that the firstt − t ′ invariant factors of B are equal 1 and the remainder are d ′

1, d′2, . . . d

′s−t+t ′ .

As K ⊆ K ′ each row of D is an integer linear combination of the rows of B ,that is, there is an r × s matrix A over Z such that AB = D. As the rows of D areZ-independent so also are the rows of A. Hence the conditions of Theorem 1.21 arefulfilled: the invariant factors of A, B and D are positive. By Theorem 1.21 we seed ′k|dt−t ′+k for 1 ≤ k ≤ r + t ′ − t . But d ′

k|dt−t ′+k as dt−t ′+k = 0 for r + t ′ − t < k ≤ t ′,completing the proof. �

As an illustration suppose G has invariant factor sequence (2,6,0,0). Then G hashomomorphic images G/H with invariant factor sequences (3,6,0) and (2,2,2,2)

for instance. But neither (4,4,8) nor (3,3,6,0) can arise as the invariant factor se-quence of a homomorphic image of G.

EXERCISES 3.3

1. (a) Let α and α′ be endomorphisms of the abelian group G. Show thatα + α′ is an endomorphism of G where (g)(α + α′) = (g)α + (g)α′for all g ∈ G.Show that the set EndG of all endomorphisms of the abelian group G,with the binary operation of addition as above, is itself an abeliangroup.


Verify the distributive law α(α′ + α′′) = αα′ + αα′′ where α,α′, α′′ ∈EndG.

(b) Find the smallest abelian group G such that EndG is a non-commu-tative ring. How many automorphisms does G have?

(c) Let G be the additive group Z2 and let α0 ∈ EndG be given by

(m1,m2)α0 = (0,m1) for all m1,m2 ∈ Z. Write down the matrix ofα0 relative to the standard Z-basis e1 = (1,0), e2 = (0,1) of G. WriteZ(α0) = {α ∈ EndG : αα0 = α0α}. Determine the matrices relative toe1, e2 of the endomorphisms α in Z(α0). Use Theorem 3.15 to decidewhether or not Z(α0) is a commutative subring of EndG. If so, findthe isomorphism type of the additive groups of EndG,Z(α0) and thatof the multiplicative group U(Z(α0)).

(d) Let G be the additive group of the ring Z3 ⊕ Z3 and let α0 ∈ EndG

be given by (x1, x2)α0 = (−x2, x1) for all x1, x2 ∈ Z3. Determinethe matrices relative to e1, e2 of the endomorphisms α in Z(α0) ={α ∈ EndG : αα0 = α0α}. Use Theorem 3.16 to find the isomorphismtypes of the additive groups of EndG and Z(α0). Is Z(α0) a field?What is the isomorphism type of the multiplicative group U(Z(α0)).Hint: Show a2 + b2 = 0 where a, b ∈ Z3 has only the one solutiona = b = 0.

(e) Let G be the additive group of the ring Z5 ⊕Z5 and let α0 ∈ EndG begiven by (x1, x2)α0 = (−x2, x1) for all x1, x2 ∈ Z5. Find the isomor-phism types of the additive groups of EndG and Z(α0). Show thatthere are 9 ordered pairs (a, b) ∈ Z5 × Z5 satisfying a2 + b2 = 0.Hence show |U(Z(α0))| = 16. By considering the multiplicativecyclic subgroups generated by

(2 00 2

)and by

(1 1

−1 1

)determine the

isomorphism type of U(Z(α0)).(f) Let G be a finite additive abelian group with |G| = p

n11 p

n22 . . . p

nk

k

where n1, n2, . . . , nk are positive integers and p1,p2, . . . , pk are dis-tinct primes. Let αj ∈ EndGpj

for 1 ≤ j ≤ k where Gpjis the

pj -component of G. Use Theorem 3.10 to show that α =α1 ⊕ α2 ⊕ · · · ⊕ αk is an endomorphism of G where (g)α =∑k

j=1(gj )αj with g = ∑kj=1 gj and gj ∈ Gpj

for 1 ≤ j ≤ k. Showalso that each endomorphism β ∈ EndG can be expressed as β =β1 ⊕ β2 ⊕ · · · ⊕ βk for unique βj ∈ EndGpj

. Deduce that EndG ∼=EndGp1 ⊕ EndGp2 ⊕ · · · ⊕ EndGpk

, that is, the ring EndG is iso-morphic to the external direct sum of the endomorphism rings of theprimary components of G.Hint: Mimic the method of Exercises 3.2, Question 5(b) substitutingendomorphisms for automorphisms.

2. (a) Show that the 16 elements of the form (−1)r × (3)s where 0 ≤ r < 2,0 ≤ s < 8 and −1,3 ∈ Z32 are distinct. Is each element of U(Z32) of


this type? (Yes/No). State the isomorphism type of the multiplicativegroup U(Z32).

(b) Verify that 2 ∈ Z3 has multiplicative order 2 and 22 �≡ 1 (mod 9).Verify that 2 ∈ U(Z81) has order 54.Hint: It is enough to show 218 �≡ 1 (mod 81), 227 ≡ −1 (mod 81). UseTheorem 3.19 to find the order of 2 in Z243.

(c) Find an integer a with 1 ≤ a < 7 such that a in Z7 has multiplicativeorder 6 and a6 �≡ 1 (mod 49). Deduce from Theorem 3.19 that a inU(Z49) has order 42. What is the order of a in U(Z(49)2)?

(d) Let p be an odd prime and a an integer with 1 ≤ a < p such that a

in Zp has multiplicative order p − 1. Is it necessarily the case thatap−1 �≡ 1 (mod p2)?Hint: Consider p = 29, a = 14.

(e) Let p be an odd prime and let a,n be integers with n ≥ 2. Denotethe congruence class of a modulo pn by xn. Use the proof of The-orem 3.19 and the ring homomorphism θ2 : Zpn → Zp2 given by(xn)θ2 = x2 to show that xn generates U(Zpn) if and only if x2 gener-ates U(Zp2).

(f) Let p be an odd prime. Show by induction on i ≥ 0 that there is an

integer ki with gcd{ki,p} = 1 satisfying (1 + p)pi = 1 + kip

i+1. Letg denote the congruence class of 1 + p modulo pn,n ≥ 2. Takingi = n − 2 and i = n − 1 in turn, deduce that g has order pn−1 inU(Zpn). Show that U(Zpn) contains an element h of order p − 1 and〈gh〉 = U(Zpn).Hint: Use Corollary 3.17 and the first parts of Exercises 2.1, Ques-tion 4(b) and Exercises 2.2, Question 4(e).

3. (a) In each of the following cases express U(Z|G|) as a direct product ofcyclic groups and hence find the invariant factors of AutG where G

is a finite cyclic group.

(i) |G| = 105; (ii) |G| = 100; (iii) |G| = 98;(iv) |G| = 96.

(b) Let G be cyclic of order n and let H = {θ ∈ AutG : θ = θ−1}. Is H

an elementary abelian 2-group? (Yes/No). Show that the l invariantfactors of AutG are even and |H | = 2l .

(c) Let G be a cyclic group of order pn where p is an odd prime andn ≥ 1. Show that AutG has a unique element of order 2. Are thereany other finite cyclic groups G with this property?

(d) Specify, in terms of their prime power factorisation, the orders of allfinite cyclic groups G such that AutG has exactly two invariant fac-tors.


(e) Find the six integers |G| such that G is cyclic and AutG is a non-trivial elementary abelian 2-group.

(f) Verify that the cyclic groups of orders 21, 28, 36, 42 have isomorphicautomorphism groups. Find the eight integers |G| such that AutZ|G|has isomorphism type C2 ⊕ C12.

4. (a) List the subsequences of the divisor sequence (2,4,4) and those of thedivisor sequence (1,3,9). How many subsequences does the divisorsequence (2,12,36) have? (You needn’t list them all! Use the con-clusion of the following paragraph.) How many isomorphism typesof abelian groups occur among the subgroups of an abelian group ofisomorphism type C2 ⊕ C12 ⊕ C36?Let (d1, d2, . . . , ds) and (δ1, δ2, . . . , δs) be divisor sequences of lengths where ds and δs are positive integers with gcd{ds, δs} = 1. Explainwhy every subsequence of the divisor sequence (d1δ1, d2δ2, . . . , dsδs)

is uniquely expressible in the form (d ′1δ

′1, d

′2δ

′2, . . . , d

′sδ

′s) where

(d ′1, d

′2, . . . , d

′s) is a subsequence of (d1, d2, . . . , ds) and (δ′

1, δ′2, . . . , δ

′s)

is a subsequence of (δ1, δ2, . . . , δs). What is the connection betweenthe numbers of these subsequences?Find the number of subsequences of (6,60,600).

(b) Can an abelian group G of isomorphism type C2 ⊕ C12 ⊕ C84 have asubgroup H of isomorphism type

(i) C14; (ii) C3 ⊕ C3 ⊕ C14; (iii) C8?

(c) Let G denote the additive group of Z2 ⊕Z8. Find a subgroup H of G

such that H and G/H are isomorphic.(d) Let G be an additive abelian group Z2 ⊕ Z8 ⊕ Z. List the 14 isomor-

phism types of subgroups H of G. Find three subgroups H of G,no two being isomorphic, such that G/H is of isomorphism typeC2 ⊕ C12. Specify the isomorphism types of the non-cyclic quo-tient groups G/H where |G/H | ≤ 100 and state their total num-ber.

5. (Counting endomorphisms and automorphisms of a finite abelian group.)(a) Let G = H1 ⊕ H2 ⊕ · · · ⊕ Hs be a finite abelian group where

Hi = 〈hi〉, di is the order of hi for 1 ≤ i ≤ s, d1 > 1 and di |dj for1 ≤ i ≤ j ≤ s. So G has invariant factor sequence (d1, d2, . . . , ds).Let α ∈ EndG. There are integers aij with (hi)α = ∑s

j=1 aijhj for1 ≤ i ≤ s. Show that the s × s matrix A = (aij ) over Z satisfies theendomorphism condition

diaij ≡ 0 (mod dj ) for 1 ≤ i, j ≤ s.


For i ≥ j show diaij ≡ 0 (mod dj ) holds for arbitrary integers aij .For i < j show diaij ≡ 0 (mod dj ) ⇔ aij ≡ 0 (mod dj /di).Show that the matrices in the ring Ms(Z) which satisfy the endomor-phism condition form a subring RG.Conversely for each A = (aij ) ∈ RG let α : G → G be defined by(g)α = ∑s

j=1 yjhj where g = x1h1 + x2h2 + · · · + xshs and yj =∑si=1 xiaij . Show that α is unambiguously defined and α ∈ EndG.

Write (A)ϕ = α. Show that ϕ : RG → EndG is a ring homomorphismwith imϕ = EndG and kerϕ = {C = (cij ) : cij ≡ 0 (mod dj ) for1 ≤ i, j ≤ s}. Deduce RG/kerϕ ∼= EndG. Show that each elementof RG/kerϕ contains a unique matrix A = (aij ) with 0 ≤ aij < dj .Hence prove the analogue of Frobenius’ theorem Corollary 6.34,namely

|EndG| =s∏

i=1

d2s−2i+1i .

Hint: Each α corresponds to its ‘reduced’ matrix A = (aij ) as above,there being di or dj choices for each aij according as i < j or i ≥ j .

(b) Let π denote a permutation of the set S = {1,2, . . . , s} and supposei0, j0 ∈ S satisfy (j0)π = i0. Show that there is a positive integerk with (i0)π

k = j0. Show that the least such integer k is such thati0, (i0)π, (i0)π

2, . . . , (i0)πk−1 are distinct.

Hint: K = {l ∈ Z : (i0)πl = i0} is an ideal of Z.Let A = (aij ) belong to the ring RG of (a) above. The (i0, j0)-entry inthe adjugate matrix adjA is the cofactor Aj0i0 and a typical term (apartfrom sign) in Aj0i0 is tπ = ∏

j �=j0aj (j)π . Show di0 tπ ≡ 0 (mod dj0)

and deduce di0Aj0 i0 ≡ 0 (mod dj0), that is, adjA ∈ RG.Hint: tπ has factor ai0(i0)πa(i0)π(i0)π

2 · · ·a(i0)πk−1j0

.(c) Let G be a finite abelian p-group, p prime. Suppose G has m1 in-

variant factors pt1 , m2 invariant factors pt2, . . . ,ml invariant fac-tors ptl where t1 < t2 < · · · < tl . Let the s × s matrix A = (aij )

over Z be reduced and belong to the ring RG of (a) above wheres = m1 + m2 + · · · + ml . Show

α = (A)ϕ ∈ AutG ⇔ |A| �≡ 0 (mod p).

Hint: For ⇐ consider b ∈ Z with b|A| ≡ 1 (mod ptl ). Show that B =b adjA over Z satisfies AB ≡ I (mod ptl ), i.e. corresponding entriesin AB and I are congruent modulo ptl . Deduce AB ≡ I (mod kerϕ)

and using (b) above conclude α−1 = (B)ϕ.


Partition

A =

⎛

⎜⎜⎜⎜⎝

M11 M12 . . . M1l

M21 M22 . . . M2l

......

. . ....

Ml1 Ml2 · · · Mll

⎞

⎟⎟⎟⎟⎠

where Mij is the indicated mi × mj submatrix of A. Show thatall entries in Mij (i < j) are divisible by p and deduce |A| ≡|M11| × |M22| × · · · × |Mll | (mod p). Write rm = |GLm(Zp)|/p(m2).Show |AutG| = rm1rm2 · · · rml

|EndG| by counting the number ofchoices for each Mij .

(d) Find a formula involving p, ti , mi (1 ≤ i ≤ l) for the number |EndG|of endomorphisms of the finite abelian p-group G of (c) above.

(e) Let G have isomorphism type C3 ⊕ C9 ⊕ C9. Using the terminologyof (a) above, verify that

A =⎛

⎝2 3 61 7 40 1 5

⎞

⎠

satisfies the endomorphism condition and is reduced. Verify|A| �≡ 0 (mod 3) and find the 3 × 3 reduced matrix B satisfyingAB ≡ I (mod kerϕ). Find the multiplicative order of the automor-phism α = (A)ϕ of G. Is α−1 = (B)ϕ? (Yes/No).

(f) Calculate, in factorised form, |EndG| and |AutG| the isomorphismclass of G being:

(i) C4 ⊕ C8 ⊕ C8; (ii) C3 ⊕ C3 ⊕ C9;(iii) C12 ⊕ C24 ⊕ C72.

(g) The abelian group G has invariant factor sequence (d1, d2).Suppose G finite. By considering

A =(

1 01 1

)and B =

(1 d2/d1

0 1

)

show that AutG is non-abelian.Suppose G infinite and so d2 = 0. Find d1 with AutG abelian. Showthat d1 is unique.

6. (a) Let G and G′ be additive abelian groups and let α : G → G′ andβ : G → G′ be group homomorphisms. Show that α + β : G → G′,defined by (g)(α + β) = (g)α + (g)β for all g ∈ G, is also a group


homomorphism. Hence show that the set Hom(G,G′) of all grouphomomorphisms α : G → G′, with the above binary operation of ad-dition, is an additive abelian group.

(b) Let G, G1, G2, G′, G′1, G′

2 be additive abelian groups. Establishgroup isomorphisms:

Hom(G1 ⊕ G2,G′) ∼= Hom(G1,G

′) ⊕ Hom(G2,G′),

Hom(G,G′1 ⊕ G′

2)∼= Hom(G,G′

1) ⊕ Hom(G,G′2).

(c) Let m and n be positive integers. Use Exercises 2.1, Question 4(b)to show Hom(Zm,Zn) is cyclic of order gcd{m,n}. Show also thatHom(Z,Zn), Hom(Zm,Z) and Hom(Z,Z) are cyclic and state theirisomorphism types. Suppose the abelian groups G and G′ are finitelygenerated. Using Theorem 3.4 deduce that Hom(G,G′) is finitelygenerated.

(d) The finite abelian groups G and G′ have invariant factor sequences(d1, d2, . . . , ds) and (d ′

1, d′2, . . . , d

′s′) respectively. Express

Hom(G,G′) as a direct sum of cyclic groups. Are the groupsHom(G,G′) and Hom(G′,G) isomorphic? Specify the invariant fac-tors of EndG = Hom(G,G).List the invariant factors of Hom(G,G′), Hom(G,G ⊕ G′),Hom(G ⊕ G′,G′) and EndG in the case (d1, d2, . . . , ds) = (2,6,12),(d ′

1, d′2, . . . , d

′s′) = (3,3,6,24).

(e) The f.g. abelian groups G and G′ decompose G = T (G)⊕M(G) andG′ = T (G′) ⊕ M(G′) where T (G), T (G′) are the torsion subgroupsof G, G′ and M(G), M(G′) are free of rank r , r ′ respectively as inCorollary 3.5. Express the torsion subgroup and torsion-free rank ofHom(G,G′) and EndG using T (G), T (G′), r , r ′ and Hom.List the invariant factors of Hom(G,G′), Hom(G′,G), EndG, andEndG′ in the case of T (G), T (G′) having invariant factor sequences(2,4), (2,2,4,4) respectively and r = 1, r ′ = 2.

Part IISimilarity of Square Matrices over a Field

A Bird’s-Eye View of Similarity of t × t Matrices overa Field

The term ‘similarity’ in this context has a specific meaning: the t × t matrices A andB over a field F are called similar if there is an invertible t × t matrix X over F suchthat XA = BX and so XAX−1 = B. Similarity is an equivalence relation on the setMt (F ) of all t × t matrices A over F . The multiplicative group of all invertible t × t

matrices X over F is denoted by GLt (F ). So the similarity class of the t × t matrix A

over F is the set {XAX−1 : X ∈ GLt (F )} of all t × t matrices over F which are similarto A. How does similarity in this sense arise? Let V be a t-dimensional vector spaceover the field F and let α : V → V be a linear mapping of V . Also let v1, v2, . . . , vt bea basis B of V . The reader will know that many problems in linear algebra are solvedby the choice of a suitable basis. The t × t matrix A = (aij ) over F defined by

(vi)α = ai1v1 + ai2v2 + · · · + ait vt for 1 ≤ i ≤ t

is called the matrix of α relative to B as in Theorem 3.15. The question is: how doesthe matrix of α change when the basis B of V is changed? The answer is: A changesto a similar matrix XAX−1. In fact the matrices XAX−1 of a given linear mapping α

of V relative to bases of V are precisely the elements of the similarity class of A. Thereason will be known to the reader: each ordered pair of bases of V is connected byan invertible matrix X. The details are revised in Chapter 5.

How can one tell whether two t × t matrices over F are similar or not? It turnsout that this question is analogous to a problem we have already solved, namely: how

154

to determine whether two finite abelian groups are isomorphic or not. As we saw inSection 3.1, this problem is solved by expressing each finite Z-module G as a directsum of cyclic submodules from which the invariant factors and the isomorphism typeof G can be read off. What is the analogous procedure starting with a given t × t

matrix A over a field F ? The first step is to construct

the F [x]-module M(A) determined by A.

The elements of M(A) are t-tuples v = (a1, a2, . . . , at ) where ai ∈ F for 1 ≤ i ≤ t . SoM(A) = F t (set equality) and we’ll refer to the elements of M(A) as vectors; indeedM(A) is a t-dimensional vector space over F when the extra structure we are about tointroduce is ignored! Here F [x] denotes the ring of polynomials in x over the field F .We review the properties of F [x] in Chapter 4, the point being that F [x] behaves in thesame way as the ring Z of integers: there is a division law for polynomials over F , theEuclidean algorithm can be used to calculate gcds of pairs of polynomials over F , andF [x] is a principal ideal domain. So it is reasonable to expect F [x]-modules to behavein the same way as Z-modules and this is indeed the case. The extra structure referredto above is the module product f (x)v which belongs to M(A) for all f (x) ∈ F [x] andall v ∈ M(A). We write

xv = vA for all v ∈ M(A)

and so multiplication of v by x (on the left) is defined to be the matrix product vA;as A is a t × t matrix over F and v ∈ F t we see vA ∈ F t , that is, vA ∈ M(A). In thesame way

x2v = vA2 for all v ∈ M(A)

the general rule being

(anxn + · · · + a1x + a0)v = v(anA

n + · · · + a1A + a0I )

for all v ∈ M(A) and an, . . . , a1, a0 ∈ F

where I denotes the t × t identity matrix over F . The above equation can be expressed

f (x)v = vf (A)

where f (x) = anxn + · · · + a1x + a0 is a typical element of F [x] and f (A) =

anAn + · · · + a1A + a0I is the corresponding t × t matrix over F obtained from

f (x) by substituting A for x. Let us leave to one side the routine proof that M(A) isan F [x]-module. Notice that M(A) = 〈e1, e2, . . . , et 〉 showing that M(A) is finitelygenerated (Definition 2.19). Suppose there is v0 ∈ M(A) such that every element ofthe module M(A) is of the form f (x)v0 for some f (x) ∈ F [x]. Then M(A) is calledcyclic with generator v0 and using bold pointed brackets we write M(A) = 〈v0〉. Weprove in Chapter 5 that the elements

v0, xv0, x2v0, . . . , x

t−1v0 form a basis of F t .

155

Now −xtv0 belongs to M(A) = F t and so there are unique scalars (elements of F )b0, b1, . . . , bt−1 such that −xtv0 = b0v0 +b1xv0 +· · ·+bt−1x

t−1v0 which rearrangesto give

d0(x)v0 = 0 where d0(x) = xt + bt−1xt−1 + · · · + b1x + b0.

The monic (leading coefficient 1) polynomial d0(x) is called the order of v0 in theF [x]-module M(A). In fact d0(x) is the monic polynomial of least degree over F

satisfying d0(x)v0 = 0. Whether or not M(A) is cyclic each of its elements v has anorder d(x), namely the monic generator of the

order ideal {f (x) ∈ F [x] : f (x)v = 0} of v.

Using the notation introduced above we write

〈v〉 = {f (x)v : f (x) ∈ F [x]} for the cyclic submodule of M(A) generated by v.

The reader will be familiar with the analogous concept in group theory: each elementof a group generates a cyclic subgroup. Here the vectors xiv for 0 ≤ i < degd(x) forman F -basis of 〈v〉. So dim 〈v〉 = degd(x) where v has order ideal 〈d(x)〉.

It is high time to look at some examples of cyclic modules M(A). Suppose that theabove basis v0, xv0, x

2v0, . . . , xt−1v0 of F t is the standard basis B0 of F t consisting

of the rows e1, e2, . . . , et of the t × t identity matrix I over F . Then v0 = e1, xv0 = e2,x2v0 = e3, . . . , xt−1v0 = et . Using consecutive pairs of these equations, on substitut-ing xi−1v0 = ei in x(xi−1v0) = xiv0 = ei+1, we obtain xei = ei+1 for 1 ≤ i < t . SoeiA = ei+1 which shows that row i of A is ei+1 for 1 ≤ i < t . The above equationscombine with d0(x)v0 = 0 to give

etA = xet = x(xt−1v0) = xtv0

= −(b0v0 + b1xv0 + b2x2v0 + · · · + bt−1x

t−1v0)

= −b0e1 − b1e2 − b2e3 − · · · − bt−1et−1

showing that the last row of A is etA = −(b0, b1, b2, . . . , bt−1). In this case

A is the companion matrix of the monic polynomiald0(x) = xt + bt−1x

t−1 + · · · + b1x + b0

and denoted by C(d0(x)). So

C(d0(x)) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 . . . 00 0 1 0 . . . 00 0 0 1 . . . 0...

......

.... . .

...

0 0 0 0 . . . 1−b0 −b1 −b2 −b3 . . . −bt−1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

156

For example

C(x3 + 4x2 + 7x + 6) =⎛

⎝0 1 00 0 1

−6 −7 −4

⎞

⎠ ,

C(x2 − 1) =(

0 11 0

), C(x − 6) = (6).

We will see that companion matrices have many remarkable properties and we nowmention one. The reader will know that the characteristic polynomial |xI − A| ofthe t × t matrix A is a monic polynomial of degree t and typically some calculationis required to determine its coefficients. However in the case of companion matricesthings could not be easier! The characteristic polynomial of C(d0(x)) is simply d0(x)

(see Theorem 5.26). Retracing the steps in the above discussion we see

M(C(d0(x))) is a cyclic F [x]-module with generator e1 of order d0(x).

Indeed F [x]/〈d0(x)〉 ∼= M(C(d0(x))) = 〈e1〉 is the standard example of such a cyclicmodule just as the additive group Zn = 〈1〉 is the standard example of a finite cyclicgroup of order n.

We now take a closer look at the module M = M(C(d0(x))) where d0(x) =x3 + 4x2 + 7x + 6 and F = Q is the rational field. As e2 = xe1 and e3 = x2e1 wesee v = (a0, a1, a2) ∈Q

3 can be expressed as

v = a0e1 + a1e2 + a2e3 = a0e1 + a1xe1 + a2x2e1 = (a2x

2 + a1x + a0)e1

showing that M is cyclic with generator e1. As d0(x) = (x + 2)(x2 + 2x + 3) isthe characteristic polynomial of C(d0(x)), we see d0(−2) = 0 showing that −2is a zero of this polynomial, that is, −2 is an eigenvalue of C(d0(x)). Let w1 =(x2 + 2x + 3)e1 = e3 + 2e2 + 3e1 = (3,2,1). As e1 has order d0(x) in M we seethat w1 is non-zero and satisfies (x + 2)w1 = 0, that is, C(d0(x))w1 = −2w1 showingthat w1 is a row eigenvector of C(d0(x)) with associated eigenvalue −2. More gener-ally, the row eigenvectors of any t × t matrix A are precisely the elements w havingorder x − λ in M(A), the associated eigenvalue being λ. Let w2 = (x + 2)e1 and sow2 = e2 + 2e1 = (2,1,0). Then w2 has order x2 + 2x + 3 in M (the reasoning isanalogous to: from an element g of order 6 in an additive abelian group we obtain ele-ments 2g of order 3 and 3g of order 2) and working in M we find xw2 = x(x +2)e1 =x2e1 + 2xe1 = e3 + 2e2 = (0,2,1). We write N1 = {f (x)w1 : f (x) ∈ Q[x]} = 〈w1〉and N2 = {f (x)w2 : f (x) ∈ Q[x]} = 〈w2〉 for the cyclic submodules of M generatedby w1 and w2 respectively. As x2 + 2x + 3 is irreducible over Q (it does not factoriseinto a product of polynomials of smaller degree over Q) we see that d0(x) has exactlyfour monic divisors over Q namely 1, x + 2, x2 + 2x + 3, x3 + 4x2 + 7x + 6. Theanalogue of Lemma 2.2 now tells us that M has exactly four submodules – no moreand no less! These submodules are {(0,0,0)}, N1, N2, M each being cyclic and the

157

order of a generator being one of the above monic divisors of d0(x). Further

M = N1 ⊕ N2

showing that M decomposes as the internal direct sum of its cyclic submodules N1and N2. What is the significance of this decomposition in terms of matrices?

Let A be an t × t matrix over F . We call α : F t → F t , defined by (v)α = vA forall v ∈ F t , the linear mapping determined by A. It turns out that the submodules N ofM(A) are precisely those subspaces N of F t which are α-invariant, that is,

(N)α ⊆ N.

Returning to our example, N1 = 〈w1〉 is a 1-dimensional α-invariant subspace of Q3

and N2 = 〈w2, xw2〉 = 〈w2〉 is a 2-dimensional α-invariant subspace of Q3 where α

is the linear mapping of Q3 determined by C(d0(x)). Further N1 and N2 are comple-mentary subspaces of Q3 and so w1,w2, xw2 is a basis B of Q3. Let

X =⎛

⎝w1

w2

xw2

⎞

⎠ =⎛

⎝3 2 12 1 00 2 1

⎞

⎠

be the invertible matrix over Q having the vectors in B as its rows. Let B denote thematrix of α relative to B. As (w1)α = −2w1, (w2)α = xw2 and (xw2)α = x2w2 =−3w2 − 2xw2 we obtain

B =⎛

⎜⎝

−2 0 0

0 0 10 −3 −2

⎞

⎟⎠ =

(C(x + 2) 0

0 C(x2 + 2x + 3)

)

.

The right-hand matrix above is the direct sum of C(x + 2) and C(x2 + 2x + 3) and wewrite B = C(x +2)⊕C(x2 +2x +3); we explain this concept below. Since C(d0(x))

is the matrix of α relative to the standard basis B0 of Q3, as we mentioned earlier, thematrices C(d0(x)) and B are similar. In fact XC(d0(x))X−1 = B (the reader shouldcheck this matrix equality by verifying detX �= 0 and XC(d0(x)) = BX; there is noneed to find the entries in X−1). The matrix C(x + 2) ⊕ C(x2 + 2x + 3), being thedirect sum of companion matrices of powers of irreducible polynomials over Q, is inprimary canonical form (pcf ).

The determination of a matrix in pcf, similar to a given square matrix A over F ,requires the factorisation of the characteristic polynomial of A into irreducible poly-nomials over F ; there is no known algorithm for obtaining this factorisation except ina handful of special cases and so matrices in pcf are of more theoretical than practicaluse. Luckily this difficulty can be avoided when tackling similarity problems: we usethe method of Section 3.1, where isomorphisms between finite abelian groups werethoroughly analysed without using prime factorisation.

158

Let A1 be an t1 × t1 matrix over F and let A2 be an t2 × t2 matrix over F . Weconstruct the (t1 + t2) × (t1 + t2) partitioned matrix over F

A1 ⊕ A2 =(

A1 0

0 A2

)

which is called the direct sum of A1 and A2. So A1 ⊕ A2 is a diagonal block matrixwith A1 and A2 in the diagonal positions and with rectangular blocks of zeros else-where. More generally given s square matrices A1,A2, . . . ,As over F , their directsum is

A1 ⊕ A2 ⊕ · · · ⊕ As =

⎛

⎜⎜⎜⎜⎜⎝

A1 0 . . . 0

0 A2...

.... . . 0

0 . . . 0 As

⎞

⎟⎟⎟⎟⎟⎠

that is, a partitioned matrix with A1,A2, . . . ,As on the diagonal and zeros elsewhere.We are now ready to meet the similarity theorem.

A matrix of the type C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)), where d1(x), d2(x),

. . . , ds(x) are monic polynomials of positive degree over F with dj (x)|dj+1(x) for1 ≤ j < s, is said to be in rational canonical form (rcf ). So a matrix in rcf is a directsum of companion matrices of monic polynomials which are successive divisors ofeach other. The similarity theorem is refreshingly brief:

Let A be a t × t matrix over a field F. Then A is similar toa unique matrix in rational canonical form.

So starting from a given t × t matrix A over F there is a t × t invertible matrix X

over F such that XAX−1 = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)) is in rcf. Thepolynomials d1(x), d2(x), . . . , ds(x) are called the invariant factors of A and the t × t

matrix C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)) is called the rational canonical formof A.

How can a suitable matrix X, and hence the invariant factors of A, be found? Itturns out, as we now outline, that the analogue of Theorem 3.4 will do the job! In-stead of decomposing the f.g. Z-module G into a direct sum of cyclic submodules, theF [x]-module M(A) is decomposed in the same way. First we need the F [x]-moduleF [x]t , the elements of which are t-tuples (f1(x), f2(x), . . . , ft (x)) of polynomialsover F . Denote row i of the t × t identity matrix over F [x] by ei(x) (1 ≤ i ≤ t). Thene1(x), e2(x), . . . , et (x) form the standard F [x]-basis of F [x]t , and so F [x]t is a freeF [x]-module of rank t . Notice

(f1(x), f2(x), . . . , ft (x)

) =t∑

i=1

fi(x)ei(x).

159

The evaluation homomorphism

θA : F [x]t → M(A) defined by

(t∑

i=1

fi(x)ei(x)

)

θA =t∑

i=1

eifi(A)

provides the connection between F [x]t and M(A). So θA maps each t-tuple (f1(x),

f2(x), . . . , ft (x)) of polynomials over F to the t-tuple of scalars in M(A) = F t ob-tained by adding together row 1 of f1(A), row 2 of f2(A), . . . , row t of ft (A). Westudy θA thoroughly in Chapter 6. For the moment notice (ei(x)) θA = ei (1 ≤ i ≤ t)

and so θA maps the standard F [x]-basis of F [x]t to the standard F -basis B0 of F t .Hence θA is surjective, that is, im θA = M(A). The reader’s antennae will know whatto expect next, namely a scrutiny of the kernel of θA. Now ker θA is a submodule of thefree F [x]-module F [x]t and so is free using the polynomial analogue of Theorem 3.1.However it should come as something of a surprise that

the t rows of the characteristic matrix xI − A form an F [x]-basis of ker θA.

We are given an F [x]-basis of ker θA on a plate! The entries in xI −A are polynomials(admittedly of degree at most1) over F and so xI − A is a matrix over F [x]. Just aseach matrix over Z can be reduced to its Smith normal form, the same is true, byanalogy, of each matrix over F [x]. In particular there are invertible t × t matricesP(x) and Q(x) over F [x] such that

P(x)(xI − A)Q(x)−1 = diag(1,1, . . . ,1, d1(x), d2(x), . . . , ds(x)

) = S(xI − A)

where the t − s entries 1 on the diagonal of the Smith normal form S(xI − A) ofxI − A are followed by the s (monic and non-constant) invariant factors dj (x) of A.The determinants |P(x)| and |Q(x)| are non-zero scalars (polynomials of zero degreeover F ) and so taking determinants of the above matrix equation gives

|xI − A| = d1(x)d2(x) · · ·ds(x)

as |P(x)||Q(x)|−1 = 1 on comparing coefficients of xt , both sides of the above equa-tion being monic polynomials of degree t over F . We have shown that the charac-teristic polynomial of A is the product of the invariant factors of A. This interestingconnection is, however, eclipsed by a more important fact: the invertible t × t matricesP(x) and Q(x) over F [x] can be found algorithmically by modifying the method ofChapter 1. In other words, P(x) and Q(x) arise from the row and column operationsused in reducing xI − A to S(xI − A). The equation

P(x)(xI − A) = S(xI − A)Q(x)

‘says it all’ although it will take us some time to appreciate what it is saying: therows of the lhs form an F [x]-basis of ker θA which, being equal to the rhs, consist ofmonic polynomial multiples (the diagonal entries in S(xI − A)) of the elements ofan F [x]-basis of F [x]t (the rows of Q(x)). We have met the analogous situation in

160

Theorem 3.4 where Z-bases of ker θ and Zt , related in the same way, led directly to

the invariant factor decomposition of Zt /ker θ .It is a relatively small step to derive an invertible matrix X over F with XAX−1

in rational canonical form. Let ρi(x) denote row i of Q(x) for 1 ≤ i ≤ t . Write vj =(ρt−s+j (x))θA and let Nj = 〈vj 〉 be the cyclic submodule of M(A) generated by vj

for 1 ≤ j ≤ s. Then

M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Ns

which expresses M(A) as a direct sum of non-zero cyclic submodules Nj . We will seethe full proof in Chapter 6. Also (ρi(x))θA = 0 for 1 ≤ i ≤ t −s and vj has order dj (x)

in M(A) for 1 ≤ j ≤ s. The degdj (x) vectors xivj for 0 ≤ i < degdj (x) form a basisBvj

of the α-invariant subspace Nj of F t(1 ≤ j ≤ s). As t = degd1(x) + degd2(x) +· · · + degds(x) we see that the vectors in the ordered set B = Bv1 ∪ Bv2 ∪ · · · ∪ Bvs

form a basis of F t . The matrix X, having the vectors of B as its rows, is invertible overF and satisfies

XAX−1 = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x))

which is the rational canonical form of A.We work through the particular case

A =⎛

⎝3 −1 14 −2 43 −3 5

⎞

⎠

over Q. The first job is to reduce xI −A to its Smith normal form S(xI −A), noting theecos and eros used in the reduction. We mimic the reduction method in Theorem 1.11:

xI − A =⎛

⎝x − 3 1 −1−4 x + 2 −4−3 3 x − 5

⎞

⎠

≡c1−(x−3)c2

c1↔c2

⎛

⎝1 0 −1

x + 2 −x2 + x + 2 −43 −3x + 6 x − 5

⎞

⎠

≡c3+c1

⎛

⎝1 0 0

x + 2 −x2 + x + 2 x − 23 −3x + 6 x − 2

⎞

⎠

≡r2−(x+2)r1

r3−3r1

⎛

⎝1 0 00 −x2 + x + 2 x − 20 −3x + 6 x − 2

⎞

⎠

161

≡c2+(x+1)c3

c2↔c3

⎛

⎝1 0 00 x − 2 00 x − 2 (x − 2)2

⎞

⎠

≡r3−r2

⎛

⎝1 0 00 x − 2 00 0 (x − 2)2

⎞

⎠ = S(xI − A).

From S(xI − A) we immediately see d1(x) = x − 2, d2(x) = (x − 2)2 = x2 − 4x + 4are the invariant factors of A and

C(x − 2) ⊕ C((x − 2)2) =⎛

⎜⎝

2 0 0

0 0 10 −4 4

⎞

⎟⎠

is the rcf of A. The sequence of ecos used in the above reduction is: c1 − (x − 3)c2,c1 ↔ c2, c3 + c1, c2 + (x + 1)c3, c2 ↔ c3. Mimicking the theory of Chapter 1, theinvertible matrix Q(x) over Q[x] is found by applying the conjugate sequence, namelyr2 + (x − 3)r1, r1 ↔ r2, r1 − r3, r3 − (x + 1)r2, r2 ↔ r3 to the 3 × 3 identity matrix I

over Q[x]:

I =⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r2+(x−3)r1

r1↔r2

⎛

⎝x − 3 1 0

1 0 00 0 1

⎞

⎠

≡r1−r3

r3−(x+1)r2

⎛

⎝x − 3 1 −1

1 0 0−(x + 1) 0 1

⎞

⎠

≡r2↔r3

⎛

⎝x − 3 1 −1

−(x + 1) 0 11 0 0

⎞

⎠ = Q(x).

The invertible matrix P(x) over Q[x] is found (although we don’t really need it) byapplying the sequence of eros used in the above reduction, namely r2 − (x + 2)r1,r3 − 3r1, r3 − r2 to I as above:

I =⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r2−(x+2)r1

r3−3r1

⎛

⎝1 0 0

−(x + 2) 1 0−3 0 1

⎞

⎠

≡r3−r2

⎛

⎝1 0 0

−(x + 2) 1 0x − 1 −1 1

⎞

⎠ = P(x).

Note that a square matrix over F [x] is invertible over F [x] if and only if its deter-minant is a non-zero constant polynomial, that is, an invertible element of F [x]. The

162

reader can now verify P(x)(xI − A)Q(x)−1 = S(xI − A) by checking detQ(x) = 1(so Q(x) is invertible over Q[x]) and P(x)(xI − A) = S(xI − A)Q(x).

Finally we use the rows of Q(x) and θA :Q[x]3 → Q3 to construct a matrix X with

the property we are looking for. The theory says that the (i, i)-entry in S(xI − A) isthe order of (ρi(x))θA in M(A) for 1 ≤ i ≤ 3 where ρi(x) is row i of Q(x), but we’llreassure ourselves of this fact as we go along. First

(ρ1(x))θA = (x − 3,1,−1)θA = ((x − 3)e1(x) + e2(x) − e3(x))θA

= e1(A − 3I ) + e2 − e3 = e1A − 3e1 + e2 − e3

= (3,−1,1) − (3,0,0) + (0,1,0) − (0,0,1)

= (0,0,0)

showing (ρ1(x))θA = 0. So the (1,1)-entry in S(xI −A) is the order of the zero vector(ρ1(x))θA in M(A) since both are 1. Secondly

(ρ2(x))θA = (−(x + 1),0,1)θA = (−(x + 1)e1(x) + e3(x))θA

= −e1(A + I ) + e3 = −e1A − e1 + e3

= −(3,−1,1) − (1,0,0) + (0,0,1) = (−4,1,0) = v1 say.

As v1 �= 0 and

d1(x)v1 = (x − 2)v1 = v1(A − 2I ) = (−4,1,0)

⎛

⎝1 −1 14 −4 43 −3 3

⎞

⎠ = (0,0,0)

we see that v1 has order d1(x) = x − 2 in M(A). So v1 is a row eigenvector of A withassociated eigenvalue 2 and N1 = 〈v1〉 = 〈v1〉.

Lastly (ρ3(x))θA = ((e1(x))θA = e1 = (1,0,0) = v2 say. Then xv2 = v2A =(3,−1,1) and so v2, xv2 are linearly independent vectors of Q3. However

x2v2 = x(xv2) = x(3,−1,1) = (3,−1,1)A = (8,−4,4)

= 4(3,−1,1) − 4(1,0,0) = 4xv2 − 4v2

and so (x2 − 4x + 4)v2 = 0 showing that d2(x) = (x − 2)2 is the order of v2 in M(A).So N2 = 〈v2〉 = 〈v2, xv2〉. In this case B = Bv1 ∪ Bv2 is the ordered set v1; v2, xv2 ofvectors. As M(A) = N1 ⊕ N2, it follows that B is a basis of Q3 and

X =⎛

⎜⎝

v1

v2

xv2

⎞

⎟⎠ =

⎛

⎜⎝

−4 1 0

1 0 03 −1 1

⎞

⎟⎠

is invertible over Q. Then detX �= 0 and XA = (C(x − 2)⊕C((x − 2)2))X, which onpostmultiplying by X−1 gives XAX−1 = C(x − 2) ⊕ C((x − 2)2), the rcf of A. We

163

have achieved our aim! Starting from A we have found an invertible matrix X over Qsuch that XAX−1 is in rational canonical form.

The matrix X can be modified to give other canonical forms of A. In this example

X1 =⎛

⎜⎝

v1

v2

(x − 2)v2

⎞

⎟⎠ =

⎛

⎜⎝

−4 1 0

1 0 01 −1 1

⎞

⎟⎠

is invertible over Q and

X1AX−11 =

⎛

⎜⎝

2 0 0

0 2 10 0 2

⎞

⎟⎠

which is the Jordan normal form (Jnf ) of A being first described in 1870 by the Frenchmathematician Camille Jordan. Notice that v1, (x−2)v2, the first and third rows of X1,are eigenvectors of A and that the Jnf is nearly diagonal. Incidentally it’s worth notingthat M(A) has a large number of submodules and among these N ′ = 〈(x − 2)v2〉and N ′′ = 〈v1, (x − 2)v2〉 are significant: N ′′ is the eigenspace of A with associatedeigenvector 2 and all subspaces of N ′′ are submodules of M(A). It is also true that allsubspaces of Q3 which contain N ′ are submodules of M(A). The particular reductionprocess of xI − A to its Smith normal form S(xI − A) selects a suitable pair ofsubmodules N1, N2 from these. The conclusion is that there are, in this case, an infinityof matrices X and X1 as above.

The last invariant factor ds(x) of a t × t matrix A over F has special significancebeing the analogue of the exponent (Definition 3.11) of a finite abelian group. In factds(x) is the minimum polynomial of A, being the monic polynomial of least degreeover F satisfying ds(A) = 0 the zero t × t matrix over F . The reader should verify(A − 2I )2 = 0 and A − 2I �= 0 in the above example, showing that A has minimumpolynomial (x − 2)2.

We will see that, of all the canonical forms under similarity, the rational canonicalform is in many ways the best: it exists without any conditions on either A or F , itcan be found by a systematic application of the polynomial division algorithm, thestructure of the linear mapping α determined by A is laid bare, and two t × t matricesover F are similar if and only if their rcfs are identical.

EXERCISES

1. Use the sequence

c1 + (x − 3)c2, c1 ↔ c2, −c1, c3 + c1, r2 + (x + 3)r1,

r3 − 2r1, c2 + (x + 1)c3, c2 − c3, −c2, r3 + r2

164

of row and column operations over Q[x] to reduce xI − A to its Smithnormal form D(x), where

A =⎛

⎝3 1 1

−8 −3 −44 2 3

⎞

⎠

over Q. Write down the rational canonical form C of A. Find invert-ible 3 × 3 matrices P(x) and Q(x)over Q[x] satisfying P(x)(xI − A) =D(x)Q(x). Verify that the (i, i)-entry in D(x) is the order of (ρi(x))θA

in M(A), where ρi(x) is row i of Q(x) for i = 1,2,3. Hence constructan invertible 3 × 3 matrix X over Q satisfying XAX−1 = C. By modi-fying row 3 of X, find an invertible 3 × 3 matrix X1 over Q satisfyingX1AX−1

1 = J in Jordan normal form. Which rows of X1 are eigenvectorsof A?

2. Use row and column operations over Q[x] to reduce xI − A to its Smithnormal form D(x), where

A =⎛

⎝2 −2 12 −3 21 −2 2

⎞

⎠

over Q.Hint: Start: r1 ↔ r3, −c1, c2 − 2c1, c3 − (x − 2)c1.Write down the rational canonical form C of A. Find invertible 3 × 3 ma-trices P(x) and Q(x) over Q[x] satisfying P(x)(xI − A) = D(x)Q(x).Verify that the (i, i)-entry in D(x) is the order of (ρi(x))θA in M(A),where ρi(x) is row i of Q(x) for i = 1,2,3. Hence construct an invertible3 × 3 matrix X over Q satisfying XAX−1 = C. Modify rows 2 and 3 ofX to obtain an invertible 3 × 3 matrix X1 over Q such that X1AX−1

1 isdiagonal.

3. Use row and column operations over Q[x] to reduce xI − A to its Smithnormal form D(x) = diag(1,1, (x − 1)3), where

A =⎛

⎝3 2 −2

−2 0 31 1 0

⎞

⎠

over Q. Write down the rational canonical form C of A. Find invert-ible 3 × 3 matrices P(x) and Q(x)over Q[x] satisfying P(x)(xI − A) =D(x)Q(x). Verify that the (i, i)-entry in D(x) is the order of (ρi(x))θA

in M(A), where ρi(x) is row i of Q(x) for i = 1,2,3. Hence construct aninvertible 3 × 3 matrix X over Q satisfying XAX−1 = C. Is the moduleM(A) cyclic? If so specify a generator of each of the submodules of M(A).Which (if any) of these generators are eigenvectors of A?

4The Polynomial Ring F [x] and Matrices

over F [x]

The s × t matrices A(x) and B(x) over the polynomial ring F [x] are called equivalentDefinition 4.11 if there is an invertible s × s matrix P(x) over F [x] and an invertiblet × t matrix Q(x) over F [x] satisfying

P(x)A(x)Q(x)−1 = B(x).

In Chapter 1 the analogous concept of equivalence of matrices over Z was discussedand led to the classification (by isomorphism) of f.g. abelian groups in Section 3.1. Wewill see that the theory of equivalence of matrices over F [x], developed in Section 4.2,leads to the classification (by similarity) of t × t matrices over the field F which iscarried to its conclusion in Section 6.1. The reader is assured that there is a pay-offto come! In Section 4.1 the ring F [x] of polynomials in x with coefficients in F isstudied in detail, this being a necessary preliminary.

The properties of the ring Z of integers used in the theory of the Smith normal formare shared by many other rings and in particular by F [x]. This boils down to one fact(which the reader already knows): it is always possible to divide one polynomial overF by a non-zero polynomial over F obtaining quotient and remainder polynomialsover F . The polynomial division property Theorem 4.1 leads to a constructive wayof determining the Smith normal form of every matrix over F [x] which amounts to ageneralisation of the Euclidean algorithm. As a consequence it is possible to determinein a finite number of steps, each involving the division of one polynomial by another,whether any two matrices over F [x] are equivalent or not: either their Smith normalforms are equal or different!


165

http://dx.doi.org/10.1007/978-1-4471-2730-7_4

166 4. The Polynomial Ring F [x] and Matrices over F [x]

4.1 The Polynomial Ring F [x] where F is a Field

We begin with a resumé of polynomial rings. Let F be a given field. For instanceF might be Z2, Z3, Z5, . . . or one of Q, R, C (the rational, real, complex fields)and Theorem 4.9 will give us more examples of fields. A polynomial over F in theindeterminate x is a sum f (x) = ∑

i≥0 aixi where all the coefficients ai belong to

F and only a finite number of the ai are non-zero. Two such polynomials f (x) =∑i≥0 aix

i and g(x) = ∑i≥0 bix

i are decreed to be equal if and only if correspondingcoefficients are equal, that is,

∑i≥0 aix

i = ∑i≥0 bix

i ⇔ ai = bi for all i ≥ 0. The setof all such polynomials is denoted by F [x]. The symbol x satisfies ax = xa for alla ∈ F and so elements of F [x] can be added and multiplied in the usual way producingfurther elements of F [x], that is,

f (x)+g(x) =∑

i≥0

(ai +bi)xi and f (x)g(x) =

∑

i≥0

(a0bi +a1bi−1 +· · ·+aib0)xi .

So for f (x) = a0 + a1x and g(x) = b0 + b1x + b2x2 we obtain

f (x) + g(x) = a0 + b0 + (a1 + b1)x + b2x2,

f (x)g(x) = a0b0 + (a0b1 + a1b0)x + (a0b2 + a1b1)x2 + a1b2x

3.

In fact F [x] is an integral domain (a non-trivial commutative ring with an identity el-ement and without divisors of zero) (see Exercises 4.1, Question 2(c)). The 0-elementof F [x] is the zero polynomial 0(x) over F , that is, all the coefficients in 0(x) are zero.Therefore a non-zero polynomial has at least one non-zero coefficient.

The degree degf (x) of the non-zero polynomial f (x) = ∑i≥0 aix

i is the largestnon-negative integer i with ai �= 0. For example f (x) = 2x4 + 3x − 1/5 over Q

has degree 4. So degf (x) is the highest power of x appearing in the non-zeropolynomial f (x). Let f (x) and g(x) be non-zero polynomials over F and writem = degf (x), n = degg(x). Then

f (x)g(x) = (amxm + am−1xm−1 + · · · + a0)(bnx

n + bn−1xn−1 + · · · + b0)

= ambnxm+n + (am−1bn + ambn−1)x

m+n−1 + · · · + a0b0

showing degf (x)g(x) = m + n and establishing the degree formula

degf (x)g(x) = degf (x) + degg(x)

as ambn �= 0. Can the zero polynomial 0(x) be assigned a degree so that the aboveformula holds for all polynomials (non-zero and zero) over F ? The degrees of thepolynomials x2 + x + 1, x + 1, 1 are 2, 1, 0 respectively and so it is reasonable toexpect deg 0(x) to be negative. For the above degree formula to hold with g(x) = 0(x)

we need deg 0(x) = degf (x) + deg 0(x) as f (x)0(x) = 0(x). It is customary to

4.1 The Polynomial Ring F [x] where F is a Field 167

define deg 0(x) = −∞; then every polynomial has a degree and the above degreeformula holds for all polynomials over F since m + (−∞) = −∞ for m ∈ Z and(−∞) + (−∞) = −∞. The reader shouldn’t be put off by this strange convention –it is adopted solely to make the theory a bit easier to express.

Let f (x) = ∑i≥0 aix

i be a polynomial over the field F and let a be an elementof F . The scalar f (a) = ∑

i≥0 aiai is called the evaluation of f (x) at a. Should

f (a) = 0 we say that f (x) has zero a. Notice that f (0) = a0 is the constant term inf (x). The reader can check that the evaluation mapping

εa : F [x] → F, given by (f (x))εa = f (a) for all f (x) ∈ F [x],

is a surjective ring homomorphism (see Exercises 4.1, Question 2(a)).The polynomial f (x) = ∑

i≥0 aixi is called constant if ai = 0 for all i > 0. So

f (x) is a constant polynomial if and only if degf (x) ≤ 0. We identify each constantpolynomial f (x) with its constant term f (0) = f (a) obtaining F ⊆ F [x], that is,the integral domain F [x] contains F as the subring of constant polynomials. Theconstant polynomial 1(x) = 1, the 1-element of F , is also the 1-element of F [x].Using the degree formula we see that the invertible elements of F [x] are precisely thepolynomials of degree 0, that is, U(F [x]) = F ∗.

The polynomials f (x) and g(x) over F are called associate if there is a ∈ F ∗ withf (x) = ag(x) in which case we write f (x) ≡ g(x). It is straightforward to verify that≡ is an equivalence relation on F [x] (Exercises 3.1, Question 7(a)(i)).

The non-zero polynomial f (x) = ∑mi=0 aix

i of degree m has leading term amxm

and leading coefficient am; should am = 1 then f (x) is said to be monic and we writef (x) = xm + am−1x

m−1 + · · · + a1x + a0. The product of monic polynomials overF is itself monic. The reader has already been alerted to the analogy between F [x]and Z; monic polynomials are analogous to positive integers. Just as each non-zerointeger m is associate to a unique positive integer namely |m| = ±m, so each non-zero polynomial f (x) over F has a unique monic associate namely f (x) = a−1

m f (x)

where f (x) has leading coefficient am.Next the familiar process of ‘long’ division of one polynomial f (x) by an-

other g(x), obtaining quotient q(x) and remainder r(x), is detailed.

Theorem 4.1 (The division property for polynomials over a field)

Let f (x) and g(x) be polynomials over a field F with g(x) �= 0(x). There are uniquepolynomials q(x) and r(x) over F satisfying

f (x) = q(x)g(x) + r(x) where deg r(x) < degg(x).


Proof

Write m = degf (x), n = degg(x) and suppose f (x) = ∑i≥0 aix

i , g(x) = ∑i≥0 bix

i .So bn �= 0. We first show that there are polynomials q(x) and r(x) as stated. For m < n

the polynomials q(x) = 0(x) and r(x) = f (x) satisfy the above conditions. We useinduction on m − n. The initial case m − n < 0 is covered above and so we assumem ≥ n. The polynomial f1(x) = f (x) − (am/bn)x

m−ng(x) has no xm term and sodegf1(x) < m. By the inductive hypothesis, applied to f1(x) and g(x), there areq1(x), r1(x) ∈ F [x] with f1(x) = q1(x)g(x) + r1(x) where deg r1(x) < n. Substi-tuting for f1(x) and rearranging gives f (x) = ((am/bn)x

m−n + q1(x))g(x) + r1(x)

showing that q(x) = (am/bn)xm−n + q1(x) and r(x) = r1(x) satisfy Theorem 4.1.

The induction is now complete.To show that q(x) and r(x) are unique, suppose that q ′(x), r ′(x) ∈ F [x] also sat-

isfy f (x) = q ′(x)g(x) + r ′(x) where deg r ′(x) < n. As deg(r(x) − r ′(x)) < n thedegree formula gives deg(q(x) − q ′(x)) + n < n and so deg(q(x) − q ′(x)) < 0. Thezero polynomial is the only polynomial of negative degree and so q(x)−q ′(x) = 0(x).Therefore q(x) = q ′(x) and hence r(x) = r ′(x) also. �

As an illustration of Theorem 4.1 take F = Q, f (x) = 3x4 − 2x3 + 5, g(x) =x3 + 2x2. As in Theorem 4.1 we construct first f (x) − 3xg(x) = −8x3 + 5 of de-gree 3 < degf (x) = 4 on comparing leading terms in f (x) and g(x). Then constructf (x) − 3xg(x) + 8g(x) = 16x2 + 5 of degree 2< deg(f (x) − 3xg(x)) = 3 on com-paring leading terms in f (x) − 3xg(x) and g(x). As deg(f (x) − 3xg(x) + 8g(x)) =2 < 3 = degg(x) the process terminates with f (x) = (3x − 8)g(x) + 16x2 + 5, thatis, q(x) = 3x − 8, r(x) = 16x2 + 5.

For r(x) = 0(x) in Theorem 4.1 we obtain f (x) = q(x)g(x), that is, g(x) is afactor or divisor of f (x) and we write g(x)|f (x).

Corollary 4.2

Let f (x) be a polynomial over the field F and let c ∈ F .(i) The remainder on dividing f (x) by x − c is f (c). Also x − c is a divisor of f (x)

if and only if f (c) = 0.(ii) Suppose f (x) �= 0(x). There are at most degf (x) elements c in F with f (c) = 0,

that is, the non-zero polynomial f (x) has at most degf (x) zeros in F .

Proof

(i) Taking g(x) = x − c in Theorem 4.1, there are polynomials q(x), r(x) over F

satisfying f (x) = q(x)(x − c) + r(x) where deg r(x) < deg(x − c) = 1. So r(x) is a


constant polynomial, that is, r(x) = r(c). Using the ring homomorphism εc we obtainf (c) = (f (x))εc = (q(x)(x − c) + r(x))εc = q(c)(c − c) + r(c) = q(c) × 0 + r(c) =r(c) and so f (x) = q(x)(x − c) + f (c).

Suppose that x − c is a divisor of f (x). There is q ′(x) ∈ F [x] with f (x) =q ′(x)(x − c). On comparing this equation with f (x) = q(x)(x − c) + f (c) we ob-tain q(x) = q ′(x) and f (c) = 0 using the uniqueness in Theorem 4.1. Converselysuppose f (c) = 0. Then f (x) = q(x)(x − c)+f (c) = q(x)(x − c) using the first partof the proof. So x − c is a divisor of f (x).

(ii) We use induction on degf (x) = n ≥ 1. Suppose n = 1. Then f (x) = a1x + a0

where a1 �= 0. Then f (c1) = 0 ⇔ c1 = −a0/a1 showing that f (x) has a unique zeroc1 in F . Now suppose n > 1 and every polynomial of degree n− 1 over F has at mostn − 1 zeros in F . Consider f (x) of degree n over F . Could there be n + 1 distinctelements ci in F where 1 ≤ i ≤ n + 1 satisfying f (ci) = 0? By Corollary 4.2(i) thereis q(x) ∈ F [x] with f (x) = q(x)(x − c1) and the degree formula gives degq(x) =n − 1. Using the evaluation homomorphisms εci

for 2 ≤ i ≤ n + 1 gives 0 = f (ci) =(f (x))εci

= (q(x)(x − c1))εci= q(ci)(ci − c1). So q(ci) = 0 as ci − c1 �= 0 and F

has no divisors of zero. We have shown that q(x) of degree n − 1 over F has n zerosin F namely c2, c3, . . . , cn+1, contrary to the inductive hypothesis. The conclusion isthat f (x) has at most n zeros in F , completing the induction. �

The reader will remember that Corollary 4.2(ii) was used in the proof of Corol-lary 3.17. It is worth convincing oneself that the proof of Corollary 4.2(ii) does notrely on Corollary 3.17.

Definition 4.3

Let K be a subgroup of the additive group of F [x], that is, K is closed under addition,K contains the zero polynomial 0(x) and K is closed under negation. Suppose also

q(x)k(x) ∈ K for all q(x) ∈ F [x], k(x) ∈ K,

that is, K is closed under polynomial multiplication. Then K is called an ideal of thering F [x].

The reader should compare Definition 4.3 with the concept of an ideal of Z intro-duced at the beginning of Section 1.3. Ideals occur as kernels of ring homomorphisms(Exercises 2.3, Question 3(b)). In particular the kernel of the evaluation homomor-phism εa : F [x] → F(a ∈ F) is, by Corollary 4.2(i), the ideal 〈x − a〉 consisting of allpolynomials over F having divisor x − a. More generally an ideal K of F [x] is calledprincipal if there is a polynomial d(x) ∈ K such that K = {q(x)d(x) : q(x) ∈ F [x]},that is, the elements of K are precisely the polynomials over F having divisor d(x).


In this case we write K = 〈d(x)〉 and call d(x) a generator of K . Converselylet d(x) be any polynomial over F ; it is straightforward to verify that the subsetK = {q(x)d(x) : q(x) ∈ F [x]} of F [x] satisfies Definition 4.3 and so is an ideal ofF [x], that is, d(x) generates the principal ideal K = 〈d(x)〉 of F [x]. Notice

〈d1(x)〉 ⊆ 〈d2(x)〉 ⇔ d2(x)|d1(x) for d1(x), d2(x) ∈ F [x].For example 〈x2〉 ⊆ 〈x〉 ⊆ 〈1〉 = F [x] as 1|x and x|x2.

We next establish the polynomial analogue of Theorem 1.15.

Theorem 4.4

Let F be a field. Every ideal K of F [x] is principal. Every non-zero ideal of F [x] hasa unique monic generator.

Proof

Let K be an ideal of F [x]. The zero ideal {0(x)} = 〈0(x)〉 of F [x] is princi-pal with generator the zero polynomial 0(x). Suppose K to be non-zero, that is,K contains a non-zero polynomial. Consider a non-zero polynomial d(x) in K withm = degd(x) as small as possible. So 0(x) is the only polynomial in K of degreeless than m. Let a be the leading coefficient of d(x). Then d(x) = (1/a)d(x) isin K by Definition 4.3 with q(x) = 1/a, k(x) = d(x). We show that d(x) gener-ates K . As d(x) ∈ K , by Definition 4.3 we see q(x)d(x) ∈ K for all q(x) ∈ F [x],that is, 〈d(x)〉 ⊆ K . To show K ⊆ 〈d(x)〉 start with f (x) ∈ K and use Theorem 4.1with g(x) = d(x): there are q(x), r(x) ∈ F [x] with f (x) = q(x)d(x) + r(x) wheredeg r(x) < degd(x) = m. As f (x) and d(x) both belong to K , from Definition 4.3 wededuce that f (x) − q(x)d(x) = r(x) also belongs to K . So r(x) = 0(x) as r(x) ∈ K

and deg r(x) < m. Therefore f (x) = q(x)d(x) ∈ 〈d(x)〉 showing K ⊆ 〈d(x)〉. So infact 〈d(x)〉 = K , that is, K is the principal ideal with monic generator d(x).

Let K be a non-zero ideal of F [x]. Suppose K = 〈d1(x)〉 = 〈d2(x)〉 where d1(x)

and d2(x) are monic polynomials over F . Then d2(x) ∈ 〈d1(x)〉 and so there is q1(x) ∈F [x] with d2(x) = q1(x)d1(x). In the same way we see that there is q2(x) ∈ F [x] withd1(x) = q2(x)d2(x). Hence d2(x) = q1(x)q2(x)d2(x). The reader will know that can-cellation is legitimate in an integral domain; cancelling d2(x) gives 1(x) = q1(x)q2(x)

and so q1(x) and q2(x) are non-zero constant polynomials. Let q1(x) = a1 ∈ F ∗.Comparing leading coefficients in d2(x) = q1(x)d1(x) gives 1 = a1 × 1. We concludea1 = 1 and q1(x) = 1(x) which gives d2(x) = 1(x)d1(x) = d1(x). So K has a uniquemonic generator. �

We now know from Theorem 4.4 that, for all fields F , the polynomial ring F [x] isa principal ideal domain (PID). Our next task is to introduce the concept of the greatest


common divisor (gcd) of a finite set of polynomials over F . This should not give thereader a headache: the theory is almost the same for polynomials as for integers. Alsopolynomial gcds can be found using the Euclidean algorithm.

Definition 4.5

Let X = {f1(x), f2(x), . . . , ft (x)} be a set of polynomials over the field F where t

is a positive integer. A polynomial d(x) over F is called a greatest common divisor(gcd) of X if(i) d(x)|fi(x) for 1 ≤ i ≤ t ,

(ii) each d ′(x) ∈ F [x] with d ′(x)|fi(x) for 1 ≤ i ≤ t satisfies d ′(x)|d(x).Could a given set X as in Definition 4.5 have two monic gcds d1(x) and d2(x)? If sothen d1(x)|d2(x) and d2(x)|d1(x) by Definition 4.5. Arguing as in the final paragraphof the proof of Theorem 4.4 we see d2(x) = d1(x). The conclusion is: the set X hasat most one monic gcd. The only gcd of X = {0(x)} is 0(x). But should X contain anon-zero polynomial we show next, by mimicking Corollary 1.16, that X does have amonic gcd.

Corollary 4.6

Let F be a field and let X = {f1(x), f2(x), . . . , ft (x)} ⊆ F [x] where fi(x) �= 0(x) forat least one i with 1 ≤ i ≤ t . Then X has a unique monic gcd d(x). Also there arepolynomials a1(x), a2(x), . . . , at (x) over F satisfying

d(x) = a1(x)f1(x) + a2(x)f2(x) + · · · + at (x)ft (x).

Proof

We outline the main steps and leave the reader to fill in the gaps. Note first

K(X) = {b1(x)f1(x)+b2(x)f2(x)+· · ·+bt (x)ft (x) : bi(x) ∈ F [x],1 ≤ i ≤ t} (♣)

is an ideal of F [x]; it is usual to write K(X) = 〈f1(x), f2(x), . . . , ft (x)〉 and callK(X) the ideal generated by f1(x), f2(x), . . . , ft (x). Secondly for each i with 1 ≤i ≤ t take bi(x) = 1(x), bj (x) = 0(x) where j �= i, 1 ≤ j ≤ t to show fi(x) ∈ K(X).By hypothesis K(X) is a non-zero ideal of F [x] and so K(X) has a unique monicgenerator d(x) by Theorem 4.4. As K(X) = 〈d(x)〉 we see d(x)|fi(x) for 1 ≤ i ≤ t .As d(x) ∈ K(X) there are ai(x) ∈ F [x] for 1 ≤ i ≤ t with d(x) = a1(x)f1(x) +a2(x)f2(x) + · · · + at (x)ft (x). Lastly suppose d ′(x) ∈ F [x] satisfies d ′(x)|fi(x) for1 ≤ i ≤ t . There are qi(x) ∈ F [x] with qi(x)d ′(x) = fi(x) for 1 ≤ i ≤ t . Henced(x) = q(x)d ′(x) where q(x) = a1(x)q1(x) + a2(x)q2(x) + · · · + at (x)qt (x) ∈ F [x]


showing d ′(x)|d(x). Therefore d(x) satisfies Definition 4.5 and so is the monic gcdof X. �

We use the notation d(x) = gcdX = gcd{f1(x), f2(x), . . . , ft (x)} for X as in Def-inition 4.5. Also write gcd{0(x)} = 0(x) which is consistent with Definition 4.5. Using(♣) we obtain

K(X) ⊆ K(Y) ⇔ gcdY |gcdX

where X and Y are finite subsets of F [x] and K(∅) = 〈0(x)〉 is the zero ideal of F [x].As gcd{f1(x), f2(x), . . . , ft (x)} = gcd{f1(x),gcd{f2(x), . . . , ft (x)}} for t ≥ 2

(Exercises 4.1, Question 2(b)), the calculation of gcdX reduces to the case t = 2. Sup-pose therefore X = {f1(x), f2(x)}. The Euclidean algorithm is an efficient methodof finding a gcd of X in the case of f1(x) not being a gcd of X, f1(x) �= 0(x).We remind the reader of the technique: write r1(x) = f1(x), r2(x) = f2(x). Thenr2(x) �= 0(x). Start by dividing r1(x) by r2(x) obtaining r1(x) = q2(x)r2(x) + r3(x)

with deg r2(x) > deg r3(x) as in Theorem 4.1. Should r3(x) = 0(x) then r2(x)|r1(x)

and r2(x) is a gcd of X = {r1(x), r2(x)}; as above we see that r2(x) and gcdX

are associate, that is, r2(x) = gcdX where r2(x) is r2(x) made monic (by divid-ing r2(x) by its leading coefficient). For r3(x) �= 0(x) we see gcd{r2(x), r3(x)} =gcdX by Exercises 4.1, Question 2(b), and so it makes sense to repeat the processwith r2(x), r3(x) in place of r1(x), r2(x). We suppose that i non-zero polynomialsr1(x), r2(x), . . . , ri(x) have been found with

gcd{r1(x), r2(x)} = gcd{r2(x), r3(x)} = · · · = gcd{ri−1(x), ri(x)}

where deg r2(x) > deg r3(x) > · · · > deg ri(x) and i ≥ 3. These degrees decrease byat least 1 at each step, that is, 0 ≤ deg ri(x) ≤ deg r2(x) − (i − 2). So ri(x) �= 0(x)

gives 2 ≤ i ≤ deg r2(x) + 2. Dividing ri−1(x) by ri(x) as in Theorem 4.1 pro-duces polynomials qi(x) and ri+1(x) over F with ri−1(x) = qi(x)ri(x) + ri+1(x)

where deg ri(x) > deg ri+1(x). Using Exercises 4.1, Question 2(c) again we obtaingcd{ri−1(x), ri(x)} = gcd{ri(x), ri+1(x)}. As i is bounded above the sequence ofdivisions must terminate in a zero remainder: there is an integer k with 2 ≤ k ≤deg r2(x) + 2 such that rk+1(x) = 0(x), rk(x) �= 0(x). We conclude that rk(x) is agcd of {rk−1(x), rk(x)} and

rk(x) = gcd{r1(x), r2(x)}

where rk(x) is the monic associate of rk(x).The algorithm can be used to find a particular pair of polynomials a1(x), a2(x)

over F satisfying d(x) = a1(x)f1(x) + a2(x)f2(x) as in Corollary 4.6 with t = 2.The connection between consecutive pairs of remainder polynomials is expressed by


the matrix equation

(ri−1(x)

ri(x)

)= Ti

(ri(x)

ri+1(x)

)for 2 ≤ i ≤ k

where Ti = (qi (x) 1

1 0

)is invertible over F [x] as detTi = −1. The above k − 1 matrix

equations combine to give

(r1(x)

r2(x)

)= T

(rk(x)

0

)

where T = T2T3 · · ·Tk . Writing T = (t11 t12t21 t22

)we obtain r1(x) = t11rk(x), r2(x) =

t21rk(x) showing that the entries in col 1 of T are the quotients t11 = r1(x)/rk(x)

and t21 = r2(x)/rk(x) obtained on dividing the original polynomials r1(x) and r2(x)

by their (possibly non-monic) gcd rk(x). We now demonstrate that the entries incol 2 of T are associates (non-zero scalar multiples) of the polynomials a1(x),a2(x) we are looking for. By the multiplicative property of determinants detT =(−1)k−1, as T is the product of k − 1 matrices Ti (2 ≤ i ≤ k) each of which hasdeterminant −1. Therefore t11t22 − t12t21 = (−1)k−1. On multiplying this equa-tion by (−1)k−1(1/c)rk(x), where c is the leading coefficient of rk(x), we get(−1)k−1(1/c)t22r1(x) + (−1)k(1/c)t12r2(x) = (1/c)rk(x) = rk(x). So the polynomi-als

a1(x) = (−1)k−1(1/c)t22 and a2(x) = (−1)k(1/c)t12

satisfy a1(x)r1(x) + a2(x)r2(x) = rk(x).As an example consider r1(x) = x4 +3x2 −2x +2, r2(x) = x3 +x2 +3x over the

rational field Q. Using Theorem 4.1 we carry out the following sequence of divisions.Divide r1(x) by r2(x): x4 + 3x2 − 2x + 2 = (x − 1)(x3 + x2 + 3x) + x2 + x + 2

giving q2(x) = x − 1, r3(x) = x2 + x + 2.Divide r2(x) by r3(x): x3 +x2 +3x = x(x2 +x+2)+x giving q3(x) = x, r4(x) = x.Divide r3(x) by r4(x): x2 + x + 2 = (x + 1)x + 2 giving q4(x) = x + 1, r5(x) = 2.Divide r4(x) by r5(x): x = (x/2) × 2 + 0 giving q5(x) = (1/2)x, r6(x) = 0(x).

In this case k = 5 and r5(x) = 2 is a gcd of {r1(x), r2(x)} by Definition 4.3. Soc = 2 and gcd{r1(x), r2(x)} = (1/2)r5(x) = 1. The reader can check

T = T2T3T4T5 =(

x − 1 11 0

)(x 11 0

)(x + 1 1

1 0

)(x/2 1

1 0

)

=(

r1(x)/2 x3 + x

r2(x)/2 x2 + x + 1

)

and hence a1(x) = (x2 + x + 1)/2, a2(x) = −(x3 + x)/2 satisfy

a1(x)r1(x) + a2(x)r2(x) = 1.


As a second example let r1(x) = x3 + 3x2 + 4x + 2, r2(x) = x3 + 2x2 + 3x + 3over the field Z5. So the coefficients of the powers of x in r1(x) and r2(x) are elementsof Z5, the powers themselves being non-negative integers.Divide r1(x) by r2(x): r1(x) = r2(x) + x2 + x + 4 giving q2(x) = 1, r3(x) =

x2 + x + 4.Divide r2(x) by r3(x): r2(x) = (x + 1)r3(x) + 3x + 4 giving q3(x) = x + 1, r4(x) =

3x + 4.Divide r3(x) by r4(x): r3(x) = (2x +1)r4(x) giving q4(x) = 2x +1, r5(x) = 0(x). In

this case k = 4 and r4(x) = 3x + 4 is a gcd of {r1(x), r2(x)} by Definition 4.3. Soc = 3 and as 1/c = 2 we see that gcd{r1(x), r2(x)} = 2r4(x) = 2(3x + 4) = x + 3is the monic gcd of r1(x) and r2(x). Also

T = T2T3T4 =(

1 11 0

)(x + 1 1

1 0

)(2x + 1 1

1 0

)

=(

2x2 + 3 x + 22x2 + 3x + 2 x + 1

).

The reader can check 2x2 + 3 = r1(x)/r4(x), 2x2 + 3x + 2 = r2(x)/r4(x) and(x + 1)r1(x) − (x + 2)r2(x) = −r4(x). Multiplying the last equation through by3 produces the monic polynomial 3(−r4(x)) = x + 3 and so a1(x) = 3(x + 1) =3x + 3 and a2(x) = −3(x + 2) = 2x + 4 satisfy a1(x)r1(x) + a2(x)r2(x) =x + 3 = gcd{r1(x), r2(x)}.

Definition 4.7

Let p(x) be a polynomial of positive degree n over the field F . Suppose theredo not exist f (x), g(x) ∈ F [x] with degf (x) < n, degg(x) < n satisfying p(x) =f (x)g(x). Then p(x) is said to be irreducible over F .

So a polynomial of positive degree over the field F , which is not a product of twopolynomials of lower degree over F , is irreducible over F . Every polynomial p(x) ofdegree 1 over any field F is irreducible over F , since p(x) is not a product of constantpolynomials.

The fundamental theorem of algebra states that each non-constant polynomialf (x) over the complex field C has a zero z0 in C, that is, f (z0) = 0. The reader mayhave met this deep theorem, first proved by the German mathematician Gauss in 1799,as an application of complex analysis. Suppose p(x) is irreducible over C. Then p(x)

has a complex zero z0 and so x −z0 is a divisor of p(x) by Corollary 4.2(i). From Def-inition 4.7 we deduce p(x) = a(x − z0) where a ∈ C. Therefore the only irreduciblepolynomials over C are those of degree 1.

Our next lemma deals with the connection between irreducibility and the non-existence of zeros.


Lemma 4.8

Let p(x) be a polynomial of degree n over a field F .(i) Suppose p(x) is irreducible over F and n ≥ 2. Then p(x) has no zeros in F .

(ii) Suppose p(x) has no zeros in F and n = 2 or n = 3. Then p(x) is irreducibleover F .

Proof

(i) Suppose to the contrary that p(x) has zero c in F . So p(c) = 0 and x − c is afactor of p(x) by Corollary 4.2(i), that is, p(x) = (x − c)q(x) where q(x) ∈ F [x].Comparing degrees gives degq(x) = n−1 and so p(x) is a product of polynomials ofdegree at most n − 1, showing that p(x) is not irreducible. This contradiction showsthat p(x) has no zeros in F .

(ii) Suppose p(x) is not irreducible, that is, p(x) = f (x)g(x) where f (x), g(x) ∈F [x] and degf (x) = l < n, degg(x) = m < n. We may assume l ≤ m. Comparingdegrees gives l + m = n. For n = 2 the only possibility is l = m = 1: for n = 3 theonly possibility is l = 1, m = 2. So in any case f (x) = ax +b, a �= 0 as degf (x) = 1.Hence f (c) = 0 where c = −b/a ∈ F . Therefore p(c) = f (c)g(c) = 0 × g(c) = 0showing that p(x) has a zero in F , contrary to hypothesis. The conclusion is: p(x) isirreducible over F . �

Which polynomials p(x) are irreducible over the real field R? We know that allpolynomials ax +b of degree 1 are irreducible over R. Consider p(x) = ax2 +bx + c

of degree 2 over R with b2 < 4ac. On completing the square we see

ap(r) = a2r2 + abr + ac = (ar + b/2)2 + (4ac − b2)/4 > 0 for r ∈R,

as (ar + b/2)2 ≥ 0 and (4ac − b2)/4 > 0. So p(r) �= 0 showing that p(x) has noreal zeros. Therefore p(x) is irreducible over R by Lemma 4.8(ii). We now use thefundamental theorem of algebra to show that there are no further polynomials p(x)

which are irreducible over R. Such a polynomial can be regarded as being over C

simply because R is a subfield of C. So there is z0 ∈ C with p(z0) = 0. We mayassume degp(x) ≥ 2 and so z0 /∈ R by Lemma 4.8(i). The complex conjugation map-ping z → z is an automorphism of C having R as its fixed field, that is, z = z ⇔ z ∈ R.Applying this automorphism to p(z0) = 0 produces p(z0) = 0. As z0 �= z0 the poly-nomials x − z0 and x − z0 are coprime (their gcd is 1) and both are divisors ofp(x) by Corollary 4.2(i). So (x − z0)(x − z0) is a divisor of p(x). What is more(x − z0)(x − z0) = x2 − (z0 + z0)x + z0z0 is a polynomial with real coefficients: writ-ing z0 = x0 + iy0 (x0, y0 ∈ R) we have z0 = x0 − iy0 and so −(z0 + z0) = −2x0 ∈ R,z0z0 = |z0|2 = x2

0 +y20 ∈ R. From Definition 4.7 we deduce p(x) = a(x − z0)(x − z0)


for some a ∈ R. Let b = −a(z0 + z0) = −2ax0 and c = az0z0 = a(x20 + y2

0). Thenp(x) = ax2 + bx + c and b2 − 4ac = 4a2x2

0 − 4a2(x20 + y2

0) = −4a2y20 < 0 as a �= 0,

y0 �= 0. Therefore b2 < 4ac showing that the irreducible polynomials over R areax + b of degree 1 and ax2 + bx + c of degree 2 with b2 < 4ac.

A polynomial of degree at least 2 over F is called reducible over F if it is notirreducible over F . So the reducible polynomials over F are of the type f (x)g(x)

where f (x) and g(x) are non-constant polynomials over F .By Lemma 4.8(ii) x2 − 2 is irreducible over Q as

√2 /∈ Q, but x2 − 2 =

(x − √2)(x + √

2) is reducible over R. Similarly x2 + 1 is irreducible over Q andover R, but x2 + 1 = (x − i)(x + i) is reducible over C. There are four quadratic (de-gree 2) polynomials over Z2 namely x2, x2 +1, x2 +x, x2 +x+1; as x2 +1 = (x+1)2

we see that the first three are reducible over Z2, but x2 + x + 1 is irreducible over Z2

by Lemma 4.8(ii). There are eight cubic (degree 3) polynomials over Z2 and, as thereader can check, just two of these are irreducible over Z2, namely x3 + x + 1 andx3 + x2 + 1. Notice that x4 + x2 + 1 = (x2 + x + 1)2 is the only reducible quartic(degree 4) polynomial over Z2 having no zeros in Z2; hence

x4 + x + 1, x4 + x3 + 1, x4 + x3 + x2 + x + 1 are irreducible over Z2.

The monic irreducible polynomials p(x) over a given field F are analogous to the(positive) prime integers p. The polynomial analogue of the fundamental theorem ofarithmetic (see the end of Section 1.2) states that each monic polynomial f (x) overF can be expressed

f (x) = p1(x)n1p2(x)n2 · · ·pk(x)nk

where p1(x),p2(x), . . . , pk(x) are distinct (no two are equal) monic irreducible poly-nomials over F , n1, n2, . . . , nk are positive integers, k ≥ 0, and also the above factori-sation of f (x) is unique apart from the order in which the factors pi(x) occur. Thistheorem is of more theoretic than practical use, as for most f (x) it is difficult if notimpossible to find its irreducible factors pi(x). An exception is xp − x over Zp: bythe |G|-lemma of Section 2.2 applied to the multiplicative abelian group Z

∗p we obtain

cp−1 = 1 for all c ∈ Z∗p , that is, each element c of Z∗

p is a zero of xp−1 − 1 over Zp .

So x − c is an irreducible factor of xp−1 − 1 over Zp by Corollary 4.2(i). Hence

xp − x = x(xp−1 − 1) = x(x − 1)(x − 2) · · · (x − (p − 1))

is the factorisation of xp − x into irreducible polynomials over Zp . In the same way

x|F | − x =∏

c∈F

(x − c) over each finite field F .

Lastly we discuss the rings which are homomorphic images of the ring F [x]. Therings Zn for n ≥ 0 are, up to isomorphism, the homomorphic images of the ring Z


(Exercises 2.3, Question 3(c)). We now discuss the analogous theory, replacing Z

by F [x].

Theorem 4.9

Let R be a ring and let θ : F [x] → R be a ring homomorphism, where F is a field.Then there is p(x) ∈ F [x] such that

θ : F [x]/〈p(x)〉 ∼= im θ

is a ring isomorphism, where (〈p(x)〉 + f (x))θ = (f (x))θ for all f (x) ∈ F [x]. Fur-ther im θ is a field if and only if p(x) is irreducible over F .

Proof

Write K = ker θ = {k(x) ∈ F [x] : (k(x))θ = 0}. Then K is an ideal of F [x] (Exer-cises 2.3, Question 3(b)) and so by Theorem 4.4 there is a polynomial p(x) over F

with K = 〈p(x)〉. As in Theorem 2.16 the ring homomorphism θ gives rise to thering isomorphism θ as above (Exercises 2.3, Question 3(b) again). Therefore everyhomomorphic image im θ of F [x] is isomorphic to a quotient ring of the type

F [x]/〈p(x)〉 for some p(x) ∈ F [x].Suppose p(x) = 0(x). Then F [x]/〈p(x)〉 ∼= F [x] which is not a field and so im θ isnot a field in this case.

Suppose that p(x) is a non-zero constant polynomial. Then p(x) is an invertibleelement of F [x] giving K = 〈p(x)〉 = F [x]. Hence F [x]/〈p(x)〉 = F [x]/F [x] is triv-ial which means that im θ is also trivial and so not a field.

Suppose p(x) is reducible of degree n over F . Then p(x) = g(x)h(x) whereg(x),h(x) ∈ F [x] and degg(x) < n, degh(x) < n. The cosets K +g(x) and K +h(x)

are non-zero elements of F [x]/K = F [x]/〈p(x)〉 but (K + h(x))(K + g(x)) =K + g(x)h(x) = K + p(x) = K showing that their product is the zero element ofF [x]/〈p(x)〉, that is, F [x]/〈p(x)〉 has divisors of zero. Hence the ring im θ also haszero divisors and so is not a field.

Suppose now that im θ is a field. By the preceding three paragraphs p(x) haspositive degree n and cannot be a product of two polynomials of degree less than n

over F , that is, p(x) is irreducible over F .Conversely suppose that p(x) is irreducible over F . By Theorem 4.4 we may

assume that p(x) is monic. Then F [x]/K is a non-trivial commutative ring with1-element K +1 where K = 〈p(x)〉. Consider a typical non-zero element K +f (x) ofF [x]/K . Then f (x) /∈ K which means that p(x) is not a divisor of f (x). What couldgcd{p(x), f (x)} be? As this gcd is a monic divisor of p(x) the only possibilities are


1 and p(x). As gcd{p(x), f (x)} is a divisor of f (x) we see gcd{p(x), f (x)} �= p(x).So gcd{p(x), f (x)} = 1. By Corollary 4.6 there are a1(x), a2(x) ∈ F [x] satisfyinga1(x)p(x) + a2(x)f (x) = 1. Hence K + 1 = K + a2(x)f (x) as 1 − a2(x)f (x) =a1(x)p(x) ∈ K . Therefore K + 1 = (K + a2(x))(K + f (x)) showing that K + f (x)

has inverse K + a2(x) in F [x]/K . So F [x]/K is a non-trivial commutative ring inwhich each non-zero element has an inverse, that is, F [x]/K is a field. �

Let p(x) be an irreducible polynomial of degree n ≥ 2 over F and as above letK = 〈p(x)〉. The field F [x]/K is an important concept (it is analogous to the fieldZp) and it is customary to simplify the notation for F [x]/K and its elements as wenow explain.

First F ′ = {K + a : a ∈ F }, that is, the set of cosets with constant polynomialrepresentatives, is closed under addition and multiplication and contains the 0-elementK + 0 and the 1-element K + 1 of F [x]/K . In fact the correspondence F → F ′ inwhich a → K +a (for all a ∈ F ) is a ring isomorphism. So F ′ is a subfield of F [x]/Kwith F ∼= F ′. It is usual to replace each coset K + a by its representative a for a ∈ F

and so F ′ is replaced by F . Therefore F is a subfield of F [x]/K , or in other words,F [x]/K is an extension field of F .

Next write K + x = c. As degp(x) ≥ 2, a typical element g(x)p(x) + x of K + x

has degree at least 1 and so K + x = c /∈ F . Suppose

p(x) = bnxn + bn−1x

n−1 + · · · + b1x + b0.

Then using sums and products of cosets, that is, working in F [x]/K we obtain

p(c) = (K + bn)(K + x)n + (K + bn−1)(K + x)n−1 + · · ·+ (K + b1)(K + x) + (K + b0)

= (K + bnxn) + (K + bn−1x

n−1) + · · · + (K + b1x) + (K + b0)

= K + (bnxn + bn−1x

n−1 + · · · + b1x + b0) = K + p(x) = K = K + 0

showing that c is a zero of p(x). This technique (of extending F to a field whichcontains a zero of an irreducible polynomial over F ) is important in Galois The-ory.

Consider a typical element K + f (x) of F [x]/K . By Theorem 4.1 there areunique q(x), r(x) ∈ F [x] with f (x) = q(x)p(x) + r(x) and deg r(x) < n. Thenf (x) ≡ r(x) (mod K) as in Section 2.2 and so K + f (x) = K + r(x). Writer(x) = an−1x

n−1 + an−2xn−2 + · · · + a1x + a0. Manipulating cosets we see

K + f (x) = K + (an−1xn−1 + an−2x

n−2 + · · · + a1x + a0)

= (K + an−1xn−1) + (K + an−2x

n−2) + · · · + (K + a1x) + (K + a0)


= (K + an−1)(K + xn−1) + (K + an−2)(K + xn−2) + · · ·+ (K + a1)(K + x) + (K + a0)

= (K + an−1)(K + x)n−1 + (K + an−2)(K + x)n−2 + · · ·+ (K + a1)(K + x) + (K + a0)

= an−1cn−1 + an−2c

n−2 + · · · + a1c + a0 = r(c).

So each element of F [x]/K is uniquely expressible in the form

r(c) = an−1cn−1 + an−2c

n−2 + · · · + a1c + a0

where p(c) = 0, degp(x) = n and ai ∈ F (0 ≤ i < n). Notice that F [x]/K isan n-dimensional vector space over F with basis 1, c, c2, . . . , cn−1. Finally writeF [x]/K = F(c) and so

F(c) is the extension field obtained by adjoining the zero c of p(x) to F .

The polynomial p(x) = x2 + 1 is irreducible over the real field R = F ; with c = i,and so i2 = −1, we obtain C = R(i) = {a + ib : a, b ∈ R}, the familiar constructionof the complex field. Similarly x2 − 2 is irreducible over the rational field Q and soQ(

√2) = {a + b

√2 : a, b ∈Q} is the field obtained by adjoining

√2 to Q.

As we saw earlier, p(x) = x2 + x + 1 is irreducible over Z2. The elementc = 〈p(x)〉+x of the field Z2(c) = Z2[x]/〈p(x)〉 satisfies p(c) = 0, that is, c2 = c+1.This equation (the analogue of i2 = −1 above) is all one needs to manipulate the fourelements of Z2(c) = {0,1, c, c + 1}. The reader can check that the addition and mul-tiplication tables of Z2(c) are:

+ 0 1 c c + 1

0 0 1 c c + 11 1 0 c + 1 c

c c c + 1 0 1c + 1 c + 1 c 1 0

and

× 0 1 c c + 1

0 0 0 0 01 0 1 c c + 1c 0 c c + 1 1

c + 1 0 c + 1 1 c


It is usual to denote this field by F4 as it can be shown that every field with exactly fourelements is isomorphic to Z2(c). Another common notation is GF(4), the Galois fieldof order 4. The additive group of F4 is the Klein 4-group of isomorphism type C2 ⊕C2

and the multiplicative group F∗4 is cyclic of order 3. From the tables (c + 1)2 = c =

(c + 1) + 1 = −(c + 1) − 1 showing p(c + 1) = 0, that is, the irreducible polynomialp(x) over Z2 has zeros c, c + 1 in F4 and so factorises p(x) = (x − c)(x − c − 1)

over F4.We know (Exercises 2.3, Question 4(a)) that the number of elements in every finite

field F is a prime power q = pn. Conversely for every prime power q = pn it can beshown that there is, up to isomorphism, a unique field Fq (also denoted GF(q), theGalois field of order q) having exactly q elements. By the |G|-lemma, the q elementsof Fq are the zeros of xq − x over Zp . What is more, all the monic irreducible poly-nomials of degree n over Zp occur once and once only in the factorisation of xpn − x

over Zp (Exercises 4.1, Question 3(c)).In Chapter 6 the field F [x]/〈p(x)〉 is used to describe the F [x]-module M(A)

where A is a t × t matrix over F satisfying p(A) = 0.

EXERCISES 4.1

1. (a) For each of the following pairs r1(x), r2(x) of polynomials over Q,use the Euclidean algorithm to determine d(x) = gcd{r1(x), r2(x)}and r ′

1(x) = r1(x)/d(x), r ′2(x) = r2(x)/d(x). Also find a1(x), a2(x) ∈

Q[x] satisfying a1(x)r1(x) + a2(x)r2(x) = d(x).(i) r1(x) = x4 + x2 + 1, r2(x) = x3 + 1;

(ii) r1(x) = x4 + x3 − 2x2 − x + 1, r2(x) = x3 − 1;(iii) r1(x) = x3 + x2 + 1, r2(x) = x2 − 1;(iv) r1(x) = x46 − 1, r2(x) = x32 − 1; first use the Euclidean algo-

rithm to find gcd{46,32}.(b) In each of the following cases, working over the indicated field, cal-

culate d(x) = gcd{r1(x), r2(x)} and the polynomials r1(x)/d(x) andr2(x)/d(x).

(i) r1(x) = x5 + x4 + x3 + 1, r2(x) = x5 + x2 + x + 1 over Z2;(ii) r1(x) = x4 + x3 + x2 − 1, r2(x) = x4 + 1 over Z3;

(iii) r1(x) = 2x3 +x2 + 3x + 4, r2(x) = 3x3 + 4x2 + 2x + 1 over Z5.(c) Let m, n, q and r be integers with m = qn + r where m ≥ n > r ≥ 0.

Let F be a field with 1-element e. Find the quotient q(x) and re-mainder r(x) on dividing xm − e by xn − e over F . Hence show(xn − e)|(xm − e) if and only if n|m. Use the Euclidean algorithmto conclude gcd{xm − e, xn − e} = xd − e where d = gcd{m,n}.

(d) Let f1(x) and f2(x) be non-zero polynomials over a field F and letl(x) be the monic generator of the ideal 〈f1(x)〉 ∩ 〈f2(x)〉 of F [x].


Show that l(x) is the least common multiple (lcm) of f1(x) and f2(x),that is, l(x) is the unique monic polynomial over F satisfying(i) f1(x)|l(x), f2(x)|l(x) and

(ii) f1(x)|l′(x), f2(x)|l′(x) ⇒ l(x)|l′(x) for l′(x) ∈ F [x]. Use Corol-lary 4.6 to show l(x) = f1(x)f2(x)/gcd{f1(x), f2(x)} in the casef1(x), f2(x) both monic.

(e) Let f (x) be a monic polynomial of degree t over a field F . Usethe polynomial analogue of the fundamental theorem of arithmeticto show that f (x) has at most 2t monic divisors over F .Hint: Consider first the case f (x) = p(x)n where p(x) is monic andirreducible over F .

2. (a) Let F be a field with an element a. Show that εa : F [x] → F , definedby (f (x))εa = f (a) for all f (x) ∈ F [x], is a surjective ring homo-morphism.

(b) Let f (x), g(x), q(x), r(x) be polynomials over a field F satisfy-ing f (x) = q(x)g(x) + r(x). Show that the ideals 〈f (x), g(x)〉and 〈g(x), r(x)〉 are equal (as sets). Deduce gcd{f (x), g(x)} =gcd{g(x), r(x)}.Let f1(x), f2(x), . . . , ft (x) be polynomials over a field F wheret ≥ 3. Show

gcd{f1(x), f2(x), . . . , ft (x)} = gcd{f1(x),gcd{f2(x), . . . , ft (x)}}.

(c) (Polynomials in one indeterminate over a ring.) Let R be a ring and letP(R) denote the set of infinite sequences (ai) = (a0, a1, . . . , ai, . . .)

where the entries aj all belong to R and only a finite number of the aj

are non-zero. The sum and product of elements (ai) and (bi) of P(R)

are defined by

(a0, a1, . . . , ai, . . .) + (b0, b1, . . . , bi, . . .)

= (a0 + b0, a1 + b1, . . . , ai + bi, . . .),

(a0, a1, . . . , ai, . . .)(b0, b1, . . . , bi, . . .)

= (a0b0, a0b1 + a1b0, . . . , a0bi + a1bi−1 + · · · + aib0, . . .).

Show that P(R) is closed under sum and product. Show that P(R) isa ring.Write (a0)ι

′ = (a0,0,0, . . . ,0, . . .) for a0 ∈ R show thatι′ : R → P(R) is an injective ring homomorphism and deduce thatR′ = im ι′ is a subring of P(R) with R ∼= R′. Show R is an integraldomain ⇔ P(R) is an integral domain.


For i ≥ 0 let ei = (0,0, . . . ,0,1,0, . . .) ∈ P(R), that is, ei =(a0, a1, . . . , ai, . . .) where aj = 0 for j �= i and ai = 1. Write x = e1.Verify (a0)ι

′x = x(a0)ι′ for all a0 ∈ R. Show xi = ei by induction

on i. Deduce

(a0, a1, . . . , ai, . . .)

= (a0)ι′x0 + (a1)ι

′x + (a2)ι′x2 + · · · + (ai)ι

′xi + · · · ,

that is, elements of P(R) are polynomials in x over R′. (It is custom-ary to identify (a0)ι

′ with a0 and write P(R) = R[x].)3. (a) Let F be a finite field.

(i) For |F | odd, use Lagrange’s theorem, Corollary 3.17 and Lem-ma 4.8 to show that the polynomial x2 + 1 over F is reducible if andonly if |F | ≡ 1 (mod 4).(ii) Over which of the fields Z2, Z3, Z5, Z7 is x2 + x + 1 irreducible?Determine the integers |F | such that x2 + x + 1 irreducible over F .Hint: Consider the three congruence classes of |F | (mod 3) separately.

(b) Show that x3 − 2 over Z7 is irreducible. List the irreducible cubicsover Z7 of the type x3 − a.Hint: First find the elements of Z∗

7 of the form b3 where b ∈ Z∗7.

Let F be a finite field. By considering the endomorphism θ of F ∗defined by (b)θ = b3 for all b ∈ F ∗, show that all cubics over F of theform x3 − a are reducible provided |F | �≡ 1 (mod 3). Determine, interms of |F |, the number of reducible cubics of the form x3 − a overF in the case |F | ≡ 1 (mod 3).

(c) Let p(x) be a monic irreducible polynomial of degree n over a finitefield F . Write |F | = q . Use the |G|-lemma of Section 2.2 to showthat each element b of the field E = F [x]/〈p(x)〉 is a zero of xqn − x.Hence show that xqn − x splits (is the product of factors of degree 1)over E. By considering gcd{p(x), xqn − x}, show that p(x) is a di-visor of xqn − x and deduce that p(x) splits over E. Let p′(x) bealso a monic irreducible polynomial of degree n over F . Show thatE ∼= F [x]/〈p′(x)〉. Is it possible for an irreducible polynomial over afinite field F to have a repeated zero (a factor (x−b)2) in an extensionfield E?

(d) The polynomial p(x) of degree 5 over Z2 satisfies p(0) = 1 andgcd{p(x), x3 + 1} = 1. Show that p(x) is irreducible over Z2. Showthat exactly one of x5 +x3 +x+1, x5 +x4 +1, x5 +x2 +1, x5 +x+1is an irreducible polynomial p(x) over Z2. What is gcd{p(x), x31 −1}over Z2?Hint: Either use the Euclidean algorithm or the theory in (c) above.


4. (a) Let F be a field of characteristic 0 (see Exercises 2.3, Question 5(a))with multiplicative identity e. Let F0 = {me/ne : m,n ∈ Z, n �= 0}denote its prime subfield. Let θ be an automorphism of F (so θ is abijective ring homomorphism of F to itself). Show that (a0)θ = a0 forall a0 ∈ F0. Show that L = {a ∈ F : (a)θ = a} is a subfield of F (L iscalled the fixed field of θ ).Let f (x) be a polynomial over F0. Show that (f (c))θ = f ((c)θ) forall c ∈ F . Deduce that c is a zero of f (x) if and only if (c)θ is a zeroof f (x).

(b) The mapping θ : Q(√

2) → Q(√

2) is defined by (a + b√

2)θ =a − b

√2 for all a, b ∈Q. Show that θ is an automorphism of Q(

√2).

Use (a) above with f (x) = x2 − 2 to show that θ is the only non-identity automorphism of Q(

√2).

(c) Let p(x) = x2 + a1x + a0 be irreducible over an arbitrary field F .Let F(c) be the extension field obtained by adjoining a zero c of p(x)

to F . Show −a1 −c is also a zero of p(x). Generalise (b) showing thatθ : F(c) → F(c), given by (a + bc)θ = a − a1b − bc for all a, b ∈ F ,is a self-inverse (θ = θ−1) automorphism of F(c).Hint: Show θ respects multiplication – this surprising fact is importantin Galois Theory.Let L denote the fixed field of θ . Show −a1 − c �= c if and only ifL = F .

5. (a) Let F be a finite field of characteristic p, let F0 be its prime subfield,and let θ be an automorphism of F . Show that (a0)θ = a0 for alla0 ∈ F0. Show that L = {a ∈ F : (a)θ = a} is a subfield of F (L is thefixed field of θ) and F0 ⊆ L ⊆ F .Suppose |F | = pn. Use Corollary 4.2(ii) to show that the Frobeniusautomorphism θ of F , defined by (a)θ = ap for all a ∈ F (see Exer-cises 2.3, Question 5(a)), has multiplicative order n.

(b) Let p(x) = x3 + x + 1 over Z2. Verify that p(x) is irreducibleover Z2. Write out the addition and multiplication tables of the fieldZ2(c) = {a0 + a1c + a2c

2 : a0, a1, a2 ∈ Z2} where p(c) = 0. Expresseach element of Z2(c)

∗ as a power of c. Factorise p(x) into a productof polynomials of degree 1 over Z2(c). Show that p′(x) = x3 +x2 +1is irreducible over Z2 and splits over Z2(c). Hence factorise x8 − x

into irreducible polynomials over Z2(c) and into irreducible polyno-mials over Z2. Are p(x) and p′(x) the only irreducible polynomialsof degree 3 over Z2?

(c) Use Question 3(d) above to show that p(x) = x5 + x3 + 1 is ir-reducible over Z2. Use the Frobenius automorphism θ of the fieldZ2(c) to express the zeros of p(x) as linear combinations over


Z2 of 1, c, c2, c3, c4 where p(c) = 0 (remember (a)θ = a2 for alla ∈ Z2(c)). Find the irreducible polynomial p′(x) over Z2 such thatp′(c + 1) = 0 in Z2(c) and factorise p′(x) into irreducible polynomi-als over Z2(c).Hint: Find the connection between the zeros of f (x) and those off (x − a)?

6. (a) Verify that x2 +1, x2 +x −1, x2 −x −1 are irreducible over Z3. Cal-culate their product and hence resolve x9 − x into monic irreduciblepolynomials over Z3. Are there any further monic irreducible poly-nomials of degree 2 over Z3? Factorise each of the above quadraticpolynomials into monic irreducible polynomials over the field Z3(i)

where i2 = −1. Simplify (a + bi)3 where a, b ∈ Z3 and hence findthe connection between the Frobenius automorphism θ of Z3(i) and‘conjugation’ a + bi → a − bi. Find a generator of the cyclic groupZ3(i)

∗. Write out the addition and multiplication tables of Z3(i).(b) Verify that p(x) = x3 − x − 1 is irreducible over Z3. Let c be a zero

of p(x) and so c3 = c + 1. Does c generate the multiplicative groupof the field Z3(c)? Is p(x) a divisor of x13 − 1 over Z3? Does −c

generate the cyclic group Z3(c)∗? Is −p(−x) = x3 − x + 1 a divisor

of x13 + 1 over Z3?Use x3 − x over Z3 to show that (a)θ = a ⇔ a ∈ Z3 where θ isthe Frobenius automorphism of Z3(c). Hence show that pa(x) =(x − a)(x − a3)(x − a9) is a polynomial over Z3 for a ∈ Z3(c). Fora /∈ Z3 is pa(x) irreducible over Z3? How many monic irreduciblepolynomials of degree 3 over Z3 are there? Factorise x13 − 1 andx13 + 1 into monic irreducible polynomials over Z3.

(c) Verify that p(x) = x4 + x2 − 1 over Z3 is not divisible by any monicirreducible quadratic over Z3 (see (a) above). Hence show that p(x)

is irreducible over Z3. Let c satisfy p(c) = 0 and write i = c2 − 1.Verify i2 = −1 and deduce that Z3(i) is a subfield of Z3(c). Find thefactorisations of p(x) into monic irreducible polynomials over Z3(c)

and into monic irreducible quadratics over Z3(i). How many monicirreducible polynomials of degree 4 over Z3 are there?

7. (a) Let E be a field with subfield F . Suppose that E, which is a vectorspace over F , has finite dimension [E : F ]. Then E is called a finiteextension of F . Let c be an element of E. By considering the ‘vectors’ci in E for 0 ≤ i ≤ [E : F ], show that the evaluation homomorphismεc : F [x] → E has a non-zero kernel K . The monic generator mc(x)

of K (see Theorem 4.4) is called the minimum polynomial of c. Showthat mc(x) is irreducible over F and deduce that F(c) = im εc is asubfield of E with F(c) ∼= F [x]/K and [F(c) : F ] = degmc(x).


(b) Let E be a field with subfield F and let L be an intermediate subfield,that is, L is a subfield of E and F ⊆ L ⊆ E. Suppose that L is a fi-nite extension of F with basis u1, u2, . . . , um where m = [L : F ] andsuppose also that E is a finite extension of L with basis v1, v2, . . . , vn

where n = [E : L]. Prove that the mn elements uivj of E (1 ≤ i ≤ m,

1 ≤ j ≤ n) form a basis of E over F . Deduce that E is a finite exten-sion of F and [E : F ] = [E : L][L : F ].

(c) Let E be a finite field with subfield F of order q and let [E : F ] = n.(i) Show that every subfield L of E satisfies |L| = qd where d|n.Conversely show (qd − 1)|(qn − 1) and (xqd − x)|(xqn − x) for eachpositive divisor d of n, and hence show that L = {c ∈ E : cqd = c} isthe unique subfield of E with |L| = qd .Hint: Consider the fixed field of θd where θ is the Frobenius automor-phism of E.(ii) Let L and M be subfields of E with F ⊆ L ∩ M and so |L| = qd

and |L| = qe. Deduce from (i) above that |L ∩ M| = qgcd{d,e}.(iii) Let n = p

n11 p

n22 · · ·pnk

k be the factorisation of n > 1 into pos-itive powers of distinct primes p1,p2, . . . , pk . For each subset X

of {1,2, . . . , k} write πX = ∏j∈X pj , i.e. πX is the product of the

primes pj for j ∈ X and π∅ = 1. Let Lj be the subfield of E with|Lj | = qn/pj (1 ≤ j ≤ k). Use induction on s = |X| and (ii) aboveto show |⋂j∈X Lj | = qn/πX . The sieve formula now asserts that thenumber of elements of E which are not in any subfield L of E withF ⊆ L �= E is r = ∑

X(−1)|X|qn/πX , the summation being over the2k subsets X of {1,2, . . . , k}. Using Questions 3(c) and 7(a) above,explain why r/n (Dedekind’s formula) is the number of monic irre-ducible polynomials of degree n over F ∼= Fq .Verify that there are 335 monic irreducible polynomials of degree 12over Z2. Verify that there are 670 monic irreducible polynomials ofdegree 6 over F4. Calculate the numbers of monic irreducible polyno-mials of degree 12 over Z3 and of degree 6 over F9.

8. (a) Let p(x) be a monic irreducible polynomial of degree n over a fieldF and let m be a positive integer. Write K = 〈p(x)m〉 for the prin-cipal ideal of F [x] generated by p(x)m and let Rm = F [x]/K . Alsolet Gm = U(Rm) denote the multiplicative group of invertible ele-ments of the ring Rm. Show K +f (x) ∈ Gm ⇔ gcd{f (x),p(x)} = 1.Show also that Hm = {K + f (x) : f (x) ≡ 1 (mod p(x))} is a sub-group of Gm.Suppose F is a finite field of order q . Show |Rm| = qmn,|Gm| = q(m−1)n(qn − 1) and |Hm| = q(m−1)n. By applying Corol-lary 3.17 to Rm/〈K + p(x)〉 show that Gm

∼= Hm × H0 (external di-rect product, Exercises 2.3, Question 4(d)) where H0 is cyclic of order


qn − 1. Express the invariant factors of Gm in terms of the invariantfactors of Hm.

(b) Let Rm = Z2[x]/〈xm〉 and let Gm = U(Rm) where m is a positiveinteger. Show |Gm| = 2m−1.Find the invariant factors of G2, G3, G4, G5 and G6. Adapting thenotation used in the proof of Theorem 3.7 to multiplicative abeliangroups, write nGm = {g ∈ Gm : gn = 〈xm〉 + 1}. Find the invariantfactors of the subgroups 2G6, 4G6, 8G6 of G6.

(c) Let j and m be positive integers and let �y� denote the integer part ofthe real number y. Show that there are �(j − 1)m/j� integers i in therange m/j ≤ i < m.Suppose 2r−1 < m ≤ 2r . Show

|2j

Gm| = 2tj where tj = �(2j − 1)m/2j �

and Gm is defined in (b) above. Show also tj = m − 1 for j ≥ r . Finda formula for the number sj of invariant factors 2j of Gm for j ≥ 1.Hint: You should find sj = 0 for j > r .List the invariant factors of G25 and G32.

(d) Let g(x) and h(x) be polynomials over a field F . Show that the map-ping α : F [x]/〈g(x)h(x)〉 → (F [x]/〈g(x)〉) ⊕ (F [x]/〈h(x)〉), givenby

(〈g(x)h(x)〉 + f (x))α = (〈g(x)〉 + f (x), 〈h(x)〉 + f (x))

for all f (x) ∈ F [x],is unambiguously defined and is a ring homomorphism. Establish

the polynomial version of the Chinese remainder theorem 2.11,

namely gcd{g(x),h(x)} = 1 implies that α is a ring isomorphism.Show that the rings Rm = Z2[x]/〈xm〉 and R′

m = Z2[x]/〈(x − 1)m〉are isomorphic and deduce Gm = U(Rm) ∼= U(R′

m). Let l and m

be positive integers. Show that the rings Z2[x]/〈xl(x − 1)m〉 andRl ⊕ Rm are isomorphic. Show that the multiplicative abelian groupsU(Z2[x]/〈xl(x − 1)m〉) and Gl × Gm are isomorphic.Hint: Use the automorphism :f (x) → f (x − 1), for all f (x) ∈ Z2[x],of the ring Z2[x].List the invariant factors of U(Z2[x]/〈x25(x − 1)32〉).

(e) Let p(x) be a polynomial of positive degree n over a field F and letm be a positive integer. Let f (x) be a polynomial of degree less thanmn over F . Show that there are unique polynomials ri(x) over F with

4.2 Equivalence of Matrices over F [x] 187

deg ri(x) < n for 0 ≤ i < m such that

f (x) =m−1∑

i=0

ri(x)p(x)i .

Let p(x) be monic irreducible polynomial of degree n over the finitefield Fq where q = pl

0 (p0 prime) and let m be a positive integer.Using the notation of (a) above write Rm = Fq [x]/〈p(x)m〉 and Gm =U(Rm) = Hm ⊕ H0. Using the method of (c) above show |pj

0 Hm| =qntj where tj = �(pj

0 − 1)m/pj

0� for j ≥ 0.

Find a formula for the number sj of invariant factors pj

0 of Hm.Specify the invariant factors of H11 and G11 in the case n = 2, q = 9.

4.2 Equivalence of Matrices over F [x]The equivalence of matrices over Z is covered in Chapter 1: by Theorem 1.11 andCorollary 1.20: every s × t matrix A over Z is equivalent to a unique s × t matrixS(A) in Smith normal form Definition 1.6. Also the reduction of A to S(A) can becarried out by a finite sequence of elementary operations over Z, this process beinga generalisation of the Euclidean algorithm. Here we show that this method applies,almost unchanged, to matrices with entries from the ring F [x] of polynomials in x

over a given field F . The analogy rests on the fact that polynomials over F have thedivision property Theorem 4.1 just as integers do.

Let F be a field and let Ms×t (F [x]) denote the set of all s × t matricesA(x) = (aij (x)) over F [x]. The (i, j)-entry in A(x) is the polynomial aij (x) overF for 1 ≤ i ≤ s, 1 ≤ j ≤ t . We consider the effect of applying elementary row op-erations (eros) over F [x] and elementary column operations (ecos) over F [x] to thematrix A(x), that is, operations of the following types:

(i) interchange of two rows or two columns(ii) multiplication of a row or column by a non-zero element of F

(iii) addition of a multiple f (x) of a row/column to a different row/column wheref (x) ∈ F [x].

The reader should compare the above with the familiar eros and ecos over Z of Sec-tion 1.1. Notice that (ii) says multiplication of a row or column by any non-zero scalar(invertible element of F [x]) is a permitted elementary operation over F [x].

As before ri ↔ ri′ denotes the interchange of row i and row i′ (i �= i′) and cj ↔ cj ′denotes the interchange of col j and col j ′ (j �= j ′). For a ∈ F ∗ denote by ari andacj respectively the elementary operations over F [x] of multiplying row i and col j

by a. Also ri +f (x)ri′ denotes the addition to row i of f (x) times row i′ (i �= i′), andcj + f (x)cj ′ denotes adding f (x) times col j ′ to col j (j �= j ′).


It should come as no surprise that all elementary operations over F [x], regardedas mappings of the set Ms×t (F [x]) of s × t matrices over F [x], are invertible andtheir inverses are elementary operations over F [x] of the same type. For instance theinverse of ari is (1/a)ri , the inverse of cj + f (x)cj ′ is cj − f (x)cj ′ , and the inverseof ri ↔ ri′ is ri ↔ ri′ again.

A t × t matrix which results on applying a single ero over F [x] to the t × t identitymatrix I is called an elementary matrix over F [x]. For example(

0 11 0

),

(1 00 2

),

(1 00 1/2

),

(1 −x

0 1

),

(1 0

2x + 1/3 1

)

are the elementary matrices over Q[x] which result on applying respectively the erosr1 ↔ r2, 2r2, (1/2)r2, r1 − xr2, r2 + (2x + 1/3)r1 to the 2 × 2 identity matrix I . Eachelementary matrix over F [x] can be obtained equally well by applying a single ecoover F [x] to I . The preceding five elementary matrices over Q[x] arise from I byapplying the ecos c1 ↔ c2, 2c2, (1/2)c2, c2 − xc1, c1 + (2x + 1/3)c2 respectively.Elementary matrices over F [x] are invertible over F [x] and their inverses are againelementary matrices over F [x].

An ero and an eco over F [x] which produce equal matrices on being applied to I

are said to be paired. Therefore ri ↔ rj , ci ↔ cj are paired, ari , aci are paired, and(watch those suffices!) ri +f (x)rj , cj +f (x)ci are paired where 1 ≤ i ≤ t , 1 ≤ j ≤ t ,i �= j , and a ∈ F ∗. Conversely each elementary matrix over F [x] corresponds to(arises from) a paired ero and eco over F [x].

As in Chapter 1 an ero and an eco over F [x] which produce inverse matrices onbeing applied to I are said to be conjugate. Thus ri ↔ rj , ci ↔ cj are conjugate, ari ,a−1ci are conjugate, and (watch the suffices and signs) ri + f (x)rj , cj − f (x)ci areconjugate where 1 ≤ i ≤ t , 1 ≤ j ≤ t , i �= j , and a ∈ F ∗.

Having spelt out the analogous details in Chapter 1, the reader will soon becomeaware of a marked ‘do-it-yourself’ attitude to the proofs here. The analogy betweenthe rings Z and F [x] is close (they are examples of Euclidean rings, that is, PIDsin which gcds can be calculated by a generalised Euclidean algorithm). Further, thetheory of equivalence of matrices over F [x] is almost identical to the theory coveredin Chapter 1. Thus the following principle holds:

pre/postmultiplication of a matrix A(x) over F [x] by an elementary matrix overF [x] carries out the corresponding ero/eco over F [x].

Our next lemma is the polynomial analogue of Lemma 1.4.

Lemma 4.10

Let A(x) be an s × t matrix over F [x] where F is a field. Let P1(x) be an elementarys × s matrix over F [x] and let Q1(x) be an elementary t × t matrix over F [x]. The


result of applying to A(x) the ero corresponding to P1(x) is P1(x)A(x). Applying toA(x) the eco corresponding to Q1(x) produces A(x)Q1(x).

We leave the reader to construct a proof of Lemma 4.10 by referring back toLemma 1.4 and Exercises 1.2, Questions 1(c) and (d) (see Exercises 4.2, Ques-tion 6(a)). The determinants of elementary matrices corresponding to type (i) and type(iii) elementary operations over F [x] are −1 and +1 respectively. Also elementarymatrices over F [x] corresponding to the type (ii) operations ari and acj have deter-minant a ∈ F ∗.

The polynomial analogue of Definition 1.5 is:

Definition 4.11

Let F be a field. The s × t matrices A(x) and B(x) over F [x] are called equivalentand we write A(x) ≡ B(x) if there is an s × s matrix P(x) and a t × t matrix Q(x),both invertible over F [x], satisfying

P(x)A(x)Q(x)−1 = B(x).

As the name and notation suggest ≡ is an equivalence relation on the setMs×t (F [x]) of all s × t matrices over F [x]. Notice that two 1 × 1 matrices overF [x] are equivalent if and only if their entries are associate (Exercises 3.1, Ques-tion 7a(i)). Is there a method of determining whether two given s × t matrices A(x)

and B(x) over F [x] are equivalent or not? The answer is: Yes! We discuss the detailsnext. It boils down to finding the ‘simplest’ matrix S(A(x)) in the equivalence classof A(x) and the simplest matrix S(B(x)) in the equivalence class of B(x). Then

A(x) ≡ B(x) ⇔ S(A(x)) = S(B(x)).

The reader should not be surprised to learn that

S(A(x)) is called the Smith normal form of A(x).

The polynomial analogue of Definition 1.6 is:

Definition 4.12

Let D(x) be an s × t matrix over F [x] such that(i) all (i, j)-entries in D(x) are zero for i �= j , that is, D(x) is a diagonal matrix,

(ii) each (i, i)-entry di(x) in D(x) is either monic or zero,(iii) di(x)|di+1(x) for 1 ≤ i < min{s, t}.Then D(x) = diag(d1(x), d2(x), . . . , dmin{s,t}(x)) is said to be in Smith normal form.


It turns out (see Theorem 4.16 and Corollary 4.19) that the equivalence class ofA(x) contains a unique matrix D(x) as in Definition 4.12 and so it makes sense towrite D(x) = S(A(x)) and refer to d1(x), d2(x), . . . , dmin{s,t}(x) as the invariant fac-tors of A(x).

It follows from Theorem 4.16 that A(x) can be changed into (reduced to) S(A(x))

by means of a finite sequence of elementary operations over F [x]. Write

P(x) = Pu(x)Pu−1(x) · · ·P2(x)P1(x)

where Pk(x) is the elementary matrix over F [x] corresponding to the kth ero used inthe reduction. Write

Q(x) = Qv(x)−1Qv−1(x)−1 · · ·Q2(x)−1Q1(x)−1

where Ql(x) is the elementary matrix corresponding to the lth eco used in the re-duction and so Q(x)−1 = Q1(x)Q2(x) · · ·Qv−1(x)Qv(x). Both P(x) and Q(x) areinvertible over F [x] being products of elementary matrices over F [x]. ApplyingLemma 4.10 u+v times we obtain the polynomial analogue of Corollary 1.13 namely

Pu(x) · · ·P2(x)P1(x)A(x)Q1(x)Q2(x) · · ·Qv(x) = S(A(x)),

that is,

P(x)A(x)Q(x)−1 = S(A(x))

as in Definition 4.11. So the above equation shows A(x) ≡ S(A(x)). The equationP(x) = Pu(x)Pu−1(x) · · ·P2(x)P1(x)I together with u applications of Lemma 4.10tell us that P(x) can be calculated by applying in sequence the u eros used inthe reduction to the s × s identity matrix I over F [x]. Also the equation Q(x) =Qv(x)−1Qv−1(x)−1 · · ·Q2(x)−1Q1(x)−1I and v applications of Lemma 4.10 tell usthat Q(x) can be calculated by applying in sequence the conjugates of the v ecos usedin the reduction to the t × t identity matrix I over F [x]. So it is ‘business as usual’with polynomials over F being the matrix entries in place of integers.

Following closely the theory in Section 1.2 we detail the reduction of 1 × 2 matri-ces over F [x].

Lemma 4.13

Let A(x) = (r1(x), r2(x)) be a 1 × 2 matrix over F [x] where F is a field.(i) There is a sequence of ecos over F [x] which reduces A(x) to S(A(x)) = (d(x),0)

where d(x) = gcd{r1(x), r2(x)}.(ii) There is a corresponding invertible matrix

Q(x) =(

q11(x) q12(x)

q21(x) q22(x)

)


over F [x] satisfying A(x) = S(A(x))Q(x).Also q22(x)r1(x) − q21(x)r2(x) = |Q(x)|d(x).

Proof

(i) The 1 × 2 matrix (d(x),0) is in Smith normal form Definition 4.12 as d(x) iseither monic or zero. Suppose r1(x) = r2(x) = 0; in this case no ecos over F [x] areneeded as A(x) = (0,0) = S(A(x)). Otherwise we may assume r1(x) �= 0 applyingc1 ↔ c2 if necessary. Suppose r1(x) and d(x) are associate; in this case the ecosc2 − (r2(x)/r1(x))c1, (d(x)/r1(x))c1 change A(x) = (r1(x), r2(x)) into (d(x),0).

Suppose now that r1(x) �= 0 and d(x) are not associate. Then r2(x) �= 0 also. Weapply the Euclidean algorithm to (r1(x), r2(x)) as set out following Corollary 4.6:there are polynomials r3(x), r4(x), . . . , rk(x), rk+1(x) = 0 over F such that ri−1(x) =qi(x)ri(x) + ri+1(x) where deg ri+1(x) < deg ri(x) and qi(x) ∈ F [x] for 1 ≤ i ≤ k.Then rk(x) ≡ d(x), that is, d(x) = rk(x). Write a = rk(x)/d(x) for the leading coef-ficient of rk(x).

Applying to A(x) the sequence of k − 1 ecos:

c1 − q2(x)c2, c2 − q3(x)c1, c1 − q4(x)c2, c2 − q5(x)c1, . . .

terminating in either c2 − qk(x)c1 (k odd) or c1 − qk(x)c2 (k even) produces

A(x) = (r1(x), r2(x)) ≡ (r3(x), r2(x)) ≡ (r3(x), r4(x))

≡ (r5(x), r4(x)) ≡ (r5(x), r6(x)) ≡ · · ·terminating in either (rk(x),0) or (0, rk(x)). In the first case (k odd) the eco a−1c1

finishes the reduction by changing (rk(x),0) into (d(x),0). In the second case (k even)the ecos c1 ↔ c2 and a−1c1 change (0, rk(x)) into (d(x),0).

So A(x) can be reduced to S(A(x)) by a finite sequence of v ecos over F [x].(ii) In the case A(x) = S(A(x)) we take Q(x) = I the 2 × 2 identity matrix over

F [x]. Otherwise let Ql(x) be the elementary matrix corresponding to the lth ecoused in the reduction of A(x) to S(A(x)) for 1 ≤ l ≤ v. By the foregoing theoryQ(x) = Qv(x)−1Qv−1(x)−1 · · ·Q2(x)−1Q1(x)−1 and also detQ(x) = ±a. Compar-ing entries in A(x) = S(A(x))Q(x) gives r1(x) = d(x)q11(x), r2(x) = d(x)q12(x).Multiplying q11(x)q22(x) − q12(x)q21(x) = |Q(x)| = ±a by d(x) givesq22(x)r1(x) − q21(x)r2(x) = |Q(x)|d(x). �

Note that the matrices Ti = (qi (x) 1

1 0 ) used in the description of the Euclidean

algorithm following Corollary 4.6 are not elementary. In fact Ti = (0 11 0

)( 1 0qi (x) 1

)

shows that Ti is the product of two elementary matrices over F [x]. The reader cancheck that at most k + 1 ecos are needed in the reduction of the non-zero matrix


A(x) = (r1(x), r2(x)) to S(A(x)) where k ≤ max{deg r1(x),deg r2(x)} + 2. FromLemmas 4.10 and 4.13 a method of constructing Q(x) step by step is obtained, onestep for each eco used in the reduction. We will see in Section 6.1 that Q(x) is key tothe theory of similarity.

As an example of Lemma 4.13 consider A(x) = (r1(x), r2(x)) where r1(x) =x4 + 3x2 − 2x + 2, r2(x) = x3 + x2 + 3x over Q[x]. We have already applied the Eu-clidean algorithm to these polynomials (before Definition 4.7): in this case k = 5 andq2(x) = x − 1, q3(x) = x, q4(x) = x + 1, q5(x) = (1/2)x and r5(x) = 2. ByLemma 4.13 the sequence of ecos: c1 − (x − 1)c2, c2 − xc1, c1 − (x + 1)c2,c2 − (x/2)c1, (1/2)c1 reduces A(x) to S(A(x)) = (1,0). Applying in sequence theconjugates of these ecos to I produces

(1 00 1

)≡

r2+(x−1)r1

(1 0

x − 1 1

)≡

r1+xr2

(x2 − x + 1 x

x − 1 1

)

≡r2+(x+1)r1

(x2 − x + 1 x

x3 + x x2 + x + 1

)

≡r1+(x/2)r2

((x4 + 3x2 − 2x + 2)/2 (x3 + x2 + 3x)/2

x3 + x x2 + x + 1

)

≡2r1

(x4 + 3x2 − 2x + 2 x3 + x2 + 3x

x3 + x x2 + x + 1

)= Q(x)

which is invertible over Q[x] and satisfies A(x)Q(x)−1 = S(A(x)). The readershould check detQ(x) = 2 which gives ((x2 + x + 1)/2)r1(x)− ((x3 + x)/2)r2(x) =1 = d(x), that is, the reduction produces polynomials a1(x) = (x2 + x + 1)/2 anda2(x) = −(x3 + x)/2 satisfying a1(x)r1(x) + a2(x)r2(x) = gcd{r1(x), r2(x)} as inCorollary 4.6.

In fact the reduction of every 1×2 matrix A(x) over F [x] to its Smith normal formS(A(x)) can be carried out using only ecos of type (iii) over F [x]. So the matrix Q(x)

of Lemma 4.13(ii) can be chosen to have determinant 1, establishing the polynomialanalogue of Lemma 1.8 (Exercises 4.2, Question 3(e)).

Suppose now that A(x) is a 2 × 1 matrix over F [x]. The matrix transpose ofLemma 4.13 tells us that there is a finite sequence of eros over F [x] which re-duces A(x) to its Smith normal form S(A(x)); also there is a corresponding matrixP(x), invertible over F [x], satisfying P(x)A(x) = S(A(x)). Comparing entries givesa1(x)r1(x) + a2(x)r2(x) = gcd{r1(x), r2(x)} where (a1(x), a2(x)) is row 1 of P(x).

As an example consider A = (r1(x)r2(x)

)where r1(x) = x3 +3x2 +4x +2 and r2(x) =

x3 + 2x2 + 3x + 3 over the field Z5. Applying the Euclidean algorithm to r1(x) andr2(x) (see before Definition 4.7) gives k = 4, q2(x) = 1, q3(x) = x+1, q4(x) = 2x+1and r4(x) = 3x + 4. So a = 3 and a−1 = 2. The sequence of eros:

r1 − r2, r2 − (x + 1)r1, r1 − (2x + 1)r2, r1 ↔ r2, 2r1


changes A(x) into S(A(x)) = (x+3

0

). Applying this (unchanged) sequence of eros to

the 2 × 2 identity matrix I over Z5[x], produces:(

1 00 1

)≡

r1−r2

(1 40 1

)≡

r2−(x+1)r1

(1 4

4x + 4 x + 2

)

≡r1−(2x+1)r2

(2x2 + 3x + 2 3x2 + 2

4x + 4 x + 2

)

≡r1↔r2

(4x + 4 x + 2

2x2 + 3x + 2 3x2 + 2

)

≡2r1

(3x + 3 2x + 4

2x2 + 3x + 2 3x2 + 2

)= P(x)

satisfying P(x)A(x) = (x+3

0

) = S(A(x)).

We now tackle the reduction to Smith normal form of a general matrix over F [x].

Lemma 4.14

Let A(x) be an s × t matrix over F [x] where F is a field. Then A(x) can bechanged into B(x) = (bij (x)) using elementary operations over F [x] where b1j (x) =bi1(x) = 0 for 1 < i ≤ s, 1 < j ≤ t and b11(x) is either monic or zero.

As with Lemma 4.10 we leave the diligent reader to construct a proof ofLemma 4.14 based on its integer analogue Lemma 1.9 using Lemma 4.13(i) in placeof Lemma 1.7 (see Exercises 4.2, Question 6(b)). Using induction on min{s, t} andLemma 4.14, a general s × t matrix A(x) over F [x] can be reduced to a diagonal ma-trix over F [x]. How then can a diagonal matrix over F [x] be reduced to Smith normalform? The answer is provided by the polynomial analogue of Lemma 1.10. For oncewe give the proof in detail if only to show how close it is to the integer version.

Lemma 4.15

Let F be a field. The diagonal matrix

A(x) =(

f1(x) 00 f2(x)

)

over F [x], where fi(x) is either monic or zero (i = 1,2), can be reduced to Smithnormal form by at most five elementary operations of type (iii) over F [x].


Proof

We mimic the proof of Lemma 1.10 step by step. There is nothing to do in thecase f1(x)|f2(x) as A(x) is already in Smith normal form. Otherwise let d(x) =gcd{f1(x), f2(x)}. By Corollary 4.6 there are polynomials a1(x) and a2(x) over F

with a1(x)f1(x) + a2(x)f2(x) = d(x). Then the sequence of eros and ecos over F [x]c2 + a1(x)c1, r1 + a2(x)r2, c1 − (f1(x)/d(x) − 1)c2,

c2 − c1, r2 + (f2(x)/d(x))(f1(x)/d(x) − 1)r1

changes A(x) into(

d(x) 00 f1(x)f2(x)/d(x)

)

which is in Smith normal form. �

The polynomial f1(x)f2(x)/d(x) is the least common multiple of the monic poly-nomials f1(x) and f2(x) over F where d(x) = gcd{f1(x), f2(x)} and we write

lcm{f1(x), f2(x)} = f1(x)f2(x)/d(x)

(see Exercises 4.1, Question 1(d)). Note that lcm{f1(x),0} = 0 for all f1(x) ∈ F [x].It follows from Lemma 4.15 that

S(diag(f1(x), f2(x))) = diag(gcd{f1(x), f2(x)}, lcm{f1(x), f2(x)})for all polynomials f1(x) and f2(x) over F .

From Theorem 4.4 all ideals of F [x] are principal. The set equalities

〈f1(x)〉 + 〈f2(x)〉 = 〈gcd{f1(x), f2(x)}〉,〈f1(x)〉 ∩ 〈f2(x)〉 = 〈lcm{f1(x), f2(x)}〉

encapsulate the divisor properties of gcd and lcm.Lemmas 4.10, 4.14 and 4.15 can be used to establish the polynomial analogues of

Theorem 1.11 and Corollary 1.13 (see Exercises 4.2, Question 6(c)):

Theorem 4.16 (The existence of the Smith normal form over F [x])Let A(x) be an s × t matrix over F [x] where F is a field. There is a sequence ofelementary operations over F [x] which changes A(x) into S(A(x)) in Smith normalform. There are invertible matrices P(x) and Q(x) over F [x] satisfying

P(x)A(x)Q(x)−1 = S(A(x)).


As before we leave the conscientious reader to construct a proof Theorem 4.16 basedon that of Theorem 1.11 (see Exercises 4.2, Question 6(c)). As explained followingDefinition 4.12, suitable matrices P(x) and Q(x) can be calculated from the eros andecos used in the reduction of A(x) to Smith normal form.

We illustrate the process by reducing the following 3 × 3 matrix A(x) over Q[x]:

A(x) =⎛

⎝x2 x x

x3 + 2x2 x2 + x x2 + 2x

2x3 + x + 1 x2 2x2

⎞

⎠

≡c1−xc2c1↔c2

⎛

⎝x 0 x

x2 + x x2 x2 + 2x

x2 x3 + x + 1 2x2

⎞

⎠

≡c3−c1

⎛

⎝x 0 0

x2 + x x2 x

x2 x3 + x + 1 x2

⎞

⎠ ≡r2−(x+1)r1

r3−xr1

⎛

⎝x 0 00 x2 x

0 x3 + x + 1 x2

⎞

⎠

≡c2−xc3c2↔c3

⎛

⎝x 0 00 x 00 x2 x + 1

⎞

⎠ ≡r3−xr2

⎛

⎝x 0 00 x 00 0 x + 1

⎞

⎠ = diag(x, x, x + 1).

Applying the sequence: r2 + r3, c3 − c2, c2 ↔ c3, c3 − xc2, r3 − (x + 1)r2, −r3 todiag(x, x, x + 1) produces diag(x,1, x(x + 1)). Finally applying r1 ↔ r2, c1 ↔ c2 todiag(x,1, x(x + 1)) produces S(A(x)) = diag(1, x, x(x + 1)) in Smith normal form.

A matrix P(x) as in Theorem 4.16 is now found by applying, in sequence, the erosused above to the 3 × 3 identity matrix I over Q[x]:

⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r2−(x+1)r1

r3−xr1

⎛

⎝1 0 0

−x − 1 1 0−x 0 1

⎞

⎠ ≡r3−xr2r2+r3

⎛

⎝1 0 0

x2 − x − 1 −x + 1 1x2 −x 1

⎞

⎠

≡r3−(x+1)r2

⎛

⎝1 0 0

x2 − x − 1 −x + 1 1−x3 + x2 + 2x + 1 x2 − x − 1 −x

⎞

⎠

≡−r3r1↔r2

⎛

⎝x2 − x − 1 −x + 1 1

1 0 0x3 − x2 − 2x − 1 −x2 + x + 1 x

⎞

⎠ = P(x).

A matrix Q(x) as in Theorem 4.16 is now found by applying, in sequence, the conju-gates of the ecos used above to the 3 × 3 identity matrix I over Q[x]:


I =⎛

⎝1 0 00 1 00 0 1

⎞

⎠ ≡r2+xr1r1↔r2

⎛

⎝x 1 01 0 00 0 1

⎞

⎠ ≡r1+r3r3+xr2

⎛

⎝x 1 11 0 0x 0 1

⎞

⎠ ≡r2↔r3r2+r3

⎛

⎝x 1 1

x + 1 0 11 0 0

⎞

⎠

≡r2↔r3r2+xr3

⎛

⎝x 1 1

x2 + x + 1 0 x

x + 1 0 1

⎞

⎠ ≡r1↔r2

⎛

⎝x2 + x + 1 0 x

x 1 1x + 1 0 1

⎞

⎠ = Q(x).

It is routine to check P(x)A(x) = S(A(x))Q(x). In this case detP(x) = detQ(x) = 1and so P(x) and Q(x) are invertible over Q[x] and satisfy P(x)A(x)Q(x)−1 =S(A(x)) as in Theorem 4.16.

The polynomial analogue of Corollary 1.14 is:

Lemma 4.17

Let P(x) be an invertible s × s matrix over F [x] where F is a field. Then P(x)

is expressible as a product of elementary matrices over F [x]. The invertible matrixP(x) can be reduced to the s × s identity matrix I = S(P (x)) using only eros overF [x]. Also P(x) reduces to the s × s identity matrix I using only ecos over F [x].

The proof of Lemma 4.17 is left as an exercise for the reader (Exercises 4.2, Ques-tion 6(d)). As an example consider the following 3 × 3 matrix P(x) over Q[x]. It isnot ‘obvious’ at the outset that P(x) is invertible over Q[x], but this will become clearon reducing P(x) to its Smith normal form (which turns out to be I ).

P(x) =⎛

⎝1 1 2x

0 1 x

x + 1 −x − 1 1

⎞

⎠

≡c2−c1

c3−2xc1

⎛

⎝1 0 00 1 x

x + 1 −2x − 2 −2x2 − 2x + 1

⎞

⎠

≡r3−(x+1)r1

⎛

⎝1 0 00 1 x

0 −2x − 2 −2x2 − 2x + 1

⎞

⎠ ≡c3−xc2

r3+(2x+2)r2

⎛

⎝1 0 00 1 00 0 1

⎞

⎠ = I.

Then P2(x)P1(x)P (x)Q1(x)Q2(x)Q3(x) = I where Pi(x) and Qj(x) are the ele-mentary matrices corresponding to the ith ero and j th eco used in the above reduction.So

P(x) = (P2(x)P1(x))−1I (Q1(x)Q2(x)Q3(x))−1

= P1(x)−1P2(x)−1Q3(x)−1Q2(x)−1Q1(x)−1


showing that P(x) is the product of six elementary matrices over Q[x], sincePi(x)−1 and Qj(x)−1 are elementary matrices over Q[x]. The equation(P2(x)P1(x))(P (x)Q1(x)Q2(x)Q3(x)) = I , bracketed as indicated, shows thatP2(x)P1(x) and P(x)Q1(x)Q2(x)Q3(x) are inverse matrices. By Lemma 2.18 weobtain P(x)Q1(x)Q2(x)Q3(x)P2(x)P1(x) = I and hence the sequence: c2 − c1,c3 − 2xc1, c3 − xc2, c2 + (2x + 2)c3, c1 − (x + 1)c3, consisting of ecos only, reducesP(x) to I by Lemma 4.10; here the two eros in the above reduction are replacedby their paired ecos. In the same way P2(x)P1(x)P (x) and Q1(x)Q2(x)Q3(x) areinverses of each other. So Q1(x)Q2(x)Q3(x)P2(x)P1(x)P (x) = I showing that thesequence of eros: r3 − (x + 1)r1, r3 + (2x + 2)r2, r2 − xr3, r1 − 2xr3, r1 − r2 reducesP(x) to I ; here the three ecos in the original reduction are replaced by their pairederos.

Let A(x) and B(x) be matrices over F [x] where F is a field. CombiningLemma 4.10, Definition 4.11 and Lemma 4.17 we see

A(x) ≡ B(x) ⇔ A(x) can be obtained from B(x) by a sequence of elementaryoperations over F [x].

We next deal with the uniqueness of the Smith normal form in this context. The hardwork has been done in Section 1.3, and so it is a matter of convincing the reader thatno extra problems arise on replacing matrices over Z by matrices over F [x].

Let A(x) be an s × t matrix over F [x] where F is a field and let l be an integerwith 1 ≤ l ≤ min{s, t}. Suppose l rows and l columns of A(x) are selected. Recall thatthe determinant of the l × l matrix which remains on deleting the unselected s − l rowsand t − l columns is called an l-minor of A(x). The 1-minors of A(x) are simply thest entries in A(x). The number of l-minors of A(x) is

(sl

)(tl

)and each is a polynomial

over F . As in Section 1.3 we introduce

gl(A(x)) = gcd{all l-minors of A(x)},that is, gl(A(x)) is the greatest common divisor of the set of l-minors of A(x). No-tice that, being a gcd, the polynomial gl(A(x)) is either zero or monic (gl(A(x)) = 0⇔ all the l-minors of A(x) are zero). It follows from Corollary 4.19 that gl(A(x))

remains unchanged on applying eros and ecos to A(x); so these polynomials are ofgreat importance.

As an illustration consider

A(x) =⎛

⎝1 1 11 x x

1 x x2

⎞

⎠

over Q[x]. Here g1(A(x)) = 1 and the 2-minors are: x − 1, x − 1, 0, x − 1,x2 − 1, x2 − x, 0, x2 − x, x3 − x2. Hence g2(A(x)) = x − 1. The reader can ver-ify g3(A(x)) = detA(x) = (x − 1)2x. It’s straightforward to find the matrix D(x) =


diag(d1(x), d2(x), d3(x)) over Q[x] which is in Smith normal form and satisfiesgl(D(x)) = gl(A(x)) for l = 1,2,3. In fact from the l-minors of D(x) we de-duce g1(D(x)) = d1(x), g2(D(x)) = d1(x)d2(x), g3(D(x)) = d1(x)d2(x)d3(x) andso D(x) = diag(1, x − 1, x(x − 1)). Anticipating Corollary 4.19 we conclude D(x) =S(A(x)) as D(x) is the only matrix which is in Smith normal form and equivalentto A(x).

The polynomial analogue of Corollary 1.19 is:

Corollary 4.18

Let B(x) be an s × r matrix over F [x] and let C(x) be an r × t matrix over F [x]where F is a field. Then gl(B(x)) and gl(C(x)) are divisors of gl(B(x)C(x)) for1 ≤ l ≤ min{r, s, t}.

Notice that Corollary 4.18 is a direct consequence of the Cauchy–Binet theoremover F [x] (see the discussion after Theorem 1.18). The next corollary is the polyno-mial analogue of Corollary 1.20 and it is just what we’re looking for!

Corollary 4.19

Let A(x) and B(x) be s × t matrices over F [x] where F is a field. Then

A(x) ≡ B(x) ⇔ gl(A(x)) = gl(B(x)) for 1 ≤ l ≤ min{s, t}.

Further A(x) is equivalent to a unique matrix S(A(x)) over F [x] in Smith normalform.

The proof of Corollary 4.19 follows closely that of its integer analogue Corol-lary 1.20 (see Exercises 4.2, Question 6(e)).

Definition 4.20

Let A(x) be an s × t matrix over F [x] where F is a field. The matrix S(A(x)) =diag(d1(x), d2(x), . . . , dmin{s,t}(x)) is called the Smith normal form of A(x). Thepolynomials dl(x) for 1 ≤ l ≤ min{s, t} are called the invariant factors of A(x).

We will see in Chapter 6 that all the invariant factors of matrices of the type xI −A

over F [x], where A is a t × t matrix over the field F , are non-zero. Our last theoremin this section is the polynomial analogue of Theorem 1.21.


Theorem 4.21

Let A(x) be an r × s matrix over F [x] and let B(x) be an s × t matrix over F [x]where F is a field and r ≤ s ≤ t . Suppose all the invariant factors of A(x), B(x) andA(x)B(x) are non-zero. Then the kth invariant factors of A(x) and B(x) are divisorsof the kth invariant factor of A(x)B(x) for 1 ≤ k ≤ r .

As usual the reader should verify that ‘nothing goes wrong’ with the proof ofTheorem 1.21 on replacing Z by F [x] (see Exercises 4.2, Question 6(f)).

The theory of similarity of t × t matrices A over a field F , which is the main topicof Chapters 5 and 6, depends on the reduction of the t × t matrix xI − A over F [x] toits Smith normal form S(xI − A). For scalar matrices, that is, matrices A = λI whereλ ∈ F , no reduction is needed as

xI − A = (x − λ)I = diag(x − λ,x − λ, . . . , x − λ) = S(xI − A).

For non-scalar matrices A, the entries in xI − A have gcd 1 which is the (1,1)-entryin S(xI − A). The worked example in ‘A bird’s-eye view of similarity’ illustrates thispoint.

EXERCISES 4.2

1. (a) List the eight elementary 2 × 2 matrices over Z2[x] having entries ofdegree at most 1 (this list should include the identity matrix) and thecorresponding eros. Find a formula in terms of the prime p for thenumber of elementary 2 × 2 matrices over Zp[x] having entries ofdegree at most 1.

(b) Write down the eros over Q[x] which produce the elementary matri-ces

(0 11 0

),

(1 01 1

),

(1 x2

0 1

),

(3 00 1

),

(1 00 1/3

),

(1 −x2

0 1

)

over Q[x]. Write down the six ecos over Q[x] which are paired tothese eros. Write down the six ecos over Q[x] which are conjugate tothese eros.

(c) Let F be a field. The ero ri ↔ rj and the eco ci ↔ cj over F [x] arepaired and conjugate. Which other paired (non-identity) eros and ecosover F [x] are also conjugate?


Hint: Treat the cases χ(F ) = 2 and χ(F ) �= 2 separately (see Exer-cises 2.3, Question 5(a)).

2. (a) Let F be a field. Use Definition 4.11 to show that ≡ is an equivalencerelation on the set Ms×t (F (x)) of all s × t matrices over F [x]. Whichelements of Ms×t (F (x)) belong to the equivalence class of the s × t

zero matrix 0 over F [x]? Which elements of Mt (F (x)) belong to theequivalence class of the t × t identity matrix I over F [x]?

(b) Let F be a field and let t ≥ 3. List the sequences of invariant fac-tors (d1(x), d2(x), d3(x)) of 3× t matrices over F [x] where d3(x)|x3.How many equivalence classes of such matrices are there?

(c) Let A(x) be an s × t matrix over F [x] where F is a field withχ(F ) �= 2. Let A(−x) be the matrix obtained from A(x) by replac-ing x by −x throughout. Write S(A(x)) = diag(d1(x), d2(x), . . . ,

dmin{s,t}(x)). Show that P(x) is an elementary matrix over F [x] ifand only if P(−x) is an elementary matrix over F [x] and deduce fromLemma 4.17 that A(−x) ≡ diag(d1(−x), d2(−x), . . . , dmin{s,t}(−x)).Hence find the Smith normal form S(A(−x)) of A(−x). State a nec-essary and sufficient condition on each di(x) for A(x) and A(−x) tobe equivalent.Use Corollary 4.19 to find the Smith normal form of

A(x) =(

x2 + 1 x2 + x + 1x2 x2 + x

).

Is A(x) ≡ A(−x)? Find S(B(x)) where

B(x) =(

x2 − x − 1 x2

x2 − x x2 + 1

).

Is B(x) ≡ B(−x)?3. (a) Using Lemma 4.15 reduce

A(x) =(

x2(x + 1) 00 (x + 1)2x

)

over F [x] to Smith normal form where F is an arbitrary field. Find2 × 2 matrices P(x) and Q(x), invertible over F [x], satisfyingP(x)A(x) = S(A(x))Q(x). What are the invariant factors of A(x)2?

(b) Reduce⎛

⎝0 x2 00 0 1x 0 0

⎞

⎠

to its Smith normal form using four elementary operations.


(c) Use Lemmas 4.14 and 4.15 to find invertible 3 × 3 matrices P(x) andQ(x) over Q[x] satisfying P(x)A(x)Q(x)−1 = S(A(x)) where

A(x) =⎛

⎝x2 x x

x3 − 2x2 x2 − x x2 − 2x

2x3 + x − 1 x2 2x2

⎞

⎠ .

(d) Let A = (aij ) be an invertible t × t matrix over the field F and letA(x) = (aij x

j ). Find S(A(x)).Hint: Express A(x) as a product.

(e) Let F be a field and let a ∈ F ∗. Reduce(

a 00 a−1

)to

(1 00 1

)using ecos

over F of type (iii).Hint: Start with c2 + (a−1 − 1)c1, c1 + c2.Hence show that every 1 × 2 matrix A(x) over F [x] can be reducedto its Smith normal form S(A(x)) by a sequence of ecos over F [x]of type (iii). Deduce A(x) = S(A(x))Q(x) where Q(x) is over F [x]and detQ(x) = 1.Hint: Use Exercises 1.1, Question 5.

4. (Polynomial version of Exercises 1.2, Question 6(b).) Let F be a field.A sequence a0(x), a1(x), a2(x), . . . of monic polynomials over F is de-fined by:

a0(x) = 1, a1(x) = x and

an(x) = an−1(x)(x2an−2(x) + 1) for n ≥ 2.

Write Kn = 〈x2an(x)〉. Show gcd{an(x), xan−1(x)} = an−1(x) for n ≥ 2.Show an(x)/an−r+1(x) ≡ 1 (mod Kn−r ) where 1 ≤ r ≤ n.Using the polynomial analogue of the proof of Lemma 1.9, that is, ap-plying Lemma 4.13 alternately to row 1 and the transpose of col 1,find a sequence of eight elementary operations over F [x] which reduces(

a4(x) xa3(x)0 −1

)to diag(x,−a4(x)/x).

Hint: Start with c1 − xa2(x)c2.Let

An(x) =(

an(x) xan−1(x)

0 −1

)

where n ≥ 3. Using the polynomial analogue of the proof of Lemma 1.9,show that this method requires 2 + 2 + 3(n− 3)+ 1 elementary operationsover F [x] to reduce An(x) to diag(x,−an(x)/x).Hint: Use the polynomial analogue of the matrices Br in Exercises 1.2,Question 6(b). You’ll need the eco c2 − xan−r (x)qn−r+1(x)c1 for odd in-tegers r ≥ 3 and the ero r2 −xan−r (x)qn−r+1(x)r1 for even integers r ≥ 4where an(x)/an−r+2(x) = 1 + qn−r+1(x)an−r+1(x), 3 ≤ r < n.


List a sequence of five elementary operations over F [x] which reducediag(x,−an(x)/x) to S(An(x)) = diag(1, an(x)).Find a sequence of four elementary operations over F [x] which reducesAn(x) to S(An(x)).

5. (Polynomial version of Exercises 1.2, Question 7.) Let F be a field and lets, t be positive integers. Write G = GLs(F [x]) × GLt (F [x]), the externaldirect product group (Exercises 2.3, Question 4(d)).(a) Let D(x) be an s × t matrix over F [x]. Verify that Z(D(x)) =

{(P (x),Q(x)) ∈ G : P(x)D(x) = D(x)Q(x)} is a subgroup of G.(b) Suppose that the s × t matrix D(x) = diag(d1(x), d2(x), . . . , ds(x))

over F [x] is in Smith normal form where s ≤ t and ds(x) �= 0. Let(P (x),Q(x)) ∈ G and write P(x) = (pij (x)), Q(x) = (qkl(x)) for1 ≤ i, j ≤ s, 1 ≤ k, l ≤ t . Show

(P (x),Q(x)) ∈ Z(D(x)) ⇔ pij (x)dj (x) = di(x)qij (x)

for 1 ≤ i, j ≤ s, where qil(x) = 0 for 1 ≤ i ≤ s < l ≤ t.

(c) You and your classmate independently reduce the s × t matrix A(x)

to D(x). You get P ′(x)A(x) = D(x)Q′(x) and your classmate getsP ′′(x)A(x) = D(x)Q′′(x). Verify the coset equalityZ(D(x))(P ′(x),Q′(x))) = Z(D(x))(P ′′(x),Q′′(x)).

(d) Use the two reductions of

A2(x) =(

x(x2 + 1) x2

0 −1

)

suggested in Question 4 above to find an element of Z(diag(1, a2(x))

of the form (P ′(x),Q′(x))(P ′′(x),Q′′(x))−1.(e) Modify part (b) to cover the case ds(x) = 0.

Hint: Consider the largest non-negative integer r with dr(x) �= 0.6. (a) Considering each type of ero and eco over F [x] in turn, write out

a proof of Lemma 4.10 based on Theorem 1.11 and Exercises 1.2,Questions 1(c) and 1(d).

(b) Write out a proof of Lemma 4.14 analogous to that of Lemma 1.9using Lemma 4.13(i) in place of Lemma 1.7.

(c) Write out a proof of Theorem 4.16 based on the proof of Theo-rem 1.11 using Lemmas 4.10, 4.14 and 4.15 in place of Lemmas 1.4,1.9 and 1.10.

(d) Write out a proof of Lemma 4.17 analogous to that of Corollary 1.14.(e) Write out a proof of Corollary 4.19 analogous to the proof of Corol-

lary 1.20 using Corollary 4.18 in place of Corollary 1.19.(f) Write out a proof of Theorem 4.21 based on the proof of Theo-

rem 1.21.

5F [x]-Modules: Similarity of t × t Matrices

over a Field F

We begin with a review of the concept of similarity in the context of matrix theory.For the reader familiar with diagonalisation there is little new here. However, as we’llsee in Chapter 6, diagonalisation is something of a ‘red herring’ as far as the generaltheory of similarity is concerned. The matrix xI − A plays a major role but there’s noneed to find the zeros of the polynomial |xI − A|, that is, the eigenvalues of the t × t

matrix A, to get started – instead the calculation of the Smith normal form of xI −A isthe initial objective, but this is deferred to the next chapter as there is preliminary workto be done. This preliminary work is analogous to the discussion of Z-modules car-ried out in Chapter 2. Each t × t matrix A over a field F gives rise to an F [x]-moduleM(A) and such modules behave in an analogous way to finite abelian groups; in par-ticular they decompose into a direct sum of cyclic submodules. This theory comes toa climax in Section 6.1 where problems concerning similarity are miraculously solvedusing M(A). Meanwhile here the necessary building blocks are introduced: the or-der of a module element, the direct sum of matrices and the important concept of thecompanion matrix of a monic polynomial with positive degree.

5.1 The F [x]-Module M(A)

Let V denote a finite-dimensional vector space over a field F and suppose given alinear mapping α : V → V . Write t = dimV and let B denote a basis v1, v2, . . . , vt

of V . The reader will know that the action of α on V can be expressed by a matrix.


203

http://dx.doi.org/10.1007/978-1-4471-2730-7_5

204 5. F [x]-Modules: Similarity of t × t Matrices over a Field F

Definition 5.1

The t × t matrix A = (aij ) over F , where (vi)α = ai1v1 + ai2v2 + · · · + ait vt for1 ≤ i ≤ t , is called the matrix of α relative to the basis B.

Our first task is to discover how the matrix of α changes on changing the basis B.

Lemma 5.2

Let α : V → V be a linear mapping of the t-dimensional vector space V over thefield F . Let B and B′ be bases of V . Denote by A and A′ the t × t matrices of α

relative to B and B′ respectively. There is then an invertible t × t matrix X over F

such that XAX−1 = A′. Suppose V = F t and B = B0, the standard basis of F t . Inthis case the rows of X are, in order, the vectors of B′.

Proof

Let v1, v2, . . . , vt denote the vectors in B and let v′1, v

′2, . . . , v

′t denote the vectors in B′.

Each v′i is expressible as a linear combination of the vectors in B and so there is a t × t

matrix X = (xij ) over F such that v′i = xi1v1 + xi2v2 + · · · + xit vt for 1 ≤ i ≤ t . In

the same way each vj is expressible as a linear combination of the vectors in B′: sothere is a t × t matrix Y = (yjk) over F such that vj = yj1v

′1 + yj2v

′2 + · · · + yjtv

′t

for 1 ≤ j ≤ t . As in the proof of Theorem 2.20 the matrices X and Y are related bythe equations XY = I and YX = I showing that X is invertible over F and X−1 = Y .

It is possible to express (v′i )α in terms of B in two ways: first using the linearity

of α followed by the matrix A = (ajk) gives

(v′i )α =

(t∑

j=1

xij vj

)

α =t∑

j=1

xij (vj )α =t∑

j=1

xij

(t∑

k=1

ajkvk

)

=t∑

k=1

(t∑

j=1

xij ajk

)

vk.

Secondly using the matrices A′ = (a′ij ) and X (in that order) we obtain

(v′i )α =

t∑

i=1

a′ij v

′j =

t∑

i=1

a′ij

(t∑

j=1

xjkvk

)

=t∑

i=1

(t∑

j=1

a′ij xjk

)

vk.

Equating the coefficients of vk in the above equations gives

t∑

j=1

xij ajk =t∑

j=1

a′ij xjk

and so XA = A′X as the (i, k)-entries agree for 1 ≤ i, k ≤ t . Postmultiplying by X−1

produces XAX−1 = A′.

5.1 The F [x]-Module M(A) 205

Suppose V = F t and vj = ej for 1 ≤ j ≤ t , that is, the basis B is the standardbasis B0 of F t . The above equation expressing v′

i in B′ as a linear combination of thevectors in B = B0 is v′

i = xi1e1 + xi2e2 + · · · + xit et = eiX which is row i of X for1 ≤ i ≤ t . So the basis B′ consists of the rows of X in their given order. �

Definition 5.3

The t × t matrices A and A′ over the field F are called similar if there is an invertiblet × t matrix X over F such that XAX−1 = A′ in which case we write A ∼ A′.

From Lemma 5.2 we see that similar matrices arise by relating a given linear map-ping α : V → V to different bases of V . We will see in Theorem 5.13 that the conceptof similarity Definition 5.3 is analogous to that of isomorphism (in the context of finiteabelian groups). For the moment we note that similarity ∼ is an equivalence relationon the set Mt (F ) of all t × t matrices over F (Exercises 5.1, Question 1(d)).

Remember (see the discussion after Lemma 2.18) that the multiplicative group ofall invertible t × t matrices X over F is denoted by GLt (F ).

Definition 5.4

Let A ∈ Mt (F ). The subset {XAX−1 : X ∈ GLt (F )} of Mt (F ) is called the similarityclass of A.

So {A′ ∈ Mt (F ) : A′ ∼ A} is the similarity class of A and consists of all t × t ma-trices A′ over F which are similar to the given t × t matrix A over F . As X0X−1 = 0for all X ∈ GLt (F ), we see that the similarity class of the zero t × t matrix 0 overF consists of 0 alone. In the same way, as XIX−1 = I for all X ∈ GLt (F ), we have{XIX−1 : X ∈ GLt (F )} = {I }, that is, the similarity class of the t × t identity matrixI over F has only one element, namely I itself. In Chapter 6 we develop a method ofdetermining whether or not two matrices are similar.

The reader will know that multiplication by invertible matrices leaves the rank ofa matrix unchanged and so

similar matrices have equal rank.

For instance(

1 00 0

)and

(1 10 1

)are not similar as their ranks are 1 and 2 respectively.

Also(

1 10 1

)and

(1 00 1

) = I are not similar although they both have rank 2 since the firstmatrix does not belong to the similarity class of I .

The reader will be familiar with the characteristic polynomial

χA(x) = det(xI − A) of the t × t matrix A over F .


The zeros of χA(x), which is a monic polynomial of degree t over F , are the eigen-values of A. The coefficient of xt−1 in χA(x) is traceA where traceA = a11 + a22 +· · · + att is the sum of the diagonal entries in A. Also the constant term in χA(x) is(−1)t |A| (Exercises 5.1, Question 2(a)).

Lemma 5.5

Similar matrices have equal characteristic polynomials.

Proof

Let A and A′ be similar t × t matrices over F . By Definition 5.3 there is an invertiblet × t matrix X with XAX−1 = A′. This factorisation of A′ (as the product XAX−1)gives rise to the factorisation of the characteristic matrix xI − A′ of A′

xI − A′ = xXIX−1 − XAX−1 = X(xI)X−1 − XAX−1 = X(xI − A)X−1.

Using the multiplicative property Theorem 1.18 of determinants we obtain

χA′(x) = |xI − A′| = |X(xI − A)X−1| = |X||xI − A||X−1|= |xI − A||X||X−1| = |xI − A| = χA(x)

as |X||X−1| = |XX−1| = |I | = 1 and the scalar |X| commutes with the polynomial|xI − A|. So χA′(x) = χA(x). �

It follows from Lemma 5.5 that similar matrices have equal traces and determi-nants. Note that

(1 10 1

)and

(1 00 1

)are not similar but both have rank 2 and characteristic

polynomial

(x − 1)2 =∣∣∣∣x − 1 −1

0 x − 1

∣∣∣∣ =

∣∣∣∣x − 1 0

0 x − 1

∣∣∣∣ .

On the other hand(

a 00 b

)∼

(b 00 a

)

using X = (0 11 0

)for all a, b ∈ F .

So what exactly must two matrices have in common in order to be similar? Theanswer will have to wait until Definition 6.8. Suffice it to say we already know theanswer to the analogous question: what exactly must two finite abelian groups G and


G′ have in common in order to be isomorphic? By Theorem 3.7 we know G ∼= G′ ifand only if their invariant factor sequences are equal.

It is time to introduce the main concept in this section. Let V be a vector space overthe field F and let α : V → V be a linear mapping. So V is an F -module and α is anF -linear mapping in the terminology of Section 2.3. We prepare to show Lemma 5.7that V can be given the extra structure of an F [x]-module by using α.

Definition 5.6

Let V be a vector space over F and let α : V → V be a linear mapping. For eachpolynomial f (x) = anx

n + an−1xn−1 + · · · + a1x + a0 over F the linear mapping

f (α) : V → V given by

(v)f (α) = an((v)αn) + an−1((v)αn−1) + · · · + a1((v)α) + a0v for all v ∈ V

is called the evaluation of f (x) at α.

The linear mapping α can be composed with itself a finite number of times togive αα = α2, ααα = α3, . . . and these positive integer powers of α are also linearmappings αn : V → V . The reader can verify that f (α) is linear. Also

the evaluation at α mapping εα : F [x] → EndV

where (f (x))εα = f (α) for all f (x) ∈ F [x]is a ring homomorphism (Exercises 5.1, Question 2(e)).

Lemma 5.7

Let V be a vector space over F and let α : V → V be a linear mapping. Write f (x)v =(v)f (α) for all f (x) ∈ F [x] and v ∈ V . Then V , with vector addition and the aboveproduct (of polynomial and vector), is an F [x]-module M(α) called the F [x]-moduledetermined by α.

Proof

We verify the seven laws of an R-module M listed before Definition 2.19 with R =F [x] and M = M(α). The elements of M(α) are no more than the vectors of V . Alsoaddition in M(α) is no more than addition of vectors in V and so laws 1, 2, 3 and 4of an abelian group are obeyed. Consider polynomials f (x), f1(x), f2(x) over F andvectors v, v1, v2 in V . Notice f (x)v ∈ V , that is, each polynomial multiple of a vectorin V is again a vector in V . As f (α) is additive: f (x)(v1 + v2) = (v1 + v2)f (α) =


(v1)f (α)+ (v2)f (α) = f (x)v1 +f (x)v2. Now the linear mapping (f1(x)+f2(x))εα

is the evaluation of f1(x) + f2(x) at α. As εα is additive we obtain

(f1(x) + f2(x))v = (v)((f1(x) + f2(x))εα) = (v)((f1(x))εα + (f2(x))εα

= (v)(f1(α) + f2(α)) = (v)f1(α) + (v)f2(α) = f1(x)v + f2(x)v.

So module law 5 (the distributive law) is obeyed in M(α).As εα is multiplicative and F [x] is a commutative ring we see

(f1(x)f2(x))v = (f2(x)f1(x))v = (v)(f2(x)f1(x))εα = (v)((f2(x))εα(f1(x))εα)

= (v)(f2(α)f1(α)) = ((v)f2(α))f1(α)

= (f2(x)v)f1(α) = f1(x)(f2(x)v)

showing that module law 6 is obeyed in M(α).The 1-element of F [x] is the constant polynomial 1(x) and (1(x))εα = 1(α) is

the identity mapping of V as (v)1(α) = v for all v ∈ V by Definition 5.6. Therefore1(x)v = (v)1(α) = v for all v ∈ V showing that module law 7 is obeyed in M(α).Therefore M(α) is an F [x]-module. �

The module M(α) is an abstract version of the module M(A) which we definenext.

Definition 5.8

Let A be a t × t matrix over the field F and let α : F t → F t denote the linear mappingdetermined by A, that is, (v)α = vA for all v ∈ F t . We write M(α) = M(A) which iscalled the F [x]-module determined by A. Also write

f (A) = anAn + an−1A

n−1 + · · · + a1A + a0I ∈Mt (F )

where f (x) = anxn + an−1x

n−1 + · · · + a1x + a0 ∈ F [x].Denote by εA : F [x] → Mt (F ) the ring homomorphism given by (f (x))εA = f (A)

for all f (x) ∈ F [x]. We call εA the evaluation at A homomorphism.

So M(A) is a concrete version of M(α). Notice that (v)f (α) = (v)f (A) for allv ∈ F t where α is the linear mapping determined by A, and the matrix of α relative tothe standard basis B0 of F t is again A. In M(A) the product of the polynomial f (x)

and vector (module element) v is

f (x)v = vf (A).


In effect the left of the above equation is defined to be the t-tuple (element of F t ) onthe right. In particular xv = vA showing that in M(A) multiplication by x on the leftmeans multiplication by A on the right. It is high time for some examples!

Example 5.9a

Suppose F = Q, A = (0 11 0

)and so t = 2. The elements of M(A) are ordered pairs

(q1, q2) of rational numbers, that is, M(A) = Q2 as sets, and we shall continue to refer

to these elements as vectors. There’s nothing mysterious about the sum of vectors norabout the scalar multiple of a vector – both are exactly what you expect. The noveltyis x (q1, q2) which is also a vector. Which vector is it? Using the above rule and thegiven A we obtain

x(q1, q2) = (q1, q2)A = (q1, q2)

(0 11 0

)= (q2, q1)

showing that multiplication by x in this Q[x]-module M(A) amounts to interchang-ing q1 and q2. Thus xe1 = x(1,0) = (0,1) = e2, xe2 = x(0,1) = (1,0) = e1 andx(5,1/2) = (1/2,5) etc. Also (x + 1)e1 = xe1 + e1 = (0,1) + (1,0) = (1,1) and(x − 1)e1 = xe1 − e1 = (0,1) − (1,0) = (−1,1). Notice that every element (q1, q2)

of M(A) can be expressed as a polynomial multiple of e1: specifically (q1, q2) =q1e1 + q2e2 = q1e1 + q2xe1 = (q1 + q2x)e1. So M(A) is described as being cyclicwith generator e1 (see Definition 5.21 for the general case). In other words, e1 gen-erates M(A) = {f (x)e1 : f (x) ∈ Q[x]}. It is significant that there is no need to usequadratic or polynomials of higher degree: in fact

x2(q1, q2) = x(x(q1, q2)) = x(q2, q1) = (q1, q2)

showing that multiplying elements of M(A) by x2 has no effect. Put in a differentway, which is more dramatic and significant, multiplication by x2 − 1 annihilates allelements of M(A), that is,

(x2 − 1)(q1, q2) = x2(q1, q2) − (q1, q2) = (q1, q2) − (q1, q2) = 0.

We will see that A is the companion matrix Definition 5.25 of x2 −1. Although Q2 has

an infinite number of subspaces there are by Theorem 5.28 just four of them whichare submodules Definition 5.14 of M(A), that is, subspaces which are closed underpolynomial multiplication. From Lemma 2.2 and Theorem 2.5 all subgroups of a finitecyclic group G are themselves cyclic and each such subgroup corresponds to a positivedivisor of the order |G| of a generator of G. Now the order of e1 in M(A) is x2 − 1,that is, the monic polynomial d(x) of smallest degree over Q satisfying d(x)e1 = 0is d(x) = x2 − 1 (see Definition 5.11 for the details). As e1 generates M(A) we see


that x2 − 1 = χA(x) is the module analogue of |G|. So it is reasonable to expect thefour submodules of M(A) to be themselves cyclic and to correspond to the four monicdivisors 1, x + 1, x − 1, x2 − 1 of x2 − 1. In fact this is true (see Theorem 5.28 for thegeneral case and its proof) and so

M(A) = 〈e1, e2〉, N1 = 〈e1 + e2〉, N2 = 〈e2 − e1〉, 〈0〉

are the four submodules of M(A) having generators e1, (x + 1)e1, (x − 1)e1,(x2 − 1)e1; these submodules are subspaces of Q

2 and have dimensions 2, 1,1, 0 respectively. The submodules N1 and N2 are the row eigenspaces of A ase1 + e2 and e2 − e1 are row eigenvectors of A with eigenvalues 1 and −1. FurtherM(A) = N1 ⊕ N2 (internal direct sum) and

XAX−1 = diag(1,−1) where X =(

e1 + e2

e2 − e1

)=

(1 1

−1 1

)

showing that A is similar to a diagonal matrix, the rows of the invertible matrix X

forming a basis of Q2 consisting of eigenvectors of A. Finally note e1 + e2 and e2 − e1

have orders x − 1 and x + 1 respectively.More generally v is a row eigenvector of the t × t matrix A over F associated with

the eigenvalue λ ∈ F if and only if the order Definition 5.11 of v in M(A) is x − λ.

Example 5.9b

Take F = Q, A = (0 1−1 0

)and so t = 2. In the Q[x]-module M(A) multiplication

by x is x(q1, q2) = (q1, q2)A = (q1, q2)(

0 1−1 0

) = (−q2, q1) for all (q1, q2) ∈ Q2. In

particular xe1 = e2, xe2 = −e1. As in Example 5.9a above, M(A) is cyclic with gen-erator e1. Also

x2(q1, q2) = x(x(q1, q2)) = x(−q2, q1) = (−q2, q1)A

= (−q2, q1)

(0 1

−1 0

)= (−q1,−q2)

that is, x2v = −v for all v = (q1, q2) ∈ Q2. Therefore (x2 + 1)v = 0 in this mod-

ule, showing that x2 + 1 = χA(x) annihilates every vector v of M(A). Now x2 + 1is irreducible over Q and so its only monic factors over Q are 1 and x2 + 1. Antici-pating Theorem 5.28 we see that M(A) has only two submodules, namely M(A) and{(0,0)} corresponding to these two divisors. In fact every non-zero vector in M(A) is agenerator and x2 + 1 is its order Definition 5.11. Also A is the companion matrix Def-inition 5.25 of x2 +1 over Q. Write Q′ = Q[x]/〈x2 +1〉 which is an extension field ofQ (see Theorem 4.9); the elements of Q′ are q + q ′ι where the coset ι = 〈x2 + 1〉 + x

satisfies ι2 = −1 and q, q ′ ∈Q. So Q′ is the field of complex numbers having rational


real parts and rational imaginary parts. As (x2 + 1)v = 0 for all v ∈ M(A) it is possi-ble to turn Q

2 into a Q′-module M ′ by writing (q +q ′ι)v = (q +q ′x)v (Exercises 5.1,Question 3(e)). But a Q′-module is no more than a vector space over Q′. What is moreM ′ is a 1-dimensional vector space over Q′ with basis e1 as

(q1, q2) = q1e1 + q2e2 = q1e1 + q2xe1 = (q1 + xq2)e1 = (q1 + ιq2)e1.

Extending the ground field F (from Q to Q′ in this case) is, where appropriate, very

helpful in the analysis of A. Notice that doubling the field (Q′ has degree 2 over Q)causes the dimension to halve: M(A) has dimension 2 as a vector space over Q andM ′ has dimension 1 as a vector space over Q′.

Example 5.9c

Suppose F = Q, A = (3 00 3

)and so t = 2. Here xv = vA = 3v for all v ∈ M(A), that

is, multiplication by x in the Q[x]-module M(A) is multiplication by the scalar 3.More generally f (x)v = f (3)v ∈ 〈v〉 for all v ∈ M(A) and so M(A) is not cyclic:there is no vector v0 ∈ M(A) with M(A) = {f (x)v0 : f (x) ∈ F [x]} as M(A) �= 〈v0〉.Each subspace N of Q2 is closed under polynomial multiplication, that is, f (x)v =f (3)v ∈ N for all f (x) ∈ F [x] and all v ∈ N . Therefore N is a submodule of M(A)

by Definition 5.14, showing that, in this case, subspaces and submodules coincide.More generally in the case of A being a t × t scalar matrix, that is, A = cI for

c ∈ F , subspaces of F t and submodules of M(A) coincide. In this case there is ‘noth-ing to do’ since the matrix A is already in canonical form and so it is not surprisingthat the F [x]-module M(A) is virtually identical to the F -module F t . Notice thatχA(x) = (x − c)t and every non-zero vector in M(A) has order x − c.

We now describe in detail the new concepts arising in the above examples.

Lemma 5.10

Let A be a t × t matrix over the field F and let v ∈ M(A). There is a unique monicpolynomial d(x) of degree at most t over F such that d(x)v = 0 and d(x)|f (x) for allf (x) ∈ F [x] with f (x)v = 0 in M(A).

Proof

The t + 1 vectors v, vA, vA2, . . . , vAt belong to the t-dimensional vector space F t

over F and so are linearly dependent: there are scalars b0, b1, b2, . . . , bt ∈ F , not allzero, satisfying b0v + b1vA + b2vA2 + · · · + btvAt = 0. In the module M(A) thisequation becomes b0v + b1xv + b2x

2v + · · · + btxtv = 0, that is, g(x)v = 0 where

g(x) = b0 +b1x +b2x2 +· · ·+btx

t ∈ F [x]. Also g(x) is non-zero with degg(x) ≤ t .


Write K = {f (x) ∈ F [x] : f (x)v = 0}. The notation suggests that K is an ideal ofF [x] and this is true (Exercises 5.1, Question 3(d)). The above paragraph shows that K

is non-zero. So K has a monic generator d(x) by Theorem 4.4. Therefore d(x)v = 0and d(x)|f (x) for all f (x) ∈ K . In particular d(x)|g(x) and so degd(x) ≤ t . So d(x)

satisfies the conditions of Lemma 5.10. Conversely suppose d(x) satisfies the condi-tions of Lemma 5.10. Then d(x) is a monic generator of the ideal K of F [x]. Therefored(x) is unique by Theorem 4.4. �

Definition 5.11

Let v be an element of an F [x]-module M . The ideal

K = {f (x) ∈ F [x] : f (x)v = 0} of F [x]is called the order ideal of v in M . So K = 〈d(x)〉 where d(x) is monic or zero byTheorem 4.4. The polynomial d(x) is called the order of v in M .

From Lemma 5.10 we see that K is non-zero for all v ∈ M(A) and so each elementof M(A) has a monic order. In other words, the order of v ∈ M(A) is the uniquemonic polynomial d(x) of least degree over F satisfying d(x)v = 0. This concept isanalogous to the order n of g ∈ G where G is an additive finite abelian group since n

is the smallest positive integer satisfying ng = 0.Notice that K = F [x] if and only if v = 0, that is, the zero vector is the only

vector of M having the constant polynomial 1(x) as its order since F [x] = 〈1(x)〉.On the other hand, taking M = F [x], all non-zero elements of the free F [x]-moduleF [x] have order 0(x) since F [x] is an integral domain. More generally each non-zeroelement of a free F [x]-module has order 0(x).

From the |G|-lemma we know |G|g = 0 for all elements g of the finite additiveabelian group G. It’ll take us some time to prove the analogous result for t × t matricesA over a field F , namely the Cayley–Hamilton theorem (Corollary 6.11)

χA(A) = 0

that is, A satisfies its own characteristic polynomial. This beautiful equation was infact first established for general A by the German mathematician Frobenius. As aconsequence χA(x)v = 0 for all v ∈ M(A), which tells us

the order of each v in M(A) is a monic divisor of χA(x).

As χA(x) has at most 2t monic divisors (Exercises 4.1, Question 1(e)) we see that thenumber of possible orders of elements in M(A) cannot exceed 2t .

As preparation for our next theorem consider two t × t matrices A and B overF with A ∼ B . Is it true that A2 ∼ B2? Is it true that A3 ∼ B3 etc.? What about


A2 + I ∼ B2 + I and A23 − A + I ∼ B23 − B + I etc.? We show next that all thesesimilarities are true.

Lemma 5.12

Let A and B be t × t matrices over F and suppose that X is an invertible t × t matrixover F with XAX−1 = B . Then Xf (A)X−1 = f (B) for all f (x) ∈ F [x].

Proof

Let’s start with the case f (x) = x2. Then

Xf (A)X−1 = XA2X = XAAX−1 = XAIAX−1 = XAX−1XAX−1

= BB = B2 = f (B)

where the factor I = X−1X has been inserted at a strategic point. On the one hand thisfactor makes no difference and on the other hand it ‘does the trick’. More generallysuppose f (x) = xi where i is a positive integer. Inserting i − 1 factors I = X−1X

gives

Xf (A)X−1 = XAiX−1 = XAA · · ·AX−1 = XAIAI · · · IAX−1

= XAX−1XAX−1 · · ·XAX−1 = (XAX−1)i = Bi = f (B).

From the above equations we see XAi = BiX which will help us in the general case.Suppose now f (x) = anx

n + an−1xn−1 + · · · + a1x + a0. Then

f (A) = anAn + an−1A

n−1 + · · · + a1A + a0I and

f (B) = anBn + an−1B

n−1 + · · · + a1B + a0I

by Definition 5.8. Now X(aiAi) = X(aiI )Ai = aiI (XAi) = aiI (BiX) = (aiB

i)X

for 1 ≤ i ≤ n and X(a0I ) = (a0I )X as the scalar matrix aiI commutes with all matri-ces in Mt (F ). Adding up these n + 1 equations and using the distributive laws in thematrix ring Mt (F ) leads to

Xf (A) = X(anAn) + X(an−1A

n−1) + · · · + X(a1A) + X(a0I )

= (anBn)X + (an−1B

n−1)X + · · · + (a1B)X + (a0I )X = f (B)X

and so Xf (A)X−1 = f (B). �

From Lemma 5.12 we see: A ∼ B implies f (A) ∼ f (B) for all f (x) ∈ F [x].Our next theorem establishes the close connection between similarity of matrices

and isomorphism of the corresponding modules.


Theorem 5.13

Let A and B be t × t matrices over the field F . Then A and B are similar if and onlyif M(A) and M(B) are isomorphic F [x]-modules.

Proof

Suppose A ∼ B . There is an invertible t × t matrix X with X−1AX = B; to facilitatethe notation we have replaced A, A′, X in Definition 5.3 by A, B , X−1 respectively.Let θ : F t → F t be the F -linear mapping determined by X, that is, (v)θ = vX forall v ∈ F t . We show θ : M(A) → M(B) to be F [x]-linear. As v is an element ofM(A) and (v)θ is an element of M(B) we have f (x)v = vf (A) and f (x)((v)θ) =((v)θ)f (B). Using Lemma 5.12 and inserting a factor I = XX−1 in the ‘right’ placegives

(f (x)v)θ = (vf (A))θ = vf (A)X = vIf (A)X = vXX−1f (A)X

= vX(X−1f (A)X) = vXf (X−1AX) = vXf (B)

= ((v)θ)f (B) = f (x)((v)θ).

So θ : M(A) → M(B) is F [x]-linear. In the same way ϕ : M(B) → M(A) is F [x]-linear, where ϕ : F t → F t is the F -linear mapping Definition 5.8 determined by X−1.As the matrices X and X−1 are inverses of each other the same is true of the F [x]-linear mappings θ and ϕ, that is, ϕ = θ−1. Therefore θ : M(A) ∼= M(B), that is, theF [x]-modules M(A) and M(B) are isomorphic.

Conversely suppose that the F [x]-modules M(A) and M(B) are isomorphic. Letθ : M(A) ∼= M(B) be an isomorphism. As θ is F [x]-linear, then θ : F t → F t is cer-tainly F -linear. Let X be the matrix of θ relative to the standard basis B0 of F t .Then θ is the F -linear mapping determined by X. As θ is invertible so also is X.Consider ei ∈ M(A) and so xei = eiA for 1 ≤ i ≤ t . Then (ei)θ ∈ M(B) and sox((ei)θ) = x(eiX) = eiXB . As θ is F [x]-linear we have (xei)θ = x((ei)θ), that is,eiAX = eiXB for 1 ≤ i ≤ t showing that corresponding rows of AX and XB areequal. Therefore AX = XB , that is, A and B are similar. �

The three matrices A in Examples 5.9a–5.9c have different determinants. No twoof them are similar by Lemma 5.5 and so no two of the Q[x]-modules M(A) in Ex-amples 5.9a–5.9c are isomorphic by Theorem 5.13.

The concepts of submodule (Definition 2.26) and quotient module (Lemma 2.27)were introduced in the context of R-modules where R is a non-trivial commutativering. We now discuss submodules N of the ‘parent’ F [x]-modules M(α) and M(A).Submodules play a crucial role in understanding how their parent modules are builtup. Just as finite abelian groups are direct sums of cyclic subgroups (Theorem 3.4) we


show in Theorem 6.5 that M(A) decomposes analogously into a direct sum of cyclicsubmodules.

Definition 5.14

Let N be a subset of an F [x]-module M such that N is a subgroup of the additivegroup of M and N is closed under polynomial multiplication: f (x)v ∈ N for all v ∈ N

and all f (x) ∈ F [x]. Then N is called a submodule of M .

The reader should check that Definition 2.26 becomes Definition 5.14 in the caseR = F [x]. We have already met examples of submodules of Q[x]-modules M(A) inExamples 5.9a–5.9c. The next lemma tells us that submodules in this context are aspecial type of subspace.

Lemma 5.15

Let F be a field. Let N be a subset of an F [x]-module M . Then N is a submodule ofM if and only if N is a subspace of M (regarded as vector space over F ) such thatxv ∈ N for all v ∈ N .

Let M and M ′ be F [x]-modules and let θ : M → M ′ be an F -linear mappingsatisfying (xv)θ = x((v)θ) for all v ∈ M . Then θ is F [x]-linear.

Proof

Note that M has the structure of a vector space over F on ignoring the products f (x)v

for all polynomials f (x) of degree at least 1 over F but retaining the products av forall constant polynomials a over F and all v ∈ M . In other words, on suppressing alloccurrences of the indeterminate x, the F [x]-module M drops its status to that of amere F -module.

Let N be a submodule of the F [x]-module M . Then N is a subgroup of the addi-tive group of M by Definition 5.14. Taking f (x) = a in Definition 5.14 gives av ∈ N

for all a ∈ F and all v ∈ N , showing that N is a subspace of the vector space M .Taking f (x) = x in Definition 5.14 gives xv ∈ N for all v ∈ N .

Conversely suppose that N is a subspace of the vector space M satisfying xv ∈ N

for all v ∈ N . Then N is a subgroup of the additive group of M by Definition 5.14 andav ∈ N for all a ∈ F and all v ∈ N . By induction xnv = x(xn−1v) ∈ N for all positiveintegers n and all v ∈ N . Consider a general polynomial f (x) = anx

n + an−1xn−1 +

· · · + a1x + a0 over F . As v ∈ N implies xnv, xn−1v, . . . , xv, v all belong to N , itfollows that all linear combinations of these n + 1 elements of N also belong to N ,that is, f (x)v = anx

nv + an−1xn−1v + · · · + a1xv + a0v ∈ N . We conclude that N is

a submodule of the F [x]-module M by Definition 5.14.


Both M and M ′ are vector spaces over F , as explained above. As θ satis-fies (xv)θ = x((v)θ) for all v ∈ M we deduce (x2v)θ = (x(xv))θ = x((xv)θ) =x(x((v)θ)) = x2((v)θ). More generally (xiv)θ = xi((v)θ) by induction on i. Withf (x) as above we obtain

(f (x)v)θ = ((anxn + · · · + a1x + a0)v)θ = (anx

nv + · · · + a1xv + a0v)θ

= an((xnv)θ) + · · · + a1((xv)θ) + a0((v)θ)

= anxn((v)θ) + · · · + a1x((v)θ) + a0((v)θ)

= (anxn + · · · + a1x + a0)((v)θ) = f (x)((v)θ)

using the F -linearity of θ . So θ is F [x]-linear. �

Let N be a subset of a vector space V over the field F and let α : V → V beF -linear. Then N is a submodule of the F [x]-module M(α) if and only if N is asubspace of V with xv ∈ N for all v ∈ N by Lemma 5.15. As xv = (v)α in M(α), theset inclusion (N)α ⊆ N shows that N is closed under multiplication by x. A subspaceN of V is called α-invariant if (N)α ⊆ N . Therefore

the submodules of M(α) are precisely the α-invariant subspaces of V .

Let A be a t × t matrix over the field F . A subspace N of F t is called A-invariant ifvA ∈ N for all v ∈ N . As above we obtain:

the submodules of M(A) are precisely the A-invariant subspaces of F t .

The 1-dimensional α-invariant subspaces of V are 〈v〉 where v is an eigenvector ofthe linear mapping α : V → V and the 1-dimensional A-invariant subspaces of F t are〈v〉 where v is a row eigenvector of the t × t matrix A over F .

Definition 5.16

Let α : V → V be a linear mapping of a vector space V over a field F . Let N bea submodule of the F [x]-module M(α). The mapping α|N : N → N , defined by(w)α|N = (w)α for all w ∈ N , is called the restriction of α to N .

Note α|N , as defined above, is a linear mapping of N regarded as a vector spaceover F ; in fact α|N is an F [x]-linear mapping of the module N . Also (w)α|N = xw

for all w ∈ N and N = M(α|N) by Lemma 5.7. We need this set-theoretic conceptand Definition 5.17 below to describe decompositions of M(α) into a direct sum ofsubmodules.


Definition 5.17

Let A1 be a t1 × t1 matrix over R and let A2 be a t2 × t2 matrix over R where R is acommutative ring. The partitioned (t1 + t2) × (t1 + t2) matrix

A1 ⊕ A2 =(

A1 0

0 A2

)

over R is called the direct sum of A1 and A2,

that is, the (i, j)-entries in A1 ⊕ A2 and A1 are equal for 1 ≤ i, j ≤ t1, the(t1 + i, t1 + j)-entry in A1 ⊕ A2 equals the (i, j)-entry in A2 for 1 ≤ i, j ≤ t2, allother entries in A1 ⊕ A2 being zero.

For example with R = Z

(1 23 4

)⊕

⎛

⎝5 6 78 9 10

11 12 13

⎞

⎠ =

⎛

⎜⎜⎜⎜⎜⎝

1 2 0 0 03 4 0 0 0

0 0 5 6 70 0 8 9 100 0 11 12 13

⎞

⎟⎟⎟⎟⎟⎠

.

Suppose that Aj is a tj × tj matrix over a commutative ring R for 1 ≤ j ≤ s

where s ≥ 3. The direct sum of matrices is associative, that is, (A1 ⊕ A2) ⊕ A3 =A1 ⊕ (A2 ⊕A3) and so this matrix is unambiguously denoted by A1 ⊕A2 ⊕A3. Moregenerally

A1 ⊕ A2 ⊕ · · · ⊕ As =

⎛

⎜⎜⎜⎜⎝

A1 0 . . . 0

0 A2 . . . 0...

.... . .

...

0 0 . . . As

⎞

⎟⎟⎟⎟⎠

is the partitioned t × t matrix over R having the given s matrices Aj on the diago-nal and rectangular zero matrices elsewhere where t = t1 + t2 + · · · + ts . Notice thatA1 ⊕ A2 ∼ A2 ⊕ A1 (see Exercises 5.1, Question 2(c)). The reader should knowdet(A1 ⊕ A2) = (detA1)(detA2) (see Exercises 5.1, Question 2(b)). As

xI − (A1 ⊕ A2 ⊕ · · · ⊕ As) = (xI − A1) ⊕ (xI − A2) ⊕ · · · ⊕ (xI − As)

on taking determinants we obtain

|xI − (A1 ⊕ A2 ⊕ · · · ⊕ As)| = |xI − A1||xI − A2| · · · |xI − As |

showing that the characteristic polynomial of a direct sum of matrices is the productof the characteristic polynomials of the individual matrices.


The direct sum of matrices has a further useful property: it respects matrix addi-tion and matrix multiplication. Specifically let Aj and Bj be tj × tj matrices over acommutative ring R for 1 ≤ j ≤ s. Then

(A1 ⊕ A2 ⊕ · · · ⊕ As) + (B1 ⊕ B2 ⊕ · · · ⊕ Bs)

=

⎛

⎜⎜⎜⎜⎝

A1 + B1 0 . . . 0

0 A2 + B2 . . . 0...

.... . .

...

0 0 . . . As + Bs

⎞

⎟⎟⎟⎟⎠

,

(A1 ⊕ A2 ⊕ · · · ⊕ As)(B1 ⊕ B2 ⊕ · · · ⊕ Bs)

=

⎛

⎜⎜⎜⎜⎝

A1B1 0 . . . 0

0 A2B2 . . . 0...

.... . .

...

0 0 . . . AsBs

⎞

⎟⎟⎟⎟⎠

,

that is,

(A1 ⊕ A2 ⊕ · · · ⊕ As) + (B1 ⊕ B2 ⊕ · · · ⊕ Bs)

= (A1 + B1) ⊕ (A2 + B2) ⊕ · · · ⊕ (As + Bs),

(A1 ⊕ A2 ⊕ · · · ⊕ As)(B1 ⊕ B2 ⊕ · · · ⊕ Bs)

= (A1B1) ⊕ (A2B2) ⊕ · · · ⊕ (AsBs).

Taking Aj = Bj and using induction the preceding equations combine to give

f (A1 ⊕ A2 ⊕ · · · ⊕ As)

= f (A1) ⊕ f (A2) ⊕ · · · ⊕ f (As) for all f (x) ∈ R[x](see Exercises 5.1, Question 2(d)).

For instance taking f (x) = x3 + 1 and s = 2 we obtain (A1 ⊕ A2)3 + I =

(A31 + I ) ⊕ (A3

2 + I ) and so if f (A1) = 0 and f (A2) = 0, then also f (A1 ⊕ A2) = 0.The reader can verify that

A1 =(

0 1−1 1

)and A2 =

⎛

⎝0 1 00 0 1

−1 0 0

⎞

⎠

are two such matrices. Taking R = R (the real field) we obtain (x3 + 1)v = 0 for allv in M(A1 ⊕ A2) and so the only possibilities for the order of v in the R[x]-moduleM(A1 ⊕ A2) are the monic divisors of x3 + 1, that is, 1, x + 1, x2 − x + 1, x3 + 1.


How do direct sums of matrices arise? The answer is: from decompositions ofM(A) into a direct sum of submodules, as we explain in Lemma 5.19. However wemust first deal with an important method of constructing bases. Let V be a vectorspace and suppose that B1 and B2 are bases of the finite-dimensional subspaces U1

and U2 of V . The ordered set B1 ∪ B2 consists of the vectors in B1 followed by thevectors in B2. The reader will be aware that, although the vectors in B1 ∪ B2 spanU1 +U2, the totality of these vectors may not be linearly independent. In fact B1 ∪B2

is a basis of U1 + U2 if and only if U1 ∩ U2 = 0, that is, if and only if U1 and U2 areindependent (Definition 2.14) as additive subgroups of the additive group of V . Ournext lemma deals with the general case.

Lemma 5.18

Let V be a vector space and let Bj be a basis of the finite-dimensional subspace Uj ofV for 1 ≤ j ≤ s. Then B = B1 ∪ B2 ∪ · · · ∪ Bs is a basis of U = U1 + U2 + · · · + Us

if and only if U1,U2, . . . ,Us are independent.

Proof

Suppose that B is a basis of U . Let v1, v2, . . . , vt be the vectors of B. Write t ′0 = 0 andt ′j = t1 + t2 +· · ·+ tj where tj = dimUj for 1 ≤ j ≤ s. The basis B of U is built up asfollows: first come the t1 vectors of B1 in order, next come the t2 vectors of B2 in order,. . . , and lastly come the ts vectors of Bs in order. So t ′s = t1 + t2 +· · ·+ ts = dimU andBj consists of the vectors vi for t ′j−1 < i ≤ t ′j where 1 ≤ j ≤ s. To test U1,U2, . . . ,Us

for independence suppose u1 +u2 + · · ·+us = 0 where uj ∈ Uj for 1 ≤ j ≤ s. As uj

is a linear combination of the vectors in Bj there are scalars ai with uj = ∑vi∈Bj

aivi

for 1 ≤ j ≤ s. Adding these s equations gives

∑

vi∈Baivi =

s∑

j=1

( ∑

vi∈Bj

aivi

)=

s∑

j=1

uj = 0.

By the linear independence of v1, v2, . . . , vt we deduce a1 = a2 = · · · = at = 0. Souj = 0 for 1 ≤ j ≤ s proving the independence Definition 2.14 of U1,U2, . . . ,Us .

Conversely suppose the subspaces U1,U2, . . . ,Us of V to be independent. Con-sider u ∈ U . There are uj ∈ Uj with u = u1 +u2 +· · ·+us as U = U1 +U2 +· · ·+Us .Now Bj is a basis of Uj and so there are scalars bi with uj = ∑

vi∈Bjbivi for

1 ≤ j ≤ s. Adding these s equations gives

u =s∑

j=1

uj =s∑

j=1

( ∑

vi∈Bj

bivi

)=

t∑

i=1

bivi


showing that v1, v2, . . . , vt span U . To show that v1, v2, . . . , vt are linearly inde-pendent, suppose there are scalars a1, a2, . . . , at with

∑ti=1 aivi = 0. Write uj =∑

vi∈Bjaivi and then the preceding equation becomes

∑sj=1 uj = 0. As uj ∈ Uj we

deduce u1 = u2 = · · · = us = 0 from the independence of U1,U2, . . . ,Us . The vec-tors in Bj are linearly independent and so uj = 0 implies ai = 0 for t ′j−1 < i ≤ t ′j and1 ≤ j ≤ s. Therefore a1 = a2 = · · · = at = 0 proving that v1, v2, . . . , vt are linearlyindependent. So B is a basis of U . �

Let M be an F [x]-module which, as a vector space over the field F , has finite di-mension t . Suppose M to be the internal direct sum of its submodules N1,N2, . . . ,Ns ,that is, M = N1 ⊕ N2 ⊕ · · · ⊕ Ns meaning that each v ∈ M can be uniquely expressedas v = w1 + w2 + · · · + ws where wj ∈ Nj for 1 ≤ j ≤ s. The submodule Nj , beinga subspace of the vector space M , is also finite-dimensional. Write tj = dimNj andlet Bj denote a basis of Nj for 1 ≤ j ≤ s. Applying Lemma 5.18 with U = V = M

and Uj = Nj for 1 ≤ j ≤ s we see that B = B1 ∪ B2 ∪ · · · ∪ Bs is a basis of M . Whatis the matrix of α : M → M , given by (v)α = xv for all v ∈ M , relative to B? Theanswer, which involves the direct sum Definition 5.17 of matrices over F , is in ournext lemma.

Lemma 5.19

Using the above notation

A1 ⊕ A2 ⊕ · · · ⊕ As

is the matrix of α : M → M relative to the basis B = B1 ∪ B2 ∪ · · · ∪ Bs ofM = N1 ⊕ N2 ⊕ · · · ⊕ Ns where Aj is the matrix of α|Nj

: Nj → Nj relative toBj for 1 ≤ j ≤ s.

Proof

Let v1, v2, . . . , vt be the vectors of B. Denote the matrix of α relative to B by B = (bik)

where 1 ≤ i, k ≤ t . As in the proof of Lemma 5.18 write t ′0 = 0 and t ′j = t1 + · · · + tj

for 1 ≤ j ≤ s. Then Bj consists of the vectors vi for t ′j−1 < i ≤ t ′j . For t ′j−1 < i ≤ t ′jwe know (vi)α = bi1v1 + bi2v2 + · · · + bit vt ∈ Nj as Nj is α-invariant. So the sec-ond suffix k of any non-zero bik satisfies t ′j−1 < k ≤ t ′j . The above equation can beexpressed: (vi)α|Nj

= ∑t ′j−1<k≤t ′j bikvk showing that bik is the (i − t ′j−1, k − t ′j−1)-

entry in Aj for such i and k by Definitions 5.1 and 5.16. We’ve now identified bik forall 1 ≤ i, k ≤ t and so B = A1 ⊕ A2 ⊕ · · · ⊕ As by Definition 5.17. �

Finally we apply the preceding theory to the case M = M(A).


Corollary 5.20

Let A be a t × t matrix over a field F . Let α : F t → F t be the linear mapping deter-mined by A. Suppose the F [x]-module M(A) has submodules N1,N2, . . . ,Ns suchthat M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Ns . Let Aj be the tj × tj matrix of α|Nj

: Nj → Nj

relative to a basis Bj of Nj where tj = dimNj for 1 ≤ j ≤ s. Let X be the in-vertible matrix over F having the vectors of B1 ∪ B2 ∪ · · · ∪ Bs as its rows. ThenXAX−1 = A1 ⊕ A2 ⊕ · · · ⊕ As .

Proof

First note that α has matrix A relative to the standard basis B0 of F t . By Lemma 5.18we see B′ = B1 ∪ B2 ∪ · · · ∪ Bs is a basis of F t . By Lemma 5.19 the matrix of α

relative to B′ is A′ = A1 ⊕ A2 ⊕ · · · ⊕ As . From Lemma 5.2 we conclude that X

satisfies XAX−1 = A′. �

Using the above notation Nj = M(α|Nj) it is true that M(α|Nj

) ∼= M(Aj ), thatis, the F [x]-modules Nj and M(Aj ) are isomorphic for 1 ≤ j ≤ s (Exercises 5.1,Question 5). The decomposition of M(A) in Corollary 5.20 can therefore be expressedas the external direct sum

M(A) ∼= M(A1) ⊕ M(A2) ⊕ · · · ⊕ M(As).

As an illustration consider

A =

⎛

⎜⎜⎝

0 −1 1 01 −1 1 11 0 0 10 1 −1 0

⎞

⎟⎟⎠

over Q. We will see in Corollary 5.29 that each vector v0 of Q4 generates a submoduleN of M(A) and N is an A-invariant space of dimension equal to the degree of theorder of v0 in M(A). Let N1 = {f (x)e1 : f (x) ∈ F [x]}, that is, the elements of N1 arepolynomial multiples of e1 = (1,0,0,0). Now xe1 = e1A = (0,−1,1,0) and x2e1 =x(xe1) = (0,−1,1,0)A = (0,1,−1,0) and so (x2 + x)e1 = 0. As e1, xe1 are linearlyindependent we see that e1 has order x2 + x in M(A). On dividing f (x) by x2 + x weobtain f (x) = q(x)(x2 + x)+ r(x) where q(x), r(x) ∈ Q[x] and r(x) = a0 + a1x. So

f (x)e1 = (q(x)(x2 + x) + r(x))e1 = q(x)(x2 + x)e1 + r(x)e1

= q(x)0 + r(x)e1 = 0 + r(x)e1 = r(x)e1 = (a0 + a1x)e1 = a0e1 + a1xe1

showing that e1, xe1 span N1. So the ordered set e1, xe1 is a basis B1 of the A-invariant subspace N1 of Q4. In the same way let N2 = {f (x)e3 : f (x) ∈ F [x]}, that


is, the elements of N2 are polynomial multiples of e3 = (0,0,1,0). Now xe3 = e3A =(1,0,0,1) and x2e3 = x(xe3) = (1,0,0,1)A = (0,0,0,0) = 0. As e3, xe3 are linearlyindependent we conclude that e3 has order x2 in M(A). As before e3, xe3 span N2

and so e3, xe3 is a basis B2 of N2. The reader may wonder why we have chosen e1

and e3 as generators (why not e1 and e2 for example?). The answer is: e1 and e3 arethe simplest vectors such that M(A) = N1 ⊕ N2 as we now demonstrate. Constructthe matrix X having the vectors of B1 ∪B2 as its rows, that is

X =

⎛

⎜⎜⎜⎝

e1

xe1

e3

xe3

⎞

⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎝

1 0 0 00 −1 1 0

0 0 1 01 0 0 1

⎞

⎟⎟⎟⎠

.

As detX = −1 �= 0 the rows of X are the vectors of the basis B = B1 ∪ B2 of Q4. Sothe union of B1 and B2 is a basis of Q4 and this fact guarantees that N1 and N2 areindependent and M(A) = N1 + N2, that is, M(A) = N1 ⊕ N2. Then

XAX−1 =

⎛

⎜⎜⎜⎝

0 1 0 00 −1 0 0

0 0 0 10 0 0 0

⎞

⎟⎟⎟⎠

= C(x(x + 1)) ⊕ C(x2)

where C((x + 1)x) = (0 10 −1

)and C(x2) = (

0 10 0

)are companion matrices Defini-

tion 5.25. In fact N1 ∼= M(C(x2 +x)) and N2 ∼= M(C(x2)) using Exercises 5.1, Ques-tion 5. Finally note that e1, xe1, e2, xe2 are linearly dependent and so e2 couldn’t havebeen used instead of e3.

Here is a tip: similarity problems can generally be solved without explicitly findingthe entries in X−1; specifically it is good enough to verify detX �= 0 and XA = BX

in order to show XAX−1 = B where A,B and X are t × t matrices over a field.In Section 6.1 a systematic method of determining generators of cyclic submod-

ules Nj as in Corollary 5.20 is developed.

EXERCISES 5.1

1. (a) Let

A =(

1 32 4

)and X =

(2 15 3

)

over Q. Calculate the entries in B = XAX−1 and verify that detA =detB and traceA = traceB . What is the characteristic polynomial ofX2AX−2?


(b) The matrices

A =(

3 − a 7−1 a

)and B =

(b 1

−1 2

)

over Q are similar. Find the possible values of a and b.(c) Verify that

XAX−1 =(

1 a

0 −1

)where

A =(

1 00 −1

)and X =

(1 −a/20 1

)

over the field R of real numbers. Does the similarity class of A containan infinite number of elements? (Yes/No). Show A ∼ B where

B = (1/√

2)

(1 11 −1

).

(d) Show that similarity ∼ is an equivalence relation on the set Mt (F ) ofall t × t matrices over the field F .

(e) The similarity class of A ∈ M2(F ) consists of A alone where F is afield. Use the invertible matrices

(1 10 1

)and

(1 01 1

)to show

A =(

a 00 a

)

for some a ∈ F .2. (a) Let A = (aij ) be a t × t matrix over a non-trivial commutative ring R.

For each permutation π of {1,2, . . . , t} write signπ = ±1 accordingas π is even/odd. Then the determinant of A is defined by detA =∑

π (signπ)a1(1)πa2(2)π · · ·at(t)π where the summation is over all t !permutations π of {1,2, . . . , t}. Show that the coefficient of xt−1 inthe monic polynomial χA(x) = det(xI − A) of degree t over R is− traceA where traceA = a11 +a22 +· · ·+att . Show that the constantterm in χA(x) is χA(0) = (−1)t detA.

(b) Let

A = (aij ) =(

A1 B

0 A2

)

be a t × t matrix over the commutative ring R, partitioned as indicated,where A1 and A2 are respectively t1 × t1 and t2 × t2 matrices, 0 is thet2 × t1 zero submatrix and t = t1 + t2. Use the above definition ofdeterminant to show

detA = (detA1)(detA2).


Deduce det(A1 ⊕ A2) = (detA1)(detA2).Hint: Suppose first (i)π ≤ t1 < i for some i ∈ {1,2, . . . , t}. What isthe value of ai(i)π ? Next consider π such that i > t1 ⇒ (i)π > t1

and hence i ≤ t1 ⇒ (i)π ≤ t1.(c) Let A1 and A2 be respectively t1 × t1 and t2 × t2 matrices over

a field F . Specify an invertible t × t matrix X over F such thatX(A1 ⊕ A2) = (A2 ⊕ A1)X where t = t1 + t2. Deduce A1 ⊕ A2 ∼A2 ⊕ A1.Let A and B be respectively 2×2 and 3×3 matrices over F satisfyingA ⊕ B = B ⊕ A. Show that A ⊕ B is a scalar matrix. More generallysuppose A1 ⊕ A2 = A2 ⊕ A1 and let t = gcd{t1, t2}. Show that thereis a t × t matrix A over F with A1 = A ⊕ A ⊕ · · · ⊕ A (t1/t terms)and A2 = A ⊕ A ⊕ · · · ⊕ A(t2/t terms).Hint: Suppose not and consider the least integer t1 + t2 for which A

does not exist.(d) For 1 ≤ j ≤ s let Aj denote a tj × tj matrix over a non-trivial com-

mutative ring R. Show

f (A1 ⊕ A2 ⊕ · · · ⊕ As) = f (A1) ⊕ f (A2) ⊕ · · · ⊕ f (As)

for all polynomials f (x) ∈ R[x].Hint: Consider first the case s = 2 and use induction on degf (x).

(e) Let M be an R-module where R is a non-trivial commutative ring.Let α : M → M and β : M → M be R-linear mappings and let a ∈ R.Show that α + β : M → M and αβ : M → M are R-linear. Showthat aα : M → M , defined by (v)(aα) = a((v)α) for all v ∈ M , is R-linear. Does the set EndM of all R-linear mappings α : M → M havethe structure of(i) a ring (Yes/No),(ii) an R-module (Yes/No)?Following Definition 5.6, for each polynomial f (x) = anx

n +an−1x

n−1 +· · ·+ a1x + a0 over R, write f (α) = anαn + an−1α

n−1 +· · · + a1α + a0ι where ι : M → M is the identity mapping. Show byinduction that(i) αn is R-linear,(ii) f (α) is R-linear.Let α ∈ EndM . Show that εα : R[x] → EndM is a ring homomor-phism where (f (x))εα = f (α) for all f (x) ∈ R[x], that is, εα is theevaluation at α homomorphism.

(f) Let V be a finite-dimensional vector space over the field F . SupposedimV = t > 0 and let B be a basis of V . Write EndF V for the ring ofall F -linear mappings α : V → V . Show that θ : EndF V ∼= Mt (F ) is


an F -linear ring isomorphism where θ(α) is the matrix of α relativeto B.Hint: Model your proof on Theorem 3.15.

3. (a) Working in the Q[x]-module M(A), where A = (1 00 0

), express each

of xe1, x2e1, (x + 1)e1, (x − 1)e1 as a pair (a1, a2) ∈ Q2 where

e1 = (1,0). Express in the same way xe2, x(x + 1)e2, x(e1 + e2),(x −1)(e1 + e2), x(x −1)(e1 + e2) where e2 = (0,1). Write down theorders of e1, e2 and e1 + e2 in M(A). Is e1 + e2, x(e1 + e2) a basisof Q2? Is e1 + e2 a generator of M(A)?Which of 〈e1〉, 〈e2〉 and 〈e1 + e2〉 are submodules of M(A)? List the(four) submodules of M(A). Which rational numbers a1, a2 are suchthat v = a1e1 + a2e2 generates M(A)?

(b) Working in the Q[x]-module M(A), where A = ( 4 −25 −3

), find the or-

ders of v1 = (1,−1) and v2 = (5,−2). Are these vectors row eigen-vectors of A? (Yes/No). Write down the characteristic polynomialχA(x) of A. Verify that χA(A) is the zero matrix. Find the order ofe1 in M(A). Does e1 generate M(A)? Which (four) monic polynomi-als over Q arise as orders of elements in M(A)? Let X = (

e1xe1

)and

Y = (v1v2

). Verify that X and Y are invertible over Q and calculate the

matrices C and D satisfying XA = CX and YA = DY . Are C and D

both similar to A? (Yes/No).(c) Let A be a t × t matrix over a field F . Suppose that the orders of

e1, e2, . . . , et and e1 + e2 + · · · + et in M(A) have degree 1. Showthat A is a scalar matrix, that is, A = λI for some λ ∈ F . Let B be a2 × 2 matrix over F which is not a scalar matrix. Deduce that M(B)

is cyclic being generated by one of e1, e2, e1 + e2.(d) Let V be a vector space over a field F and let α : V → V be a linear

mapping. Suppose v ∈ V . Show K = {f (x) ∈ F [x] : (v)f (α) = 0} tobe an ideal of F [x]. K is the order ideal Definition 5.11 of v in M(α).Show V finite-dimensional implies K non-zero.

(e) Let A be a t × t matrix over a field F and suppose m(A) = 0 wherem(x) is a monic polynomial over F . Show that F t has the structure ofan F [x]/〈m(x)〉-module M ′ on defining (f (x) + 〈m(x)〉)v = vf (A)

for all f (x) ∈ F [x], v ∈ F t .Hint: Start by showing that f (x)v is unambiguously defined by ther.h.s. of the above equation where f (x) = f (x) + 〈m(x)〉.Suppose m(x) is irreducible over F . Deduce that M ′ is a vectorspace over the field F ′ = F [x]/〈m(x)〉. Let v1, v2, . . . , vt ′ form a ba-sis of M ′ over F ′. Show that the degm(x) × t ′ vectors (xi)vj for0 ≤ i < degm(x), 1 ≤ j ≤ t ′, form a basis of F t . Deduce t/t ′ =degm(x).


4. (a) Find xe1 and x2e1 in the Q[x]-module M(A) where

A =⎛

⎝0 2 −12 3 −21 2 −2

⎞

⎠ .

Are e1, xe1 linearly dependent? Are e1, xe1, x2e1 linearly dependent?Show that e1 has order (x + 1)(x − 3) in M(A). Are (x + 1)e1 and(x − 3)e1 row eigenvectors of A? Which elements v of M(A) satisfy(x + 1)(x − 3)v = 0? Is M(A) a cyclic Q[x]-module? Determine theorder of (1,0,−1) in M(A) and hence construct an invertible matrixX over Q such that XAX−1 is diagonal.Hint: The rows of X are linearly independent (row) eigenvectors of A.

(b) Find xe1 and x2e1 in the Q[x]-module M(A) where

A =⎛

⎝2 2 1

−1 −1 −11 2 2

⎞

⎠ .

Show that e1 has order (x − 1)2 in M(A). Find the order of e1 + e2 inM(A). Show that

X =⎛

⎝e1 + e2

e1

xe1

⎞

⎠

is invertible over Q and satisfies

XAX−1 =⎛

⎜⎝

1 0 0

0 0 10 −1 2

⎞

⎟⎠ .

(c) Let α : V → V be a linear mapping of the vector space V over thefield F . Use Lemma 5.15 to show that all subspaces N of V withN ⊆ kerα are α-invariant. The subspace N ′ of V satisfies imα ⊆ N ′.Show that N ′ is α-invariant.Suppose rankα = 1. Show that there are no further α-invariant sub-spaces.Let λ ∈ F and let N be a subspace of V . Show that N is α-invariantif and only if N is (α − λι)-invariant where ι is the identity mappingof V .

5.2 Cyclic Modules and Companion Matrices 227

(d) Let α : F 3 → F 3 be the linear mapping determined by

A =⎛

⎝1 0 00 0 00 0 0

⎞

⎠

over an arbitrary field F . Show that every α-invariant subspace N ofF 3 satisfies either 〈e1〉 ⊆ N or N ⊆ 〈e2, e3〉. List the ten α-invariantsubspaces of F 3 in the case F = Z2.

(e) Let α : F 3 → F 3 be the linear mapping determined by

A =⎛

⎝0 1 00 0 00 0 0

⎞

⎠

over the arbitrary field F . Determine the α-invariant subspaces N ofF 3 and list these subspaces in the case F = Z2.

(f) Specify the α-invariant subspaces of Q3 where α is the linear mappingdetermined by the matrix A of Question 4(a) above.Hint: Consider α + ι.Specify the α-invariant subspaces of Q3 where α is the linear mappingdetermined by the matrix A of Question 4(b) above.

5. Let A be a t × t matrix over a field F and let α : F t → F t be given by(v)α = vA for all v ∈ F t . Let N be a submodule of the F [x]-moduleM(A). Let u1, u2, . . . , us be a basis of N and let B be the s × s matrixof α|N relative to this basis. For u ∈ N write (u)β = (a1, a2, . . . , as) ∈ F s

where u = a1u1 + a2u2 + · · · + asus . Using Lemma 5.15 showβ : N ∼= M(B), that is, show β to be an isomorphism between theF [x]-modules N and M(B).

5.2 Cyclic Modules and Companion Matrices

Just as every finite abelian group decomposes into a direct sum of cyclic subgroupsTheorem 3.4, so every F [x]-module M(A) as in Definition 5.8 decomposes into adirect sum of cyclic submodules. As preparation for the fundamental theorem 6.5 weanalyse cyclic F [x]-modules and the corresponding matrices, companion matrices,which explicitly determine cyclic modules. Companion matrices have many agreeableproperties: their characteristic polynomials can be read off from the entries in the lastrow and they provide standard examples of cyclic modules. Submodules of cyclicF [x]-modules are also cyclic and easy to ‘pin down’, compared to those of a general


F [x]-module M(A), and knowledge of them will help our study of variants of therational canonical form in Section 6.2.

Definition 5.21

Let M be an R-module where R is a non-trivial commutative ring. Suppose M con-tains an element v0 such that each element of M is expressible as rv0 for some r ∈ R.Then M is said to be cyclic with generator v0.

Taking R = Z we obtain the concept (Definition 2.1) of a cyclic abelian group.Taking R = F [x] we obtain the important concept of a cyclic F [x]-module, examplesof which we have already met in Examples 5.9a and 5.9b.

Let R be a non-trivial commutative ring. Then R is a free R-module of rank 1,that is, R is a cyclic R-module being generated by its 1-element 1 of order ideal {0}.The submodules of R are precisely the ideals K of R, but K may not be a cyclicR-module unless R is a PID. However R/K is a cyclic R-module being generated byits 1-element K + 1, where K is an ideal of R.

By Theorem 2.5 every cyclic Z-module is isomorphic to the additive group of thering Z/〈n〉 where n ≥ 0. Our first lemma generalises Theorem 2.5 to include cyclicR-modules and introduces the standard examples of cyclic F [x]-modules.

Lemma 5.22

Let M be a cyclic R-module with generator v0 where R is a non-trivial commutativering. The mapping θ : R → M , where (r)θ = rv0 for all r ∈ R, is R-linear and sur-jective. Write K = ker θ . The mapping θ : R/K ∼= M , where (K + r)θ = (r)θ for allr ∈ R, is an isomorphism of R-modules.

Suppose R = F [x] where F is a field. Let v0 have order d0(x) in M . Thenθ : F [x]/〈d0(x)〉 ∼= M .

Proof

As module laws 5 and 6 are obeyed in M we see (r1 + r2)θ = (r1 + r2)v0 =r1v0 + r2v0 = (r1)θ + (r2)θ and (rr1)θ = (rr1)v0 = r(r1v0) = r((r1)θ) for allr, r1, r2 ∈ R. So θ is R-linear. As v0 is a generator (Definition 5.21) of M , everyelement of M is rv0 = (r)θ for some r ∈ R showing im θ = M , that is, θ is surjec-tive. From the first isomorphism theorem for R-modules (Corollary 2.28) we deduceθ : R/K ∼= M where K = ker θ .

Now take R = F [x]. Then K = {f (x) ∈ F [x] : f (x)v0 = 0} is the order ideal(Definition 5.11) of v0 in M . By hypothesis K = 〈d0(x)〉 where d0(x) is either monicor zero. In any case θ : F [x]/〈d0(x)〉 ∼= M . �


Let us look at a few examples. For instance M = Q[x]/〈x3 + 1〉 is a cyclic Q[x]-module having generator K + 1 of order x3 + 1 by Lemma 5.22 where K = 〈x3 + 1〉.By Theorem 4.1 with g(x) = x3 + 1 we see that each element of M is of the formK + r(x) where r(x) ∈Q[x], deg r(x) < 3. In fact M is a 3-dimensional vector spaceover Q with basis K + 1, K + x, K + x2. In the same way M ′ = Z3[x]/〈x4〉 isa cyclic Z3[x]-module being generated by v0 = 〈x4〉 + 1 of order x4; also M ′ is a4-dimensional vector space over Z3 with basis v0, xv0, x

2v0, x3v0 and |M ′| = 34 =

81. The general case is discussed in Theorem 5.24. However we note in passing thatF [x]/〈0(x)〉 is exceptional: it is the polynomial analogue of an infinite cyclic group,being a free F [x]-module of rank 1, and also it is a countably infinite-dimensionalvector space over F : the elements of F [x]/〈0(x)〉 are singletons (subsets 〈0(x)〉 +f (x) = {f (x)} of F [x] having exactly one element) and {1}, {x}, {x2}, . . . , {xn}, . . .is a countable F -basis of F [x]/〈0(x)〉.

The theory of finite abelian groups depends crucially on Lemma 2.7. The analo-gous lemma for F [x]-modules is stated next; it is equally important and can be provedin the same way (Exercises 5.2, Question 3(a)).

Lemma 5.23

Let v be an element of an F [x]-module M having monic order d(x). Then f (x)v hasorder d(x)/gcd{f (x), d(x)} in M where f (x) ∈ F [x].

For example we know v0 = 〈x4〉 + 1 has order x4 in the Z3[x]-module M ′above. So xv0 has order x4/gcd{x, x4} = x4/x = x3 in M ′ and (x2 + 1)v0 has orderx4/gcd{x2 +1, x4} = x4/1 = x4 in M ′ using Lemma 5.23. This means that (x2 +1)v0

is also a generator of M ′ simply because it has the ‘right’ order as we show next.

Theorem 5.24

Let F be a field and let M be a cyclic F [x]-module with generator v0 of monic orderd0(x). Then each v in M has order d(x) where d(x)|d0(x). Further v is a generatorof M if and only if v has order d0(x). Also M is a t-dimensional vector space over F

with basis Bv0 consisting of v0, xv0, x2v0, . . . , x

t−1v0 where t = degd0(x).

Proof

By Definition 5.21 there is f (x) ∈ F [x] with v = f (x)v0. So d0(x)v = d0(x)f (x)v0 =f (x)d0(x)v0 = f (x)0 = 0 showing that d0(x) is in the order ideal K of v in M . SoK = 〈d(x)〉 for some monic polynomial d(x) over F by Theorem 4.4, that is, v hasorder d(x) by Definition 5.11, and d(x)|d0(x).


Suppose v is a generator of M . Interchanging the roles of v and v0 in the precedingparagraph gives d0(x)|d(x). So d(x) = d0(x) as the monic polynomials d(x) andd0(x) are such that each is a divisor of the other.

Suppose d(x) = d0(x), that is, v has the same order as the generator v0 of M .As v = f (x)v0, by Lemma 5.23 we see d0(x) = d0(x)/gcd{f (x), d0(x)}. Thereforegcd{f (x), d0(x)} = 1. By Corollary 4.6 there are polynomials a1(x) and a2(x) overF with a1(x)f (x) + a2(x)d0(x) = 1. So

v0 = 1v0 = (a1(x)f (x) + a2(x)d0(x))v0

= a1(x)f (x)v0 + a2(x)d0(x)v0 = a1(x)v

as d0(x)v0 = 0. Therefore v0 is a polynomial multiple of v. This is good news asit is now only a small step to show that v generates M . Consider v′ ∈ M . As v0

generates M , by Definition 5.21 there is f ′(x) ∈ F [x] with v′ = f ′(x)v0. So v′ =f ′(x)a1(x)v showing that v is a generator of M as f ′(x)a1(x) ∈ F [x].

The vectors v0, xv0, x2v0, . . . , x

t−1v0 belong to the vector space M over F .Could these vectors be linearly dependent? If so there are a0, a1, . . . , at−1 ∈ F ,not all zero, with a0v0 + a1xv0 + a2x

2v0 + · · · + at−1xt−1v0 = 0. Write a(x) =

a0 + a1x + a2x2 + · · · + at−1x

t−1. Then a(x)v0 = 0 and so a(x) ∈ 〈d0(x)〉, thatis, a(x) belongs to the order ideal of v0. Therefore d0(x)|a(x) which gives t =degd0(x) ≤ dega(x) as a(x) �= 0(x). But dega(x) < t . This contradiction shows thatv0, xv0, x

2v0, . . . , xt−1v0 are linearly independent and we denote this ordered set of t

vectors by Bv0 . Does Bv0 span M? Each v in M is expressible as v = f (x)v0. Dividingf (x) by d0(x) there are q(x) and r(x) in F [x] with f (x) = q(x)d0(x) + r(x) wheredeg r(x) < t . Write r(x) = r0 + r1x + r2x

2 + · · · + rt−1xt−1. Using d0(x)v0 = 0 we

obtain

v = f (x)v0 = (q(x)d0(x) + r(x))v0 = q(x)d0(x)v0 + r(x)v0

= r(x)v0 = r0v0 + r1xv0 + r2x2v0 + · · · + rt−1x

t−1v0

showing that each vector v in M is expressible as a linear combination of the vectorsin Bv0 , that is, Bv0 spans M . Therefore Bv0 is an F -basis of M and dimM = t . �

We have used the basis Be1 of M(A) with t = 2 in Examples 5.9a and 5.9b. InSection 6.1 bases of the type Bv0 are exactly what is needed to construct an invertiblematrix X with XAX−1 in rational canonical form Definition 6.4.

Suppose that the cyclic F [x]-module M is also a t-dimensional vector spaceover F . Let v0 of order d0(x) be a generator of M . Then d0(x) is monic of degree t byTheorem 5.24, that is, d0(x) = xt +bt−1x

t−1 +· · ·+b1x +b0. Let α : M → M be theF -linear mapping defined by (v)α = xv for all v ∈ M . What is the matrix of α relative


to Bv0 ? Write vi = xiv0 for 1 ≤ i < t . Then Bv0 is the F -basis v0, v1, v2, . . . , vt−1 ofthe vector space M . As

(v)α = xv, (xv)α = x2v, (x2v)α = x3v, . . . , (xt−2v)α = xt−1v

we see (vi)α = vi+1 for 0 ≤ i < t , that is, α maps each vector in Bv0 into thenext vector in Bv0 (when there is a next one). The only thing outstanding is:what does α ‘do’ to vt−1? To answer this question we must express (vt−1)α =(xt−1v0)α = x(xt−1v0) = xtv0 as a linear combination of v0, v1, v2 . . . , vt−1, thatis, of v0, xv0, x

2v0, . . . , xt−1v0. As v0 has order d0(x) we know d0(x)v0 = 0, that is,

xtv0 + bt−1xt−1v0 + · · · + b1xv0 + b0v0 = 0 which on rearranging gives

(vt−1)α = −b0v0 − b1v1 − · · · − bt−1vt−1

and this is the equation we are looking for.

Definition 5.25

The t × t matrix over the field F

C(d(x)) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 . . . 00 0 1 0 . . . 00 0 0 1 . . . 0...

......

.... . .

...

0 0 0 0 . . . 1−b0 −b1 −b2 −b3 . . . −bt−1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

is called the companion matrix of d0(x) = xt + bt−1xt−1 + · · · + b1x + b0 over F .

Companion matrices are the building blocks for the rational canonical form of ageneral square matrix A over F . Notice that a polynomial which isn’t monic doesn’thave a companion matrix. It is worth pointing out that, in spite of the notation,C(d0(x)) is a matrix over F . Specifically note

C(d0(x)) is the matrix of α relative to the basis Bv0 of the cyclicF [x]-module M with generator v0 of order d0(x)

on referring back to the discussion preceding Definition 5.25.We have already seen in Examples 5.9a and 5.9b that companion matrices pro-

vide a ready source of cyclic modules. The main properties of companion matricesare established in the following theorem which involves the F [x]-module M(C) (seeDefinition 5.8) determined by the companion matrix C.


Theorem 5.26

Write C = C(d0(x)) where d0(x) is a monic polynomial of positive degree t over F .

Then M(C) is a cyclic F [x]-module with generator e1 of order d0(x). Further d0(x)

is the characteristic polynomial of C and d0(C) = 0.

Proof

Let d0(x) = xt +bt−1xt−1 +· · ·+b1x +b0 where bi ∈ F for 0 ≤ i < t . As eiC = ei+1

for 1 ≤ i < t we see xei = ei+1 in M(C). By induction ei+1 = xie1 for 0 ≤ i < t .

So Be1 is the standard basis e1, xe1, x2e1, . . . , x

t−1e1 of F t . As Be1 spans F t we

see that M(C) is cyclic with generator e1. From Definition 5.25 the last row of the

t × t matrix C is etC = −(b0, b1, . . . , bt−1), that is, xte1 = x(xt−1e1) = xet = etC =−(b0 + b1x + · · · + bt−1x

t−1)e1 which rearranges to produce d0(x)e1 = 0. Let K

be the order ideal of e1 in M(C). Then d0(x) ∈ K and e1 has order d(x) in M(C)

where d(x)|d0(x) by Definition 5.11. As M(C) is a t-dimensional vector space over

F we see degd(x) = t by Theorem 5.24. Therefore d(x) = d0(x) and so K = 〈d0(x)〉showing that e1 has order d0(x) in M(C). The characteristic polynomial of C is

χC(x) = det(xI − C) =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

x −1 0 . . . 0 00 x −1 . . . 0 00 0 x . . . 0 0...

......

. . ....

...

0 0 0 . . . x −1b0 b1 b2 . . . bt−2 x + bt−1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

.

We perform the column operation: c1 +(xc2 +x2c3 +· · ·+xt−1ct ), that is, to col 1

add xi−1 col i for 1 < i ≤ t to the matrix xI − C. This column operation, being the

composition of t −1 ecos of type (iii), leaves the determinant unchanged and produces

a new col 1 with only one non-zero entry, namely d0(x) in row t . So

χC(x) =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

0 −1 0 . . . 0 00 x −1 . . . 0 00 0 x . . . 0 0...

......

. . ....

...

0 0 0 . . . x −1d0(x) b1 b2 . . . bt−2 x + bt−1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣


= (−1)t+1d0(x)

∣∣∣∣∣∣∣∣∣∣∣

−1 0 . . . 0 0x −1 . . . 0 00 x . . . 0 0...

.... . .

......

0 0 . . . x −1

∣∣∣∣∣∣∣∣∣∣∣

on expanding along col 1. The above determinant is (−1)t−1 and so

χC(x) = (−1)t+1d0(x)(−1)t−1 = (−1)2t d0(x) = d0(x)

showing that C(d0(x)) has characteristic polynomial d0(x).In the F [x]-module M(C) we know d0(x)ei = eid0(C) which is row i of the t × t

matrix d0(C) for 1 ≤ i ≤ t . But ei = xi−1e1 and d0(x)e1 = 0 in M(C). Therefored0(x)ei = d0(x)xi−1e1 = xi−1d0(x)e1 = xi−10 = 0 and so eid0(C) = 0 for 1 ≤ i ≤ t .We have shown d0(C) = 0, that is, d0(C) is the t × t zero matrix as each of its rows isthe zero vector of F t . �

Theorem 5.26 is brimming with useful information. First the F [x]-moduleM(C(d0(x))) is cyclic with generator e1 and Corollary 5.27 below shows that ev-ery cyclic F [x]-module M(A) is isomorphic to M(C(χA(x))).

Second should we wish to construct a matrix with a given characteristic poly-nomial then the companion matrix gives an immediate answer. For example find amatrix C over Q with characteristic polynomial x2 + (2/3)x − (4/5). An answer isC = ( 0 1

4/5 −2/3

).

Thirdly every companion matrix ‘satisfies’ its own characteristic polynomial. Thereader can verify that C above satisfies

C2 + (2/3)C − (4/5)I =(

0 00 0

).

By Theorem 5.26 the Cayley–Hamilton Theorem (Corollary 6.11) is valid for com-panion matrices, that is,

χC(C) = 0 for C = C(d0(x)).

Conversely, as we will see, the general case of Corollary 6.11 can be reduced to thatof a single companion matrix and so Theorem 5.26 ‘does the trick’.

Our next corollary lays bare the structure of every cyclic F [x]-module M(A) pro-vided we know a generator v0 of M(A) and the characteristic polynomial χA(x) of A.


Corollary 5.27

Let the F [x]-module M(A) be cyclic with generator v0 where A is a t × t matrix overa field F . The matrix X, having the vectors of the basis Bv0 of F t as its rows, satisfiesXAX−1 = C(χA(x)). The order of v0 in M(A) is χA(x). The F [x]-modules M(A)

and M(C(χA(x))) are isomorphic.

Proof

Let d0(x) denote the order of v0 in M(A). Then degd0(x) = t by Theorem 5.24 asM(A) has dimension t over F . Let α : F t → F t be the linear mapping determinedby A. Then α has matrix A relative to the standard basis B0 of F t and, by the discus-sion preceding Definition 5.25, α has matrix C(d0(x)) relative to Bv0 . By Lemma 5.2the matrix X, which relates Bv0 to B0, that is, eiX = xi−1v0 for 1 ≤ i ≤ t , satisfiesXAX−1 = C(d0(x)). By Theorem 5.26 the characteristic polynomial of C(d0(x)) isd0(x) and so d0(x) = χA(x) using Lemma 5.5. The order of v0 in M(A) is there-fore χA(x) and XAX−1 = C(χA(x)), that is, A and C(χA(x)) are similar. By Theo-rem 5.13 the F [x]-modules M(A) and M(C(χA(x))) are isomorphic. �

So M(A) is cyclic if and only if A is similar to the companion matrix of its char-acteristic polynomial.

As an example let

A =⎛

⎝1 1 −11 −1 20 2 −1

⎞

⎠

over Q. In Section 6.1 we describe a method leading to a generator of M(A) shouldit have one. Also we will see shortly that cyclic modules have many generators. Forthe moment we adopt a ‘trial and error’ approach. Does e1 generate M(A)? As xe1 =(1,1,−1), x2e1 = (2,−2,2) and

∣∣∣∣∣∣

e1

xe1

x2e1

∣∣∣∣∣∣=

∣∣∣∣∣∣

1 0 01 1 −12 −2 2

∣∣∣∣∣∣= 0

we see that e1, xe1, x2e1 are linearly dependent. So e1 does not generate M(A).Does e2 generate M(A)? As

∣∣∣∣∣∣

e2

xe2

x2e2

∣∣∣∣∣∣=

∣∣∣∣∣∣

0 1 01 −1 20 6 −5

∣∣∣∣∣∣= 5 �= 0


we see that Be2 consisting of e2, xe2, x2e2 is a basis of Q3. So M(A) is cyclic withgenerator e2. We leave the reader to check that the characteristic polynomial of A is

χA(x) = |xI − A| =∣∣∣∣∣∣

x − 1 −1 1−1 x + 1 −20 −2 x + 1

∣∣∣∣∣∣

= x3 + x2 − 6x + 4 = (x − 1)(x2 + 2x − 4)

and

X =⎛

⎝e2

xe2

x2e2

⎞

⎠ =⎛

⎝0 1 01 −1 20 6 −5

⎞

⎠

satisfies

XAX−1 = C(x3 + x2 − 6x + 4) =⎛

⎝0 1 00 0 1

−4 6 −1

⎞

⎠ .

As the polynomial x2 +2x−4 is irreducible over Q, by Theorem 5.28 below the Q[x]-module M(A) has only four submodules: 0, N1, N2, M(A) where N1 = 〈(2,0,−1)〉and N2 = 〈e1, xe1〉. Also M(A) = N1 ⊕ N2, that is, M(A) is the internal direct sumof its submodules N1 and N2 as shown below:

Any vector v of Q3 not in either N1 or N2 is a generator of M(A), because 〈v, xv, x2v〉is a submodule N of M(A) containing v and there is only one such submodule, namelyM(A) itself. So 〈v, xv, x2v〉 = M(A).

From Lemma 2.2 all subgroups of (Zn,+) are cyclic and correspond to the posi-tive divisors of n. Our next theorem, which has already used in Examples 5.9a–5.9c,contains the polynomial analogue.


Theorem 5.28

Let M be a cyclic F [x]-module with generator v0 having non-zero order d0(x) in M .Let N be a submodule of M . Then N is cyclic with generator d(x)v0 where d(x) is amonic divisor of d0(x). The order of d(x)v0 in M is d0(x)/d(x). The quotient moduleM/N is cyclic with generator N + v0 of order d(x) in M/N .

Proof

Consider KN = {f (x) ∈ F [x] : f (x)v0 ∈ N}. It is straightforward to verify that KN

is an ideal of F [x] (see Exercises 5.2, Question 4(a)). As d0(x)v0 = 0 ∈ N we de-duce d0(x) ∈ KN and so KN is non-zero. By Theorem 4.4 there is a unique monicpolynomial d(x) over F with 〈d(x)〉 = KN . As d0(x) ∈ KN we see d(x)|d0(x). Asd(x) ∈ KN we see d(x)v0 ∈ N . We show that w0 = d(x)v0 is a generator of N . Con-sider w ∈ N . Since v0 is a generator of M and w ∈ M there is g(x) ∈ F [x] withw = g(x)v0 by Definition 5.21. So g(x)v0 = w ∈ N which gives g(x) ∈ KN . There-fore g(x) = h(x)d(x) for some h(x) ∈ F [x] and so w = h(x)d(x)v0 = h(x)w0 show-ing that N is indeed cyclic with generator w0.

Let K ′ denote the order ideal Definition 5.11 of w0 in M . Then d0(x)/d(x) ∈ K ′as

(d0(x)/d(x))w0 = (d0(x)/d(x))d(x)v0 = d0(x)v0 = 0

showing K ′ to be non-zero. By Theorem 4.4 there is a monic polynomial d ′(x) over F

with K ′ = 〈d ′(x)〉 and d ′(x)|d0(x)/d(x). Then d ′(x)d(x)v0 = d ′(x)w0 = 0 showingd ′(x)d(x) ∈ 〈d0(x)〉, the order ideal of v0 in M . Therefore d0(x)|d ′(x)d(x) and sod0(x)/d(x)|d ′(x). Each of the monic polynomials d0(x)/d(x) and d ′(x) is a divisor ofthe other and so they are equal: d0(x)/d(x) = d ′(x). So K ′ = 〈d0(x)/d(x)〉 showingthat w0 has order d0(x)/d(x) in M .

Using Lemma 2.27 each element N + v of M/N is expressible as N + f (x)v0 =f (x)(N + v0) wheref (x) ∈ F [x] showing that N + v0 is a generator of M/N . LetK ′′ denote the order ideal of N + v0 in M/N . Then d(x) ∈ K ′′ as

d(x)(N + v0) = N + d(x)v0 = N + w0 = N

the zero element of M/N . So K ′′ = 〈d ′′(x)〉 where d ′′(x) is a monic polynomial overF and d ′′(x)|d(x) by Theorem 4.4. But d ′′(x)(N +v0) = N leads to d ′′(x)v0 ∈ N andso d ′′(x) ∈ KN = 〈d(x)〉. Therefore d(x)|d ′′(x) which gives d ′′(x) = d(x). We haveshown K ′′ = KN and so N + v0 has order d(x) in M/N . �

Let M be as in Theorem 5.28. Then N → KN is a bijection from the set of sub-modules N of M to the set of ideals K of F [x] with 〈d0(x)〉 ⊆ K . The inverse bijection


is K → NK where NK = {q(x)v0 : q(x) ∈ K}. Further

N1 ⊆ N2 ⇔ KN1 ⊆ KN2

that is, these bijections are inclusion-preserving (see Exercises 5.2, Question 4(a)).

Let v0 be an element of the F [x]-module M . Then

N = {f (x)v0 : f (x) ∈ F [x]} is the cyclic submodule of M with generator v0.

It is straightforward to verify that N is a submodule Definition 2.26 of M and it isthen immediate that v0 is a generator Definition 5.21 of N . In the context of F [x]-modules we use bold brackets writing N = 〈v0〉 to indicate that v0 is a generatorof the submodule N , whereas 〈v0〉 denotes the subspace spanned by v0. Therefore〈v0〉 = {f (x)v0 : f (x) ∈ F [x]} and 〈v0〉 = {av0 : a ∈ F }.

Corollary 5.29

Let the element v0 of the F [x]-module M have order d0(x) ∈ F [x]. Then the cyclicsubmodule 〈v0〉 is a t-dimensional vector space of M with basis Bv0 consisting ofv0, xv0, x

2v0, . . . , xt−1v0 where t = degd0(x).

Proof

This is short and sweet as Corollary 5.29 is simply Theorem 5.24 applied to 〈v0〉. �

With v0 as in Corollary 5.29 we see 〈v0〉 = 〈v0, xv0, x2v0, . . . , x

t−1v0〉.

Example 5.30a

Let A = diag(−1,0,1) over the real field R. We show that v0 = (1,1,1) generates theR[x]-module M(A). Since xv0 = (−1,0,1), x2v0 = (1,0,1) and

∣∣∣∣∣∣

v0

xv0

x2v0

∣∣∣∣∣∣=

∣∣∣∣∣∣

1 1 1−1 0 11 0 1

∣∣∣∣∣∣= 2 �= 0,

the vectors v0, xv0, x2v0 of Bv0 are linearly independent. Hence Bv0 is a basis of R3

and so v = a0v0 + a1xv0 + a2x2v0 = (a0 + a1x + a2x

2)v0 where a0, a1, a2 ∈ R andv ∈ R

3. Therefore M(A) = 〈v0〉 is cyclic with generator v0. As x3v0 = (−1,0,1) =xv0 we see (x3 − x)v0 = 0 and so v0 has order d0(x) = x3 − x = (x + 1)x(x − 1) in


M(A). By Corollary 5.27 the matrix

X =⎛

⎝v0

xv0

x2v0

⎞

⎠ =⎛

⎝1 1 1

−1 0 11 0 1

⎞

⎠

having the vectors of Bv0 as its rows is invertible over R and satisfies

XAX−1 = C(x3 − x) =⎛

⎝0 1 00 0 10 1 0

⎞

⎠ .

The reader may think that we’ve taken a step backwards by undiagonalising a per-fectly good diagonal matrix! But it is instructive to know that diagonal matrices withdistinct diagonal entries determine cyclic modules (Exercises 5.2, Question 1(d)) andcomputations in these modules are easily carried out. Here the 23 monic divisors d(x)

of d0(x) can be arranged in a ‘cubical’ lattice:

By Theorem 5.28 the 23 submodules of M(A) are each of the form 〈d(x)v0〉 whered(x) is a monic divisor of (x + 1)x(x − 1) and these submodules fit together inthe same way as shown in the above diagram: notice that (x + 1)x(x − 1)/d(x) isthe order of d(x)v0 in M(A) and the two lattices are related by the correspondence(x + 1)x(x − 1)/d(x) ↔ 〈d(x)v0〉 for all monic divisors d(x) of (x + 1)x(x − 1).

For instance

(x + 1)x(x − 1) ↔ 〈v0〉 = M(A),

1 ↔ 〈(x + 1)x(x − 1)v0〉 = {(0,0,0)} = 0,

x(x − 1) ↔ 〈(x + 1)v0〉 = 〈(2,1,0)〉 = 〈(x + 1)v0, x(x + 1)v0〉= 〈(2,1,0), (2,0,0)〉


which is the x1x2-plane (with equation x3 = 0) in R3 and

x ↔ 〈(x + 1)(x − 1)v0〉 = 〈(0,−1,0)〉 = 〈−e2〉 = 〈e2〉which is the x2-axis in R

3. In fact the 1- and 2-dimensional submodules of M(A) arerespectively the coordinate axes and coordinate planes in R

3.

Consider v = (a1, a2, a3) ∈R3 with 3 non-zero entries. Which of the 8 submodules of

M(A) could 〈v〉 be? As v is not in any of the coordinate planes 〈(x + 1)v0〉, 〈xv0〉,〈(x − 1)v0〉 we see 〈v〉 = M(A) is the only possibility. So v generates M(A).

The reader can check that the theory of this example remains unchanged on re-placing R by any field F of characteristic �= 2 but retaining A = diag(−1,0,1) asbefore where 1 is the 1-element of F . In particular, for F a finite field with |F | = q

odd, there are (q − 1)3 vectors v = (a1, a2, a3) ∈ F 3 with M(A) = 〈v〉, there beingq − 1 choices for ai ∈ F ∗, i = 1,2,3.

Example 5.30b

Let F be any field and let

A = C(x2(x − 1)) =⎛

⎝0 1 00 0 10 0 1

⎞

⎠

over F . By Theorem 5.26 the F [x]-module M(A) is cyclic with generator e1 of orderd0(x) = x2(x − 1).

The six monic divisors of d0(x) can be arranged as shown.


The submodules of M(A), which are six in number by Theorem 5.28, fit togetherin the same way:

Here 〈xe1〉 = 〈e2〉 = 〈e2, xe2〉 = 〈e2, e3〉 and 〈(x − 1)e1〉 = 〈e2 − e1, e3 − e2〉are subspaces of dimension 2, whereas 〈x2e1〉 = 〈e3〉 = 〈e3〉 and 〈x(x − 1)e1〉 =〈e3 − e2〉 = 〈e3 − e2〉 are 1-dimensional eigenspaces of A corresponding to the eigen-values 0 and 1. This module decomposes as the internal direct sum

M(A) = 〈(x − 1)e1〉 ⊕ 〈x2e1〉

using the submodules on the extreme left and right in the above diagram. We constructa basis of F 3 using bases of the submodules in this decomposition: specifically letB = B(x−1)e1 ∪Bx2e1

, that is,

B consists of (x − 1)e1, x(x − 1)e1, x2e1.

This method (of obtaining a basis from bases of components in a direct sum) is uni-versally applicable and we will use it time after time in Chapter 6. Let X denote the


invertible matrix over F having the vectors of B as its rows. So

X =⎛

⎝(x − 1)e1

x(x − 1)e1

x2e1

⎞

⎠ =⎛

⎝−1 1 00 −1 10 0 1

⎞

⎠

satisfies

XAX−1 =⎛

⎜⎝

0 1 00 0 0

0 0 1

⎞

⎟⎠ =

(C(x2) 0

0 C(x − 1)

)

.

As A = C(x2(x − 1)) we obtain

C(x2(x − 1)) ∼ C(x2) ⊕ C(x − 1)

which is a particular case of Theorem 5.31 below.Finally note that any v in M(A) which does not belong to either N1 =

〈(x − 1)e1〉 or N2 = 〈xe1〉, the largest submodules �= M(A), is a generator of M(A)

by Theorem 5.28. In the case of a finite field F of order q , then |M(A)| = q3,|N1| = |N2| = q2 and |N1 ∩N2| = |〈(x − 1)e1〉∩ 〈xe1〉| = |〈x(x − 1)e1〉| = q . There-fore |N1 ∪ N2| = |N1| + |N2| − |N1 ∩ N2| = q2 + q2 − q . So the number of vectors v

with M(A) = 〈v〉 is |M(A)| − |N1 ∪ N2| = q3 − q2 − q2 + q = q(q − 1)2.

The next theorem will help our manipulation of direct sums of companion matricesand establish the connection between the rational and primary canonical forms.

Theorem 5.31 (The Chinese remainder theorem for companion matrices)

Let f (x) and g(x) be monic polynomials of positive degrees over a field F withgcd{f (x), g(x)} = 1. Then C(f (x)g(x)) ∼ C(f (x)) ⊕ C(g(x)).

Proof

Write

f (x) = xs + as−1xs−1 + · · · + a1x + a0,

g(x) = xt + bt−1xt−1 + · · · + b1x + b0

and so degf (x) = s, degg(x) = t . Working in the F [x]-module M =M(C(f (x)g(x))) we describe a particular invertible (s + t) × (s + t) matrix Y

over F satisfying YC(f (x)g(x))Y−1 = C(f (x)) ⊕ C(g(x)). Now M is cyclic with


generator e1 ∈ F s+t of order f (x)g(x) by Theorem 5.26. Then v1 = g(x)e1 hasorder f (x) in M by Lemma 5.23. By Corollary 5.29 the submodule N1 = 〈v1〉of M has F -basis Bv1 consisting of v1, xv1, . . . , x

s−1v1. As xk−1e1 = ek in M

for 1 ≤ k ≤ s + t we see v1 = g(x)e1 = (b0, b1, . . . , bt−1,1,0,0, . . . ,0) ∈ Fs+t ,xv1 = xg(x)e1 = (0, b0, b1, . . . , bt−1,1,0,0, . . . ,0) ∈ F s+t . This pattern (moving thecoefficients of g(x) one place right) continues until the last vector of Bv1 is obtainednamely xs−1v1 = xs−1g(x)e1 = (0, . . . ,0, b0, b1, . . . , bt−1,1) ∈ F s+t . In the sameway, by Lemma 5.23 and Corollary 5.29, the order of v2 = f (x)e1 in M is g(x) andthe submodule N2 = 〈v2〉 of M has F -basis Bv2 consisting of v2, xv2, . . . , x

t−1v2. Asbefore these t elements of F s+t are

v2 = f (x)e1 = (a0, a1, . . . , as−1,1,0, . . . ,0),

xv2 = xf (x)e1 = (0, a0, a1, . . . , as−1,1,0, . . . ,0), . . . ,

until finally

xt−1v2 = xt−1f (x)e1 = (0, . . . ,0, a0, a1, . . . , as−1,1).

The condition gcd{f (x), g(x)} = 1 has not been used yet but it comes into playnow to show that the vectors in B = Bv1 ∪Bv2 are linearly independent. Suppose thereare scalars ci, dj ∈ F for 0 ≤ i < s, 0 ≤ j < t satisfying

c0v1 + c1xv1 + · · · + cs−1xs−1v1 + d0v2 + d1xv2 + · · · + dt−1x

t−1v2 = 0.

Write c(x) = c0 + c1x + · · · + cs−1xs−1 and d(x) = d0 + d1x + · · · + dt−1x

t−1. Onsubstituting v1 = g(x)e1 and v2 = f (x)e1 in the above equation, the polynomials c(x)

and d(x) over F are seen to satisfy

(c(x)g(x) + d(x)f (x))e1 = 0.

Therefore c(x)g(x) + d(x)f (x) belongs to the order ideal 〈f (x)g(x)〉 of e1 in M ,that is, f (x)g(x)|(c(x)g(x) + d(x)f (x)). So f (x)|c(x)g(x) which gives f (x)|c(x)

as gcd{f (x), g(x)} = 1. As deg c(x) < s = degf (x) we deduce c(x) = 0(x), that is,ci = 0 for 0 ≤ i < s. In the same way g(x)|d(x)f (x) which leads to g(x)|d(x) and sod(x) = 0(x), that is, dj = 0 for 0 ≤ j < t as degd(x) < t = degf (x). The conclusionis: the s + t vectors in B are linearly independent and so B is a basis of Fs+t , asthese s + t vectors necessarily span F s+t . From Lemmas 5.18 and 5.19 we deduceM = N1 ⊕N2. The linear mapping α : F s+t → F s+t determined by C(f (x)g(x)) hasmatrix C(f (x)g(x)) relative to the standard basis B0 of F s+t . By Corollary 5.20 thematrix of α relative to B is A1 ⊕ A2 where Ak is the matrix of the restriction α|Nk


of α to Nk relative to Bvkfor k = 1,2. Now (v)α = xv for all v ∈ M and so using

the theory preceding Definition 5.25 we see A1 = C(f (x)) and A2 = C(g(x)). Let Y

denote the invertible (s + t) × (s + t) matrix over F having the vectors of B as itsrows. Therefore

Y =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

b0 b1 . . . bt−1 1 0 0 . . . 00 b0 b1 . . . bt−1 1 0 . . . 0...

. . .. . .

. . .. . .

. . .. . .

......

. . .. . .

. . .. . .

. . . 00 0 . . . 0 b0 b1 . . . bt−1 1

a0 a1 . . . as−1 1 0 0 . . . 00 a0 a1 . . . as−1 1 0 . . . 0...

. . .. . .

. . .. . .

. . .. . .

......

. . .. . .

. . .. . .

. . . 00 0 . . . 0 a0 a1 . . . as−1 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

being partitioned by its first s rows and last t rows as indicated. By Lemma 5.2 we seethat Y satisfies

YC(f (x)g(x))Y−1 = C(f (x)) ⊕ C(g(x))

and so C(f (x)g(x)) ∼ C(f (x)) ⊕ C(g(x)). �

We continue to use the notation

f (x) = xs + as−1xs−1 + · · · + a1x + a0,

g(x) = xt + bt−1xt−1 + · · · + b1x + b0

for monic polynomials over a field F although we do not assume gcd{f (x), g(x)} = 1.Working in M = M(C(f (x)g(x))) as in Theorem 5.31, let Y ′ denote the(s + t) × (s + t) matrix over F having the vectors of the ordered set Bv1 ∪ Bv2 asits rows. Let T denote the (s + t) × (s + t) matrix over F with ekT = es+t+1−k for1 ≤ k ≤ s + t . So the rows of T are the rows of the identity matrix I but appearing inthe opposite order: the first row of T is the last row of I , the second row of T is thelast-but-one row of I and so on. The reader can check T 2 = I and so T −1 = T . AlsodetT = ±1 (see Exercises 5.2, Question 6(d)). The (s + t) × (s + t) matrix T Y ′T isobtained from Y ′ by reversing the order of its rows and at the same time reversing the


order of its columns. Thus

T Y ′T =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 as−1 . . . a1 a0 0 0 . . . 00 1 as−1 . . . a1 a0 0 . . . 0...

. . .. . .

. . .. . .

. . .. . .

.... . .

. . .. . .

. . .. . . 0

0 0 . . . 0 1 as−1 . . . a1 a0

1 bt−1 . . . b1 b0 0 0 . . . 00 1 bt−1 . . . b1 b0 0 . . . 0...

. . .. . .

. . .. . .

. . ....

.... . .

. . .. . .

. . .. . . 0

0 0 . . . 0 1 bt−1 . . . b1 b0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

is partitioned into its first t rows and its last s rows. It is customary to write

R(f (x), g(x)) = detT Y ′T which is called the resultant of f (x) and g(x).

For instance

R(x − r, x2 + ax + b) =∣∣∣∣∣∣

1 −r 00 1 −r

1 a b

∣∣∣∣∣∣= r2 + ar + b.

We apply the ecos c3 +xc2, c3 +x2c1 to the underlying matrix and expand along col 3to get

∣∣∣∣∣∣

1 −r x(x − r)

0 1 x − r

1 a x2 + ax + b

∣∣∣∣∣∣= −(x + a + r)(x − r) + x2 + ax + b = r2 + ar + b.

We have expressed the resultant as a polynomial linear combination of x − r andx2 + ax + b. This method is used in the proof below of the main property Corol-lary 5.32 of resultants.

Corollary 5.32

Let f (x), g(x) be monic polynomials of positive degree over a field F . Thengcd{f (x), g(x)} = 1 if and only if their resultant R(f (x), g(x)) is non-zero.


Proof

Suppose gcd{f (x), g(x)} = 1. The matrix Y of Theorem 5.31 is invertible overF and as Y = Y ′ we obtain R(f (x), g(x)) = detT YT �= 0. Conversely supposeR(f (x), g(x)) �= 0. We apply the s + t − 1 ecos cs+t + xs+t−kck over F [x] to T Y ′Tfor 1 ≤ k < s + t where degf (x) = s, degg(x) = t . These ecos leave detT Y ′T un-changed and all columns of T Y ′T except the last unchanged. However the new col-umn s + t is

(xt−1f (x), . . . , xf (x), f (x), xs−1g(x), . . . , xg(x), g(x))T .

Expanding the resulting determinant along this column produces polynomials a(x) andb(x) of degrees at most t − 1 and s − 1 respectively over F satisfyinga(x)f (x) + b(x)g(x) = R(f (x), g(x)). Therefore gcd{f (x), g(x)} is a divisor ofthe non-zero scalar R(f (x), g(x)). So gcd{f (x), g(x)} = 1. �

The matrix Y satisfying YC(f (x)g(x))Y−1 = C(f (x)) ⊕ C(g(x)) as in Theo-rem 5.31 is such that detY = detT YT = R(f (x), g(x)). Because of the close connec-tion between Y and R(f (x), g(x)) in the case gcd{f (x), g(x)} = 1 we have allowedourselves the above small digression on resultants in general.

EXERCISES 5.2

1. (a) Write down the companion matrix C = C(x3 − x) of x3 − x over Q.List the eigenvalues of C and find a row eigenvector corresponding toeach of them. Specify an invertible 3 × 3 matrix X over Q such thatXCX−1 is diagonal.

(b) Show |C(f (x))| = (−1)t a0 where f (x) = xt + at−1xt−1 + · · · +

a1x + a0 over a field F . Deduce rankC(f (x)) = t or t − 1 accord-ing as a0 �= 0 or a0 = 0.

(c) Let C = C(x2 + ax + b) over a field F . Verify C2 + aC + bI = 0and |xI − C| = x2 + ax + b. Show that D = C2 satisfiesD2 + (2b − a2)D + b2I = 0.

(d) Let f (x) be a monic polynomial of degree t over a field F . Show thatC(f (x)) is similar to a diagonal matrix over F if and only if f (x) hast distinct zeros in F . Working over

F4 = {0,1, c, c + 1 : 1 + 1 = 0, c2 + c + 1 = 0}

find an invertible 4 × 4 matrix X with XC(x4 + x)X−1 diagonal.


2. (a) Verify that e1 generates the Q[x]-module M(A) where

A =⎛

⎝1 0 11 1 01 1 0

⎞

⎠

over Q. Calculate the order of e1 in M(A). Without further calculationwrite down the characteristic polynomial χA(x) of A and verify that

X =⎛

⎝e1

xe1

x2e1

⎞

⎠

is invertible over Q and satisfies XAX−1 = C(χA(x)). Specify twolinearly independent row eigenvectors of A.

(b) Let

A =⎛

⎝1 0 11 1 11 0 1

⎞

⎠

over Q. Find the orders of e1, e2, e3 and e1 + e2 + e3 in the Q[x]-module M(A). Do any of these vectors generate M(A)? (Yes/No.)Verify that e1 + e2 generates M(A) and find an invertible matrix X

over Q satisfying XAX−1 = C(χA(x)). Specify generators of theeight submodules of M(A) and draw their lattice diagram.

(c) Let

A =⎛

⎝2 2 2

−1 −1 0−1 −1 −1

⎞

⎠

over Q. Show that the Q[x]-module M(A) is cyclic and specify aninvertible matrix X over Q satisfying XAX−1 = C(χA(x)). Specifynon-zero submodules N1 and N2 with M(A) = N1 ⊕ N2.

(d) Write C = C(f (x)) where f (x) is a monic polynomial of degree 3over a field F . Use the factorisation of f (x) into irreducible polyno-mials over F and Theorem 5.28 to show that the number of submod-ules of M(C) is 2, 4, 6 or 8. Sketch the (five) possible lattice diagramsand state which of them cannot occur in the cases

(i) F = R; (ii) F = C; (iii) F = Z2.

(e) Let A = (aij ) be a t × t matrix over a field F such that aii+1 �= 0 for1 ≤ i < t and aij = 0 whenever j > i +1. Show that the F [x]-moduleM(A) is cyclic with generator e1.


Hint: Establish 〈e1, e2, . . . , es〉 ⊆ 〈e1, xe1, . . . , xs−1e1〉 for 1 ≤ s ≤ t

by induction on s. Find the order of e1 in the Q[x]-module M(A)

where

A =

⎛

⎜⎜⎜⎝

0 1 0 02 0 1 0

0 0 0 10 0 2 0

⎞

⎟⎟⎟⎠

.

Specify a generator of the submodule N of M(A) satisfying0 �= N �= M(A). Is every non-zero element of N a generator of N?Locate the elements v with 〈v〉 = M(A).

(f) Let

A =⎛

⎝0 a b

0 c d

0 0 0

⎞

⎠

be a matrix over a field F . Find a necessary and sufficient condition ona, b, c, d for the F [x]-module M(A) to be cyclic with generator e1.

3. (a) Write down a proof of Lemma 5.23 using Lemma 2.7 as guide.(b) The element v0 of the Q[x]-module M has order x3 + x2 + x + 1.

Find the orders in M of:

(x2 + x)v0, (x4 + 2x2 + 1)v0, (2x2 + x − 1)v0,

(x9 − x)v0, (x7 − x6 + x5 − x4)v0.

(c) Let M = Z2[x]/K where K = 〈x4 + x〉. Find the orders in M of:K + x, K + x2, K + 1 + x + x2. Find those elements v ∈ M such thatM = 〈v〉.

(d) The cyclic F [x]-module M is generated by v0 having monic or-der d0(x). Let v in M have order d(x). Show

M = 〈v〉 ⇔ d(x) = d0(x).

The cyclic F [x]-module M ′ contains two elements v0 and v1 havingequal orders such that M ′ = 〈v0〉, M ′ �= 〈v1〉. What type of modulemust M ′ be? Specify an example of M ′, v0, v1 with this property.

(e) Let F be a field and let M be a cyclic F [x]-module with generator v0

of monic order f1(x)f2(x) where gcd{f1(x), f2(x)} = 1 in M . Showthat M = N1 ⊕ N2 where N1 = 〈f1(x)v0〉 and N2 = 〈f2(x)v0〉. Sup-pose v = v1 + v2 where v1 ∈ N1, v2 ∈ N2. Show that the order of v

in M is g1(x)g2(x) where gi(x) is the order of vi in Ni for i = 1,2.Deduce M = 〈v〉 ⇔ N1 = 〈v1〉 and N2 = 〈v2〉.


(f) Let M = Z3[x]/K where K = 〈x4 + x2〉 and write v0 = K + 1. WriteN1 = 〈x2v0〉 and N2 = 〈(x2 + 1)v0〉. What are the orders in M of

(i) x2v0 and (ii) (x2 + 1)v0?

Is M = N1 ⊕ N2? (Yes/No). Show that N1 contains 8 vectors v1 withN1 = 〈v1〉. How many vectors v2 ∈ N2 satisfy N2 = 〈v2〉? How manyvectors v ∈ M satisfy M = 〈v〉?

4. The F [x]-module M contains an element v0 of order d0(x) where F is afield.(a) Let N be a submodule of M . Show that

KN = {f (x) ∈ F [x] : f (x)v0 ∈ N}is an ideal of F [x] with 〈d0(x)〉 ⊆ KN . Show KN = 〈d0(x)〉 ⇔〈v0〉 ∩ N = 0.Let N1 and N2 be submodules of M . Show N1 ⊆ N2 ⇒ KN1 ⊆ KN2 .Show that the reverse implication is true in the case M = 〈v0〉.

(b) Let K be an ideal of F [x]. Show that NK = {f (x)v0 : f (x) ∈ K} is asubmodule of M . Specify a generator of the cyclic F [x]-module NK .Let K1 and K2 be ideals of F [x] containing 〈d0(x)〉. ShowK1 ⊆ K2 ⇔ NK1 ⊆ NK2 .

(c) Suppose M = 〈v0〉. Let L denote the set of submodules N of M andlet L′ denote the set of ideals K of F [x] with 〈d0(x)〉 ⊆ K . ShowN = NKN

for all N ∈ L. Show also K = KNKfor all K ∈ L

′. Deducethat N → KN is an inclusion-preserving bijection: L → L

′ with aninclusion-preserving inverse.

(d) Let M = Z3[x]/〈x3 − x〉 and v0 = 〈x3 − x〉 + 1 (so F = Z3, d0(x) =x3 − x). Specify generators d(x)v0 of each of the eight submodulesN of M where d(x)|d0(x). For each N specify a generator of KN .Arrange the submodules of M in their lattice diagram. Use the sieveformula (Exercises 4.1, Question 7(c)) to find the number of v ∈ M

with M = 〈v〉.(e) Answer (d) above in the case of

M = Z3[x]/〈x5 + x〉 and v0 = 〈x5 + x〉 + 1.

Hint: The irreducible factors of x5 + x over Z3 have degrees 1 and 2.5. (a) Let d0(x) be a monic polynomial of positive degree t over a field F .

Write M = F [x]/K where K = 〈d0(x)〉 and so M is a cyclicF [x]-module with generator K + 1. Let f (x) be a polynomial over F

and let N = {K + g(x) : f (x)g(x) ∈ K}. Show that N is a submod-ule of M . Let d(x) = gcd{f (x), d0(x)} and write q(x) = d0(x)/d(x).Show that K + q(x) generates N .


(b) Let A be a t × t matrix over a field F . Suppose that the F [x]-moduleM(A) is cyclic with generator v0 of order χA(x). Let f (x) be a poly-nomial over F . Show that N = {v ∈ F t : vf (A) = 0} is a submoduleof M(A). (You should know, from the theory of linear homogeneousequations, that N is a subspace of F t and dimN = t − rankf (A).) Asabove let d(x) = gcd{f (x),χA(x)} and q(x) = χA(x)/d(x). Showthat N is cyclic with generator v0q(A) of order d(x) in M(A). De-duce rankf (A) = t − degd(x).

(c) Let C = C((x2 + 1)(x − 1)) over Q. Verify rankC − I = 2 andrankC2 +I = 1. Determine the ranks of C100, C100 −C50, C100 +C50

and C100 − C52.6. (a) Working over an arbitrary field F , use Theorem 5.31 to find invert-

ible matrices Y1 and Y2 over F satisfying Y1C(x(x − 1)2)Y−11 =

C(x) ⊕ C((x − 1)2), Y2C(x2(x + 1)2)Y−12 = C(x2) ⊕ C((x + 1)2).

(b) Verify R(x − r, x3 + ax2 + bx + c) = r3 + ar2 + br + c and evaluateR(x2 + ax + b, x2 + cx + d) in the same way.

(c) Let f (x) and g(x) be monic polynomials of positive degrees s and t

over a field F . Show R(f (x), g(x)) = (−1)stR(g(x), f (x)).(d) Let T denote the t × t matrix over a field F with eiT = et+1−i for

1 ≤ i ≤ t . Show detT = (−1)t (t−1)/2.

6Canonical Forms and Similarity Classes

of Square Matrices over a Field

We are now ready to combine the theory developed in Chapters 4 and 5. Supposegiven a t × t matrix A over a field F . We show how to find an invertible t × t matrixX over F such that the structure of XAX−1 is revealed for all to see. It is too muchto expect XAX−1 to be diagonal although this can almost be achieved (any non-zero(i, j)-entries in the Jordan normal form of A have i = j or i +1 = j ) should all eigen-values of A belong to F . Rather we adopt an approach which works unconditionallyfor all square matrices: just as each finite abelian group G can be expressed as a directsum of cyclic subgroups, so X can always be found such that XAX−1 is a direct sumof companion matrices. We will see that the analogy between Z and F [x] extendsto one between the non-zero finite Z-module G and the F [x]-module M(A) deter-mined by A. This analogy is in no way hindered by the serendipitous fact that Cyclicand Companion begin with the same letter! Thus two finite non-trivial Z-modules areisomorphic if and only if they have the same isomorphism type

Cd1 ⊕ Cd2 ⊕ · · · ⊕ Cds

where dj |dj+1 for 1 ≤ j < s, d1 > 1, s ≥ 1, whereas two t × t matrices over F aresimilar if and only if they have the same rational canonical form (rcf )

C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x))

where dj (x)|dj+1(x) for 1 ≤ j < s, degd1(x) > 0, s ≥ 1. The reduction of the char-acteristic matrix xI − A over F [x] to its Smith normal form is the ‘lion’s share’ ofthe calculation of X such that XAX−1 is the rcf of A. Consequently it is possible


251

http://dx.doi.org/10.1007/978-1-4471-2730-7_6

252 6. Canonical Forms and Similarity Classes of Square Matrices over a Field

to resolve the question of whether or not two square matrices over F are similar byan algorithmic method in which factorisation problems do not arise. Further it is es-sentially the Euclidean algorithm which does the trick. The monic polynomials dj (x)

appearing in the rcf of A are unique and are called the invariant factors of A. This isthe content of Section 6.1.

Should the factorisation of the characteristic polynomial χA(x) = |xI − A| intoirreducible polynomials over F be known, it is relatively easy to derive the primarycanonical form (pcf ) of A and, where appropriate, the Jordan normal form (Jnf ) of A.This is carried out in Section 6.2.

In Section 6.3 we study the endomorphisms and automorphisms of the F [x]-module M(A), that is, the algebra of matrices in Mt (F ) which commute with A.Knowledge of the pcf of A will enable us to estimate the number of matrices similarto A and, in certain cases, find the partition of Mt (F ) into similarity classes.

6.1 The Rational Canonical Form of Square Matricesover a Field

Let F denote a field and let A be a t × t matrix over F . In order to analyse the F [x]-module M(A) determined by A (see Definition 5.8), we begin with the F [x]-module

F [x]t = F [x] ⊕ F [x] ⊕ · · · ⊕ F [x]which is the direct sum of t copies of F [x]. The elements of F [x]t are t-tuples(f1(x), f2(x), . . . , ft (x)) of polynomials over F . Addition in F [x]t is carried outcomponent-wise and the product of f (x) in F [x] by an element of F [x]t is givenby

f (x)(f1(x), f2(x), . . . , ft (x)) = (f (x)f1(x), f (x)f2(x), . . . , f (x)ft (x)).

We will not make use of the fact that F [x]t is a ring as there is no need in this contextto multiply two t-tuples together. However matrix multiplication will play a role: theelements of F [x]t are 1 × t matrices over F [x] and postmultiplication by matrices inMt (F [x]) produce endomorphisms of F [x]t .

Let ei(x) denote the element of F [x]t with 1(x) in position i and zeros elsewhere(1 ≤ i ≤ t). Then e1(x), e2(x), . . . , et (x) is an F [x]-basis of F [x]t called the standardbasis of F [x]t . So F [x]t is a free F [x]-module of rank t by Definition 2.22. Theconnection between F [x]t and the F [x]-module M(A) is provided by the evaluationhomomorphism

θA : F [x]t → M(A) defined by (f1(x), f2(x), . . . , ft (x))θA =t∑

i=1

fi(x)ei .

6.1 The Rational Canonical Form of Square Matrices over a Field 253

The verification that θA is an F [x]-linear mapping is left to the reader (Exercises 6.1,Question 8(b)). The expression

∑ti=1 fi(x)ei on the right-hand side of the above equa-

tion is an element of the F [x]-module M(A); so fi(x)ei = eifi(A) is row i of the t × t

matrix fi(A) over F by Definition 5.8. Therefore

t∑

i=1

fi(x)ei =t∑

i=1

eifi(A) ∈ F t

showing that θA maps t-tuples of polynomials to t-tuples of scalars. In particular(ei(x))θA = (0, . . . ,0,1(x),0, . . . ,0)θA = eiI = ei for 1 ≤ i ≤ t as 1(A) = I , show-ing that θA maps the standard basis of F [x]t to the standard basis of F t . More gener-ally

(t∑

i=1

aiei(x)

)

θA =t∑

i=1

aiei,

that is, θA maps each t-tuple of constant polynomials over F into the correspondingvector in F t for all A ∈ Mt (F ). As F t = 〈e1, e2, . . . , et 〉 ⊆ im θA we see im θA =F t = M(A), that is, θA is surjective.

The next lemma helps us ‘nail down’ the kernel of θA.

Lemma 6.1

Let K denote the submodule of F [x]t generated by the rows of xI − A. Thenf (x)ei(x) ≡ eif (A) (mod K) for f (x) ∈ F [x] and 1 ≤ i ≤ t .

Proof

Notice first that each element of K is expressible as the matrix product of an element(f1(x), f2(x), . . . , ft (x)) of F [x]t and xI − A, since

t∑

i=1

fi(x)(ei(x)(xI − A)) =(

t∑

i=1

fi(x)ei(x)

)

(xI − A).

So K is a submodule of F [x]t being the image of the ‘postmultiplication by xI − A’endomorphism of F [x]t . Write f (x) = ∑n

j=0 ajxj and let i satisfy 1 ≤ i ≤ t . Then

a0ei(x) = ei(a0I ) and so a0ei(x) ≡ ei(a0I ) (mod K). Also

ei(x)(xj I − Aj) = ei(x)(xj−1I + xj−2A + · · · + xAj−2 + Aj−1)(xI − A) ∈ K

for 1 ≤ j ≤ n which shows xj ei(x) = ei(x)xj I ≡ ei(x)Aj (mod K). Henceajx

j ei(x) ≡ ei(ajAj ) (mod K) for 0 ≤ j ≤ n. Adding up these n + 1 congruences

gives f (x)ei(x) ≡ eif (A) (mod K) for 1 ≤ i ≤ t . �


We show next that the kernel of θA is a free submodule of F [x]t . In fact, by thepolynomial analogue of Theorem 3.1, all submodules of F [x]t are free and of rank atmost t (Exercises 6.1, Question 8(a)). Here rank ker θA = t and what is more we aregiven an F [x]-basis of ker θA ‘on a plate’ as we show next.

Theorem 6.2

Let A be a t × t matrix over a field F . The rows of the characteristic matrix xI −A arethe elements of an F [x]-basis of ker θA, where θA : F [x]t → M(A) is the evaluationhomomorphism.

Proof

Let aij denote the (i, j)-entry in A for 1 ≤ i, j ≤ t . We establish three properties ofthe rows of xI − A: first they belong to ker θA, secondly they generate ker θA, andthirdly they are F [x]-linearly independent.

Row i of xI − A is ei(x)(xI − A) = ei(x)xI − eiA = xei(x) − (ai1, ai2, . . . , ait )

and so its image by θA is

(ei(x)(xI − A))θA = (xei(x))θA − (ai1, ai2, . . . , ait )θA

= xei − (ai1, ai2, . . . , ait ) = eiA − eiA = 0

for 1 ≤ i ≤ t . Therefore the t rows of xI − A belong to ker θA.Using the notation of Lemma 6.1, let K denote the submodule of F [x]t generated

by the rows of xI − A. By the preceding paragraph we obtain K ⊆ ker θA. Con-sider an element

∑ti=0 fi(x)ei(x) of F [x]t . Taking f (x) = fi(x) in Lemma 6.1 gives

fi(x)ei(x) ≡ eifi(A) (mod K) for 1 ≤ i ≤ t . Adding these t congruences togetherproduces

t∑

i=0

fi(x)ei(x) ≡t∑

i=0

eifi(A) (mod K).

Now suppose∑t

i=0 fi(x)ei(x) belongs to ker θA and so∑t

i=1 eifi(A) = 0. Therefore

t∑

i=0

fi(x)ei(x) ≡ 0 (mod K)

which means∑t

i=0 fi(x)ei(x) ∈ K . Therefore ker θA ⊆ K which gives ker θA = K .So the rows of xI − A generate ker θA.

Finally suppose the rows of xI − A to be F [x]-linearly dependent (we are look-ing for a contradiction). There is a non-zero element

∑ti=1 gi(x)ei(x) of F [x]t with


(g1(x), g2(x), . . . , gt (x))(xI −A) = 0 which is an equation between t-tuples of poly-nomials over F . Let gj0(x) have maximum degree among gi(x) for 1 ≤ i ≤ t . Thenxgj0(x) = ∑t

i=1 aij0gi(x) on comparing j0th entries. But

degxgj0(x) = 1 + deggj0(x) > degt∑

i=1

aij0gi(x)

and so the above polynomial equality is false as the degrees of its sides are unequal.Therefore the rows of xI −A are F [x]-linearly independent. So we’ve shown that thet rows of xI − A form an F [x]-basis of ker θA. �

Example 6.3

Take

F = Q and A =⎛

⎝1 1 10 2 10 0 1

⎞

⎠ .

We find the images of a few elements of Q[x]3 by θA : Q[x]3 → M(A). For instance

(2, x, x2)θA = (2e1(x) + xe2(x) + x2e3(x))θA = 2e1 + xe2 + x2e3

= 2e1 + e2A + e3A2 = (2,0,0) + (0,2,1) + (0,0,1) = (2,2,2).

In the same way

(x2,1 − x, x3)θA = e1A2 + e2 − e2A + e3A

3

= (1,3,3) + (0,1,0) − (0,2,1) + (0,0,1) = (1,2,3).

Denote row i of

xI − A =⎛

⎝x − 1 −1 −1

0 x − 2 −10 0 x − 1

⎞

⎠

by zi(x) for i = 1,2,3.Be careful to change the signs of all entries in A when constructing xI − A. In

this case z1(x) = (x − 1,−1 − 1), z2(x) = (0, x − 2,−1), z3(x) = (0,0, x − 1).By Theorem 6.2 we know that z1(x), z2(x), z3(x) is a Q[x]-basis of ker θA. Does(x2 − x,−2,−2) belong to ker θA? To answer this question apply θA and see whetheror not the resulting vector is zero. As

(x2 − x,−2,−2)θA = e1A2 − e1A − 2e2 − 2e3 = (1,3,3) − (1,1,1) − (0,2,2)

= (0,0,0)


we see (x2 − x,−2,−2) ∈ ker θA. As A is upper triangular (its (i, j)-entries are zerofor i > j ), it is easy to express elements of ker θA as polynomial linear combinationsof the rows of xI − A; here (x2 − x,−2,−2) = xz1(x) + z2(x) + z3(x).

The equations zi(x) = ei(x)(xI − A) for i = 1,2,3 show how the matrixxI − A relates the Q[x]-basis z1(x), z2(x), z3(x) of ker θA and the Q[x]-basise1(x), e2(x), e3(x) of Q[x]3. Can we find Q[x]-bases of ker θA and Q[x]3 relatedin this way by a diagonal matrix? If so, then it is a small step, as we set out below,to express M(A) as a direct sum of cyclic submodules and simultaneously find aninvertible 3 × 3 matrix X over Q such that XAX−1 is a direct sum of companionmatrices. Further, should the diagonal matrix referred to be in Smith normal formdiag(1(x), d1(x), d2(x)) where d1(x) �= 1(x), then XAX−1 = C(d1(x)) ⊕ C(d2(x))

where d1(x)|d2(x), that is, XAX−1 is in rational canonical form. The reader shouldnot be surprised that the answer to the above question is: Yes! A suitable Q[x]-basisof Q[x]3 is provided by the rows of Q(x) satisfying

P(x)(xI − A)Q(x)−1 = S(xI − A)

where P(x), Q(x) are invertible over Q[x] and S(xI − A) = diag(1, d1(x), d2(x)) isthe Smith normal form of xI − A. Using the method of Section 4.2, the sequence ofelementary operations:

c1 ↔ c2, −c1, c2 − (x − 1)c1, c3 + c1, r2 + (x − 2)r1,

r2 ↔ r3, c2 ↔ c3, r3 + r2

reduces xI − A to its Smith normal form

S(xI − A) = diag(1, x − 1, (x − 1)(x − 2))

showing d1(x) = x − 1, d2(x) = (x − 1)(x − 2). Applying the eros in the above se-quence to I gives

P(x) =⎛

⎝1 0 00 0 1

x − 2 1 1

⎞

⎠ and Q(x) =⎛

⎝x − 1 −1 −1

0 0 11 0 0

⎞

⎠

on applying the conjugates of the ecos in the above sequence to I . Then

P(x)(xI − A) = S(xI − A)Q(x)

and so P(x)(xI − A)Q(x)−1 = diag(1, x − 1, (x − 1)(x − 2)). Write ρi(x) =ei(x)Q(x) and so ρi(x) is row i of Q(x) for i = 1,2,3. By Corollary 2.23 we seethat ρ1(x), ρ2(x), ρ3(x) form a Q[x]-basis of Q[x]3 as detQ(x) = −1. By Theo-rem 6.2 the rows of xI − A form a Q[x]-basis of ker θA and as detP(x) = −1 thesame is true of the rows of P(x)(xI − A). So the rows of S(xI − A)Q(x), that


is, ρ1(x), (x − 1)ρ2(x), (x − 1)(x − 2)ρ3(x) form a Q[x]-basis of ker θA. We haveachieved the polynomial analogue of the Z-bases ρ1, ρ2, ρ3 of Z3 and ρ1,2ρ2,26ρ3

of ker θ used in Example 3.2. Here

(ρ1(x))θA = (x − 1,−1,−1)θA = e1A − e1 − e2 − e3 = (0,0,0)

has order 1 in M(A) and, being zero, it is a redundant generator. Write v1 =(ρ2(x))θA = (e3(x))θA = e3; then v1 has order x − 1 in M(A) as v1 �= 0, but(x − 1)v1 = e3A − e3 = e3 − e3 = (0,0,0). Write v2 = (ρ3(x))θA = (e1(x))θA = e1;then v2 has order (x − 1)(x − 2) in M(A) as (x − 1)v2 = (x − 1)e1 = (0,1,1) �= 0,

(x − 2)v2 = (x − 2)e1 = (−1,1,1) �= 0 and (x − 1)(x − 2)v2 = (x − 1)(−1,1,1) =(−1,1,1)(A − I ) = 0. Anticipating the theory in Theorem 6.5 below we conclude

M(A) = 〈v1〉 ⊕ 〈v2〉that is, M(A) is the direct sum of the cyclic submodule generated by v1 and the cyclicsubmodule generated by v2. Finally, combining Lemma 5.19, Corollary 5.20, Theo-rem 5.24 and Corollary 5.27, the vectors v1, v2, xv2 make up the basis B = Bv1 ∪Bv2

of Q3 and

X =⎛

⎝v1

v2

xv2

⎞

⎠ =⎛

⎜⎝

0 0 1

1 0 01 1 1

⎞

⎟⎠

satisfies

XAX−1 = C(x − 1) ⊕ C((x − 1)(x − 2)) =⎛

⎜⎝

1 0 0

0 0 10 −2 3

⎞

⎟⎠

which is the rational canonical form of A.

Definition 6.4

Let d1(x), d2(x), . . . , ds(x) be non-constant monic polynomials over a field F withdj (x)|dj+1(x) for 1 ≤ j < s. Let t = degd1(x) + degd2(x) + · · · + degds(x). Thenthe t × t matrix

C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x))

over F is said to be in rational canonical form (rcf ).

So to be in rcf a matrix must be the direct sum of companion matrices of polyno-mials each being a divisor of the next.

Our next theorem is the culmination of the theory in Chapters 4 and 5. It is theanalogue of Theorem 3.4 in the case of a finite abelian group G with |G| > 1.


Theorem 6.5 (The existence of the rational canonical form)

Let A be a t × t matrix over a field F . There are monic polynomials d1(x), d2(x), . . . ,

ds(x) of positive degree over F satisfying di(x)|di+1(x) for 1 ≤ i < s where s ≤ t andvectors vi of order di(x) in M(A) for 1 ≤ i ≤ s such that

M(A) = 〈v1〉 ⊕ 〈v2〉 ⊕ · · · ⊕ 〈vs〉.

Let X denote the invertible t × t matrix over F having as its rows the vectors of thebasis B = Bv1 ∪Bv2 ∪ · · · ∪Bvs of F t . Then

XAX−1 = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x))

is in rational canonical form and χA(x) = d1(x)d2(x) · · ·ds(x).

Proof

We use the evaluation homomorphism θA : F [x]t → M(A) which is surjective. ByTheorem 6.2 the rows of xI −A are the elements of an F [x]-basis of ker θA. ApplyingTheorem 4.16 with A(x) = xI − A, there are invertible t × t matrices P(x) and Q(x)

over F [x] satisfying

P(x)(xI − A) = S(xI − A)Q(x)

where S(xI − A) = diag(1, . . . ,1, d1(x), d2(x), . . . , ds(x)) is the Smith normal formof xI − A and 1 ≤ s ≤ t . The polynomials d1(x), d2(x), . . . , ds(x) are the non-constant diagonal entries in S(xI − A) and so are monic, of positive degree over F ,and satisfy di(x)|di+1(x) for 1 ≤ i < s by Definition 4.12. Write ρi(x) = ei(x)Q(x)

for 1 ≤ i ≤ t . As P(x) is invertible over F [x], the rows of P(x)(xI − A) form anF [x]-basis of ker θA by Corollary 2.21. Using the displayed equation above, the rowsof S(xI − A)Q(x) namely

ρ1(x), . . . , ρt−s(x), d1(x)ρt−s+1(x), . . . , ds(x)ρt (x)

also form an (actually the same!) F [x]-basis of ker θA. As Q(x) is invertible overF [x], its rows

ρ1(x), ρ2(x), . . . , ρt (x)

form an F [x]-basis of F [x]t by Corollary 2.23. We now mimic the proofof Theorem 3.4 by using these closely related F [x]-bases to decomposeM(A) ∼= F [x]t /ker θA. As ρ1(x), ρ2(x), . . . , ρt (x) generate the F [x]-moduleF [x]t and θA is surjective, we see that (ρ1(x))θA, (ρ2(x))θA, . . . , (ρt (x))θA

generate the F [x]-module M(A) = im θA. But (ρi(x))θA = 0 for 1 ≤ i ≤ t − s as


ρ1(x), . . . , ρt−s(x) belong to ker θA, being the first t − s elements of the above F [x]-basis of ker θA. Discarding these t − s redundant generators, all of which are zero, wesee that the remaining s vectors (ρt−s+1(x))θA, . . . , (ρt (x))θA generate M(A). Writevi = (ρt−s+i (x))θA for 1 ≤ i ≤ s. Then M(A) = 〈v1〉 + 〈v2〉 + · · · + 〈vs〉 showingM(A) to be the sum of s non-trivial cyclic submodules. Next Lemma 2.15 is used toshow that M(A) is the internal direct sum of 〈v1〉, 〈v2〉, . . . , 〈vs〉 and to determine theorder of each vi in M(A). Suppose

u1 + u2 + · · · + us = 0 (♣)

where ui ∈ 〈vi〉 for 1 ≤ i ≤ s. So ui = fi(x)vi where fi(x) ∈ F [x] for 1 ≤ i ≤ s.Substituting for ui and vi equation (♣) gives

(f1(x)ρt−s+1(x) + f2(x)ρt−s+2(x) + · · · + fs(x)ρt (x))θA

= f1(x)((ρt−s+1(x))θA) + f2(x)((ρt−s+2(x))θA) + · · · + fs(x)((ρt (x))θA)

= f1(x)v1 + f2(x)v2 + · · · + fs(x)vs = u1 + u2 + · · · + us = 0

which shows that f1(x)ρt−s+1(x) + f2(x)ρt−s+2(x) + · · · + fs(x)ρt (x) belongsto ker θA. Using the F [x]-basis of ker θA displayed above, there are polynomials qi(x)

over F (1 ≤ i ≤ t) with

f1(x)ρt−s+1(x) + f2(x)ρt−s+2(x) + · · · + fs(x)ρt (x)

= q1(x)ρ1(x) + · · · + qt−s(x)ρt−s(x) + qt−s+1(x)d1(x)ρt−s+1(x) + · · ·+ qt (x)ds(x)ρt (x).

As ρ1(x), ρ2(x), . . . , ρt (x) are F [x]-linearly independent, the coefficients of ρi(x) onopposite sides of the above equation are equal: so qi(x) = 0 for 1 ≤ i ≤ t − s (sincethese ρi(x) appear on one side only), and the ρt−s+i (x) give

fi(x) = qt−s+i (x)di(x) for 1 ≤ i ≤ s.

On substituting for fi(x) and vi we obtain

ui = fi(x)vi = qt−s+i (x)di(x)(ρt−s+i (x))θA

= qt−s+i (x)((di(x)ρt−s+i (x))θA) = 0

since di(x)ρt−s+i (x) ∈ ker θA for 1 ≤ i ≤ s. By Lemma 2.15 we conclude

M(A) = 〈v1〉 ⊕ 〈v2〉 ⊕ · · · ⊕ 〈vs〉

showing that M(A) is the internal direct sum of s non-zero cyclic submodules.


Let Ki be the order ideal Definition 5.11 of vi in M(A) for 1 ≤ i ≤ s. Asdi(x)ρt−s+i (x) ∈ ker θA, we see

di(x)vi(x) = di(x)((ρt−s+i (x))θA) = (di(x)ρt−s+i (x))θA = 0

showing di(x) ∈ Ki . Conversely suppose gi(x) ∈ Ki . Then gi(x)vi = 0 which leads togi(x)ρt−s+i (x) ∈ ker θA. Using the above F [x]-basis of ker θA consisting of the rowsof S(xI − A)Q(x) we deduce di(x)|gi(x). Therefore Ki = 〈di(x)〉 showing that vi

has order di(x) in M(A) for 1 ≤ i ≤ s.Finally we construct a matrix X such that XAX−1 is in rcf Definition 6.4. Let

α : F t → F t be the linear mapping determined by the given t × t matrix A over F andso (v)α = xv = vA for all v ∈ F t . Write Ni = 〈vi〉 for 1 ≤ i ≤ s. By Theorem 5.24the vectors xjvi for j = 0,1,2, . . . ,degdi(x)− 1, in that order, make up the basis Bvi

of Ni . By the discussion following Theorem 5.24, the matrix of α|Ni: Ni → Ni , the

restriction of α to the cyclic submodule Ni , relative to Bviis the companion matrix

C(di(x)) for 1 ≤ i ≤ s. By the first part of the proof M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Ns .Therefore B = Bv1 ∪ Bv2 ∪ · · · ∪ Bvs is a basis of F t by Lemma 5.18. Let X be thet × t matrix having the vectors of B as its rows. Using Lemma 5.2 and Corollary 5.20we obtain

XAX−1 = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)).

The polynomials di(x) for 1 ≤ i ≤ s satisfy Definition 6.4 and so XAX−1 is in rationalcanonical form. The above matrix equation leads to

X(xI − A)X−1 = (xI − C(d1(x))) ⊕ (xI − C(d2(x))) ⊕ · · · ⊕ (xI − C(ds(x))).

Taking determinants:

χA(x) = |xI − A| = |X||xI − A||X|−1 = |X(xI − A)X−1|= |(xI − C(d1(x))) ⊕ (xI − C(d2(x))) ⊕ · · · ⊕ (xI − C(ds(x)))|= |xI − C(d1(x))||xI − C(d2(x))| · · · |xI − C(ds(x))| = d1(x)d2(x) · · ·ds(x)

on using the theory following Definition 5.17 and Theorem 5.26. �

The ‘end of the road’ is in sight thanks to Theorem 6.5 and some comments are inorder. First the F [x]-module M(A) decomposes into a direct sum of cyclic modulesfor any t × t matrix A over any field F . The set-up is uncannily analogous to that offinite abelian groups Theorem 3.7: although the decomposition itself is not in generalunique, the divisor sequence (d1(x), d2(x), . . . , ds(x)) as in Theorem 6.5 is uniquelydetermined by A (see Theorem 6.6 below) and characterises the similarity class of A.

Secondly the reader is now able to transform any t × t matrix A over any fieldF into its rcf XAX−1. The ‘stumbling-block’ is the elementary but laborious process


of calculating S(xI − A) and finding a suitable matrix Q(x) as in Theorem 4.16;the reader will be aware that all the foregoing examples of A are either ‘nice’ froma theoretical point of view or have t ≤ 3. Nevertheless the rcf is as good as one canhope for: each similarity class contains a unique matrix in rcf and, being a directsum of companion matrices, the Cayley–Hamilton theorem Corollary 6.11 is a directconsequence of its existence.

We now prepare to prove the polynomial analogue of Theorem 3.7: for each t × t

matrix A over a field F the polynomials d1(x), d2(x), . . . , ds(x) in Theorem 6.5 areunique. Let d(x) be a polynomial over the field F and let M be an F [x]-module.Then μd(x) : M → M , defined by (v)μd(x) = d(x)v for all v ∈ M , is F [x]-linear. Inother words the mapping μd(x), which multiplies each element v of M by d(x), is anendomorphism of M . Write

imμd(x) = d(x)M and kerμd(x) = M(d(x))

and so d(x)M and M(d(x)) are submodules of M .For F [x]-modules M and M ′ we have

d(x)(M ⊕ M ′) = (d(x)M) ⊕ (d(x)M ′) and

(M ⊕ M ′)(d(x)) = M(d(x)) ⊕ M ′(d(x))

showing that the external direct sum (Exercises 2.3, Question 7(f)) is respected in thiscontext. We are duty bound to consider decompositions of modules into direct sumsof cyclic modules as in Theorem 6.5 and it is convenient to use the most economicnotation Lemma 5.22 for cyclic torsion F [x]-modules, namely

F [x]/〈d0(x)〉 with d0(x) monic

which is cyclic with generator

〈d0(x)〉 + 1(x) of order d0(x).

From Lemmas 5.22 and 5.23 we deduce

d(x)(F [x]/〈d0(x)〉) ∼= F [x]/〈d0(x)/gcd{d(x), d0(x)}〉as the above module on the left is cyclic with generator 〈d0(x)〉 + d(x) of orderd0(x)/gcd{d(x), d0(x)}. Using Lemma 5.23 again gives

(F [x]/〈d0(x)〉)(d(x))∼= F [x]/〈gcd{d(x), d0(x)}〉

as the coset 〈d0(x)〉+d0(x)/gcd{d(x), d0(x)} of order gcd{d(x), d0(x)} generates theabove module on the left.

The analogy between Theorems 3.7 and 6.6 is so close that the reader is merelygiven a start on the proof of Theorem 6.6 and then encouraged to complete it byreferring back to Theorem 3.7.


Theorem 6.6 (The invariance theorem for F [x]-modules M(A))

Let M and M ′ be isomorphic F [x]-modules where F is a field. Suppose

M ∼= F [x]/〈d1(x)〉 ⊕ F [x]/〈d2(x)〉 ⊕ · · · ⊕ F [x]/〈ds(x)〉

where d1(x), d2(x), . . . , ds(x) are monic polynomials of positive degree satisfyingdi(x)|di+1(x) for 1 ≤ i < s. Also suppose

M ′ ∼= F [x]/〈d ′1(x)〉 ⊕ F [x]/〈d ′

2(x)〉 ⊕ · · · ⊕ F [x]/〈d ′s′(x)〉

where d ′1(x), d ′

2(x), . . . , d ′s′(x) are monic polynomials of positive degree satisfying

d ′i (x)|d ′

i+1(x) for 1 ≤ i < s′. Then s = s′ and di(x) = d ′i (x) for 1 ≤ i ≤ s.

Proof

By hypothesis there is an isomorphism α : M ∼= M ′. Let d(x) ∈ F [x]. The F [x]-linearity of α gives μd(x)α = αμd(x) as (d(x)v)α = d(x)(v)α for all v ∈ M . There-fore (d(x)M)α = {(d(x)v)α : v ∈ M} = {d(x)(v)α : v ∈ M} ⊆ d(x)M ′ as (v)α ∈ M ′for v ∈ M . Replacing α by α−1 gives (d(x)M ′)α−1 ⊆ d(x)M . Applying α to thisinclusion gives d(x)M ′ ⊆ (d(x)M)α. Therefore (d(x)M)α = d(x)M ′ showing thatthe submodules d(x)M and d(x)M ′ correspond under α and so are isomorphic. Wewrite α| : d(x)M ∼= d(x)M ′ as the restriction α| is an isomorphism between d(x)M

and d(x)M ′.In the same way for v ∈ M(d(x)) we have (v)μd(x) = 0 and so 0 = (v)μd(x)α =

(v)αμd(x) showing (v)α ∈ M ′(d(x)). We’ve shown (M(d(x)))α ⊆ M ′

(d(x)). Replacing

α by α−1 gives (M ′(d(x)))α

−1 ⊆ M(d(x)) and so on applying α we obtain M ′(d(x)) ⊆

(M(d(x)))α. Therefore (M(d(x)))α = M ′(d(x)) showing that the submodules M(d(x)) and

M ′(d(x)) correspond under α and so are isomorphic. As before we write α| : M(d(x))

∼=M ′

(d(x)). Now take d(x) = d1(x). Mimicking the proof of Theorem 3.7 step-by-step(Exercises 6.1, Question 8(c)) leads to the conclusion s = s′ and di(x) = d ′

i (x) for1 ≤ i ≤ s. �

The following corollary is a direct consequence of Theorem 6.6.

Corollary 6.7 (The uniqueness of the rational canonical form)

Let A be a t × t matrix over a field F . The monic polynomials d1(x), d2(x), . . . , ds(x)

of positive degree over F as in Theorem 6.5 are unique. Also A is similar to a uniquematrix in rational canonical form Definition 6.4.


Proof

Suppose as well as the decomposition of M(A) described in Theorem 6.5 we haveM(A) = 〈v′

1〉 ⊕ 〈v′2〉 ⊕ · · · ⊕ 〈v′

s′〉 where v′i �= 0 has order d ′

i (x) in M(A) for1 ≤ i ≤ s′ and d ′

i (x)|d ′i+1(x) for 1 ≤ i < s′. Now 〈v′

i〉 ∼= F [x]/〈d ′i (x)〉 by Lemma 5.22

for 1 ≤ i < s′. Applying Theorem 6.6 with M = M ′ = M(A) and α = ι, the identitymapping of M , gives s = s′ and di(x) = d ′

i (x) for 1 ≤ i ≤ s. So the polynomialsd1(x), d2(x), . . . , ds(x) are unique.

Suppose A ∼ C = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)) and A ∼ C′ =C(d ′

1(x))⊕C(d ′2(x))⊕· · ·⊕C(d ′

s′(x)) where C and C′ are in rcf Definition 6.4. ThenC ∼ C′ and so M(C) ∼= M(C′) by Theorem 5.13. Write r(i) = 1+∑

j<i degdj (x) for1 ≤ i ≤ s. Then vi = er(i) has order di(x) in M(C) for 1 ≤ i ≤ s by Theorem 5.26 andM(C) = 〈v1〉⊕ 〈v2〉⊕· · ·⊕ 〈vs〉. In the same way M(C′) = 〈v′

1〉⊕ 〈v′2〉⊕· · ·⊕ 〈v′

s′〉where v′

i has order d ′i (x) in M(C′) for 1 ≤ i ≤ s′. Applying Theorem 6.6 with

M = M(C) and M ′ = M(C′) we conclude s = s′ and di(x) = d ′i (x) for 1 ≤ i ≤ s.

So A is similar to a unique matrix in rcf. �

It is now legitimate to make the next definition.

Definition 6.8

Let A be a t × t matrix over a field F . The sequence (d1(x), d2(x), . . . , ds(x)) of poly-nomials as in Theorem 6.5 is called the invariant factor sequence of A. The uniquematrix C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)) as in Definition 6.4 which is similar toA is called the rational canonical form of A.

A number of important things are immediately apparent. Suppose we are presentedwith two t × t matrices A and A′ over a field F . How can we determine whether ornot A and A′ are similar? The method should be almost second nature to the reader:reduce xI − A and xI − A′ to their Smith normal forms S(xI − A) and S(xI − A′)using elementary operations over F [x]. Then

A ∼ A′ ⇔ S(xI − A) = S(xI − A′)

that is, A and A′ are similar if and only if the Smith normal forms of their characteristicmatrices are equal. Also

S(xI − A) = diag(1,1, . . . ,1, d1(x), d2(x), . . . , ds(x))

that is, the invariant factors di(x) of A are the last s diagonal entries in S(xI − A) for1 ≤ i ≤ s being preceded by t − s constant polynomials 1 = 1(x) where s ≤ t . So

A and A′ are similar ⇔ their invariant factor sequences are equal.


Using Theorem 5.13 we see

M(A) ∼= M(A′) ⇔ A and A′ have equal invariant factor sequences.

As an illustration consider the ring M3(Z2) of 3 × 3 matrices A over Z2. There are29 = 512 such matrices, there being 2 choices for each of the 9 entries in A. Thereare 23 = 8 possibilities for the characteristic polynomial of A as χA(x) = |xI − A| =x3 + a2x

2 + a1x + a0 there being 2 choices for each of a0, a1, a2. So there are 8 sim-ilarity classes of matrices A with M(A) cyclic by Corollary 5.27, such A having asingle invariant factor, namely χA(x) and S(xI − A) = diag(1,1, χA(x)). We knowd1(x)d2(x) · · ·ds(x) = χA(x) by Theorem 6.5, that is, the invariant factors of A arecertain divisors of the characteristic polynomial of A. For M(A) non-cyclic we haves ≥ 2 and so d1(x)2|χA(x) as d1(x)|d2(x). As 2 degd1(x) ≤ degχA(x) = 3 we seedegd1(x) = 1 and d2(x) is reducible of degree at most 2. For s = 3 the possible in-variant factor sequences are (x, x, x) and (x + 1, x + 1, x + 1) arising from A = 0 andA = I respectively. For s = 2 the possible invariant factor sequences are

(x, x2), (x + 1, (x + 1)2), (x, x(x + 1)), (x + 1, x(x + 1)).

So in all there are 14 similarity classes of 3 × 3 matrices over Z2 of which 6 arisefrom invertible matrices A, that is, those satisfying χA(0) �= 0. So the group GL3(Z2)

of order 168 partitions into 6 conjugacy classes (in GLt (F ) the terms ‘similarity’ and‘conjugacy’ are interchangeable). By Theorem 5.13 there are 14 isomorphism classesof Z2[x]-modules M(A) for A ∈ M3(Z2).

We now introduce the polynomial analogue of the exponent Definition 3.11 of afinite abelian group.

Definition 6.9

Let A be a t × t matrix over a field F . The monic polynomial μA(x) of least degreeover F satisfying μA(A) = 0 is called the minimum polynomial of A.

It is not clear at the outset that each t × t matrix A over F has a polynomial μA(x)

as in Definition 6.9. However it is clear that the constant polynomial 1(x) cannot be theminimum polynomial of any A since 1(A) = I by Definition 5.8 and I �= 0 (the iden-tity matrix cannot equal the zero matrix). So if μA(x) exists then degμ(x) ≥ 1. So theminimum polynomial of the zero t × t matrix 0 over F is μ0(x) = x since μ0(0) = 0and no other monic polynomial of degree 1 has this property. In the same way theminimum polynomial of the identity t × t matrix I over F is μI (x) = x − 1 sincedegμI (x) = 1 and μI (I) = I − I = 0. More generally μA(x) = x − a ⇔ A = aI fora ∈ F , that is, scalar matrices and only scalar matrices have minimum polynomials of


degree 1. Does the 3 × 3 matrix

A = C(x) ⊕ C(x2) =⎛

⎜⎝

0 0 0

0 0 10 0 0

⎞

⎟⎠

over Q have a minimum polynomial? A likely candidate is x2 as A2 = 0, but havingDefinition 6.9 in mind we must ask: could there be another monic polynomial f (x)

of degree 2 over Q with f (A) = 0? If so then r(x) = f (x) − x2 = ax + b satisfiesr(A) = 0 where a, b ∈ Q, that is, aA + bI = 0. But

aA + bI =⎛

⎜⎝

b 0 0

0 b a

0 0 b

⎞

⎟⎠ =

⎛

⎝0 0 00 0 00 0 0

⎞

⎠

gives a = b = 0. Therefore f (x) = x2 and we conclude μA(x) = x2 from Defini-tion 6.9. Now (x, x2) is the invariant factor sequence of the above 3 × 3 matrix A

and so, in this case, the minimum polynomial and the largest (and last) invariant fac-tor coincide. Our next corollary (analogous to part one of Corollary 3.12) shows thatthis is always true. It follows that each square matrix over a field does have a uniqueminimum polynomial as described in Definition 6.9.

Corollary 6.10

Let A be a t × t matrix over a field F . Let (d1(x), d2(x), . . . , ds(x)) denote the invari-ant factor sequence of A. Let KA = {f (x) ∈ F [x] : f (A) = 0}. Then KA = 〈ds(x)〉,that is, KA is the principal ideal of F [x] with generator ds(x). Also μA(x) = ds(x)

and similar matrices have equal minimum polynomials.

Proof

Notice first KA = ker εA where εA : F [x] → Mt (F ) is the evaluation at A ring homo-morphism Definition 5.8 given by (f (x))εA = f (A) for all f (x) ∈ F [x]. ThereforeKA is an ideal of F [x] by Exercises 2.3, Question 3(b) and is called the annihilatorideal KA of A. Our first task is to show ds(x) ∈ KA which we do using the theoryof Chapter 5. Write C = C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)) for the rcf Defini-tion 6.8 of A. By Theorem 6.5 there is an invertible t × t matrix X over F withXAX−1 = C. By Lemma 5.12, with C in place of B , we have Xf (A)X−1 = f (C)

for all f (x) ∈ F [x]. Therefore f (A) = 0 ⇔ f (C) = 0 which shows KA = KC ; infact similar matrices have equal annihilator ideals. From the discussion of matrix di-


rect sum after Definition 5.17 and Exercises 5.1, Question 2(d) we obtain

f (C) = f (C(d1(x)))⊕f (C(d2(x)))⊕· · ·⊕f (C(ds(x))) for all f (x) ∈ F [x] (♠)

which expresses f (C) as a direct sum of matrices f (C(di(x))) for 1 ≤ i ≤ s. ByTheorem 5.26 we know di(C(di(x))) = 0 for 1 ≤ i ≤ s. However di(x)|ds(x) andso di(C(di(x))) is a factor of ds(C(di(x))) for 1 ≤ i ≤ s using the evaluation homo-morphism εC(di (x)) of Definition 5.8. So ds(C(di(x))) = 0 for 1 ≤ i ≤ s and henceds(C) = 0 as each of the s matrices in (♠), with f (x) = ds(x), is zero. So ds(x) ∈ KA.

We now show that ds(x) generates the ideal KA. Consider f (x) ∈ KA. Thenf (A) = 0 and so f (C) = Xf (A)X−1 = X0X−1 = 0. From (♠) we deducef (C(ds(x))) = 0 as a direct sum of matrices is zero if only if each summand (each in-dividual matrix in the direct sum) is zero. The element e1 has order ds(x) in the F [x]-module M = M(C(ds(x))) by Theorem 5.26. As f (x)e1 = e1f (C(ds(x))) = e10 = 0in M we deduce ds(x)|f (x) from Definition 5.11. From the discussion followingDefinition 4.3 we conclude KA = 〈ds(x)〉.

Now ds(x) is the unique monic generator of KA by Theorem 4.4. Let f (x) in KA

be monic. Then ds(x)|f (x) implies degds(x) ≤ degf (x) and degds(x) = degf (x)

implies ds(x) = f (x). Therefore ds(x) is the monic polynomial of least degree in KA,that is, ds(x) = μA(x) by Definition 6.9. By Corollary 6.7 similar matrices have equalsequences of invariant factors and so, on comparing the last polynomials in these se-quences, their minimum polynomials are also equal. �

The minimum polynomial μA(x) is a useful similarity invariant. In the 2 × 2 caseit alone ‘does the job’: the 2 × 2 matrices A and B over a field F are similar if andonly if μA(x) = μB(x) (see Exercises 6.1, Question 6(a)).

The relationship between μA(x) and the characteristic polynomial χA(x) is thesubject of the next corollary, which is analogous to the last part of Corollary 3.12.

Corollary 6.11

Let μA(x) and χA(x) be the minimum and characteristic polynomials of the t × t

matrix A over the field F . Then μA(x)|χA(x) and

χA(A) = 0 (the Cayley–Hamilton theorem).

Also χA(x)|(μA(x))t . The polynomials χA(x) and μA(x) have the same irreduciblefactors over F .


Proof

As usual let (d1(x), d2(x), . . . , ds(x)) denote the invariant factor sequence of A. ThenχA(x) = det(xI − A) = d1(x)d2(x) · · ·ds(x) by Theorem 6.5. As μA(x) = ds(x) byCorollary 6.10 we see χA(x) = d1(x)d2(x) · · ·ds−1(x)μA(x) showing μA(x)|χA(x).Evaluation Definition 5.8 at A gives χA(A) = d1(A)d2(A) · · ·ds−1(A)μA(A) = 0since μA(A) = 0 which establishes the Cayley–Hamilton theorem for square matri-ces over a field.

As di(x)|di+1(x) there is a monic polynomial qi(x) over F with di(x)qi(x) =di+1(x) for 1 ≤ i < s. Hence di(x)|ds(x) for 1 ≤ i ≤ s as

di(x)qi(x)qi+1(x) · · ·qs−1(x) = ds(x).

Multiplying these s equations together and substituting χA(x) = d1(x)d2(x) · · ·ds(x),μA(x) = ds(x) gives χA(x)q1(x)q2(x)2 · · ·qs−1(x)s−1 = (μA(x))s which showsχA(x)|(μA(x))s . As s ≤ t we also obtain χA(x)|(μA(x))t .

Let p(x) be irreducible Definition 4.7 over F . Suppose p(x)|χA(x). FromχA(x)|(μA(x))t we deduce p(x)|(μA(x))t . Hence p(x)|μA(x) as an irreducible divi-sor of a product of polynomials must be a divisor of at least one of the polynomials.Conversely suppose p(x)|μA(x). From μA(x)|χA(x) we deduce directly p(x)|χA(x).Therefore χA(x) and μA(x) have the same irreducible factors over F . �

In Section 6.2 we assume that the factorisation

χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk

of χA(x) into positive powers nj of monic irreducible polynomials pj (x) over F isknown. By Corollary 6.11 the factorisation of μA(x) involves positive (but no larger)powers of the same monic irreducible polynomials, that is,

μA(x) = p1(x)n′1p2(x)n

′2 · · ·pk(x)n

′k where 1 ≤ n′

j ≤ nj for 1 ≤ j ≤ k.

One further comment: the Cayley–Hamilton theorem holds for square matrices overcommutative rings (see P.M. Cohn: Algebra, Volume 1, Wiley (1974)). However theabove proof does not ‘work’ in this general setting as it depends on the existence ofthe rcf Definition 6.4.

EXERCISES 6.1

1. (a) For each of the following 3×3 matrices A over the rational field Q re-duce its characteristic matrix xI −A to Smith normal form S(xI −A)


noting the eros and ecos used in the reduction. State the invariant fac-tor sequence of A and find invertible 3 × 3 matrices P(x) and Q(x)

over Q[x] satisfying P(x)(xI −A) = S(xI −A)Q(x). Specify an in-vertible 3 × 3 matrix X over Q such that XAX−1 is in rational canon-ical form.

(i)

⎛

⎝1 −1 1

−1 1 −1−2 2 −2

⎞

⎠ ; (ii)

⎛

⎝1 1 −1

−1 −1 12 2 −2

⎞

⎠ ;

(iii)

⎛

⎝1 1 −11 0 −15 3 −4

⎞

⎠ .

(b) The t × t matrix A over a field F has sequence

(d1(x), d2(x), . . . , ds(x))

of invariant factors (t ≥ 2). Use the rcf of A to show rankA ≥ t − s.Show rankA = 1 implies s = t − 1, dj (x) = x for 1 ≤ j ≤ t − 2 anddt−1(x) = x(x − traceA).

(c) Let A be a t × t matrix over a field F and let P(x), Q(x) be invertiblet × t matrices over F [x] satisfying P(x)(xI − A) = S(xI − A)Q(x).Show detP(x) = detQ(x).

2. (a) The t × t matrix A over a field F has sequence

(d1(x), d2(x), . . . , ds(x))

of invariant factors (t ≥ 2). For λ ∈ F show that A − λI has sequenceof invariant factors (d1(x + λ), d2(x + λ), . . . , ds(x + λ)).Hint: Replace x by x + λ in

P(x)(xI − A) = diag(1,1, . . . ,1, d1(x), d2(x), . . . , ds(x))Q(x).

(b) Find the invariant factor sequences of the following matrices over Q:

(i)

⎛

⎝2 −1 1

−1 2 −1−2 2 −1

⎞

⎠ ; (ii)

⎛

⎝3 1 −1

−1 1 12 2 0

⎞

⎠ ;

(iii)

⎛

⎝6 1 −11 5 −15 3 1

⎞

⎠ .

Hint: Use the answer to Question 1(a) above.


(c) Let d(x) be a monic polynomial of positive degree over a field F .Show nullity C(d(x)) ≤ 1. Show also nullity C(d(x)) = 1 if and onlyif d(0) = 0.(Remember nullityA = dim{v ∈ F t : vA = 0} where A is a t × t ma-trix over F .)Let the t × t matrix A have invariant factors d1(x), d2(x), . . . , ds(x).Write nullity A = n. Show n ≤ s. Show also nullity A = n if and onlyif x|di(x) for s − n < i ≤ s.

(d) Write C0 = C(x4) over an arbitrary field F . Determine the invariantfactors of C2

0 , C30 and C4

0 . Write C1 = C((x2 +1)2) over F . Determinethe invariant factors of C2

1 + I . Are C20 and C2

1 + I similar? Specifyinvertible 4 × 4 matrices X0 and X1 over F such that X0C

20X−1

0 is inrcf and X1(C

21 + I )X−1

1 is in rcf.Write C = C(d(x)n) where d(x) is a monic polynomial of positivedegree m over F and n is a positive integer. Use a basis of Fmn con-sisting of vectors, suitably ordered, of the type xid(x)j e1 in the F [x]-module M(C) for 0 ≤ i < m, 0 ≤ j < n, to construct an invertiblematrix X over F such that Xd(C)X−1 is in rcf. Are all the invariantfactors of d(C) equal to each other? Is d(C) similar to C(xmn)m?

(e) Write C = C(xn) over an arbitrary field F where n is a positive in-teger. Let m be an integer with 1 ≤ m ≤ n and suppose n = qm + r

with 0 ≤ r < m. Show that the invariant factors of Cm are xq (m − r

times), xq+1 (r times).Hint: Transform Cm into rcf by using, suitably ordered, the basisvectors xjm+ie1 in the module M(C) for 0 ≤ jm + i < n where0 ≤ i < m, 0 ≤ j ≤ q .

(f) Write C = C(x4 +x2 +1) over an arbitrary field F . Find an invertible4 × 4 matrix X over F such that

XC2X−1 = C(x2 + x + 1) ⊕ C(x2 + x + 1).

More generally let d(x) be a monic polynomial of positive degree t

over a field F . Write C = C(d(x2)). Find an invertible 2t × 2t matrixX over F satisfying

XC2X−1 = C(d(x)) ⊕ C(d(x)).

What is the rcf of C(d(x4))4?Hint: Use the module M(C2) to find X.

3. (a) Let A be a t × t matrix over a field F where χ(F ) �= 2 (−1 �= 1in F ). Suppose v has order f (x) in the F [x]-module M(A). Showthat (−1)degf (x)f (−x) is the order of v in the F [x]-module M(−A).


Suppose that (d1(x), d2(x), . . . , ds(x)) is the invariant factor sequenceof A. Describe the invariant factor sequence of −A in terms of thepolynomials dj (x) (1 ≤ j ≤ s). State a necessary and sufficient con-dition on dj (x) (1 ≤ j ≤ s) for −A ∼ A.

(b) Decide whether or not −A ∼ A in the case of

A =⎛

⎝1 2 32 1 11 −1 −2

⎞

⎠

over Q.(c) Let A be a 3 × 3 matrix over a field F of characteristic not 2. Is the

following statement true? −A ∼ A ⇔ |A| = traceA = 0. (Either finda proof or produce a counter-example.)

4. (a) Let A be a t × t matrix over a field F . By transposing the equationP(x)(xI − A) = S(xI − A)Q(x) show that A and AT have the samesequence of invariant factors and deduce A ∼ AT .

(b) Let f (x) = a0 + a1x + a2x2 + a3x

3 + x4 be a monic polynomial overa field F and write

Rf =

⎛

⎜⎜⎝

a1 a2 a3 1a2 a3 1 0a3 1 0 01 0 0 0

⎞

⎟⎟⎠ .

Show that Rf C(f (x)) is symmetric and deduce Rf C(f (x))R−1f =

C(f (x))T . More generally let f (x) be a monic polynomial of posi-tive degree t over F . Specify a symmetric invertible t × t matrix Rf

satisfying Rf C(f (x))R−1f = C(f (x))T .

(c) Write C = C(d1(x)) ⊕ C(d2(x)) where d1(x) = x2 + a1x + a0, andd2(x) = x2 + b1x + b0 are polynomials over F . Using the notation of(b) above, show that R = Rd1 ⊕ Rd2 is a symmetric invertible 4 × 4matrix over F satisfying RCR−1 = CT . More generally suppose thet × t matrix C over F to be in rcf. Specify a symmetric invertible t × t

matrix R with RCR−1 = CT .(d) Let A be an arbitrary t × t matrix over F . Show that there is a sym-

metric invertible t × t matrix Y such that YAY−1 = AT .Hint: Consider Y = XT RX where XAX−1 = C is in rcf.

(e) Find Y as in (d) above in the case of the 3×3 matrix A of Question 1(i)above.

(f) Let F be a field and let U be a subspace of F t . Write s = dimU andUo = {v ∈ F t : uvT = 0 for all u ∈ U}.


Let A be a t × t matrix over F and let Y , as in (d) above, be a sym-metric and invertible t × t matrix over F with YAY−1 = AT . Letγ : F t ∼= F t be the isomorphism determined Definition 5.8 by Y .Let N be a submodule of M(A). Show that No is a submodule ofM(AT ) and (No)γ is a submodule of M(A). Write (N)π = (No)γ

and denote by L(M(A)) the set of all submodules of M(A). Show thatπ : L(M(A)) → L(M(A)) is a polarity, that is,(i) (N)π2 = N for all N ∈ L(M(A)),

(ii) N1 ⊆ N2 ⇔ (N2)π ⊆ (N1)π where N1,N2 ∈ L(M(A))

(π is inclusion-reversing).Hint: For (i) show (N)γ −1 ⊆ ((No)γ )o(see Exercises 3.1, Ques-tion 6(b)).

5. (a) Let f (x) = a0 + a1x + a2x2 + · · · + at−1x

t−1 + xt be a monic poly-nomial of positive degree t over a field F with a0 �= 0. Write

f (x)∗ = (xt /a0)f (1/x)

= 1/a0 + (at−1/a0)x + (at−2/a0)x2 + · · · + (a1/a0)x

t−1+ xt .

Let g(x) be a monic polynomial of positive degree s over F withg(0) �= 0. Show (f (x)g(x))∗ = f (x)∗g(x)∗ and f (x)∗∗ = f (x).The polynomial f (x) as above is called palindromic if f (x) = f (x)∗.Show that the product of palindromic polynomials is itself palin-dromic and g(x)g(x)∗ is palindromic.Let f (x) as above be palindromic. Show that f (x) satisfies f (0) =a0 = ±1. Hence show either ai = at−i for 0 ≤ i ≤ t or ai = −at−i

for 0 ≤ i ≤ t . List the palindromic polynomials of degrees 1, 2 and 3over Z3.

(b) Let f (x) = a0 + a1x + a2x2 + x3 over a field F where a0 �= 0. Cal-

culate (the entries in) C(f (x))−1 and hence find an invertible 3 × 3matrix X over F with XC(f (x))−1 = C(f (x)∗)X.Hint: Show e3 generates the F [x]-module M(C(f (x))−1) and hasorder f (x)∗ in this module.

(c) Let f (x) be a monic polynomial of positive degree t over the field F

with f (0) �= 0. Generalise (b) above to show that the invertible t × t

matrix X with eiX = et+1−i for 1 ≤ i ≤ t satisfies XC(f (x))−1 =C(f (x)∗)X.

(d) Let A be an invertible t × t matrix over F with rcf

C(d1(x)) ⊕ C(d2(x)) ⊕ · · · ⊕ C(ds(x)).

Show χA(0) �= 0 and deduce dj (0) �= 0 for 1 ≤ j ≤ s. Show also

A−1 ∼ C(d1(x))−1 ⊕ C(d2(x))−1 ⊕ · · · ⊕ C(ds(x))−1.


Deduce from (c) above that A−1 has invariant factor sequence(d1(x)∗, d2(x)∗, . . . , ds(x)∗). Hence show A ∼ A−1 if and only ifdj (x) is palindromic for 1 ≤ j ≤ s.

(e) List the 12 invariant factor sequences of 3 × 3 matrices A over Z3

with A ∼ A−1.(f) Working over Q calculate χA(x) and decide whether or not A ∼ A−1

in the cases

(i) A =⎛

⎝1 2 52 2 7

−1 −1 −3

⎞

⎠ ; (ii) A =⎛

⎝2 1 −1

−1 0 12 2 −1

⎞

⎠ .

6. (a) Let A and B be 2 × 2 matrices over a field F . Show μA(x) = μB(x)

implies A ∼ B .Hint: Consider the cases degμA(x) = 1,2 separately.Construct an example of 3×3 matrices A and B over Q with μA(x) =μB(x) but A is not similar to B .Find a formula (in terms of q) for the number of similarity classes of2 × 2 matrices over the finite field Fq having q elements.

(b) True or false? The 3 × 3 matrices A and B over the field F are similarif and only if χA(x) = χB(x) and μA(x) = μB(x).

(c) Let A be a t × t matrix over a field F . Let εA : F [x] → Mt (F ) de-note the evaluation at A ring homomorphism (Definition 5.8). ShowεA : F [x]/〈μA(x)〉 ∼= im εA.Hint: Use Exercises 2.3, Question 3(b).

(d) The t × t matrix A over the field F is such that I,A,A2, . . . ,An−1

are linearly independent but I,A,A2, . . . ,An−1,An are linearly de-pendent. Show An = a0I + a1A + a2A

2 + · · · + an−1An−1 where

μA(x) = xn − an−1xn−1 − · · · − a1x − a0 is the minimum polyno-

mial of A.Calculate A2 in the case of

A =⎛

⎝2 1 1

−2 −1 −21 1 2

⎞

⎠

over Q. Hence find μA(x) and the invariant factors of A.(e) Let Ai be a ti × ti matrix over a field F with minimum polynomial

μAi(x) for i = 1,2. Show directly from Definition 6.9 that the mini-

mum polynomial of A1 ⊕ A2 is lcm{μA1(x),μA2(x)}.7. (a) By describing the possible invariant factor sequences show that there

are q3 + q2 + q similarity classes of 3 × 3 matrices over the finitefield Fq . Determine the number of similarity classes of 4 × 4 matricesover Fq (it isn’t quite q4 + q3 + q2 + q).


Show that the number of similarity classes of n × n matrices A overFq with minimum polynomial of degree m is P(n,m) qm whereP(n,m) is the number of partitions (Definition 3.13) of n havinglargest part m. Denote by P(n,m, l) the number of partitions of n withlast part m having l distinct parts. Find a formula involving P(n,m, l)

for the number of conjugacy classes in GLn(Fq). Determine P(8,3),P(8,3,1), P(8,3,2), P(8,3,3) and hence find the number of conju-gacy classes in GL8(Z5) having minimum polynomial of degree 3.Hint: Consider the partition (t1, t2, . . . , ts) where (d1(x), d2(x), . . . ,

ds(x)) is the sequence of invariant factors of A and tj = degdj (x) for1 ≤ j ≤ s.

(b) Determine the number of similarity classes of t × t matrices A over agiven field F satisfying A2 = 0.How many similarity classes of t × t matrices A over Z2 satisfyingA2 = I,A �= I are there?Hint: μA(x) is a divisor of (x − 1)2. Does the answer change if Z2 isreplaced by an arbitrary field F of characteristic 2? How many con-jugacy classes of involutions (elements of multiplicative order 2) arethere in the group GLt (F ) where χ(F ) = 2?

(c) Let F be a field and let N(t) denote the number of similarityclasses of t × t matrices A over F satisfying A3 = 0. Show N(t) =�(t + 2)/2� + N(t − 3) for t > 3.Hint: There are N(t − 3) similarity classes of t × t matrices A over F

with μA(x) = x3. Calculate N(t) for 1 ≤ t ≤ 10.Write N ′(t) = (1/2)((t/2) + 1)((t/2) + 2) − (�t/6� + 1)((t/2) −(3/2)�t/6�) for even t and N ′(t) = (1/2)(�t/2� + 1)(�t/2� + 2) −(�(t − 3)/6� + 1)((t − 3)/2 − (3/2)�(t − 3)/6�) for t odd. Verify theformula N ′(t) = �(t + 2)/2� + N ′(t − 3) for t > 3.Hint: Treat the cases t odd and t even separately.Using induction on t deduce N ′(t) = N(t) for t ≥ 1. Find the numberof conjugacy classes of elements of order 3 in the group GL100(Z3).

8. (a) Let F be a field and t a positive integer. Show that each submodule K

of F [x]t is free with rankK ≤ t .Hint: Generalise Theorem 3.1.

(b) The evaluation mapping θA : F [x]t → M(A) is defined by(f1(x), f2(x), . . . , ft (x))θA = ∑t

i=1 fi(x)ei for all t-tuples of poly-nomials (f1(x), f2(x), . . . , ft (x)) ∈ F [x]t . Show that θA isF [x]-linear.

(c) Complete the proof of Theorem 6.6 by referring back to Theorem 3.7.


6.2 Primary Decomposition of M(A) and Jordan Form

As usual let A denote a t × t matrix over a field F . Here we assume that the fac-torisation of the characteristic polynomial χA(x) of A into irreducible polynomialsover F is known. The primary decomposition of the F [x]-module M(A) is estab-lished in Theorem 6.12, the theory being analogous to that of Section 3.2. We obtainthe primary canonical form (pcf ) of A and using the partition function p(n) find thenumber of similarity classes of matrices having a given characteristic polynomial.The pcf is then modified to give two versions of the Jordan normal form (Jnf ): firstthe ‘non-split’ case which becomes the ‘usual’ Jnf should χA(x) factorise into poly-nomials of degree 1 over F , and secondly the separable Jordan form (sJf ) in caseχA(x) factorises into irreducible polynomials over F none of which have repeatedzeros in any extension field of F . The sJf is used in representation theory. Finallywe discuss the real Jordan form which is important in the theory of dynamical sys-tems.

Let A be a t × t matrix over a field F with χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk

being the factorisation of χA(x) into positive powers nj of distinct monic irreduciblepolynomials pj (x) over F for 1 ≤ j ≤ k.

The pj (x)-component of M(A) is

M(A)pj (x) = {v ∈ M(A) : pj (x)nj v = 0} for 1 ≤ j ≤ k.

Directly from the above definition we see that the order of each element v ofM(A)pj (x) is a divisor of pj (x)nj . Conversely let u have order pj (x)lj in M(A)

where lj ≥ 0. So pj (x)lj u = 0 and also χA(x)u = 0 by Corollary 6.11. By Corol-lary 4.6 we see pj (x)min{lj ,nj } = gcd{pj (x)lj , χA(x)} satisfies pj (x)min{lj ,nj }u = 0and so u ∈ M(A)pj (x). Therefore

M(A)pj (x) consists exactly of those elements u having order a power of pj (x).

It is straightforward to show that M(A)pj (x) is a submodule of M(A). Also M(A)pj (x)

is non-zero (Exercises 6.2, Question 3(a)). The k submodules M(A)pj (x) for 1 ≤ j ≤ k

are collectively referred to as the primary components of M(A).For example let

A =⎛

⎝1 1 1

−1 −1 −12 2 1

⎞

⎠

over Q. The reader can verify χA(x) = x2(x − 1) and so M(A) has primary compo-nents M(A)x and M(A)x−1. We now find Q-bases of these A-invariant subspaces of

6.2 Primary Decomposition of M(A) and Jordan Form 275

Q3 in the same way as (row) eigenvectors are found:

A2 =⎛

⎝1 1 1

−1 −1 −12 2 1

⎞

⎠

⎛

⎝1 1 1

−1 −1 −12 2 1

⎞

⎠ =⎛

⎝2 2 1

−2 −2 −12 2 1

⎞

⎠

and therefore

M(A)x = {v ∈Q3 : x2v = vA2 = 0}

= {(a, b, c) ∈ Q3 : a − b + c = 0} = 〈(0,1,1), (1,1,0)〉

where v = (a, b, c). So M(A)x has Q-basis B1 consisting of u1, xu1 where u1 =(0,1,1) and x2u1 = 0. Therefore u1 has order x2 in M(A) and xu1 is a row eigenvec-tor of A associated with the eigenvalue 0. The reader can check that

M(A)x−1 = {v ∈ Q3 : (x − 1)v = v(A − I ) = 0} = 〈(2,2,1)〉

is the row eigenspace of A associated with the eigenvalue 1. So M(A)x−1 has Q-basisB2 consisting of the single vector u2 = (2,2,1) of order x−1 in M(A). In fact B1 ∪B2

is a basis of Q3 (see the proof of Corollary 6.13) and

Y =⎛

⎜⎝

u1

xu1

u2

⎞

⎟⎠ =

⎛

⎜⎝

0 1 11 1 0

2 2 1

⎞

⎟⎠ satisfies YAY−1 = C(x2) ⊕ C(x − 1).

As a second example let A = C(x2(x + 1)) ⊕ C(x(x + 1)2(x − 1)) over Q. So A

is a 7 × 7 partitioned matrix Definition 5.17, the leading entries in the two diag-onal blocks (submatrices) being in the (1,1)- and (4,4)-positions. Then χA(x) =x3(x + 1)3(x − 1) and

M(A) = 〈e1〉 ⊕ 〈e4〉by Theorem 5.26 where e1, e4 ∈ Q

7. Now e1 has order x2(x + 1) and e4 has or-der x(x + 1)2(x − 1) in M(A). From these generators with hybrid orders (or-ders divisible by two or more irreducible polynomials) we construct vectors having‘pure’ orders (orders which are powers of a single irreducible polynomial) whichgenerate the primary components. In this case the primary components of M(A)

are:

M(A)x = 〈(x + 1)e1〉 ⊕ 〈(x + 1)2(x − 1)e4〉 of dimension 2 + 1 = 3,

M(A)x+1 = 〈x2e1〉 ⊕ 〈x(x − 1)e4〉 of dimension 1 + 2 = 3,

M(A)x−1 = 〈x(x + 1)2e4〉 of dimension 1.

In fact M(A) = M(A)x ⊕ M(A)x+1 ⊕ M(A)x−1, that is, M(A) is the internal directsum of its primary components (see Theorem 6.12).


Returning to the general case let A and B be t × t matrices over a field F and letα : M(A) ∼= M(B) be an isomorphism of F [x]-modules. Then A and B are similarby Theorem 5.13 and so χA(x) = χB(x) from Lemma 5.5. We proceed to show thatrestrictions of α give rise to isomorphisms between the primary components of M(A)

and those of M(B). Suppose v ∈ M(A)pj (x). Applying α to the equation pj (x)nj v = 0gives pj (x)nj (v)α = 0, since α is F [x]-linear, which shows (v)α ∈ M(B)pj (x). Inthe same way w ∈ M(B)pj (x) implies (w)α−1 ∈ M(A)pj (x). So the restriction (Def-inition 5.16) of α to M(A)pj (x) is an isomorphism α| : M(A)pj (x)

∼= M(B)pj (x) for1 ≤ j ≤ k.

Our next theorem is the polynomial analogue of Theorem 3.10.

Theorem 6.12 (The primary decomposition of the F [x]-module M(A))

Let A be a t × t matrix over a field F with χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk wherep1(x),p2(x), . . . , pk(x) are k different monic irreducible polynomials over F andn1, n2, . . . , nk are positive integers. Then

M(A) = M(A)p1(x) ⊕ M(A)p2(x) ⊕ · · · ⊕ M(A)pk(x).

We omit the proof of Theorem 6.12 as it is the analogous to the proof of Theo-rem 3.10 (Exercises 6.2, Question 3(d)). Notice that the notation pj (x) for 1 ≤ j ≤ k

imposes an arbitrary ordering on the k monic irreducible factors of χA(x) and deter-mines the order in which the primary components appear in the above decomposition.

Let A and B be t × t matrices over a field F . Then

A ∼ B ⇔ M(A) ∼= M(B)

⇔ M(A)pj (x)∼= M(B)pj (x) for all j with 1 ≤ j ≤ k

by Theorems 5.13, 6.12 and the preceding discussion. So the similarity class of A

depends only on the isomorphism classes of the primary components of M(A). Thisfact will help in the enumeration of similarity classes discussed after Definition 6.14.

Corollary 6.13

Let A and χA(x) be as in Theorem 6.12 and let α be the linear mapping determinedby A. Let Bj be a basis of M(A)pj (x) and let Aj be the matrix of α| : M(A)pj (x)

∼=M(A)pj (x) relative to Bj for 1 ≤ j ≤ k. Then

χAj(x) = pj (x)nj and dimM(A)pj (x) = degpj (x)nj for 1 ≤ j ≤ k.

Further M(A)pj (x)∼= M(Aj ) for 1 ≤ j ≤ k.


Proof

Note that B = B1 ∪B2 ∪· · ·∪Bk is a basis of F t by Lemma 5.18 and Theorem 6.12. Asusual χAj

(x) = det(xI −Aj) and Aj has minimum polynomial μAj(x). From the def-

inition of primary component we deduce μAj(x)|pj (x)nj and so χAj

(x) = pj (x)mj

where mj is a positive integer for 1 ≤ j ≤ k by Corollary 6.11. By Corollary 5.20we know XAX−1 = A1 ⊕ A2 ⊕ · · · ⊕ Ak where the vectors in B are the rows of X.By Lemma 5.5 and the discussion of direct sums following Definition 5.17 we obtainχA(x) = χA1(x)χA2(x) · · ·χAk

(x), that is,

p1(x)n1p2(x)n2 · · ·pk(x)nk = p1(x)m1p2(x)m2 · · ·pk(x)mk .

As p1(x),p2(x), . . . , pk(x) are distinct monic irreducible polynomials we deducenj = mj for 1 ≤ j ≤ k from the polynomial analogue of the fundamental theo-rem of arithmetic. Therefore dimM(A)pj (x) = |Bj | = degχAj

(x) = degpj (x)nj for1 ≤ j ≤ k. Taking s = k and Nj = M(A)pj (x) in Corollary 5.20 and N = Nj , B = Bj

in Exercises 5.1, Question 5, gives the stated module isomorphism. �

The primary decomposition Theorem 3.10 of a finite abelian group G correspondsto the factorisation of |G| into powers of distinct primes. From Theorem 6.12 andCorollary 6.13 we obtain the module analogue: the primary decomposition of M(A)

corresponds to the factorisation of χA(x) into powers of distinct monic irreduciblepolynomials.

For example let A be an 8 × 8 matrix over Q with χA(x) = p1(x)2p2(x)3 wherep1(x) = x + 1 and p2(x) = x2 + 1. We will see shortly that there are 6 similarityclasses of such matrices A. The Q[x]-modules M(A) have a common feature: in eachcase

M(A) = M(A)p1(x) ⊕ M(A)p2(x)∼= M(A1) ⊕ M(A2)

where χA1(x) = p1(x)2, χA2(x) = p2(x)3. So χA(x) = χA1(x)χA2(x) is the factori-sation of χA(x) into powers of irreducible polynomials over Q.

We now return to the general case of a t × t matrix A over a field F . Sup-pose M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Ns where Ni = 〈vi〉, for 1 ≤ i ≤ s vi has orderdi(x) in M(A) and di(x)|di+1(x) for 1 ≤ i < s as in Theorem 6.5. As before letχA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk be the factorisation of χA(x) into positive pow-ers nj of distinct monic irreducible polynomials pj (x) over F for 1 ≤ j ≤ k. AsχA(x) = d1(x)d2(x) · · ·ds(x) by Theorem 6.5, each invariant factor di(x) has factori-sation di(x) = p1(x)ti1p2(x)ti2 · · ·pk(x)tik where 0 ≤ tij ≤ nj . On comparing powersof pj (x) we obtain nj = t1j + t2j + · · · + tsj where 0 ≤ t1j ≤ t2j ≤ · · · ≤ tsj for1 ≤ j ≤ k. By Corollary 6.11 each tsj > 0. Let lj denote the smallest i with tij > 0 for


1 ≤ j ≤ k. Therefore

(tlj j , tlj +1j , . . . , tsj ) is a partition of nj with at most s parts for 1 ≤ j ≤ k.

As lj = 1 ⇔ pj (x)|d1(x), one at least of these partitions has s parts.

Write mij (x) = di(x)/pj (x)tij and vij = mij (x)vi for 1 ≤ i ≤ s,1 ≤ j ≤ k.

The vector vi generates Ni and has order di(x) in M(A). So vij has orderdi(x)/gcd{mij (x), di(x)} = di(x)/mij (x) = pj (x)tij by Lemma 5.23. Let ki denotethe number of positive exponents tij for 1 ≤ i ≤ s. As di(x)|di+1(x) for 1 ≤ i < s

we see k1 ≤ k2 ≤ · · · ≤ ks and ks = k by Corollary 6.11. On omitting trivial terms theprimary decomposition of Ni is

Ni =∑

j

⊕〈vij 〉 for 1 ≤ i ≤ s

as di(x) = p1(x)ti1p2(x)ti2 · · ·pk(x)tik , that is, Ni is the internal direct sum of its ki

non-trivial primary components 〈vij 〉 for 1 ≤ i ≤ s. So Ni ∩ M(A)pj (x) = 〈vij 〉 for1 ≤ i ≤ s, 1 ≤ j ≤ k. On omitting zero terms in

M(A)pj (x) =∑

i

⊕〈vij 〉

gives the invariant factor decomposition of M(A)pj (x).The non-constant monic polynomials pj (x)tij are called the elementary divisors

of A. In other words the elementary divisors of A are the invariant factors of theprimary components of A. From the discussion following Corollary 6.7 and Theo-rem 6.12 we deduce:

two t × t matrices over a field are similar if and only if their elementarydivisors (taking repetitions into account) are equal.

For example consider A = C((x + 1)(x2 + 1))⊕C((x + 1)(x2 + 1)2) over Q. So A isin rcf. Write p1(x) = x + 1 and p2(x) = x2 + 1. Then χA(x) = p1(x)2p2(x)3 givingn1 = 2, n2 = 3. As A is a direct sum we obtain M(A) = N1 ⊕ N2 where N1 = 〈e1〉and N2 = 〈e4〉 are cyclic submodules with generators v1 = e1 and v2 = e4 of orders(x + 1)(x2 + 1) and (x + 1)(x2 + 1)2 respectively from Theorem 5.26. UsingLemma 5.23 we see v11 = (x2 + 1)e1 = e1 + e3 has order x + 1 and v21 =(x2 +1)2e4 = (x4 +2x2 +1)e4 = e4 +2e6 +e8 also has order x +1 in M(A). So thesevectors are row eigenvectors of A corresponding to the eigenvalue −1 and the primarycomponent M(A)x+1 = 〈v11, v21〉 is in this case the eigenspace of A correspondingto −1. By Lemma 5.23 the vectors v12 = (x + 1)e1 = e1 + e2 and v22 = (x + 1)e4 =e4 + e5 have orders x2 + 1 and (x2 + 1)2 respectively in M(A). The primary com-ponent M(A)x2+1 = 〈v12〉 ⊕ 〈v22〉 is a subspace of Q8 having dimension 2 + 4 = 6.


In this case t11 = 1, t21 = 1, t12 = 1, t22 = 2 and the elementary divisors of M(A) arex + 1, x + 1; x2 + 1, (x2 + 1)2. Looking ahead, the matrix

Y =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

v11

v21

v12

xv12

v22

xv22

x2v22

x3v22

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 1 0 0 0 0 0

0 0 0 1 0 2 0 1

1 1 0 0 0 0 0 00 1 1 0 0 0 0 0

0 0 0 1 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 1 1 00 0 0 0 0 0 1 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

is invertible over Q and satisfies

YAY−1 =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

−1 0 0 0 0 0 0 0

0 −1 0 0 0 0 0 0

0 0 0 1 0 0 0 00 0 −1 0 0 0 0 0

0 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 −1 0 −2 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

= C(x + 1) ⊕ C(x + 1) ⊕ C(x2 + 1) ⊕ C((x2 + 1)2)

which is in primary canonical form according to the following definition.

Definition 6.14

Let A be a t × t matrix over a field F as in Theorem 6.12. Let Pj denote the rcf ofAj where M(A)pj (x)

∼= M(Aj ) as in the proof of Corollary 6.13 for 1 ≤ j ≤ k. Thematrix P1 ⊕ P2 ⊕ · · · ⊕ Pk is said to be in primary canonical form (pcf ).

As A ∼ A1 ⊕ A2 ⊕ · · · ⊕ Ak and Aj ∼ Pj for 1 ≤ j ≤ k we see

A ∼ P1 ⊕ P2 ⊕ · · · ⊕ Pk

where ∼ denotes similarity Definition 5.3. Apart from the ordering of the primarycomponents each A is similar to a unique matrix in pcf.

We proceed to construct an invertible matrix Y over F such that YAY−1 is inpcf. For each i and j with 1 ≤ i ≤ s, 1 ≤ j ≤ k and tij > 0, the vector vij hasorder pj (x)tij and so the basis Bvij

of Ni ∩ M(A)pj (x) has degpj (x)tij elements


by Corollary 5.29. As N1,N2, . . . ,Ns are independent we see that the vectors in⋃i Bvij

are linearly independent and are∑

i degpj (x)tij = degpj (x)nj in number.Therefore Bj = ⋃

i Bvijis a basis of the pj (x)-component of M(A) for 1 ≤ j ≤ k.

Also Pj is the matrix relative to Bj of the restriction to M(A)pj (x) of the lin-ear mapping determined by A for 1 ≤ j ≤ k by Corollaries 5.20 and 5.27. FinallyB = ⋃

i,j Bvij= B1 ∪ B2 ∪ · · · ∪ Bk is a basis of F t and the vectors in B are the rows

of an invertible matrix Y satisfying YAY−1 = P1 ⊕ P2 ⊕ · · · ⊕ Pk in pcf by Corol-lary 5.20. Now Pj = ∑s

i=lj⊕C(pj (x)tij ) for 1 ≤ j ≤ k and so YAY−1 is the direct

sum of k1 +k2 +· · ·+ks companion matrices C(pj (x)tij ), one for each elementary di-visor pj (x)tij of A. The non-zero F [x]-modules M(C(pj (x)tij )) are indecomposable,that is, none can be expressed as the direct sum of two non-zero submodules (Exer-cises 6.2, Question 3(b)). So the above decomposition of M(A) into k1 + k2 +· · ·+ ks

non-trivial submodules cannot be ‘improved’, that is, M(A) cannot be expressed asthe direct sum of more than k1 + k2 + · · · + ks non-zero submodules (Exercises 6.2,Question 3(c)).

We return briefly to Example 6.3. With

A =⎛

⎝1 1 10 2 10 0 1

⎞

⎠

over Q we found M(A) = 〈v1〉 ⊕ 〈v2〉 where v1 = (0,0,1) and v2 = (1,0,0) haveorders d1(x) = x − 1 and d2(x) = (x − 1)(x − 2) respectively in M(A). Here s =k = 2. Write p1(x) = x − 1 and p2(x) = x − 2. Then χA(x) = (x − 1)2(x − 2) =p1(x)2p2(x) and so n1 = 2, n2 = 1. Also t11 = 1, t12 = 0, t21 = 1, t22 = 1 and som11(x) = 1, m21(x) = x − 2, m22(x) = x − 1 giving v11 = v1, v21 = (x − 2)v2 =(−1,1,1), v22 = (x − 1)v1 = (0,1,1) of orders x − 1, x − 1, x − 2 respectively inM(A). In this case the rows of Y are eigenvectors of A and

Y =⎛

⎜⎝

v11

v21

v22

⎞

⎟⎠ =

⎛

⎝0 0 1

−1 1 10 1 1

⎞

⎠ satisfies YAY−1 = diag(1,1,2) = P1 ⊕ P2 in pcf

where P1 = C(x −1)⊕C(x −1) and P2 = C(x −2). We’ve modified X with XAX−1

in rcf to get Y with YAY−1 in pcf. It is always possible to carry out this procedureprovided the irreducible factorisation of χA(x) is known.

We now ask: what is the number of similarity classes of t × t matrices A overF all having the same χA(x)? In other words by Theorem 5.13 how many isomor-phism classes of F [x]-modules M(A) are there where A has a given characteristicpolynomial? As before it is necessary to know the factorisation

χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk .


The analogous problem for finite abelian groups was solved at the end of Section 3.2using the partition function p(n). We use the same approach here.

The relationship between the invariant factors di(x) and the elementary divisorspj (x)tij , tij > 0, of A is expressed by the entries in the following bordered s ×k table:

∼= M(A) p1(x) p2(x) . . . pk(x)

d1(x) t11 t12 . . . t1k

d2(x) t21 t22 . . . t2k

......

......

ds(x) ts1 ts2 . . . tsk

where ∼= M(A) denotes the isomorphism class of the F [x]-module M(A). The orderin which the k columns appear in the table is arbitrary, but otherwise different tablescorrespond to different isomorphism classes of F [x]-modules M(A). As noted abovethe non-zero entries in column j are the parts of a partition (tlj j , tlj +1j , . . . , tsj ) of nj

for 1 ≤ j ≤ k. Notice t1j > 0 for some j as degd1(x) ≥ 1 and so s is the maximumnumber of parts in these k partitions.

Conversely suppose given a partition of nj for each j with1 ≤ j ≤ k. Denoteby s the largest number of parts in any one of these partitions. Then the partitionof nj has s − lj + 1 parts where lj is a positive integer and so can be written(tlj j , tlj +1j , . . . , tsj ) as above. Write tij = 0 for i < lj . Then tij is the (i, j)-entryin the table of the isomorphism class of M(A) where A is a t × t matrix with χA(x) =p1(x)n1p2(x)n2 · · ·pk(x)nk and invariant factor sequence (d1(x), d2(x), . . . , ds(x))

where di(x) = p1(x)ti1p2(x)ti2 · · ·pk(x)tik for 1 ≤ i ≤ s. Using the partition functionp(n) of Definition 3.13 the conclusion is:

there are p(n1)p(n2) · · ·p(nk) isomorphism classes of F [x]-modules M(A)

where A is a t × t matrix over F with χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk

since there are p(nj ) choices for each partition of nj for 1 ≤ j ≤ k.For example consider 25 × 25 matrices A over Q with χA(x) = x5(x + 1)6 ·

(x2 + 1)7. From the discussion following Definition 3.13 we know p(5) = 7,p(6) = 11, p(7) = 15. The number of isomorphism classes of Q[x]-modules M(A) is7 × 11 × 15 = 1155. One such class is specified by the partitions (2,3), (2,2,2),(3,4) of 5, 6, 7 respectively. This class has elementary divisors x2, x3; (x + 1)2,

(x + 1)2, (x + 1)2; (x2 + 1)3, (x2 + 1)4 and representative matrix C in pcf:

C(x2) ⊕ C(x3) ⊕ C((x + 1)2) ⊕ C((x + 1)2) ⊕ C((x + 1)2)

⊕ C((x2 + 1)3) ⊕ C((x2 + 1)4).


The table of ∼= M(C) is

∼= M(C) x x + 1 x2 + 1

(x + 1)2 0 2 0x2(x + 1)2(x2 + 1)3 2 2 3x3(x + 1)2(x2 + 1)4 3 2 4

The invariant factors of C appear in the rows of the table. So

C ∼ C((x + 1)2) ⊕ C(x2(x + 1)2(x2 + 1)3) ⊕ C(x3(x + 1)2(x2 + 1)4)

the matrix on the right being the rcf of C and μC(x) = x3(x + 1)2(x2 + 1)4 beingthe minimum polynomial of C. The invariant factor decompositions of the primarycomponents of M(C) appear in the columns of the table. So

M(C)x ∼= M(C(x2)) ⊕ M(C(x3)),

M(C)x+1 ∼= M(C((x + 1)2)) ⊕ M(C((x + 1)2)) ⊕ M(C((x + 1)2)) and

M(C)x2+1∼= M(C((x2 + 1)3)) ⊕ M(C((x2 + 1)4)).

We have now completed our counting of the number of similarity classes of t × t

matrices A over a field F having a given characteristic polynomial χA(x). The pcf ofA plays a crucial role in this theory. We next obtain the Jordan normal form of A bymodifying the companion matrices appearing in the pcf of A.

Definition 6.15

Let f (x) be a monic polynomial of positive degree n over a field F and let l be apositive integer.

The Jordan block matrix

J (f (x), l) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

C L 0 0 . . . 0

0 C L 0 . . . 0

0 0 C L.. .

......

.... . .

. . .. . . 0

0 0 . . . 0 C L

0 0 . . . 0 0 C

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

is the ln × ln matrix over F partitioned into n × n submatrices where C = C(f (x)),L denotes the n × n matrix with (n,1)-entry 1 all other entries being zero, and 0denotes the n × n zero matrix.

A direct sum of matrices of the type J (p(x), l), where p(x) is monic and irre-ducible over F , is said to be in Jordan normal form (Jnf ) over F .


For example

J (x − 2,3) =⎛

⎜⎝

2 1 0

0 2 1

0 0 2

⎞

⎟⎠ and J (x2 + 1,2) =

⎛

⎜⎜⎜⎝

0 1 0 0−1 0 1 0

0 0 0 10 0 −1 0

⎞

⎟⎟⎟⎠

.

Also J (x − 2,3) ⊕ J (x2 + 1,2) is in Jnf over Q.The Jordan block matrix J (f (x), l) and the companion matrix C(f (x)l) are sim-

ilar as we’ll see shortly. These matrices have a number of entries in common: in bothcases the (i, i +1)-entry is 1 for 1 ≤ i < ln and all (i, j)-entries are zero for i +1 < j .The presence of the somewhat strange matrices L achieves this as far as J (f (x), l) isconcerned. The structure of J (f (x), l) exploits the fact that the module M(C(f (x)l))

has a chain Theorem 5.28 of submodules:

〈e1〉 ⊃ 〈f (x)e1〉 ⊃ 〈f (x)2e1〉 ⊃ · · · ⊃ 〈f (x)l−1e1〉 ⊃ 0

whereas the structure of C(f (x)l) ignores its existence! The precise connection be-tween C(f (x)l) and J (f (x), l) is established in Theorem 6.16. We first look at thecase l = 2, n = 2. Write C′ = C(f (x)2) where f (x) = a0 + a1x + x2. Then e1 hasorder f (x)2 in M(C′). We are looking for an invertible 4 × 4 matrix Z with

ZC′Z−1 = J (f (x),2) =

⎛

⎜⎜⎜⎝

0 1 0 0−a0 −a1 1 0

0 0 0 10 0 −a0 −a1

⎞

⎟⎟⎟⎠

.

On equating rows in ZC′ = J (f (x),2)Z we see that the rows z1, z2, z3, z4 of Z arelinearly independent and satisfy

xz1 = z2, xz2 = −a0z1 − a1z2 + z3, xz3 = z4, xz4 = −a0z3 − a1z4

working in the module M(C′). The first two equations rearrange to give z3 =(a0 + a1x + x2)z1 = f (x)z1 and the last two equations give f (x)z3 = 0. Sof (x)2z1 = f (x)z3 = 0. The above four equations and the linear independence ofz1, z2, z3, z4 are expressed by the single statement: z1 has order f (x)2 in M(C′). Wetake z1 = e1 (it would be perverse to do otherwise as e1 has the property requiredof z1). Knowing f (x)2 = a2

0 + 2a0a1x + (2a0 + a21)x2 + 2a1x

3 + x4 the reader cannow check

Z =

⎛

⎜⎜⎝

z1

z2

z3

z4

⎞

⎟⎟⎠ =

⎛

⎜⎜⎝

z1

xz1

f (x)z1

xf (x)z1

⎞

⎟⎟⎠ =

⎛

⎜⎜⎝

e1

xe1

(a0 + a1x + x2)e1

(a0x + a1x2 + x3)e1

⎞

⎟⎟⎠ =

⎛

⎜⎜⎝

1 0 0 00 1 0 0a0 a1 1 00 a0 a1 1

⎞

⎟⎟⎠


is invertible over F and does satisfy ZC′ = J (f (x),2)Z. We are now ready for thegeneral case.

Theorem 6.16

Let f (x) be a monic polynomial of positive degree n over a field F and let l be apositive integer. Let Z be the invertible ln × ln matrix over F with eiZ = xrf (x)qe1

evaluated in M(C(f (x)l)) for 1 ≤ i ≤ ln where i − 1 = qn + r , 0 ≤ r < n. Then

ZC(f (x)l)Z−1 = J (f (x), l) and M(C(f (x)l)) ∼= M(J(f (x), l)).

Proof

Notice first that q and r are respectively the quotient and remainder on dividingi − 1 by n. As xj−1e1 = ej in M(C(f (x)l)) for 1 ≤ j ≤ ln, the j th entry ineiZ = xrf (x)qe1 is the coefficient of xj−1 in xrf (x)q which is monic of degreeqn+ r = i − 1. So the (i, i)-entry in Z is 1 and the (i, j)-entry in Z is 0 for i < j , thatis, Z is a lower triangular matrix (all its entries above the diagonal are zero). Hence|Z| = 1 and so Z is invertible over F . It is enough to verify ZC(f (x)l) = J (f (x), l)Z

which we now carry out row by row.Suppose first i �= 0 (mod n), that is, r �= n−1 where i −1 = qn+r and 1 ≤ i < ln.

Then eiJ (f (x), l) = ei+1 and so

eiJ (f (x), l)Y = ei+1Y = xr+1f (x)qe1 = x(xrf (x)qe1)

= x(eiZ) = eiZC(f (x)l)

as multiplication by x on the left means multiplication by C(f (x)l) on the rightby Lemma 5.7 and Definition 5.8. The above equation says that J (f (x), l)Z andZC(f (x)l) have the same row i.

Suppose now i ≡ 0 (mod n) and i < ln. So i − 1 = qn+ (n− 1), that is, r = n− 1and i = (q + 1)n where 0 ≤ q + 1 < l. The last row of C = C(f (x)) is (xn − f (x))e1

and so row i of J (f (x), l) is eiJ (f (x), l) = (xn −f (x))ei−n+1 +ei+1. As i −n = qn

we see ei−n+1Z = f (x)qe1 and ei+1Z = f (x)q+1e1. Therefore

eiJ (f (x), l)Z = (xn − f (x))f (x)qe1 + f (x)q+1e1

= xnf (x)qe1 = x(xn−1f (x)qe1) = x(eiZ) = eiZC(f (x)l)

showing that J (f (x), l)Z and ZC(f (x)l) have the same row i.Lastly suppose i = ln. Then eiJ (f (x), l) = (xn − f (x))ei−n+1. As i − n =

(l − 1)n we have ei−n+1Z = f (x)l−1e1. Therefore

eiJ (f (x), l)Z = (xn − f (x))f (x)l−1e1 = xnf (x)l−1e1


since f (x)le1 = 0 in the F [x]-module M(C(f (x)l)). As i − 1 = ln − 1 =(l − 1)n + n − 1 the last row of Z is eiZ = xn−1f (x)l−1e1. HenceeiJ (f (x), l)Z = x(xn−1f (x)l−1e1) = x(eiZ) = eiZC(f (x)l). The conclusion is:J (f (x), l)Z = ZC(f (x)l) as row by row these matrices are equal. The matricesC(f (x)l) and J (f (x), l) are similar and so the F [x]-modules M(C(f (x)l)) andM(J(f (x), l)) are isomorphic by Theorem 5.13. �

Notice l = 1 in Theorem 6.16 gives q = 0, Z = I and J (f (x),1) = C(f (x)).Also n = 1 in Theorem 6.16 gives r = 0, f (x) = x − λ,

Z =

⎛

⎜⎜⎜⎜⎜⎝

e1

(x − λ)e1

(x − λ)2e1...

(x − λ)l−1e1

⎞

⎟⎟⎟⎟⎟⎠

and

J (x − λ, l) = λI + C(xl) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

λ 1 0 . . . 0

0 λ 1. . .

...

0 0 λ. . . 0

......

. . .. . . 1

0 0 . . . 0 λ

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

,

the ‘usual’ l × l Jordan block matrix which the reader may already have met.Let A be a t × t matrix over a field F having m elementary divisors. There is an

invertible matrix Y over F with YAY−1 = C1 ⊕ C2 ⊕ · · · ⊕ Cm where each Ci is thecompanion matrix of a power of an irreducible polynomial over F . By Theorem 6.16there is an invertible matrix Zi over F such that ZiCiZ

−1i = Ji is a Jordan block

matrix for 1 ≤ i ≤ m. Then Z = Z1 ⊕ Z2 ⊕ · · · ⊕ Zm is invertible over F and

(ZY )A(ZY)−1 = Z(YAY−1)Z−1 = Z(C1 ⊕ C2 ⊕ · · · ⊕ Cm)Z−1

= J1 ⊕ J2 ⊕ · · · ⊕ Jm

is in Jnf Definition 6.15.Suppose χA(x) factorises as a product of polynomials x − λj over F , that is, all

the eigenvalues of A belong to F . Then A is similar to the ‘almost diagonal’ matrixJ1 ⊕ J2 ⊕ · · · ⊕ Jm in Jnf where Ji = J (x − λj , li) for 1 ≤ i ≤ m.

As an illustration consider A = C((x + 1)3(x2 − 2)2) over Q which is in rcf.Then M(A) = 〈v12〉 ⊕ 〈v11〉 is the primary decomposition of the cyclicQ[x]-module M(A) where v12 = (x2 − 2)2e1 = (4,0,−4,0,1,0,0) and v11 =(x + 1)3e1 = (1,3,3,1,0,0,0). The matrix Y of Theorem 5.31 satisfies YAY−1 =


C((x + 1)3) ⊕ C((x2 − 2)2) in pcf. We ‘short-circuit’ the above theory by directlyconstructing

ZY =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

v21

(x + 1)v21

(x + 1)2v21

v11

xv11

(x2 − 2)v11

x(x2 − 2)v11

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

4 0 −4 0 1 0 04 4 −4 −4 1 1 04 8 0 −8 −3 2 1

1 3 3 1 0 0 00 1 3 3 1 0 0

−2 −6 −5 −1 3 1 00 −2 −6 −5 −1 3 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

Then ZY is invertible over Q and satisfies

(ZY )A(ZY)−1 = J (x + 1,3) ⊕ J (x2 − 2,2)

=⎛

⎜⎝

−1 1 0

0 −1 1

0 0 −1

⎞

⎟⎠ ⊕

⎛

⎜⎜⎜⎝

0 1 0 02 0 1 0

0 0 0 10 0 2 0

⎞

⎟⎟⎟⎠

which is in Jnf.

We now prepare to meet separable polynomials. This important concept leads toa useful modification of the Jordan normal form.

Let f (x) be a polynomial of positive degree n over a field F with leading co-efficient an. If we are ‘lucky’ then f (x) splits over F , that is, there are elementsc1, c2, . . . , cn ∈ F with f (x) = an(x − c1)(x − c2) · · · (x − cn); as F has no divisorsof zero we see that c1, c2, . . . , cn are the zeros of f (x). If we are ‘unlucky’ then f (x)

has an irreducible factor p1(x) over F with 2 ≤ degp1(x) ≤ n. Using Theorem 4.9there is an extension field E1 of F which contains a zero c1 of p1(x). Either f (x)

splits over E1 or the process is repeated. In the latter case f (x) has irreducible factorp2(x) over E1 with 2 ≤ degp2(x) ≤ n − 1; also there is an extension field E2 of E1

containing a zero c2 of p2(x). After at most n − 1 steps this process terminates in anextension field E of F such that f (x) splits over E. Assuming that the constructionof E has been carried out as economically as possible (there is no subfield E′ of E

which is an extension of F such that E′ �= E and f (x) splits over E′) then E is calleda splitting field of f (x) over F .

The theory of splitting fields is fundamental in Galois Theory but is not carriedfurther here. Notice however that E = F32 is a splitting field of f (x) = x32 − x overF = Z2 and provides an unbiased view of F32, that is, no one of the six irreduciblequintics over Z2 is given preferential treatment (Exercises 4.1, Question 3(c)). No twoof the 32 zeros of x32 −x in F32 are equal. We now investigate this important propertyin general.


Definition 6.17

An irreducible polynomial p0(x) of degree n over a field F is called separable overF if there is a field E with F ⊆ E containing n distinct zeros c1, c2, . . . , cn of p0(x).A polynomial f (x) over F is called separable over F if all its irreducible factors overF are separable over F .

The reader might expect all polynomials over certain fields to be separable. Thisis true for F a finite field (Exercises 4.1, Question 3(c)) and in the case χ(F ) = 0,that is, F has characteristic zero as we show after Lemma 6.19. In particular allpolynomials over the real field R are separable. However, as we now demonstrate,x2 − y over F is an example of an inseparable (not separable) polynomial over F ,where

F = Z2(y) = {f (y)/g(y) : f (y), g(y) ∈ Z2[y], g(y) �= 0(y)}

is the field of fractions of Z2[y]. Note first that x2 − y has no zeros in F : the equationy = (f (y)/g(y))2 leads to yg(y)2 = f (y)2 which is impossible as degyg(y)2 is oddand degf (y)2 is even. So x2 − y is irreducible over F by Lemma 4.8(ii). Also anyextension field E of F containing a zero c1 of x2 −y does not contain another zero c2,since χ(E) = 2 giving

c21 = c2

2 = y ⇒(c1 − c2)2 = 0 ⇒ c1 = c2 and x2 − y = (x − c1)

2.

So x2 − y is irreducible and inseparable over F . The reducibility of a polynomialdepends on its ground field and the same goes for separability: in this case x2 − y isreducible and separable over E.

We next introduce the derivative of a polynomial without appeal to any limitingprocess. The derivative in this context is closely related to separability.

Definition 6.18

Let f (x) = anxn + an−1x

n−1 + · · · + a1x + a0 be a polynomial over a field F .The polynomial f ′(x) = nanx

n−1 + (n − 1)an−1xn−1 + · · · + a1 over F is called

the (formal) derivative of f (x).

The coefficient nan = an + an + · · · + an (n terms) of xn in f ′(x) is an elementof F as are all the coefficients in f ′(x). For example f (x) = x3 + x2 + 1 over Z2 hasf ′(x) = x2 since 3x2 = x2 and 2x = 0; in this case gcd{f (x), f ′(x)} = 1.


Lemma 6.19

Let f (x) = anxn + an−1x

n−1 + · · · + a1x + a0 be a polynomial of positive degree n

over a field F . Suppose F has an extension field E containing c1, c2, . . . , cn such thatf (x) = an(x − c1)(x − c2) · · · (x − cn). Then

gcd{f (x), f ′(x)} = 1 ⇔ c1, c2, . . . , cn are distinct.

Proof

Suppose c1, c2, . . . , cn are not distinct. We arrange the notation so that c1 = c2. There-fore f (x) = (x − c1)

2g(x) where g(x) = an(x − c3)(x − c4) · · · (x − cn). The usualrules of differentiation are valid in this context (see Exercises 6.2, Question 8(a)) andso f ′(x) = 2(x − c1)g(x) + (x − c1)

2g′(x) showing that x − c1 is a divisor of bothf (x) and f ′(x). So gcd{f (x), f ′(x)} = 1 ⇒ c1, c2, . . . , cn distinct.

Conversely suppose c1, c2, . . . , cn to be distinct. Then f (x) has n distinct monicirreducible factors over E namely x − ci for 1 ≤ i ≤ n. Is it possible for x − ci to bea divisor of f ′(x) for some i? If this is the case then we choose the notation so thati = 1. Write f (x) = (x − c1)h(x) where h(x) = an(x − c2)(x − c3) · · · (x − cn). Thenf ′(x) = 1 × h(x) + (x − c1)h

′(x) using the familiar rule for differentiating a product.As (x − c1)|f (x) we deduce (x − c1)|h(x) showing h(x) = (x − c1)q(x) for someq(x) ∈ E[x]. Evaluating h(x) = (x − c1)q(x) at c1 gives h(c1) = (c1 − c1)q(c1) = 0.But h(c1) = an(c1 − c2)(c1 − c3) · · · (c1 − cn) �= 0 as an �= 0 and c1 − ci �= 0 for1 < i ≤ n. We conclude: none of the n factors x − ci of f (x) is a divisor of f ′(x). Soc1, c2, . . . , cn distinct implies gcd{f (x), f ′(x)} = 1. �

There is a surprising consequence of Lemma 6.19: let p0(x) be a monic irreduciblepolynomial over a field F which is inseparable over F . Then gcd{p0(x),p′

0(x)} is anon-constant monic divisor over F of p0(x). Therefore gcd{p0(x),p′

0(x)} = p0(x) byDefinition 4.7 and so p0(x)|p′

0(x). As degp0(x) > degp′0(x) there is only one way

out of this apparent impasse, namely p′0(x) = 0(x), that is, the derivative of p0(x) is

the zero polynomial. As degp0(x) ≥ 1 we cannot conclude that p0(x) is a constantpolynomial! However it is correct to infer χ(F ) �= 0 as all non-constant polynomialsover fields of characteristic zero have non-zero derivatives. So χ(F ) = p where p isprime and the only powers of x appearing in p0(x) with non-zero coefficients are pow-ers of xp , that is, p0(x) ∈ F [xp]. For example, as we saw following Definition 6.17,p0(x) = x2 − y over F = Z2(y) is inseparable; in this case p0(x) belongs to F [x2]and has zero derivative.


We now discuss a modification of the Jordan normal form involving separability.

Definition 6.20

Let B be an n × n matrix over a field F and let l be a positive integer. The ln × ln

matrix

JS(B, l) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

B I 0 . . . 0 0

0 B I. . . 0 0

.... . .

. . .. . .

. . ....

......

. . .. . . I 0

0 0 . . .. . . B I

0 0 . . . . . . 0 B

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

where I and 0 denote respectively the n × n identity and zero matrices over F , is

called a separable Jordan block over F .

A direct sum of matrices of the type JS(C(p(x)), l), where each p(x) is monic,

separable over F and irreducible over F , is said to be in separable Jordan form (sJf )

over F .

The next theorem studies the connection between JS(B, l) and J (χB(x), l) in the

case χB(x) irreducible. Computations involving JS(B, l), rather than J (χB(x), l), are

easier to carry out.

Theorem 6.21

Let B be a n×n matrix over a field F with χB(x) irreducible over F and let l be an in-

teger with l ≥ 2. Then J (χB(x), l) ∼ JS(B, l) if and only if χB(x) is separable over F .

In the separable case the F [x]-module M(JS(B, l)) is cyclic with generator e1.

Proof

For convenience we write χB(x) = anxn + an−1x

n−1 + · · · + a1x + a0 where an = 1.

Our first task is to determine the structure of the ln × ln matrix χB(JS(B, l)). Then

JS(B, l) = B + N where


B =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

B 0 0 . . . 0 0

0 B 0. . . 0 0

.... . .

. . .. . .

......

......

. . .. . . 0 0

0 0 0. . . B 0

0 0 0 . . . 0 B

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

, N =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 I 0 . . . 0 0

0 0 I. . . 0 0

......

. . .. . .

. . ....

......

. . .. . . I 0

0 0 0. . . 0 I

0 0 0 . . . 0 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

and

BN =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 B 0 . . . 0 0

0 0 B.. . 0 0

......

. . .. . .

. . ....

......

.... . . B 0

0 0 0 . . . 0 B

0 0 0 . . . 0 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

= NB,

that is, B and N commute – a big advantage of JS(B, l) over J (χB(x), l). In particular

powers of JS(B, l) can be calculated using the binomial expansion

JS(B, l)j = (B + N)j = Bj + jBj−1N +(

j

2

)Bj−2N2 + · · · +

(j

j

)Nj .

Notice that all matrices in this expansion are partitioned – they are l × l matrices hav-

ing entries which are n×n matrices over F and so belong to the subring Ml (Mn(F ))

of Mln(F ). Further the n × n submatrices in Bj−i N i on the ith diagonal above (and

parallel to) the main diagonal are all equal Bj−i for 1 ≤ i < l, all other n × n subma-

trices in the partition being zero. For instance with l ≥ 3,

JS(B, l)2 =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

B2 2B I 0 . . . 0

0 B2 2B.. .

. . ....

.... . .

. . .. . . I 0

......

. . .. . . 2B I

0 0 . . .. . . B2 2B

0 0 . . . . . . 0 B2

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠


which displays the above binomial expansion in diagonal ‘stripes’. Now

N l−1 =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 0 . . . 0 I

0 0 0 . . . 0 0...

.... . .

. . ....

......

.... . .

. . ....

...

0 0 . . . 0 0 0

0 0 . . . 0 0 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

and N l = 0 showing that N is a nilpotent matrix (some power of N is the zero matrix).

Therefore the above binomial expansion can be expressed

JS(B, l)j =l∑

i=0

(j

i

)Bj−i N i .

Multiplying this equation by the coefficient aj of xj in χB(x) and summing over j

gives

χB(JS(B, l)) =n∑

j=0

ajJS(B, l)j

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

χB(B) χ ′B(B) ? ? . . . ?

0 χB(B) χ ′B(B) ? . . . ?

0 0 χB(B). . .

. . ....

......

. . .. . .

. . . ?

0 0 . . . 0 χB(B) χ ′B(B)

0 0 . . . 0 0 χB(B)

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

where the question marks ? denote n× n matrices closely related to the higher deriva-

tives χ(i)B (x) for i ≥ 2 of χB(x) but which are not relevant here (Exercises 6.2, Ques-

tion 7(c)). By Corollary 6.11 we know χB(B) = 0 and so χB(JS(B, l)) is a parti-

tioned upper triangular matrix with zero n × n matrices on the main diagonal and the

same n × n matrix χ ′B(B) along the next diagonal. We have found the structure of


χB(JS(B, l)) we were looking for and leave the reader to verify

(χB(JS(B, l)))l−1 =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 0 . . . 0 (χ ′B(B))l−1

0 0 0 . . . 0 0...

.... . .

. . .. . .

......

.... . .

. . .. . .

...

0 0 . . . . . . 0 0

0 0 . . . . . . 0 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(♣♣)

showing that this partitioned matrix has at most one non-zero n × n submatrix entry,namely (χ ′

B(B))l−1 in the top right-hand corner. So far we have not used the irre-ducibility of χB(x) nor the separability (or otherwise) of χB(x). But things are aboutto change!

Suppose χB(x) is irreducible and separable over F . By Corollary 4.6 andLemma 6.19 there are a(x), b(x) ∈ F [x] with a(x)χB(x) + b(x)χ ′

B(x) = 1. Eval-uating this polynomial equation at B gives b(B)χ ′

B(B) = I as χB(B) = 0 by Corol-lary 6.11. Therefore χ ′

B(B) is invertible over F . Using Exercises 5.1, Question 2(b)we see that the characteristic polynomial of JS(B, l) is |xI − JS(B, l)| = |xI − B|l =χB(x)l . What is the minimum polynomial μ(x) of JS(B, l)? By Corollary 6.11we know μ(x) = χB(x)l

′where 1 ≤ l′ ≤ l as χB(x) is irreducible over F . From

(♣♣) we see χB(JS(B, l))l−1 �= 0. So μ(x) is not a divisor of χB(x)l−1. There-fore l′ = l and μ(x) = χB(x)l . So JS(B, l) has only one invariant factor, namelyχB(x)l by Corollary 6.10, which means JS(B, l) ∼ C(χB(x)l) by Corollary 6.7. ButC(χB(x)l) ∼ J (χB(x), l) by Theorem 6.16 and so J (χB(x), l) ∼ JS(B, l). Premul-tiplying (♣♣) by e1 gives χB(x)l−1e1 = e1χB(JS(B, l)) �= 0 in the F [x]-moduleM(JS(B, l)). So e1 has order χB(x)l in M(JS(B, l)), that is, M(JS(B, l)) = 〈e1〉.

Suppose now that χB(x) is irreducible over F but not separable over F . In thiscase χ ′

B(x) = 0 and so (χB(JS(B, l)))l−1 = 0 from (♣♣). Therefore μ(x)|χB(x)l−1

giving μ(x) �= χB(x)l , that is, the minimum and characteristic polynomials of JS(B, l)

are different. Combining Theorems 5.13, 5.26 and 6.16 we see that the F [x]-module M(J(χB(x), l)) is cyclic but the F [x]-module M(JS(B, l)) is not cyclic.So M(J(χB(x), l)) and M(JS(B, l)) are not isomorphic. From Theorem 5.13 weconclude that J (χB(x), l) and JS(B, l) are not similar. �

Corollary 6.22

Let A be a t × t matrix over a field F . Then A is similar to a matrix in separable Jordanform over F if and only if χA(x) is separable over F .


Proof

Suppose A ∼ JS(C(p1(x)), l1) ⊕ JS(C(p2(x), l2)) ⊕ · · · ⊕ JS(C(pm(x), lm)) wherepi(x) is monic, separable and irreducible over F for 1 ≤ i ≤ m. As pi(x)li is thecharacteristic polynomial of JS(C(pi(x)), li ) for 1 ≤ i ≤ m we deduce χA(x) =∏m

i=1 pi(x)li from Lemma 5.5 and Exercises 5.1, Question 2(b). There may be repeti-tions among the polynomials pi(x) but all are separable over F . So χA(x) is separableover F by Definition 6.17.

Suppose χA(x) is separable over F . By Theorem 6.16 and the discussion af-ter it we know A ∼ J1 ⊕ J2 ⊕ · · · ⊕ Jm where Ji = J (pi(x), li ) and pi(x) is ir-reducible over F for 1 ≤ i ≤ m. As pi(x)li is the characteristic polynomial ofJ (pi(x), li) we obtain χA(x) = ∏m

i=1 pi(x)li as above. Therefore each pi(x) is sep-arable over F by Definition 6.17. As J (pi(x),1) = C(pi(x)) = JS(C(pi(x),1)) wesee Ji = J (pi(x), li) ∼ JS(C(pi(x), li ) by Theorem 6.21 for li ≥ 1 and 1 ≤ i ≤ m.Therefore

A ∼ JS(C(p1(x)), l1) ⊕ JS(C(p2(x), l2)) ⊕ · · · ⊕ JS(C(pm(x), lm))

showing that A is similar to a matrix in sJf over F . �

Let p0(x) = x2 + a1x + a0 be separable and irreducible over a field F . Our nexttheorem transforms the companion matrix C(p0(x)l) into the separable Jordan ma-trix JS(C(p0(x))T , l), that is, a specific invertible matrix ZS over F is constructedsatisfying ZSC(p0(x)l)Z−1

S = JS(C(p0(x))T , l). From Theorem 6.21 we know thatZS exists but Theorem 6.21 does not help us find it. Keep the case F = R and thediscussion after Lemma 4.8 in mind as this is the motivation for our next theorem.Every square matrix A over R in pcf Definition 6.14 can be transformed into sJf Def-inition 6.20 and a final ‘tweak’ Corollary 6.25 gives a matrix in real Jordan formDefinition 6.24.

Theorem 6.23

Let p0(x) = x2 + a1x + a0 be separable and irreducible over a field F and letl be a positive integer. Let E = F(c) be an extension field of F where p0(x) =(x − c)(x − c′) and so c′ ∈ E. Regarding p0(x)l as a polynomial over E, denote by M

the E[x]-module (Definition 5.8) determined by C(p0(x)l). Write w = (x − c′)le1.Then w has order (x − c)l in M . Also (x − c)j−1w = vj0 + cvj1 where vj0, vj1 ∈ F 2l

for 1 ≤ j ≤ l. Let ZS be the 2l × 2l matrix over F with eiZ = vjr where i − 1 =2(j − 1) + r , 0 ≤ r < 2 for 1 ≤ i ≤ 2l. Then ZS is invertible over F and satisfies

ZSC(p0(x)l)Z−1S = JS(C(p0(x))T , l).


Proof

The vector e1 ∈ E2l generates M by Theorem 5.26 and so has order p0(x)l =(x − c)l(x − c′)l in M . Therefore

w = (x − c′)le1

has order p0(x)l/gcd{(x − c′)l,p0(x)l} = p0(x)l/(x − c′)l = (x − c)l in M byLemma 5.23. Now E = F ⊕ cF , that is, the additive group of E is the direct sumof its subgroups F and cF = {cv : v ∈ F }. In the same way E2l = F 2l ⊕ cF 2l , thatis, the vector space E2l , which is 4l-dimensional over F , is the direct sum of its sub-spaces F 2l and cF 2l = {cv : v ∈ F 2l} both of which have dimension 2l over F . Notethat E2l is the set of elements of the E[x]-module M . As (x − c)j−1w ∈ M there areunique vectors vj0, vj1 ∈ F 2l with (x − c)j−1w = vj0 + cvj1 for 1 ≤ j ≤ l.

As p0(x) is separable over F we know c �= c′ by Definition 6.17. Let w′ =(x − c)le1. Then w′ has order (x − c′)l in M . Also M = 〈w〉 ⊕ 〈w′〉 by the primarydecomposition theorem 6.12. By Exercises 4.1, Question 4(c) there is a self-inverseautomorphism θ of the field E with (c)θ = c′, (c′)θ = c and (v)θ = v for all v ∈ F ;notice that θ is complex conjugation in the case F = R, E = C as c and c′ are complexconjugates. The automorphism θ of E can be extended to a ring automorphism

�θ of

E[x] by defining

(a0 + a1x + · · · + asxs)

�θ = (a0)θ + (a1)θx + · · · + (as)θxs

for all a0 + a1x + · · · + asxs ∈ E[x]. So

((x − c)j )�θ = ((x − c)θ)j = (x − (c)θ)j = (x − c′)j for 1 ≤ j ≤ l.

We introduce the invertible F -linear mapping θ : M → M defined by

(x1, x2, . . . , x2l )θ = ((x1)θ, (x2)θ, . . . , (x2l )θ) for all xi ∈ E, 1 ≤ i ≤ 2l.

Then θ fixes all vectors in F 2l since θ fixes all elements of F . The mapping θ is notE[x]-linear, but it does have the semi-linearity property

(f (x)v)θ = ((f (x))�θ)((v)θ ) for all f (x) ∈ E[x], v ∈ M (♦)

(Exercises 6.2, Question 9(a)). As θ is self-inverse so also are�θ and θ . Now θ inter-

changes c and c′. Using (♦) we see that θ interchanges w and w′ because

(w)θ = ((x − c′)le1)θ = (x − c′)l�θ(e1)θ = (x − c)le1 = w′

and (w′)θ = w as (θ)−1 = θ . Now

w′ = (w)θ = (v10 + cv11)θ = (v10)θ + (c)θ(v11)θ = v10 + c′v11


using (♦) again and more generally for 1 ≤ j ≤ l

(x − c)j−1w′ = ((x − c′)j−1)�θ(w)θ = ((x − c′)j−1w)θ

= (vj0 + cvj1)θ = (vj0)θ + (c)θ(vj1)θ = vj0 + c′vj1.

We can now construct the 2l × 2l matrices

Y =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

w

(x − c)w...

(x − c)l−1w

w′(x − c′)w′

...

(x − c′)l−1w′

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

v10 + cv11

v20 + cv21...

vl0 + cvl1

v10 + c′v11

v20 + c′v21...

vl0 + c′vl1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

and ZS =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

v10

v11

v20

v21......

vl0

vl1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

By Theorem 6.16 and the discussion following it Y is invertible over E and satisfiesYC(p0(x)l)Y−1 = J (x − c, l) ⊕ J (x − c′, l). Now detY and detZS are closely re-lated: applying eros to Y we obtain detY = (−1)l(l+1)/2(c−c′)l detZS (Exercises 6.2,Question 9(b)). So ZS is invertible over F and its rows form a basis B of F 2l by Corol-lary 2.23.

Regarding p0(x)l as a polynomial over F , let M0 denote the F [x]-module deter-mined by C(p0(x)l) and let α : M0 → M0 denote the F -linear mapping determinedby C(p0(x)l) as in Definition 5.8. Then (v)α = xv = vC(p0(x)l) for all v ∈ F 2l . Wenow use the decomposition E2l = F 2l ⊕ cF 2l above to find the matrix of α relativeto B.

For 1 ≤ j ≤ l the equation x(x − c)j−1w = c(x − c)j−1w + (x − c)jw in M gives

x(vj0 + cvj1) = c(vj0 + cvj1) + vj+10 + cvj+11

= −a0vj1 + vj+10 + c(vj0 − a1vj1 + vj+11)

as c2 = −a1c − a0 and where vl+10 = vl+11 = 0 since (x − c)lw = 0. Comparing‘real’ and ‘imaginary’ parts we obtain for 1 ≤ j < l

xvj0 = 0 × vj0 − a0vj1 + 1 × vj+10 + 0 × vj+11

xvj1 = 1 × vj0 − a1vj1 + 0 × vj+10 + 1 × vj+11and

xvl0 = 0 × vl0 − a0vl1

xvl1 = 1 × vl0 − a1vl1

and these equations in M0 tell us by Definition 6.20 that JS(C(p0(x))T , l) is the matrixof α relative to B. As C(p0(x)l) is the matrix of α relative to the standard basis B0 ofF 2l we conclude ZSC(p0(x)l)Z−1

S = JS(C(p0(x))T , l) by Lemma 5.2. �


Although the details of Theorem 6.23 are lengthy, the construction of the invertible

matrices Y and ZS is relatively straightforward. Remember that xi−1e1 = ei in M as in

Theorem 6.23 for 1 ≤ i ≤ 2l and so the ith entry in w = (x − c′)le1 is the coefficient

of xi−1 in (x − c′)l = (x + a1 + c)l namely(

li−1

)(a1 + c)l−i+1 and the ith entry

in w′ = (x − c)le1 is the coefficient of xi−1 in (x − c)l namely(

li−1

)(−c)l−i+1 for

1 ≤ i ≤ 2l.

We work through the case p0(x) = x2 + x + 1, F = Q, l = 3. The zeros of p0(x)

are c = −1/2 + (√

3/2)i, c′ = −1/2 − (√

3/2)i and so E = Q(c) = Q(√

3i). In

this case w = (x − c′)3e1 = (x + 1 + c)3e1 = ((1 + c)3,3(1 + c)2,3(1 + c),1,0,0)

and w′ = (x − c)3e1 = (−c3,3c2,−3c,1,0,0) have orders (x − c)3 and

(x + 1 + c)3 respectively in the E[x]-module M determined by the companion matrix

C((x2 + x + 1)3). Then

X =

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

w

xw

x2w

w′xw′x2w′

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

(1 + c)3 3(1 + c)2 3(1 + c) 1 0 00 (1 + c)3 3(1 + c)2 3(1 + c) 1 00 0 (1 + c)3 3(1 + c)2 3(1 + c) 1

−c3 3c2 −3c 1 0 00 −c3 3c2 −3c 1 00 0 −c3 3c2 −3c 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

satisfies XC((x2 + x + 1)3)X−1 = C((x − c)3) ⊕ C((x + c + 1)3) by Theorem 5.31.

Eliminating c2 and higher powers of c using c2 = −c − 1 produces

X =

⎛

⎜⎜⎜⎜⎜⎜⎝

−1 3c 3 + 3c 1 0 00 −1 3c 3 + 3c 1 00 0 −1 3c 3 + 3c 1

−1 −3 − 3c −3c 1 0 00 −1 −3 − 3c −3c 1 00 0 −1 −3 − 3c −3c 1

⎞

⎟⎟⎟⎟⎟⎟⎠

.

Applying eros of type (iii) over Q(c) to X and eliminating c2 produces

Y =

⎛

⎜⎜⎜⎜⎜⎜⎝

w

(x − c)w

(x − c)2w

w′(x + c + 1)w′(x + c + 1)2w′

⎞

⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎝

−1 3c 3 + 3c 1 0 0c 2 + 3c 3 + 3c 3 + 2c 1 0

1 + c 3 + 2c 5 + 3c 5 + 2c 3 + c 1

−1 −3 − 3c −3c 1 0 0−1 − c −1 − 3c −3c 1 − 2c 1 0

−c 1 − 2c 2 − 3c 3 − 2c 2 − c 1

⎞

⎟⎟⎟⎟⎟⎟⎠


=

⎛

⎜⎜⎜⎜⎜⎜⎝

v10 + cv11v20 + cv21v30 + cv31

v10 − (c + 1)v11v20 − (c + 1)v21v30 − (c + 1)v31

⎞

⎟⎟⎟⎟⎟⎟⎠

.

The matrix Y transforms C((x2 + x + 1)3) into C((x − c)3) ⊕ C((x − c′)3) by Theo-rem 5.31. Notice detY = detX = R((x−c)3, (x−c′)3) �= 0 by Corollary 5.32. Finally

ZS =

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

v10

v11

v20v21

v30v31

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

−1 0 3 1 0 00 3 3 0 0 0

0 2 3 3 1 01 3 3 2 0 0

1 3 5 5 3 11 2 3 2 1 0

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

is invertible over Q as detZS = −27 by direct computation. Also ZS satisfies

ZS

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

−1 −3 −6 −7 −6 −3

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

0 −1 1 0 0 01 −1 0 1 0 0

0 0 0 −1 1 00 0 1 −1 0 1

0 0 0 0 0 −10 0 0 0 1 −1

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

ZS

as the reader may verify. So ZSC((x2 + x + 1)3)Z−1S = JS(C(x2 + x + 1)T ,3).

As an example of our next corollary notice C(x2 + x + 1)T = ( 0 −11 −1

)is similar

over R to the rotation matrix

R2π/3 =( −1/2

√3/2

−√3/2 −1/2

)

as Z0C(x2 + x + 1)T Z−10 = R2π/3 where

Z0 =(

1 −(1 − √3)/2

1 −(1 + √3)/2

).

The 6 × 6 matrix Z1 = Z0 ⊕ Z0 ⊕ Z0 is invertible over R and transforms


JS(C(x2 + x + 1)T ,3) into

JS(R2π/3,3) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

−1/2√

3/2 1 0 0 0−√

3/2 −1/2 0 1 0 0

0 0 −1/2√

3/2 1 00 0 −√

3/2 −1/2 0 1

0 0 0 0 −1/2√

3/20 0 0 0 −√

3/2 −1/2

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

which is in real Jordan form according to the next definition.

Definition 6.24

A direct sum of matrices, each being either of the type J (x − λ, k) or JS(B, l) (seeDefinitions 6.15 and 6.20) where B = (

a b−b a

)for λ,a, b ∈ R, b > 0, is said to be in

real Jordan form.

So a matrix in real Jordan form is the direct sum of certain separable Jordan blocksover R. For example

J (x − 3,1) ⊕ J (x − 4,2) ⊕ JS(2R2π/3,2)

=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

3 0 0 0 0 0 0

0 4 1 0 0 0 00 0 4 0 0 0 0

0 0 0 −1√

3 1 00 0 0 −√

3 −1 0 1

0 0 0 0 0 −1√

30 0 0 0 0 −√

3 −1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

is in real Jordan form. Notice that (x − 3)(x − 4)2(x2 + 2x + 4)2 is the factorisationinto irreducible polynomials over R of the characteristic polynomial of the above 7×7matrix.

Note also that(

a b−b a

)is the matrix of α :C → C relative to the basis 1, i of the real

vector space C where α is the R-linear mapping ‘multiplication by z0 = a + ib’, thatis, (z)α = z0z for all z ∈C and so (1)α = a+ ib, (i)α = −b+ ia. In geometric terms α

is the composition of the commuting mappings ‘radial expansion by |z0| =√

a2 + b2’and ‘(anticlockwise) rotation through arg z0’, both fixing the origin.

Corollary 6.25

Let A be a t × t matrix over the real field R. Then A is similar to a matrix in realJordan form.


Proof

We construct a matrix which transforms the pcf Definition 6.14 of A into a ma-trix in real Jordan form. By the discussion after Lemma 4.8 the monic factors ofχA(x) which are irreducible over R are of at most two types: x − λ where λ ∈ R

and p0(x) = x2 + a1x + a0 where a21 < 4a0. Let m be the number of elementary

divisors of A. Then the pcf of A is the direct sum of m matrices each being eitherC((x −λ)l) or C((p0(x))l) as above. There is an invertible l × l matrix Z over R withZC((x − λ)l)Z−1 = J (x − λ, l) by Theorem 6.16.

By Theorem 6.23 there is an invertible 2l × 2l matrix ZS over R withZSC(p0(x)l)Z−1

S = JS(C(p0(x))T , l). On completing the square we obtain p0(x) =x2 + a1x + a0 = (x − a)2 + b2 where a = −a1/2 and b =

√a0 − a2

1/4. The reader

can check that Z0 = ( 1 a+b1 a−b

)is invertible over R and satisfies Z0C(p0(x))T Z−1

0 = B

where B = (a b−b a

)as in Definition 6.24. Let Z1 = Z0 ⊕ Z0 ⊕ · · · ⊕ Z0 with

l terms. Then the 2l × 2l matrix Z = Z1ZS is invertible over R and satisfiesZC((p0(x))l)Z−1 = JS(B, l).

Finally the direct sum of the m matrices Z, as described above, one for each directsummand C((x −λ)l) or C((p0(x))l) in the pcf of A, is an invertible t × t matrix overR which transforms the pcf of A into a matrix in real Jordan form. As A is similar toits pcf Definition 6.14 we conclude that A is similar to a matrix in real Jordan form. �

EXERCISES 6.2

1. (a) For each of the matrices A below over the rational field Q

find invertible matrices P(x) and Q(x) over Q[x] such thatP(x)(xI − A)Q(x)−1 is in Smith normal form. Hence find a 3 × 3invertible matrix X over Q with XAX−1 in rational canonical form.Determine Q-bases of the primary components M(A)x and M(A)x+2

of the Q[x]-module M(A). Specify a 3 × 3 invertible matrix Y overQ with YAY−1 in primary canonical form:

(i)

⎛

⎝1 −1 21 −1 2

−1 1 −2

⎞

⎠ ; (ii)

⎛

⎝−1 1 11 −2 −1

−1 1 1

⎞

⎠ ;

(iii)

⎛

⎝−3 1 2−1 −1 2−1 1 0

⎞

⎠ ; (iv)

⎛

⎝−1 1 11 0 1

−1 −1 −3

⎞

⎠ .

Are any two of these matrices similar?(b) Let A be a t × t matrix over a field F . Let p(x) be an irreducible

polynomial over F . Show p(x)|χA(x) ⇔ detp(A) = 0.Hint: Argue by contradiction using Corollaries 4.6 and 6.11.


(c) Let

A =⎛

⎝2 1 −11 0 −12 2 −1

⎞

⎠

over Q. Calculate A2 + I . Hence find the factorisation of χA(x) usingtraceA and (b) above. Determine an invertible 3 × 3 matrix Y over Qwith YAY−1 in primary canonical form.

2. (a) Let A be a t × t matrix over a field F having s invariant factors. Sup-pose χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk where p1(x),p2(x), . . . ,

pk(x) are distinct and irreducible over F . Using the notation intro-duced after Corollary 6.13 write μA(x) = p1(x)ts1p2(x)ts2 · · ·pk(x)tsk .Show M(A)pj (x) = {v ∈ M(A) : pj (x)tsj v = 0} for 1 ≤ j ≤ k.

(b) Let A be a t × t matrix over a field F . Suppose A has t dis-tinct eigenvalues λ1, λ2, . . . , λt in F . Show that A is similar todiag(λ1, λ2, . . . , λt ). Is M(A) cyclic?Hint: Use Theorem 6.12 and bases (row eigenvectors of A) of theprimary components M(A)x−λj

for 1 ≤ j ≤ t .More generally show that A is similar to a diagonal matrix over F ifand only if μA(x) is a product of distinct factors of degree 1 over F .True or false? The t × t matrix A over the field F is similar to adiagonal matrix over F if and only if(i) all its elementary divisors are of degree 1 over F ,

(ii) each invariant factor of A splits into distinct factors of degree 1over F .

(c) Let s and t be positive integers. Let L(s, t) denote the set of sequences(l1, l2, . . . , ls) of s non-negative integers lj with l1 + l2 + · · · + ls = t .Let M(s, t) denote the set of sequences (m1,m2, . . . ,mt+s−1) whereeither mj = 0 or mj = 1 for 1 ≤ j < s + t and m1 + m2 + · · · +ms+t−1 = t . Write (l1, l2, . . . , ls)α = (m1,m2, . . . ,ms+t−1) wheremj = 0 for j = l1 + l2 + · · · + li + i (1 ≤ i < s) and mj = 1 oth-erwise (1 ≤ j < s + t). Show that α : L(s, t) → M(s, t) is a bijectionby constructing α−1. Deduce |L(s, t)| = (

s+t−1t

).

Hint: To get the idea, list the images under α of the 10 elements ofL(3,3).

(d) Let c1, c2, . . . , cs be distinct elements of a field F and write f (x) =(x − c1)(x − c2) · · · (x − cs). Use Theorem 6.12 to show that

(s+t−1

t

)

is the number of similarity classes of t × t matrices A over F sat-isfying f (A) = 0. Are all such matrices diagonalisable (similar to adiagonal matrix)?


Hint: Let lj = dimUj where Uj = {v ∈ F t : vA = cj v} and use (c)above.

(e) Let F be a field with χ(F ) �= 2, that is, 1 �= −1 in F . Determinethe number of similarity classes of t × t matrices A over F with(i) A2 = I ; (ii) A3 = A.Hint: Use (d) above.

(f) Use Corollary 3.17 to show x2 + x + 1 is irreducible over the field Zp

(p prime) if and only if p ≡ −1 (mod 3). Determine the number ofsimilarity classes of t × t matrices A over Zp with p �= 3 satisfyingA3 = I .Hint: Treat the cases p ≡ 1 (mod 3) and p ≡ −1 (mod 3) separately.

(g) Let A be a t × t matrix over the finite field Fq . Show that A is similarto a diagonal matrix over Fq if and only if Aq = A. Find the numberof similarity classes of t × t matrices A over Fq satisfying Aq = A.List diagonal representatives of these similarity classes in the cases(i) t = 4, q = 2; (ii) t = 2, q = 4.Hint: The polynomial xq − x splits into q distinct factors over Fq .

3. (a) Let A be a t × t matrix over a field F . Suppose χA(x) = p(x)nq(x)

where p(x) is a monic irreducible polynomial over F , n is a positiveinteger and gcd{p(x), q(x)} = 1. Show that the primary componentM(A)p(x) = {v ∈ M(A) : p(x)nv = 0} is a submodule of the F [x]-module M(A). Using Corollary 6.11 show M(A)p(x) to be non-trivial.

(b) Let p(x) be a monic irreducible polynomial over a field F and let n

be a positive integer. Use Theorem 5.28 to show that the F [x]-moduleM = M(C(p(x)n)) is indecomposable, that is, M is non-zero anddoes not have non-zero submodules N1 and N2 with M = N1 ⊕ N2.Conversely let N be an indecomposable submodule of the F [x]-module M(A) where A is a t × t matrix over a field F . ShowN ∼= M(C(p(x)n)) where p(x)n|χA(x), p(x) is irreducible over F

and n is a positive integer.Hint: Use Exercises 5.1, Question 5, Theorems 6.5, 6.12 and (a)above.

(c) Let A be a t × t matrix over a field F . Suppose

M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Nr

where Nj is a non-zero submodule of M(A) for 1 ≤ j ≤ r . Showr ≤ t .Suppose r is as large as possible. Show that Nj is indecomposable for1 ≤ j ≤ r . Deduce from (b) above that r is the number of elementarydivisors of A.

(d) Construct a proof of Theorem 6.12 using Theorem 3.10 as a guide.


4. (a) Let p be prime and let A be a t × t matrix over Zp . The image of theevaluation homomorphism εA : Zp[x] → Mt (Zp) is the commutativesubring im εA = {f (A) ∈ Mt (Zp) : f (x) ∈ Zp[x]} of Mt (Zp). Showthat ϕ : im εA → im εA, given by (X)ϕ = Xp for all X ∈ im εA, is aring homomorphism. Show also that ϕ is Zp-linear.Hint: Use the method of Exercises 2.3, Question 5(a).Show μAp(x)|μA(x). Suppose μA(x) is irreducible over Zp . DeduceAp ∼ A and find the connection between ϕ, εA and the Frobeniusautomorphism θ of the field F [x]/〈μA(x)〉.Let A = C(x2 + x + 1) ⊕ C(x2 + x + 1) and B = C((x2 + x + 1)2)

over Z2. Decide whether or not A2 ∼ A and B2 ∼ B .(b) Let p be prime and let A be a t × t matrix over Zp . Suppose μA(x) =

p0(x)m where p0(x) is monic and irreducible of degree n over Zp andm ≥ 2. Use ϕ above to find a non-zero polynomial g(x) over Zp withg(Ap) = 0 and degg(x) < mn. Deduce Ap � ∼A.Hint: Consider first the case 1 < m ≤ p.

(c) Let A be a t × t matrix over Zp . Use Corollary 6.13 to show χA(x) =χAp(x). Show also A ∼ Ap if and only if the minimum polynomialμA(x) is a product of distinct irreducible factors over Zp .Hint: Use (a) and (b) above.

5. (a) Determine the number of similarity classes of 12 × 12 matrices A

over the rational field Q having characteristic polynomial χA(x) =(x−1)3x4(x+1)5. How many different minimum polynomials μA(x)

are there among these classes? Specify the partitions of 3, 4 and 5such that (i) A is cyclic, (ii) A is diagonalisable (similar to a diagonalmatrix over Q).Construct the table of ∼=M(A) specified by the partitions (1,2), (1,3),(1,1,1,2) of 3, 4, 5. List the invariant factors of A. Find μA(x) andexpress the rcf and pcf of A as direct sums of companion matrices.

(b) Determine the number of similarity classes of 18 × 18 matrices A

over the rational field Q having χA(x) = (x − 1)5(x2 + 1)4(x + 1)5.Construct the table of ∼=M(A0) specified by the partitions (1,2,2),(1,1,2), (1,1,3) of the exponents of x −1, x2 +1, x +1 respectively.List the invariant factors of A0. Using Exercises 6.1, Question 2(a), listthe invariant factors of −A0 and construct the table of ∼= M(−A0). Is−A0 ∼ A0? How many of these classes are such that −A ∼ A? Howmany of these classes are such that A ∼ A−1?

(c) Determine the number of similarity classes of 22×22 matrices A overthe rational field Q having

χA(x) = (x − 1)5(x + 1)5(x − 2)6(x − 1/2)6.


How many of these classes are such that −A ∼ A? How many of theseclasses are such that A ∼ A−1?

6. (a) Let A = (aij ) be a t × t matrix over a field F such that aij = 0 fori +1 < j where t ≥ 2. Show that the F [x]-module M(A) is generatedby e1 if and only if aii+1 �= 0 for 1 ≤ i < t .

(b) Working over the rational field Q, use Theorem 6.16 to find invertiblematrices Z1 and Z2 satisfying

Z1C((x + 1)3)Z−11 = J (x + 1,3) and

Z2C((x2 + 1)2)Z−12 = J (x2 + 1,2).

Hence find Z with Z(C((x + 1)3) ⊕ C((x2 + 1)2))Z−1 in Jnf (Defi-nition 6.15).

(c) Which (if any) pairs of the following 6 × 6 matrices over a field F

are similar? C((x + 1)6), J ((x + 1)2,3), J ((x + 1)3,2), J (x + 1,6),J ((x + 1)6,1). Which of these matrices is in Jnf?

(d) Write A = C(x3(x + 1)2) over an arbitrary field F . Specify invertiblematrices Y and Z over F such that YAY−1 is in pcf and ZAZ−1 is inJnf. Find the Jnf of A2 and the Jnf of A + A2.

(e) Let λ be an element of a field F . Show that there are p(t) similarityclasses of t × t matrices A over F with χA(x) = (x − λ)t , where p(t)

is the number of partitions Definition 3.13 of the positive integer t .Taking t = 5, list representatives of these classes in (i) pcf, (ii) Jnf.Express the rank of the t × t matrix J − λI in terms of s and t where

J =s∑

j=1

⊕J (x − λ, tj ).

Write B = J (x − 1, t) over a field F where t ≥ 2. Show B ∼ B2 ifand only if χ(F ) �= 2. Find the Jnf of B2 in the case χ(F ) = 2.Hint: B = I + C(xt ).

7. (a) The polynomials

f (x) =m∑

i=0

aixi and g(x) =

n∑

j=0

bjxj

are over an arbitrary field F . Establish the rules of formal differentia-tion Definition 6.18:(i) (f (x) + g(x))′ = f ′(x) + g′(x), (cf (x))′ = cf ′(x) for c ∈ F ,

(ii) (f (x)g(x))′ = f ′(x)g(x) + f (x)g′(x).Using induction on the positive integer (i) and (ii) above, show(g(x)i)′ = ig(x)i−1g′(x). Deduce the ‘function of a function’ rule:(f (g(x)))′ = f ′(g(x))g′(x).


(b) Let p be a prime and write F = Zp(y) for the field of fractions ofZp[x] (see the discussion after Definition 6.17). Let E be an exten-sion field of F containing a zero c of the polynomial xp − y over F .Show xp −y = (x −c)p over E and deduce that the monic irreduciblefactors of xp − y over F are equal. Show c /∈ F and hence show thatxp − y is irreducible over F . Is xp − y inseparable over F ? Are yourconclusions unchanged on replacing Zp by an arbitrary field F0 ofcharacteristic p?

(c) Let i be a non-negative integer and let

f (x) = anxn + an−1x

n−1 + · · · + a1x + a0

be a polynomial over the field F . The polynomial

Hi(f (x)) =n∑

j≥i

aj

(j

i

)xj−i

over F is called the ith Hasse derivative of f (x). Showf (x) → Hi(f (x)) is a linear mapping of the vector space F [x]over F .Which f (x) ∈ Z2[x] satisfy f ′′(x) = 0 and which satisfyH2(f (x)) = 0?Hint: Consider f (x) = xj . Describe H2(H2(f (x))) for f (x) ∈ Z2[x].Show i!Hi(f (x)) = f (i)(x) and H ′

i (f (x)) = (i + 1)Hi+1(f (x)) forall f (x) ∈ F [x]. Suppose the characteristic χ(F ) of F is not a divisorof i!. Show Hi(f (x)) = f (i)(x)/i!. Suppose χ(F ) is not a divisor ofi + 1. Show Hi+1(f (x)) = Hi(f (x))/(i + 1).

(d) Let B be an n × n matrix over a field F . Calculate the 4 × 4 matri-ces JS(B,4)2, JS(B,4)3, JS(B,4)4 having entries in Mn(F ) whereJS(B,4) is the separable Jordan block matrix (Definition 6.20). Let l

be a positive integer and let B and N be the ln × ln matrices definedin the proof of Theorem 6.21. Show

f (JS(B, l)) =l−1∑

i=0

Hi(f (B))N i

for all f (x) ∈ F [x] where f (x) and

Hi(f (B)) =n∑

j≥i

aj

(j

i

)Bj−i

are as in (c) above. Hence find a formula for the n×n matrix entries ?in the partitioned ln × ln matrix χB(J (B, l)) constructed in the proofof Theorem 6.21.


8. (a) Find the minimum polynomial μA(x) where A = JS(C(x2), l) is theseparable Jordan block matrix (Definition 6.20) over a field F andl ≥ 2. (The answer depends on whether or not χ(F ) is a divisor of l.)Hint: Write A = C + N as in Theorem 6.21 where C is the direct sumof l matrices C(x2).What is the value of nullity A? Find the rcf of A.

(b) Let A and A′ be respectively an n × n matrix and an n′ × n′ ma-trix over a field F and let l be a positive integer. Find an invertiblel(n + n′) × l(n + n′) matrix X over F with

X(JS(A, l) ⊕ JS(A′, l)) = JS(A ⊕ A′, l)X.

Hint: To get started take n = n′ = 1 and l = 2. Then try n, n′ arbitrary,l = 3, before tackling the general case.Let A, B , X be n × n matrices over F with AX = XB , X invertibleover F . Use X = X ⊕ X ⊕ · · · ⊕ X (the direct sum of l matrices X)to show JS(A, l) ∼ JS(B, l).Let f (x) and g(x) be monic polynomials of positive degree over F

with gcd{f (x), g(x)} = 1. Using Theorem 5.31 deduceJS(C(f (x)g(x)), l) ∼ JS(C(f (x)), l) ⊕ JS(C(g(x)), l).

9. (a) Let M be an R-module where R is a non-trivial commutative ring. Anadditive mapping α : M → M is called semi-linear if there is an au-tomorphism θ of R such that (av)α = (a)θ(v)α for all a ∈ R, v ∈ M .Let α and β be semi-linear mappings of M . Show that αβ is semi-linear. Show that α bijective implies α−1 semi-linear.Let θ ∈ AutR and let t be a positive integer. Show that θ : Rt → Rt ,defined by (a1, a2, . . . , at )θ = ((a1)θ, (a2)θ, . . . , (at )θ) for all(a1, a2, . . . , at ) ∈ Rt , is semi-linear.

(b) Let E be a field containing c, c′ where c �= c′ and let l be positiveinteger. Let vj0 and vj1 belong to E2l for 1 ≤ j ≤ l. Let Y denotethe 2l × 2l matrix over E with eiY = vi0 + cvi1 for 1 ≤ i ≤ l andeiY = vi−l0 + c′vi−l1 for l < i ≤ 2l. Let ZS be the 2l × 2l matrix overE with e2jZS = vj0, e2j−1ZS = vj1 for 1 ≤ j ≤ l (see the proof ofTheorem 6.23). Show detY = (−1)l(l+1)/2(c − c′)l detZS .

(c) Write p0(x) = x2 − 2x + 2 over the real field R. Is p0(x) irreducibleand separable over R? Find the integer entries in the 6 × 6 matrixZS of Theorem 6.23 satisfying ZSC(p0(x)3) = JS(C(p0(x))T ,3)ZS .Find the value of detZS . Is ZS invertible over Q? Find an invertible6 × 6 matrix Z over R with ZC(p0(x)3)Z−1 in real Jordan form.


6.3 Endomorphisms and Automorphisms of M(A)

Let M(A) denote, as usual, the F [x]-module determined (Definition 5.8) by the t × t

matrix A over the field F . We study here the endomorphism ring EndM(A), thatis, the ring of F [x]-linear mappings β : M(A) → M(A), the ring multiplication beingcomposition of mappings. Each β in EndM(A) is an additive mapping (Definition 2.3)of the abelian group (F t ,+) and so EndM(A) is a subring of the ring End(F t ,+)

discussed in Lemma 3.14. Now End(F t ,+) is a t2-dimensional vector space overF having EndM(A) as a subspace. The dimension of EndM(A) is determined byFrobenius’ theorem (Corollary 6.34) in terms of the invariant factors (Definition 6.8)of A. We’ll see that EndM(A) is isomorphic (both as a ring and a vector space) to thecentraliser Z(A) where

Z(A) = {B ∈ Mt (F ) : AB = BA}consists of all matrices B which commute with the given matrix A. The index of thegroup GLt (F ) ∩ Z(A) = U(Z(A)) in GLt (F ) is the size of the similarity class of A.Finally, knowing the irreducible factorisation of χA(x) over F , we analyse the groupAutM(A) ∼= U(Z(A)) of invertible endomorphisms of M(A).

We start the details by reminding the reader that μA(x) denotes the minimumpolynomial (Corollary 6.11) of A. Notice μA(x) = x − c if and only if A = cI inwhich case F(c) = F as c ∈ F and EndM(A) ∼= Mt (F ) as discussed in Example 5.9c.Our first theorem, which is the polynomial analogue of Theorem 3.16, shows thatM(A) and its endomorphism ring are easily described should μA(x) be irreducibleover F .

Theorem 6.26

Let A denote a t × t matrix over a field F and suppose μA(x) is irreducible over F .Then χA(x) = μA(x)s where s = t/m and m = degμA(x). Let F(c) be the extensionfield of F obtained by adjoining a zero c of μA(x) to F . Writing

f (c)v = f (x)v for all v ∈ F tand all f (x) ∈ F [x]gives the F [x]-module M(A) the structure of a vector space of dimension s over F(c).Also N is a submodule of the F [x]-module M(A) if and only if N is a subspace of theF(c)-module M(A). Further EndM(A) ∼= Ms(F (c)) and AutM(A) ∼= GLs(F (c)).

Let the F(c)-module M(A) have basis v1, v2, . . . , vs . Denote by X the t × t matrixover F with eiX = xrvj where i − 1 = (j − 1)m + r,0 ≤ r < m, 1 ≤ i ≤ t . Then X isinvertible over F and

XAX−1 = C(μA(x)) ⊕ C(μA(x)) ⊕ · · · ⊕ C(μA(x)) (s terms).

6.3 Endomorphisms and Automorphisms of M(A) 307

Proof

The proof of Theorem 6.26, except the last paragraph, is omitted as it is closely analo-gous to that of Theorem 3.16 (Exercises 6.3, Question 1(e)). As for the last paragraph,let j − 1 and r be the quotient and remainder on dividing i − 1 by m and so 1 ≤ j ≤ s

as 0 ≤ i − 1 < ms. Write p(x) = μA(x) to remind ourselves that μA(x) is irreducibleover F . All non-zero vectors in F t have order p(x) in the F [x]-module M(A) asp(x)v = vp(A) = vμA(A) = v × 0 = 0. So the cyclic submodule Nj = 〈vj 〉 hasF -basis Bvj

consisting of vj , xvj , . . . , xm−1vj for 1 ≤ j ≤ s by Corollary 5.29. The

restriction of the linear mapping determined by A to Nj has matrix C(p(x)) relativeto Bvj

by Corollary 5.27 for 1 ≤ j ≤ s. As v1, v2, . . . , vs is a basis of the F(c)-moduleM(A) we see that the F [x]-module M(A) decomposes M(A) = N1 ⊕N2 ⊕ · · ·⊕Ns .Therefore B = Bv1 ∪ Bv2 ∪ · · · ∪ Bvs is a basis of F t by Lemma 5.18. The matrix X,as specified above, has the vectors of B as its rows and so is invertible over F byCorollary 2.23. Finally X satisfies

XAX−1 = C(p(x)) ⊕ C(p(x)) ⊕ · · · ⊕ C(p(x))

the direct sum of s companion matrices C(p(x)), by Corollary 5.20. �

To illustrate Theorem 6.26 consider

A =

⎛

⎜⎜⎝

1 1 0 11 1 −1 2

−4 −1 1 −3−3 −2 1 −3

⎞

⎟⎟⎠

over Q. By direct calculation the reader can check A2 = −I . So μA(x) is a monicand non-constant divisor over Q of x2 + 1 by Corollary 6.10. As x2 + 1 is irreducibleover Q we deduce μA(x) = x2 + 1. Without further calculation the invariant factorsof A are x2 + 1, x2 + 1 (what else could they be?) and hence χA(x) = (x2 + 1)2. Inthis case reducing xI − A to Smith normal form is not the easiest way to find X withXAX−1 in rcf. Rather first pick any non-zero vector as v1 ∈Q

4 in Theorem 6.26, sayv1 = e1, and secondly pick any vector v2 not in 〈v1, xv1〉 = 〈(1,0,0,0), (1,1,0,1)〉,for instance v2 = e2. Then v1, xv1, v2, xv2 is a basis B of Q4 and

X =

⎛

⎜⎜⎜⎝

v1

xv1

v2

xv2

⎞

⎟⎟⎟⎠

=

⎛

⎜⎜⎜⎝

1 0 0 01 1 0 1

0 1 0 01 1 −1 2

⎞

⎟⎟⎟⎠

satisfies

XAX−1 = C(x2 + 1) ⊕ C(x2 + 1).


Incidentally this matrix XAX−1 is in pcf and Jnf as well as rcf. Writing iv =xv for all v ∈ M(A) turns M(A) into a 2-dimensional vector space over Q(i) ={a + ib : a, b ∈ Q} where, as usual, i2 = −1. The submodule 〈v1, xv1〉 of theQ[x]-module M(A) is a 2-dimensional subspace over Q and also a 1-dimensionalsubspace over Q(i). Finally, there is one matrix X with XAX−1 in rcf for each basisv1, v2 of M(A) over Q(i).

Now suppose A to be an arbitrary t × t matrix over a field F . There is a closeconnection between endomorphisms of M(A) and matrices B which commute withA as we now demonstrate.

Theorem 6.27

Let A be a t × t matrix over a field F and let the F -linear mapping β : F t → F t havematrix B relative to the standard basis B0 of F t . Then β is an endomorphism of M(A)

if and only if AB = BA.Suppose M(A) to be a cyclic F [x]-module. Then each matrix which commutes

with A is of the type B = f (A) where f (x) ∈ F [x], degf (x) < t . Also β is anautomorphism of M(A) if and only if gcd{f (x),χA(x)} = 1.

Proof

Let β be an endomorphism of M(A). Then (xv)β = x((v)β) for all v ∈ F t by Def-inition 2.24. As xv = vA and (v)β = vB for all v ∈ F t , on taking v = ei we obtaineiAB = (xei)β = x((ei)β) = eiBA for 1 ≤ i ≤ t . So AB and BA have equal rowsshowing AB = BA.

Conversely suppose AB = BA. Then (xv)β = vAB = vBA = x((v)β) for allv ∈ F t . From Lemma 5.15 with β in place of θ , we deduce that β is an endomorphismof M(A).

Suppose M(A) is cyclic with generator v0 and let B be a t × t matrix over F

satisfying AB = BA. Let β be the endomorphism of M(A) defined by (v)β = vB

for all v ∈ F t . We use the basis Bv0 of F t consisting of v0, xv0, x2v0, . . . , x

t−1v0

introduced in (5.24). There are unique scalars ai−1 where 1 ≤ i ≤ t such that (v0)β =a0v0 + a1xv0 + a2x

2v0 + · · · + at−1xt−1v0. Write f (x) = a0 + a1x + a2x

2 + · · · +at−1x

t−1 and let γ : F t → F t be the F -linear mapping defined by (v)γ = vf (A) forall v ∈ F t . As A and f (A) commute, γ is an endomorphism of M(A) by the firstpart of the proof. As (v0)β = (v0)γ and v0 generates M(A) we deduce β = γ . As thematrices of β and γ relative to B0 are respectively B and f (A) we conclude B = f (A)

where degf (x) < t .Suppose that β is an automorphism of M(A). Then β−1 is also an automorphism

of M(A) and B−1 is its matrix relative to B0. So there is a(x) ∈ F [x] with B−1 = a(A)

by the above paragraph. On dividing a(x)f (x) by χA(x) we obtain a(x)f (x) =


q(x)χA(x) + r(x) where q(x), r(x) ∈ F [x] and deg r(x) < t = degχA(x). ThereforeI = a(A)f (A) = q(A)χA(A) + r(A) = r(A) by Corollary 6.11. So r ′(x) = r(x) − 1satisfies r ′(A) = 0 with deg r ′(x) < t = degμA(x) as μA(x) = χA(x) by Corol-lary 6.10. The conclusion is: r ′(x) is the zero polynomial, that is, r(x) = 1. Fromf (x)g(x) = q(x)χA(x) + 1 we deduce gcd{f (x),χA(x)} = 1.

Conversely suppose gcd{f (x),χA(x)} = 1. By Corollary 4.6 there area(x), b(x) ∈ F [x] with a(x)f (x)+b(x)χA(x) = 1. Evaluating this polynomial equal-ity at A gives a(A)f (A) = I as χA(A) = 0 by Corollary 6.11. So B = f (A) is invert-ible over F with inverse B−1 = a(A). Therefore β is an automorphism of M(A). �

There are three important facts in Theorem 6.27. First, endomorphisms β of M(A)

correspond to matrices B which commute with A.

Definition 6.28

Let A be a t × t matrix over a field F . Then

Z(A) = {B ∈ Mt (F ) : AB = BA}is called the centraliser of A.

The centraliser Z(A) is a subring of the ring Mt (F ) of t × t matrices over F andZ(A) is also a subspace of the t2-dimensional vector space Mt (F ) over F . For thisreason Z(A) is called an algebra over F . Denote by (β)θ the matrix of β relativeto the standard basis B0 of F t for all β ∈ EndM(A). Then θ : EndM(A) ∼= Z(A) isan algebra isomorphism (an isomorphism of rings and vector spaces) (Exercises 6.3,Question 2(b)).

In the case of a cyclic M(A) the matrices I,A,A2, . . . ,At−1 are F -independentand from Theorem 6.27 these matrices span Z(A). So

t = dimZ(A) = dim EndM(A) for cyclic F [x]-modules M(A)

which is the second important fact contained in Theorem 6.27.The order |AutM(A)| of the automorphism group of a cyclic module M(A) can

be found, in the case of a finite field F of order q , using a polynomial version ofthe Euler φ-function as we now explain. So suppose |F | = q and for each non-zeropolynomial g(x) over F let

Φq(g(x)) denote the number of polynomials f (x) over F

with gcd{f (x), g(x)} = 1,degf (x) < degg(x).

Then thirdly from Theorem 6.27 we obtain:


Let A be a t × t matrix over a finite field F of order q with M(A) cyclic.

Then |EndM(A)| = qt and |AutM(A)| = Φq(χA(x)).

The value of Φq(g(x)) can be calculated in the same way as φ(n). Specifically

Φq(g(x)h(x)) = Φq(g(x))Φq(h(x)) for gcd{g(x),h(x)} = 1,

that is, Φq is multiplicative. Also

Φq(p(x)n) = qmn − qm(n−1)

where p(x) is irreducible of degree m over the finite field F of order q (Exercises 6.3,Question 2(c)).

For example let A = C(x3(x2 + x + 1)) over Z2. Then M(A) is cyclic with gener-ator e1 and χA(x) = x3(x2 + x + 1) by Theorem 5.26. As Φ2(x

3) = 23 − 22 (the fourrelevant polynomials are 1, x + 1, x2 + 1, x2 + x + 1) and Φ2(x

2 + x + 1) = 22 − 1(the three relevant polynomials are 1, x, x + 1) we see Φ2(χA(x)) = 4 × 3 = 12. So|EndM(A)| = 25 = 32 and |AutM(A)| = 12 by Theorem 6.27. Knowing the orderof AutM(A) the number of 5 × 5 matrices similar to A can be found using our nexttheorem. In this case the size of the similarity class of A is

|GL5(Z2)|/|AutM(A)| = (25 − 1)(25 − 2)(25 − 22)(25 − 23)(25 − 24)/12

= 833280.

Theorem 6.29

Let A be a t × t matrix over a field F . Write G = GLt (F ) and HA = G ∩ Z(A). ThenHA = U(Z(A)) is a subgroup of the multiplicative group G and AutM(A) ∼= HA.The correspondence

HAX → X−1AX for X ∈ G

between the set of left cosets HAX of HA in G and the similarity class of A is unam-biguous and bijective. For a finite field F there are |GLt (F )|/|HA| matrices similarto A.

Proof

We leave the routine verification that HA = U(Z(A)) is a subgroup of G to the readerand also that θ : AutM(A) ∼= HA is a group isomorphism, where (β)θ is the matrixof β relative to the standard basis B0 of F t for all β ∈ AutM(A) (Exercises 6.3,Question 2(b)).


For X,Y ∈ G write X ≡ Y if there is B ∈ HA with BX = Y . Then ≡ is an equiv-alence relation on G (it is the analogue of Lemma 2.9 for multiplicative groups). Theequivalence class of X is HAX = {BX : B ∈ HA} which we call the (left) coset ofHA in G with representative X. The set of all such cosets of HA in G constitutes apartition of G as each is non-empty and each matrix in G belongs to exactly one ofthem.

Suppose HAX = HAY . Then Y ∈ HAX and there is B ∈ HA with BX = Y . AsB ∈ Z(A) we know AB = BA and so B−1AB = A. Therefore

Y−1AY = (BX)−1ABX = X−1B−1ABX = X−1AX

showing that the correspondence HAX → X−1AX is unambiguously defined, as itdoes not depend on the choice of coset representative X or Y .

Conversely suppose X−1AX = Y−1AY where X,Y ∈ G. Reversing the abovesteps we obtain B ∈ Z(A) where B = YX−1 and so B ∈ G ∩ Z(A) = HA. So X ≡ Y ,that is, HAX = HAY . We have now shown that HAX → X−1AX is injective. FromDefinition 5.4 we conclude that this correspondence is surjective.

Let F be a finite field. Then G and HA are finite groups. Each coset of HA

in G consists of |HA| elements. As these cosets partition G we see that theirnumber is |G|/|HA|, the index of HA in G. Using the above correspondence|GLt (F )|/|AutM(A)| is the size of the similarity class of A. �

The correspondence of Theorem 6.29 is a special case of a general fact, theorbit-stabiliser theorem, in the theory of permutation representations. Here eachX ∈ GLt (F ) = G gives rise to a permutation (bijection) π(X) of the set Mt (F ) de-fined by: (A)π(X) = X−1AX for all A ∈Mt (F ), that is, the image of each A by π(X)

is X−1AX. Notice

π(XX′) = π(X)π(X′) for all X,X′ ∈ G

showing that π is a permutation representation of G, that is, π is a homomorphismfrom G to a group of permutations. Notice (A′)π(X) = A where A and A′ are similarDefinition 5.3.

The orbit of A is

{(A)π(X) : X ∈ G} = {X−1AX : X ∈ G},which is the similarity class of A.

The stabiliser of A is

{X ∈ G : (A)π(X) = A} = {X ∈ G : X−1AX = A}= {X ∈ G : AX = XA} = G ∩ Z(A) = HA.


The orbit-stabiliser theorem asserts that, in a general context, the length of theorbit of A equals the index of the stabiliser of A in G (Exercises 6.3, Question 3(a)).

Consider the particular case M2(Z3). By Exercises 6.1, Question 6(a) the similar-ity classes of 2×2 matrices A over Z3 are 12 = 3+9 in number since they correspondto the 12 possible minimum polynomials μA(x). The 3 similarity classes of scalar ma-trices 0, I , −I over Z3 are singletons (they each consist of one matrix only) and haveminimum polynomials x, x − 1, x + 1 respectively.

The 9 similarity classes with degμA(x) = 2 are such that M(A) is cyclic and sothe second part of Theorem 6.27 can be used. Consider first μA(x) = x2 + 1 whichis irreducible over Z3. There are 9 − 1 = 8 polynomials f (x) of degree at most 1over Z3 with gcd{f (x), x2 + 1} = 1, namely all f (x) = a1x + a0 except the zeropolynomial, and so Φ3(x

2 + 1) = 8. As |GL2(Z3)| = (32 − 1)(32 − 3) = 48 we seethat the similarity class of C(x2 + 1) over Z3 consists of exactly 48/8 = 6 matricesby Theorem 6.29. The reader can check

±(

0 1−1 0

), ±

(1 −1

−1 −1

), ±

(−1 −1−1 1

)

all satisfy det = 1, trace = 0 and so are the 6 matrices in the similarity class ofC(x2 + 1) over Z3. As x2 + x − 1 and x2 − x − 1 are irreducible over Z3 (theyhave no zeros in Z3, – now use Lemma 4.8(ii)) we see that the similarity classes ofC(x2 + x − 1) and C(x2 − x − 1) over Z3 also each contain exactly 6 matrices.

Now consider μA(x) = x2. There are 32 − 3 = 6 polynomials f (x) of degree atmost 1 over Z3 with gcd{f (x), x2} = 1 namely all f (x) = a1x + a0 except those witha0 = 0. So Φ3(x

2) = 6 and by Theorem 6.29 there are precisely 48/6 = 8 matricessimilar to C(x2) over Z3 (they all have rank 1, trace 0). In the same way the similarityclass of C((x + 1)2) over Z3 contains exactly 8 matrices and the similarity class ofC((x − 1)2) over Z3 also contains exactly 8 matrices.

Finally consider μA(x) = x(x +1). The polynomials ±1, ±(x −1) and only thesecontribute to Φ3(x(x + 1)) = 4. By Theorem 6.29 there are precisely 48/4 = 12 ma-trices similar to C(x(x +1)) over Z3 (they all have rank = 0, trace = −1). In the sameway the similarity class of C(x(x − 1)) over Z3 contains exactly 12 matrices and thesimilarity class of C((x + 1)(x − 1)) over Z3 also contains exactly 12 matrices.

We have now accounted for the similarity classes of all 34 = 81 matrices inM2(Z3). It is reassuring that 81 = 1 + 1 + 1 + 6 + 6 + 6 + 8 + 8 + 8 + 12 + 12 + 12as the similarity classes partition M2(Z3). Now A ∈ GLt (F ) ⇔ μA(0) �= 0. So the48 matrices of GL2(Z3) partition into 8 similarity (conjugacy) classes and 48 =1 + 1 + 6 + 6 + 6 + 8 + 8 + 12. In the same way the number and sizes of the similarityclasses of matrices in M2(Fq) can be found (Exercises 6.3, Question 2(e)).

Our next task is to study the algebra EndM(A) in the case of an arbitrary t × t ma-trix A over a field. For cyclic M(A) this has already been done in Theorem 6.27. The


general case can be ‘cracked’ by decomposing M(A) as in Theorem 6.5 and using cer-tain s × s matrices over F [x] which represent endomorphisms of M(A). We are usedto specifying linear mappings of F t via their matrices relative to a convenient basis.Square matrices over Z are used to describe endomorphisms of finite abelian groups(Exercises 3.3, Question 5(a)). The same method, with a few minor adjustments, doesthe trick yet again!

Definition 6.30

Let A be a t × t matrix over a field F having s invariant factors di(x) for 1 ≤ i ≤ s.By Theorem 6.5 there are vectors vi of order di(x) in M(A) such that

M(A) = 〈v1〉 ⊕ 〈v2〉 ⊕ · · · ⊕ 〈vs〉.Let β ∈ EndM(A). Then

(vi)β = bi1(x)v1 + bi2(x)v2 + · · · + bis(x)vs

=s∑

j=1

bij (x)vj for 1 ≤ i, j ≤ s,

where bij (x) ∈ F [x]. The s × s matrix B(x) = (bij (x)) is said to represent β relativeto the ordered set of generators v1, v2, . . . , vs of M(A). If degbij (x) < degdj (x) for1 ≤ i, j ≤ s then B(x) is said to be reduced.

It is not correct to refer to B(x) as the matrix of β relative to v1, v2, . . . , vs as thereare many matrices B(x) as in Definition 6.30. For example the zero endomorphism ofM(A) is represented by the zero s × s matrix over F [x] as well as (for instance)B(x) = (bij (x)) where bi1(x) = d1(x), bij (x) = 0 for 1 ≤ i ≤ s, 1 < j ≤ s sinced1(x)v1 = 0. However we will see in Theorem 6.32 that each endomorphism β ofM(A) is represented by a unique reduced matrix B(x). Notice that B(x) in Ms(F [x])is reduced if and only if each entry in col j is of smaller degree than dj (x) for1 ≤ j ≤ s.

Except in the special case d1(x) = ds(x), that is, all the invariant factors of A areequal, there are some s × s matrices over F [x] which do not represent any endomor-phism of M(A) as we now explain.

Definition 6.31

Let d1(x), d2(x), . . . , ds(x) denote the invariant factors of the t × t matrix A over thefield F . The s × s matrix B(x) = (bij (x)) over F [x] is said to satisfy the endomor-phism condition relative to M(A) (e.c.rel. M(A)) if

(dj (x)/di(x))|bij (x) for 1 ≤ i < j ≤ s.


For example suppose A = C(x) ⊕ C(x2) and F = Zp . So s = 2, t = 3 andd1(x) = x, d2(x) = x2. In this case the 2 × 2 matrix B(x) = (bij (x)) satisfies e.c.rel.M(A) if and only if x|b12(x). There are p5 reduced matrices satisfying e.c.rel. M(A)

namely those of the type

B(x) =(

c11 c12x

c21 c22x + c′22

)for c11, c12, c21, c22, c

′22 ∈ Zp

as the entries in col 1 are of degree less than 1 = degx and the entries in col 2are of degree less than 2 = degx2. By Corollary 6.34 we will see that M(A) hasp5 endomorphisms corresponding to the above p5 matrices. Further M(A) hasp3(p − 1)2 automorphisms corresponding to reduced matrices B(x) with c11 �= 0,c′

22 �= 0, that is, the matrix( c11 0

c21 c′22

)of remainders on division by x is invertible over

Zp (see Lemma 6.35).

Theorem 6.32

Let A be a t × t matrix over a field F and let β ∈ EndM(A). Using the notation ofDefinition 6.30 let B(x) = (bij (x)) represent β relative to v1, v2, . . . , vs . Then B(x)

satisfies e.c.rel. M(A). There is a unique reduced matrix B(x) which represents β

relative to v1, v2, . . . , vs . The set RA of all s × s matrices B(x) over F [x] satisfyinge.c.rel. M(A) is a subring of Ms(F [x]).

Proof

We begin by reformulating the endomorphism condition in a more usable way.Suppose B(x) = (bij (x)) satisfies Definition 6.31, that is, (dj (x)/di(x))|bij (x) for1 ≤ i < j ≤ s. On multiplying through by di(x) we obtain dj (x)|di(x)bij (x) whichcan be written as di(x)bij (x) ≡ 0 (mod dj (x)). These latter conditions hold for alli and j in the range 1 ≤ i, j ≤ s (they hold for i ≥ j as then dj (x)|di(x)). There-fore

B(x) = (bij (x)) satisfies e.c.rel. M(A) if and only ifdi(x)bij (x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s.

Suppose B(x) = (bij (x)) represents β relative to v1, v2, . . . , vs . Then (vi)β =bi1(x)v1 + bi2(x)v2 + · · · + bis(x)vs for 1 ≤ i ≤ s. Multiplying this equation by theorder di(x) of vi in M(A) gives

0 = (di(x)vi)β = di(x)((vi)β)

= di(x)bi1(x)v1 + di(x)bi2(x)v2 + · · · + di(x)bis(x)vs

for 1 ≤ i ≤ s. Using the independence Definition 2.14 of the submodules 〈vi〉 inthe direct sum M(A) = 〈v1〉 ⊕ 〈v2〉 ⊕ · · · ⊕ 〈vs〉, we deduce di(x)bij (x)vj = 0 for


1 ≤ i, j ≤ s from the above equation. As dj (x) is the order of vj in M(A) we concludedi(x)bij (x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s, that is, the s × s matrix B(x) = (bij (x))

over F [x] satisfies e.c.rel. M(A).For 1 ≤ i, j ≤ s let rij (x) be the remainder on dividing bij (x) by dj (x). Then

bij (x)vj = rij (x)vj as dj (x)vj = 0 for 1 ≤ i, j ≤ s. From Definition 6.30 we see thatthe s × s matrix (rij (x)) is reduced and represents β relative to v1, v2, . . . , vs . Con-versely suppose the s × s matrix (r ′

ij (x)) to be reduced and represent β relative to

v1, v2, . . . , vs . Subtracting (vi)β = ∑sj=1 r ′

ij (x)vj from (vi)β = ∑sj=1 rij (x)vj gives

0 = ∑sj=1(rij (x) − r ′

ij (x))vj for 1 ≤ i ≤ s. The independence property of the inter-nal direct sum M(A) = 〈v1〉 ⊕ 〈v2〉 ⊕ · · · ⊕ 〈vs〉 now gives (rij (x) − r ′

ij (x))vj = 0for 1 ≤ i, j ≤ s. From Definition 5.11 we deduce dj (x)|(rij (x) − r ′

ij (x)) and sorij (x) = r ′

ij (x) using Theorem 4.1 for 1 ≤ i, j ≤ s as both rij (x) and r ′ij (x) have lower

degree than dj (x). The conclusion is: each endomorphism of M(A) is represented asin Definition 6.30 by a unique reduced matrix.

We denote by RA the subset of Ms(F [x]) consisting of all matrices B(x) satisfy-ing e.c.rel. M(A). Consider B(x) = (bij (x)) and B ′(x) = (b′

ij (x)) in RA. There areqij (x), q ′

ij (x) ∈ F [x] with di(x)bij (x) = qij (x)dj (x) and di(x)b′ij (x) = q ′

ij (x)dj (x)

for 1 ≤ i, j ≤ s. Adding these equations gives

di(x)(bij (x) + b′ij (x)) = (qij (x) + q ′

ij (x))dj (x) ≡ 0 (mod dj (x))

for 1 ≤ i, j ≤ s, which shows B(x) + B ′(x) ∈ RA. For 1 ≤ i, j, k ≤ s we have

di(x)bij (x)b′jk(x) = qij (x)dj (x)b′

jk(x)

= qij (x)q ′jk(x)dk(x) ≡ 0 (mod dk(x)).

Summing over j produces

di(x)

(s∑

j=1

bij (x)b′jk(x)

)

≡ 0 (mod dk(x)) for 1 ≤ i, k ≤ s,

showing B(x)B ′(x) ∈ RA. So RA is closed under addition and multiplication. Weleave the reader to check −B(x) ∈ RA and also 0, I ∈ RA, that is, the s × s zero andidentity matrices over F [x] satisfy e.c.rel. M(A) as in Definition 6.31 and so belongto RA (Exercises 6.3, Question 6(b)). Therefore RA is a subring of Ms(F [x]). �

By Theorem 6.32 each endomorphism β of M(A) gives rise to a unique reducedmatrix B(x) satisfying e.c.rel. M(A). We now address the question: does each ma-trix in RA arise from some endomorphism of M(A) as in Definition 6.30? The nexttheorem tells us, amongst other things, that the answer is: Yes!


Theorem 6.33

Let A be a t × t matrix over a field F with invariant factors d1(x), d2(x), . . . , ds(x).Let M(A) = 〈v1〉⊕ 〈v2〉⊕· · ·⊕ 〈vs〉 where vi has order di(x) in M(A) for 1 ≤ i ≤ s.Let B(x) belong to the ring RA of Theorem 6.32. There is a unique endomorphism β

of M(A) such that B(x) represents β relative to v1, v2, . . . , vs . Write β = (B(x))ϕ forall B(x) ∈ RA. Then ϕ : RA → EndM(A) is a surjective ring homomorphism.

Let K = {(bij (x)) ∈ Ms(F [x]) : dj (x)|bij (x) for 1 ≤ i, j ≤ s}. Then K = kerϕand ϕ : RA/K ∼= EndM(A) where (K + B(x))ϕ = (B(x))ϕ for all B(x) ∈ RA.

Proof

Consider the s × s matrix B(x) = (bij (x)) in the ring RA. Then di(x)bij (x) ≡0 (mod dj (x)) for 1 ≤ i, j ≤ s. As v1, v2, . . . , vs generate M(A) for each v ∈ M(A)

there are polynomials fi(x) over F for 1 ≤ i ≤ s with v = f1(x)v1 + f2(x)v2 + · · · +fs(x)vs . We want to construct β ∈ EndM(A) so that Definition 6.30 holds, that is,(vi)β = ∑s

j=1 bij (x)vj for 1 ≤ i ≤ s. There is no choice for the image of v by β as

(v)β = f1(x)(v1)β + f2(x)(v2)β + · · · + fs(x)(vs)β =s∑

i,j=1

fi(x)bij (x)vj (♦)

and so there cannot be two different β as in Definition 6.30.To show that we are not chasing a will-o’-the-wisp we start again and define

β : M(A) → M(A) by (♦). We must first check that this definition of (v)β is un-ambiguous, in other words, does (v)β remain unchanged on changing the above poly-nomials fi(x) for 1 ≤ i ≤ s? So suppose also v = g1(x)v1 + g2(x)v2 + · · · + gs(x)vs

where gi(x) ∈ F [x] for 1 ≤ i ≤ s. Then 0 = v − v = ∑si=1(fi(x) − gi(x))vi and

so (fi(x) − gi(x))vi = 0 using the independence Definition 2.14 of the submodules〈vi〉 of M(A) for 1 ≤ i ≤ s. As vi has order di(x) in M(A) we see fi(x) − gi(x) =qi(x)di(x) where qi(x) ∈ F [x] for 1 ≤ i ≤ s. On multiplying this equation by bij (x)

we obtain

(fi(x) − gi(x))bij (x) = qi(x)di(x)bij (x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s

as B(x) = (bij (x)) belongs to RA and so satisfies e.c.rel. M(A). Therefore(fi(x) − gi(x))bij (x)vj = 0 as dj (x) is the order of vj in M(A), that is,fi(x)bij (x)vj = gi(x)bij (x)vj for 1 ≤ i, j ≤ s. So the r.h.s. of (♦) remains unchangedon replacing fi(x) by gi(x) and (v)β is unambiguously defined by (♦). It is now ‘plainsailing’ to verify that β is an F [x]-linear mapping of M(A), that is, β ∈ EndM(A),and B(x) represents β relative to v1, v2, . . . , vs (Exercises 6.3, Question 6(c)).

As β is specified uniquely by B(x) it is legitimate to introduce

ϕ : RA → EndM(A) by β = (B(x))ϕ for all B(x) ∈ RA.


(The reader may find it helpful to compare this proof with that of Theorem 3.15:the ring isomorphism θ−1 of Theorem 3.15 is analogous to ϕ although ϕ is a longway from being injective.) Each β in EndM(A) arises from some B(x) in RA byDefinition 6.30 and Theorem 6.32, showing ϕ to be surjective.

To show that ϕ is a ring homomorphism consider B(x) = (bij (x)) and B ′(x) =(b′

ij (x)) in RA. Write β = (B(x))ϕ and β ′ = (B ′(x))ϕ. Using (♦)

(v)((B(x))ϕ + (B ′(x))ϕ) = (v)(β + β ′) = (v)β + (v)β ′

=s∑

i,j=1

fi(x)bij (x)vj +s∑

i,j=1

fi(x)b′ij (x)vj

=s∑

i,j=1

fi(x)(bij (x) + b′ij (x))vj

= (v)((B(x) + B ′(x))ϕ) for all v ∈ M(A)

and so ϕ respects addition: (B(x))ϕ + (B ′(x))ϕ = (B(x) + B ′(x))ϕ. Using (♦) again

(v)(((B(x))ϕ)((B ′(x))ϕ)) = (v)(ββ ′) = ((v)β)β ′

=(

s∑

i,j=1

fi(x)bij (x)vj

)

β ′

=s∑

k=1

(s∑

i,j=1

fi(x)bij (x)b′jk(x)

)

vk

=s∑

i,k=1

fi(x)

(s∑

j=1

bij (x)b′jk(x)

)

vk

= (v)((B(x)B ′(x))ϕ) for all v ∈ M(A)

showing that ϕ respects multiplication, that is, (B(x))ϕ(B ′(x))ϕ = (B(x)B ′(x))ϕ.The 1-element of RA is the s × s identity matrix I (x) = (δij (x)) over F [x] and soδij (x) = 1(x) or 0(x) according as i = j or i �= j (1 ≤ i, j ≤ s). Using (♦) gives

(v)((I (x))ϕ) =s∑

i,j=1

fi(x)δij (x)vj =s∑

i=1

fi(x)δii(x)vi =s∑

i=1

fi(x)vi = v

for all v ∈ M(A) showing that (I (x))ϕ = ι the identity mapping of M(A). As ι isthe 1-element of EndM(A) we conclude that ϕ : RA → EndM(A) is a surjective ringhomomorphism.

Suppose B(x) = (bij (x)) in Ms(F [x]) satisfies dj (x)|bij (x) for 1 ≤ i, j ≤ s.Then bij (x) ≡ 0 (mod dj (x)) and so di(x)bij (x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s


showing B(x) ∈ RA and K ⊆ RA. As dj (x) is the order of vj in M(A) we seebij (x)vj = 0 for 1 ≤ i, j ≤ s. Hence

(v)((B(x))ϕ) =s∑

i,j=1

fi(x)bij (x)vj =s∑

i,j=1

fi(x) × 0 = 0 for all v ∈ M(A)

by (♦) showing (B(x))ϕ = 0, that is, B(x) ∈ kerϕ and K ⊆ kerϕ.Conversely suppose B(x) = (bij (x)) belongs to kerϕ. As the endomorphism

(B(x))ϕ of M(A) is represented Definition 6.30 by B(x) we have 0 = (vi)((B(x))ϕ) =∑sj=1 bij (x)vj for 1 ≤ i ≤ s. By the independence Definition 2.14 of the submodules

〈vj 〉 for 1 ≤ j ≤ s we obtain bij (x)vj = 0 for 1 ≤ i, j ≤ s. As dj (x) is the order of vj

in M(A) we conclude dj (x)|bij (x) for 1 ≤ i, j ≤ s from Definition 5.11. Thereforekerϕ ⊆ K and so K = kerϕ.

From the first isomorphism theorem for rings (Exercises 2.3, Question 3(b)) wededuce ϕ : RA/K ∼= EndM(A) where (K + B(x))ϕ = (B(x))ϕ for all B(x) ∈ RA. �

The integer dimZ(A) is a measure of the number of matrices which commuteDefinition 6.28 with a given t × t matrix A over a field F . From Theorem 6.27 weknow dimZ(A) = t in the case of M(A) cyclic, that is, of A having only one invariantfactor d1(x) = μA(x) = χA(x) of degree t . We now combine Theorems 6.32 and 6.33to deal with the general case.

Corollary 6.34 (Frobenius’ theorem)

Let A be a t × t matrix over a field F with invariant factors d1(x), d2(x), . . . , ds(x).Let Z(A) = {B ∈ Mt (F ) : AB = BA} denote the centraliser of A. Then

dim EndM(A) = dimZ(A) =s∑

i=1

(2s − 2i + 1)degdi(x).

Proof

We use the notation of Theorem 6.33. The algebra isomorphism θ of Definition 6.28shows that EndM(A) and Z(A) are vector spaces of equal dimension over F . Write

VA = {B(x) ∈ RA : B(x) is reduced}.

Then VA is a subspace of the vector space RA over F and we leave the reader to verifythat the mapping ϕ : RA → EndM(A) of Theorem 6.33 is F -linear. The restrictionϕ′ = ϕ|VA

of ϕ to VA is an F -linear mapping ϕ′ : VA → EndM(A). In fact ϕ′ is a


vector space isomorphism, (β)(ϕ′)−1 being the unique reduced matrix representingβ relative to v1, v2, . . . , vs by Theorem 6.32. So ϕ′ : VA

∼= EndM(A) and dimVA =dim EndM(A).

We now determine dimVA. For 1 ≤ i, j ≤ s let VA(i, j) denote the subspace ofall matrices in VA having zero entries except possibly for their (i, j)-entry. ThenVA = ∑s

i,j=1 ⊕VA(i, j), that is, VA is the direct sum of its s2 subspaces VA(i, j).Consider B(x) = (bij (x)) in VA(i, j). Then degbij (x) < degdj (x) for 1 ≤ i, j ≤ s.For 1 ≤ i < j ≤ s we have bij (x) = qij (x)(dj (x)/di(x)) by Definition 6.31where qij (x) is an arbitrary polynomial over F with degqij (x) < degdi(x). SodimVA(i, j) = degdi(x) for i < j . For 1 ≤ j ≤ i ≤ s we see that bij (x) is an arbitrarypolynomial over F with degbij (x) < degdj (x) and so dimVA(i, j) = degdj (x) forj ≤ i. The general rule is therefore

dimVA(i, j) = degdmin{i,j}(x) for 1 ≤ i, j ≤ s

and the s × s matrix with (i, j)-entry dimVA(i, j) is

⎛

⎜⎜⎜⎜⎜⎝

degd1(x) degd1(x) degd1(x) · · · degd1(x)

degd1(x) degd2(x) degd2(x) · · · degd2(x)

degd1(x) degd2(x) degd3(x) · · · degd3(x)...

......

...

degd1(x) degd2(x) degd3(x) · · · degds(x)

⎞

⎟⎟⎟⎟⎟⎠

.

There are 2s − 1 entries degd1(x) in the above matrix, 2s − 3 entries degd2(x) andmore generally there are 2s − 2i + 1 entries degdi(x) for 1 ≤ i ≤ s. As dimVA is thesum of the entries in the above matrix we obtain

dimVA =s∑

i=1

(2s − 2i + 1)degdi(x)

which is also the dimension of Z(A). �

For example suppose a 16 × 16 matrix A over a field F has invariant factorsx2, x2(x + 1), x3(x + 1)2, x4(x + 1)2. So s = 4 and

dimZ(A) = 7 × 2 + 5 × 3 + 3 × 5 + 1 × 6 = 50

by Corollary 6.34 showing that the matrices which commute with A are the elementsof the 50-dimensional algebra Z(A). The structure of a typical matrix in Z(A) forA in rcf is the subject of Exercises 6.3, Question 5(d). However, as we’ll see, it’snecessary to use the primary decomposition Theorem 6.12 of M(A) to fully describethe similarity class of A. In this case A ∼ A1 ⊕A2 where A1 is 11 × 11 with invariant


factors x2, x2, x3, x4 and A2 is 5 × 5 with invariant factors x + 1, (x + 1)2, (x + 1)2

by Definition 6.14. Then Z(A) ∼= Z(A1) ⊕ Z(A2) and using Corollary 6.34 we seedimZ(A1) = 37, dimZ(A2) = 13. Using Theorem 6.37 we’ll be able to specify theinvertible matrices in Z(A1) and Z(A2) and hence construct the subgroup

HA = G ∩ Z(A) = U(Z(A)) ∼= U(Z(A1)) × U(Z(A2)) of Theorem 6.29.

We now discuss automorphisms of M(A) in detail. Our next lemma involves the ad-jugate matrix adjB(x) introduced in Section 1.3.

Lemma 6.35

Let A be a t × t matrix over a field F with invariant factors d1(x), d2(x), . . . , ds(x).Let B(x) = (bij (x)) represent the endomorphism β of M(A) as in Definition 6.30.Then adjB(x) belongs to the ring RA.

Suppose gcd{detB(x),χA(x)} = 1. Then β is an automorphism of M(A).Conversely suppose β is an automorphism of M(A) and χA(x) = p(x)n where

p(x) is irreducible over F . Then gcd{detB(x),χA(x)} = 1.

Proof

Write S = {1,2, . . . , s} and let i0, j0 ∈ S. The (i0, j0)-entry in adjB(x) is the co-factor B(x)j0i0 which is (apart from sign) the determinant of the (s − 1) × (s − 1)

matrix which remains on deleting row j0 and column i0 from B(x). We showdi0(x)B(x)j0i0 ≡ 0 (mod dj0(x)) which is the endomorphism condition, as in The-orem 6.32, for adjB(x). Now B(x)j0i0 is the sum of (s − 1)! terms ±tπ (x), one foreach permutation π : S → S with (j0)π = i0, where tπ (x) = ∏

j �=j0bj (j)π (x). As

di0(x)B(x)j0i0 ≡ 0 (mod dj0(x)) holds for i0 ≥ j0, since di0(x) ≡ 0 (mod dj0(x)), weassume i0 < j0. As (j0)π = i0 the integers i0 and j0 belong to the same cycle in thecycle decomposition of the permutation π (we trust that the reader is familiar withthe resolution of a permutation of S into disjoint cycles). Let l be the smallest posi-tive integer with (i0)π

l = j0. Then i0, i1, . . . , il are l + 1 distinct integers in S whereik = (i0)π

k for 1 ≤ k ≤ l. So (ik−1)π = ik for 1 ≤ k ≤ l and il = j0, that is, (il)π = i0.Then tπ (x) has factor fπ(x) = bi0i1(x)bi1i2(x) · · ·bil−1il (x). The matrix B(x) be-longs to the ring RA and so dik−1(x)bik−1ik (x) ≡ 0 (mod dik (x)) for 1 ≤ k ≤ l byTheorem 6.32. Using induction on k we obtain di0(x)bi0i1(x)bi1i2(x) · · ·bik−1ik (x) ≡0 (mod dik (x)) which is a local climax! It is now downhill: taking k = l givesdi0(x)fπ (x) ≡ 0 (mod dj0(x)). Therefore di0(x)tπ (x) ≡ 0 (mod dj0(x)) for all (s−1)!permutations π of S with (j0)π = i0 since fπ(x)|tπ (x). As B(x)k0j0 is a sum ofterms ±tπ (x) we conclude di0(x)B(x)j0i0 ≡ 0 (mod dj0(x)) as we set out to show. SoadjB(x) ∈ RA by Theorem 6.32.


Suppose gcd{detB(x),χA(x)} = 1. By Corollary 4.6 there are a(x), b(x) ∈ F [x]with a(x)detB(x) + b(x)χA(x) = 1. By the first part of the proof a(x) adjB(x)

belongs to RA. Write (a(x) adjB(x))ϕ = γ where ϕ : RA → EndM(A) is thering homomorphism of Theorem 6.33. So the endomorphism γ of M(A) satis-fies βγ = (B(x))ϕ(a(x) adjB(x))ϕ = (a(x)B(x) adjB(x))ϕ = (a(x)detB(x)I)ϕ us-ing the familiar property of the adjugate matrix (see before Theorem 1.18). NowχA(x)I = d1(x)d2(x) · · ·ds(x)I ∈ kerϕ by Corollary 6.7 and Theorem 6.33. There-fore (a(x)detB(x)I)ϕ = ((1 − b(x)χA(x)I ))ϕ = (I )ϕ − (b(x)χA(x)I )ϕ = (I )ϕ = ι

showing βγ = ι, the identity automorphism of M(A). In the same way γβ = ι and soβ has inverse γ , that is, β ∈ AutM(A).

Let the automorphism β = (B(x))ϕ have inverse γ = (C(x))ϕ. Then B(x)C(x)

belongs to the coset I + kerϕ. So B(x)C(x) = I + K(x) where K(x) ∈ kerϕand so each entry in K(x) is divisible by d1(x) using Theorem 6.33. By hypothe-sis χA(x) = p(x)n. We see detB(x)detC(x) = det(I + K(x)) ≡ 1 (mod p(x)) asp(x)|d1(x) and so p(x) cannot be a divisor of detB(x). As p(x) is irreducible over F

all monic divisors of p(x)n except 1 are divisible by p(x). So gcd{detB(x),χA(x)} =gcd{detB(x),p(x)n} = 1. �

We now suppose that the s invariant factors of A are all equal to p(x)l where p(x)

is irreducible over F . A consequence of Theorem 6.33 is that EndM(A) is isomor-phic to the ring Ms(F [x]/〈p(x)l〉. Here we ask: how can the automorphisms β ofM(A) be found? The reader should take heart from the fact that the case l = 1 hasalready been dealt with completely in Theorem 6.26 and involved the extension ofF to E = F [x]/〈p(x)〉. As explained after Theorem 4.9 it is convenient to write c =x +〈p(x)〉 and so E = F(c) where p(c) = 0. A typical element of E becomes f (c) =f (x)+〈p(x)〉 where f (x) ∈ F [x] and so at the wave of a magic wand the coset nota-tion miraculously disappears! Also the natural ring homomorphism η : F [x] → E andthe ‘evaluation at c’ homomorphism εc : F [x] → E coincide as

(f (x))η = f (x) + 〈p(x)〉 = f (c) = (f (x))εc for all f (x) ∈ F [x],

that is, η = εc. We extend εc to εc :Ms(F [x]) →Ms(E), a surjective homomorphismof matrix rings, by writing (B(x))εc = (bjk(c)) for all s × s matrices B(x) = (bjk(x))

over F [x].To get used to these ideas and as preparation for the next theorem we work through

two examples. First suppose s = 2, d1(x) = d2(x) = x3 and so p(x) = x the fieldF being arbitrary (any field whatsoever) and A = C(x3) ⊕ C(x3). Notice f (x) ≡f (0) (mod x) as x|(f (x) − f (0)). In this case E = F [x]/〈x〉 = F on identifying thecoset f (x) + 〈x〉 with the scalar f (0). Then η : F [x] → F is given by (f (x))η =f (0) = (f (x))ε0 for all f (x) ∈ F [x] and so η = ε0. Every 2 × 2 matrix over F [x]


satisfies the endomorphism condition Definition 6.31 as the invariant factors of the6 × 6 matrix A are equal. A typical reduced matrix in RA = M2(F [x]) is

B(x) =(

a11 + b11x + c11x2 a12 + b12x + c12x

2

a21 + b21x + c21x2 a22 + b22x + c22x

2

)

and

(B(x))ε0 =(

a11 a12

a21 a22

)∈M2(F ).

As χA(x) = x6 it follows from Lemma 6.35 that (B(x))ϕ is an automorphism ofM(A) if and only if gcd{detB(x), x} = 1. But detB(x) ≡ det((B(x))ε0) (mod x) andso

(B(x))ϕ ∈ AutM(A) ⇔∣∣∣∣a11 a12

a21 a22

∣∣∣∣ �= 0.

Notice that the scalars bij and cij do not feature in the above condition and so arearbitrary. In the case of a finite field F of order q we see

|AutM(A)| = q8|GL2(F )| = q8(q2 − 1)(q2 − q)

as there are q choices for each of the 8 scalars bij , cij and |GL2(F )| choices for theinvertible matrix (B(x))ε0 over F . By Theorem 6.29 the size of the similarity class ofA is

|GL6(F )|/|AutM(A)|= (q6 − 1)(q6 − q)(q6 − q2)(q6 − q3)(q6 − q4)(q6 − q5)

/(q8(q2 − 1)(q2 − q))

= (q6 − 1)(q6 − q)(q6 − q2)(q6 − q3).

As a second example let A = C(p(x)2) ⊕ C(p(x)2) where p(x) = x2 + x + 1 overF = Z2. So t = 8, s = 2, d1(x) = d2(x) = x4 + x2 + 1 and χA(x) = p(x)4 =x8 + x4 + 1. In this case E = Z2[x]/〈x2 + x + 1〉 is a field of order 4, the elementsof E corresponding to the 4 remainders on division of f (x) ∈ Z2[x] by p(x). HereE = F(c) = {0,1, c,1 + c} where p(c) = c2 + c + 1 = 0. As above, every matrixB(x) ∈M2(Z2[x]) satisfies the endomorphism condition. A typical reduced matrix inM2(Z2[x]) = RA is

B(x) =(

q11(x)p(x) + r11(x) q12(x)p(x) + r12(x)

q21(x)p(x) + r21(x) q22(x)p(x) + r22(x)

)


where degqij (x) < 2 and deg rij (x) < 2 for 1 ≤ i, j ≤ 2 and

(B(x))εc =(

r11(c) r12(c)

r21(c) r22(c)

)∈M2(E).

As χA(x) = p(x)4 we see from Lemma 6.35 that (B(x))ϕ is an automorphism ofM(A) if and only if gcd{detB(x),p(x)} = 1, that is detB(x) �≡ 0 (mod p(x)). But

detB(x) ≡∣∣∣∣r11(x) r12(x)

r21(x) r22(x)

∣∣∣∣ (mod p(x))

and∣∣∣∣r11(x) r12(x)

r21(x) r22(x)

∣∣∣∣ �≡ 0 (mod p(x)) ⇔ det((B(x))εc) �= 0.

Therefore

(B(x))ϕ ∈ AutM(A) ⇔ det((B(x))εc) �= 0.

For example( x x

x x2

)ϕ is an automorphism of M(A) as

∣∣∣∣c c

c c2

∣∣∣∣ = c �= 0,

but(

x 1+x1 x

)ϕ is not an automorphism of M(A) as

∣∣∣∣c 1 + c

1 c

∣∣∣∣ = 1 + c + c2 = 0.

The polynomials qij (x) are not mentioned in the above condition and so are arbi-trary. In the case of an automorphism (B(x))ϕ of M(A) there are 4 possibilities foreach of the 4 polynomials qij (x) and |GL2(E)| = (42 − 1)(42 − 4) possibilities for(B(x))εc ∈ GL2(E). Therefore

|AutM(A)| = 44(42 − 1)(42 − 4) = 210 × 32 × 5.

By Theorem 6.29 the number of matrices in the similarity class of A is

|GL8(Z2)|/|AutM(A)|= (28 − 1)(28 − 2)(28 − 22)(28 − 23)(28 − 24)

× (28 − 25)(28 − 26)(28 − 27)/|AutM(A)|= 218 × 33 × 5 × 72 × 17 × 31 × 127.


Theorem 6.36

Let A be a t × t matrix over a field F having s equal invariant factors p(x)l wherep(x) is irreducible of degree m over F . Then m = t/(ls) and RA = Ms(F [x]). WriteE = F(c) where p(c) = 0. Let B(x) = (bij (x)) represent the endomorphism β ofM(A) as in Definition 6.30 and write B(c) = (bij (c)). Then

β ∈ AutM(A) ⇔ detB(c) �= 0 ⇔ B(c) ∈ GLs(E).

Let F be a finite field of order q . Then |EndM(A)| = qms2l and

|AutM(A)| = qms2(l−1)|GLs(E)|= qms2(l−1)(qms − 1)(qms − qm) · · · (qms − qm(s−1)).

There are exactly |GLt (F )|/|AutM(A)| matrices similar to A.

Proof

As the invariant factors of A are equal the endomorphism condition Definition 6.31 issatisfied by all s × s matrices B(x) over F [x], that is, RA = Ms(F [x]). From The-orem 6.5 the characteristic polynomial is the product of the invariant factors. HereχA(x) = (p(x)l)s = p(x)ls . Equating degrees gives t = mls and so m = t/(ls). Weknow β ∈ AutM(A) ⇔ gcd{detB(x),p(x)} = 1 by Lemma 6.35 andgcd{detB(x),p(x)} = 1 ⇔ detB(x) �≡ 0 (mod p(x)) as p(x) is irreducible over F .The field E = F(c) where p(c) = 0 is tailor-made for the job of determining whetheror not a polynomial f (x) over F is divisible by p(x): in fact

f (x) �≡ 0 (mod p(x)) ⇔ f (c) �= 0.

The ‘evaluation at c’ ring homomorphism εc : F [x] → F(c) = E has im εc = E

and ker εc = 〈p(x)〉. Taking f (x) = detB(x) and using εc gives(detB(x))εc = (det(bij (x)))εc = det((bij (x))εc) = detB(c). Putting the pieces to-gether shows β ∈ AutM(A) ⇔ detB(x) �≡ 0 (mod p(x)) ⇔ detB(c) �= 0. As B(c) isan s × s over the field E we conclude detB(c) �= 0 ⇔ B(c) ∈ GLs(E).

Assume now that B(x) is reduced, that is, degbij (x) < degp(x)l = ml for1 ≤ i, j ≤ s. On dividing bij (x) by p(x) using Theorem 4.1, there are polynomi-als qij (x) and rij (x) over F with bij (x) = qij (x)p(x) + rij (x) where degqij (x) <

m(l − 1) and deg rij (x) < m for 1 ≤ i, j ≤ s. As p(c) = 0 we see bij (c) = rij (c) ∈ E

for 1 ≤ i, j ≤ s. So whether or not β is an automorphism of M(A) depends on theremainders rij (x) but not on the quotients qij (x).

Let S(m) = {(rij (x)) ∈Ms(F [x]) : deg rij (x) < m}.


So the elements of S(m) are all s × s matrices over F [x] having entries of degreeless than m = degp(x). It is straightforward to show that S(m) is a subgroup of theadditive group of Ms(F [x]). The ring homomorphism εc : Ms(F [x]) → Ms(E) haskernel consisting of s × s matrices over F [x] having all entries divisible by p(x).Therefore S(m) ∩ ker εc = {0}, that is, the only matrix belonging to S(m) and ker εc

is the zero matrix. As a consequence the restriction of εc to S(m) is an isomorphismεc|S(m) : S(m) ∼= Ms(E) of additive abelian groups.

Suppose |F | = q . By Corollary 6.34 we know

dim EndM(A) =s∑

j=1

(2s − 2j + 1)ml = s2ml

as the terms are in arithmetic progression. So |EndM(A)| = qs2ml .Let β ∈ AutM(A). There are qm(l−1) possibilities for each of the s2 polynomials

qij (x), namely any polynomial of degree less than m(l−1) over F . By the above para-graph and the first part of the proof there are |GLs(E)| possibilities for the s × s ma-trix (rij (x)) of remainders, namely those with (rij (x))εc = (rij (c)) ∈ GLs(E). These

choices are independent of each other and so |AutM(A)| = qms2(l−1)|GLs(E)| sinceβ is represented by a unique reduced matrix using Theorem 6.32. As |E| = qm theformula for |GLs(E)| after Lemma 2.18 and Theorem 6.29 combine to complete theproof. �

The proof of our next theorem is due to the Japanese mathematician K. Shodawho in 1928 analysed the automorphisms of an arbitrary finite abelian p-group (Exer-cises 3.3, Question 5(c)). We now study the polynomial analogue, namely AutM(A)

where χA(x) is a power of an irreducible polynomial p(x) over F . The case A =C(x)⊕C(x2) over an arbitrary field F , discussed after Definition 6.31, is the smallestexample covered by Theorem 6.37 but not by Theorem 6.36.

Theorem 6.37

For 1 ≤ i ≤ r write Ai = C(p(x)li ) ⊕ C(p(x)li ) ⊕ · · · ⊕ C(p(x)li ) (si terms) wherep(x) is irreducible of degree m over a field F and l1 < l2 < · · · < lr . Write A =A1 ⊕A2 ⊕ · · ·⊕Ar and s = s1 + s2 + · · ·+ sr . Each s × s matrix B(x) in the ring RA

Theorem 6.32 partitions into si × sj submatrices Bij (x) (1 ≤ i, j ≤ r) as shown:

B(x) =

⎛

⎜⎜⎜⎜⎝

B11(x) B12(x) . . . B1r (x)

B21(x) B22(x) . . . B2r (x)...

.... . .

...

Br1(x) Br2(x) . . . Brr (x)

⎞

⎟⎟⎟⎟⎠


where each entry in Bij (x) is divisible by p(x) for i < j . Also

(B(x))ϕ ∈ AutM(A) ⇔ (Bii(x))ϕ ∈ AutM(Ai) for 1 ≤ i ≤ r.

The dimension of EndM(A) over F is me where

e =r∑

j=1

lj sj (sj + 2(sj+1 + sj+2 + · · · + sr )).

Let F be a finite field of order q and let E = F(c) where p(c) = 0. Then

|EndM(A)| = |E|e.

Write ki = |GLsi (E)|/qs2i m for 1 ≤ i ≤ r . Then

|AutM(A)| = k1k2 · · ·kr |EndM(A)|.

Proof

Write n = l1s1 + l2s2 + · · · + lr sr . Then A is the mn × mn matrix over F in rcf withs1 invariant factors p(x)l1 , s2 invariant factors p(x)l2 , . . . , sr invariant factors p(x)lr

and χA(x) = p(x)n. Consider the (i′, j ′)-entry bi′j ′(x) in B(x). Where is this entrylocated in the above partition of B(x)? As 1 ≤ i′, j ′ ≤ s there are i, j with 1 ≤ i, j ≤ r

such that i′ = s1 + s2 +· · ·+ si−1 + i′′, 1 ≤ i′′ ≤ si and j ′ = s1 + s2 +· · ·+ sj−1 + j ′′,1 ≤ j ′′ ≤ sj . Therefore the (i′, j ′)-entry in B(x) is the (i′′, j ′′)-entry of Bij (x). Thendi′(x) = p(x)li , that is, the i′th invariant factor of A is p(x)li . Also dj ′(x) = p(x)lj .The endomorphism condition Corollary 6.34

di′(x)bi′j ′(x) ≡ 0 (mod dj ′(x))

is satisfied for i ≥ j as then li ≥ lj and so di′(x)|dj ′(x). For i < j the above endomor-phism condition gives p(x)lj −li |bi′j ′(x), and so from li < lj we deduce that all entriesbi′j ′(x) in Bij (x) are divisible by p(x). Therefore B(x) partitions as described above.

The next step in the proof is crucial as it produces an important factorisation ofdetB(x) (mod p(x)). In fact

detB(x) ≡

∣∣∣∣∣∣∣∣∣∣∣

B11(x) 0 . . . 0

B21(x) B22(x). . .

......

.... . . 0

Br1(x) Br2(x) . . . Brr (x)

∣∣∣∣∣∣∣∣∣∣∣

(mod p(x))


where all submatrices Bij (x) for i < j have been replaced by zero matrices. By Exer-cises 5.1, Question 2(b)

detB(x) ≡ detB11(x)detB22(x) · · ·detBrr(x) (mod p(x)). (❤)

Each F [x]-module M(Ai) is of the type covered by Theorem 6.36 and so

(Bii(x))ϕ ∈ AutM(Ai) ⇔ detBii(x) �≡ 0 (mod p(x)) for 1 ≤ i ≤ r.

Suppose (B(x))ϕ ∈ AutM(A). As χA(x) = p(x)n we see

detB(x) �≡ 0 (mod p(x))

by Lemma 6.35. Therefore detBii(x) �≡ 0 (mod p(x)) for all i with 1 ≤ i ≤ r by (❤).So (Bii(x))ϕ ∈ AutM(Ai) for all 1 ≤ i ≤ r . The converse is proved by reversing thesteps.

The algebra EndM(A) is a vector space over F and has dimension

dim EndM(A) =s∑

j ′=1

(2s − 2j ′ + 1)degdj ′(x)

by Corollary 6.34. The sj terms in this sum, for j ′ = s1 + s2 + · · · + sj−1 + j ′′,1 ≤ j ′′ ≤ sj , are in arithmetic progression as degdj ′(s) = degp(x)lj = mlj andthe average term is sj + 2(sj+1 + sj+2 + · · · + sr ). So the sum of these sj termsis mlj sj (sj + 2(sj+1 + sj+2 + · · · + sr )) and hence dim EndM(A) = me wheree = ∑r

j=1 lj sj (sj + 2(sj+1 + sj+2 + · · · + sr )).Suppose now that F is a finite field with |F | = q . Then |E| = |F(c)| = qm by the

discussion following Theorem 4.9. From the preceding paragraph

|EndM(A)| = qme = |E|e.

Suppose B(x) to be reduced and (B(x))ϕ ∈ AutM(A). By the earlier part of the proof(Bii(x))ϕ ∈ AutM(Ai) for 1 ≤ i ≤ r . From Theorem 6.36

|AutM(Ai)|/|EndM(Ai)| = qms2i (li−1)|GLsi (E)|/qms2

i li

= |GLsi (E)|/qs2i m = ki

and so there are kiqms2

i li choices for each matrix Bii(x). The number of choices foreach Bij (x) with i �= j is the same, whether or not (B(x))ϕ ∈ AutM(A), namelyqmsisj lj for i > j and qmsisj li for i < j . As these r2 choices are independentthe number of reduced B(x) with (B(x))ϕ ∈ AutM(A) is their product, that is,

|AutM(A)| = k1k2 · · ·kr |EndM(A)|. �


As an illustration we calculate the number of 6 × 6 matrices over Z3 which aresimilar to A = C(x2 + 1) ⊕ C((x2 + 1)2). So F = Z3, q = 3, p(x) = x2 + 1, m = 2,c2 = −1 and E = Z3(c) is a field of order 9. Also r = 2, s = 2, s1 = s2 = 1, l1 = 1,l2 = 2 and so e = 3 + 2 = 5. On constructing a 2 × 2 reduced matrix

B(x) =(

B11(x) B12(x)

B21(x) B22(x)

)

the number of choices for the 1 × 1 matrices Bij (x) are the entries in

(32 32

32 34

)and |EndM(A)| = 310.

As |GL1(E)| = 9 − 1 = 8 we see k1 = k2 = 8/9 giving |AutM(A)| =(8/9)(8/9) × 310 = 26 × 36. By Theorem 6.29 the size of the similarity class of A

is the index of AutM(A) in GL6(Z3) which is

|GL6(Z3)|/|AutM(A)|= (36 − 1)(36 − 3)(36 − 32)(36 − 33)(36 − 34)(36 − 35)/(26 × 36)

= 132 × 112 × 7 × 5 × 39 × 28.

Notice that the numbers are unchanged on replacing p(x) by any other monic irre-ducible quadratic over Z3, that is, by x2 + x − 1 or x2 − x − 1. More generally anytwo monic irreducible polynomials of the same degree over F are interchangeable inthis sense.

As a second example we calculate the sizes of the six conjugacy (similarity)classes which make up the group GL3(Z2). There are four cyclic classes havingas χA(x) the four polynomials of degree 3 over Z2 with non-zero constant terms,namely

x3 + 1 = (x2 + x + 1)(x + 1), x3 + x + 1, x3 + x2 + 1,

x3 + x2 + x + 1 = (x + 1)3

on factoring into irreducible polynomials over Z2. In these cases |AutM(A)| =Φ2(χA(x)) using the theory following Definition 6.28. You can check:

Φ2((x2 + x + 1)(x + 1)) = 3, Φ2(x

3 + x + 1) = 7,

Φ2(x3 + x2 + 1) = 7, Φ2((x + 1)3) = 4.

Using the formula |GL2(Z2)|/|AutM(A)| of Theorem 6.29 these classes havesizes:

168/3 = 56, 168/7 = 24, 168/7 = 24, 168/4 = 42


as |GL2(Z2)| = (8 − 1)(8 − 2)(8 − 4) = 168. There remain just two non-cyclicclasses in GL3(Z2) namely the class with invariant factors x + 1, (x + 1)2 and theclass with invariant factors x + 1, x + 1, x + 1. By Theorem 6.37 the correspond-ing groups AutM(A) have orders 8 and 168 and so the class sizes are 168/8 = 21and 168/168 = 1 respectively. The conjugacy classes of every group partition thegroup: in this case it is comforting to check 56 + 24 + 24 + 42 + 21 + 1 = 168showing that all the similarity classes in GL3(Z2) have been accounted for. Thistype of analysis can be carried out on all groups GL3(Fq) (Exercises 6.3, Ques-tion 7(c)).

Our next (and last) theorem completes the theory. It tells us that the primary de-composition Theorem 6.12 of M(A) leads to decompositions of both EndM(A) andAutM(A). It should present no problem to the diligent reader who has completed theanalogous exercises, namely Exercises 3.2, Question 5(b) and Exercises 3.3, Ques-tion 1(f), for finite abelian groups. Taken together with Theorem 6.37 the structureof every group AutM(A) can be analysed provided the resolution of χA(x) into irre-ducible polynomials is known.

We remind the reader of the terminology introduced in Section 6.2: let A bea t × t matrix over a field F with χA(x) = p1(x)n1p2(x)n2 · · ·pk(x)nk wherep1(x),p2(x), . . . , pk(x) are k distinct monic irreducible polynomials over F andn1, n2, . . . , nk are positive integers. For 1 ≤ j ≤ k the submodule M(A)pj (x) ={v ∈ M(A) : pj (x)nj v = 0} is the pj (x)-component of M(A). Then M(A) =M(A)p1(x) ⊕ M(A)p2(x) ⊕ · · · ⊕ M(A)pk(x) by Theorem 6.12, that is, M(A) is theinternal direct sum of its primary components.

Theorem 6.38

Using the above notation let βj ∈ EndM(A)pj (x) for 1 ≤ j ≤ k. Then β = β1 ⊕ β2 ⊕· · ·⊕βk is an endomorphism of M(A) where (u)β = ∑k

j=1(uj )βj and u = ∑kj=1 uj ,

uj ∈ M(A)pj (x) for 1 ≤ j ≤ k. Each β ∈ EndM(A) satisfies (uj )β ∈ M(A)pj (x) forall uj ∈ M(A)pj (x) and is uniquely expressible as β = β1 ⊕ β2 ⊕ · · · ⊕ βk whereβj ∈ EndM(A)pj (x) for 1 ≤ j ≤ k.

Write (β)σ = (β1, β2, . . . , βk). Then

σ : EndM(A) ∼= EndM(A)p1(x) ⊕ EndM(A)p2(x) ⊕ · · · ⊕ EndM(A)pk(x)

is an algebra isomorphism. Also

σ |AutM(A) : AutM(A) ∼= AutM(A)p1(x) × AutM(A)p2(x) × · · · × AutM(A)pk(x)

is a group isomorphism.


Proof

Consider u,u′ ∈ M(A). By Theorem 6.12 there are uj ,u′j ∈ M(A)pj (x) for 1 ≤ j ≤ k

with

u =k∑

j=1

uj and u′ =k∑

j=1

u′j .

So uj + u′j ∈ M(A)pj (x) for 1 ≤ j ≤ k and u + u′ = ∑k

j=1(uj + u′j ). Therefore

(u + u′)β =k∑

j=1

(uj + u′j )βj =

k∑

j=1

((uj )βj + (u′j )βj )

=k∑

j=1

(uj )βj +k∑

j=1

(u′j )βj = (u)β + (u′)β

showing β : M(A) → M(A) to be an additive mapping of the additive abeliangroup M(A). In the same way for f (x) ∈ F [x] we have f (x)u = ∑k

j=1 f (x)uj andf (x)uj ∈ M(A)pj (x) for 1 ≤ j ≤ k. Therefore

(f (x)u)β =k∑

j=1

(f (x)uj )βj

=k∑

j=1

f (x)((uj )βj )

= f (x)

(k∑

j=1

(uj )βj

)

= f (x)((u)β)

showing β to be F [x]-linear. So β ∈ EndM(A). Notice (uj )β = (uj )βj for alluj ∈ M(A)pj (x), that is, βj is the restriction of β to M(A)pj (x) for 1 ≤ j ≤ k.

Conversely let β ∈ EndM(A). For uj ∈ M(A)pj (x) we have pj (x)nj uj = 0 andso pj (x)nj ((uj )β) = (pj (x)nj uj )β = (0)β = 0 showing (uj )β ∈ M(A)pj (x) for1 ≤ j ≤ k. We have established, in one line, an important fact:

The endomorphisms of M(A) respect the primary decomposition of M(A).

Therefore the restriction of β to M(A)pj (x) is an endomorphism βj of M(A)pj (x) for1 ≤ j ≤ k. We’ve now come full circle as β = β1 ⊕ β2 ⊕ · · · ⊕ βk by the first part ofthe proof. As each βj is uniquely determined by β it’s legitimate to define

σ : EndM(A) → EndM(A)p1(x) ⊕ EndM(A)p2(x) ⊕ · · · ⊕ EndM(A)pk(x)


by (β)σ = (β1, β2, . . . , βk) for all β ∈ EndM(A). By the first part of the proof σ isa bijection from EndM(A) to the external direct sum of the endomorphism rings ofthe primary components of M(A). It is straightforward to verify that σ is F -linearand also that σ respects addition and multiplication (composition) of endomorphismsas in Exercises 3.3, Question 1(f). Therefore σ is an algebra isomorphism. Furtherβ = β1 ⊕ β2 ⊕ · · · ⊕ βk is an invertible element of EndM(A) if and only if βj is aninvertible element of EndM(A)pj (x) for 1 ≤ j ≤ k as in Exercises 3.2, Question 5(b).In other words

β ∈ AutM(A) ⇔ βj ∈ AutM(A)pj (x) for 1 ≤ j ≤ k,

and so

σ |AutM(A) : AutM(A) ∼= AutM(A)p1(x) × AutM(A)p2(x) × · · · × AutM(A)pk(x)

is a group isomorphism between AutM(A) and the direct product of the automor-phism groups of the primary components of M(A). �

As an example let A be an 8 × 8 matrix over the finite field Fq having invari-ant factors x, x(x + 1), x2(x + 1)3. Then A ∼ P1 ⊕ P2 as in Definition 6.14 whereP1 has invariant factors x, x, x2 and P2 has invariant factors x + 1, (x + 1)3. SoM(P1) ∼= M(A)x and M(P2) ∼= M(A)x+1. From the algebra isomorphism σ of The-orem 6.38 we deduce dim EndM(A) = dim EndM(P1) + dim EndM(P2). But byCorollary 6.34 this equation is 16 = 10 + 6 and so we are none the wiser. However thegroup isomorphism σ |AutM(A) of Theorem 6.38 gives

|AutM(A)| = |AutM(P1)| × |AutM(P2)|and the factors on the right of this equation can be found using Theorem 6.37. So

|AutM(P1)| = k1k2|EndM(P1)| = (|GL2(Fq)|/q4)(|GL1(Fq)|/q)q10

= (q + 1)(q − 1)3q6

and

|AutM(P2)| = ((q − 1)/q)2q6 = (q − 1)2q4

giving |AutM(A)| = (q + 1)(q − 1)5q10. By Theorem 6.29 the number of matricessimilar to A is

|GL8(Fq)|/|AutM(A)|= (q8 − 1)(q8 − q)(q8 − q2)(q8 − q3)(q8 − q4)(q8 − q5)

× (q8 − q6)(q8 − q7)/((q + 1)(q − 1)5q10)


= (q8 − 1)(q7 − 1)(q6 − 1)(q4 + q3 + q2 + q + 1)

× (q3 + q2 + q + 1)(q2 + q + 1)q18.

As a finale we partition the group GL4(Z2) into conjugacy classes. The list of ir-reducible polynomials p(x) of degree at most 4 over Z2 with p(x) �= x is: x + 1,x2 + x + 1, x3 + x + 1, x3 + x2 + 1, x4 + x + 1, x4 + x3 + 1, x4 + x3 + x2 + x + 1.The invariant factors of matrices A in GL4(Z2) have these polynomials as their fac-tors. There are 8 cyclic classes (classes with a single invariant factor χA(x)) listed nexttogether with the number |AutM(A)| = Φ2(χA(x)) as in Theorem 6.27:

(x + 1)4,8; (x + 1)2(x2 + x + 1),6; (x + 1)(x3 + x + 1),7;(x + 1)(x3 + x2 + 1),7; (x2 + x + 1)2,12; x4 + x + 1,15;x4 + x3 + 1,15; x4 + x3 + x2 + x + 1,15.

There are 6 non-cyclic classes listed next by their invariant factor sequence, togetherwith |AutM(A)| calculated using Theorem 6.37 and Definition 6.28:

(x + 1, x + 1, x + 1, x + 1), |GL4(Z2)| = 20160; (x + 1, x + 1, (x + 1)2),192;(x + 1, (x + 1)3),16; ((x + 1)2, (x + 1)2),96;(x + 1, (x + 1)(x2 + x + 1)),18; (x2 + x + 1, x2 + x + 1), |GL2(F4)| = 180.

By Theorem 6.29 the 14 conjugacy classes of elements of GL4(Z2) have sizes: 2520,3360, 2880, 2880, 1680, 1344, 1344, 1344, 1, 105, 1260, 210, 1120, 112 and their sumis 20160 as the reader can check as a final act.

EXERCISES 6.3

1. (a) Let

A =

⎛

⎜⎜⎝

1 1 −2 2−5 −2 6 −42 1 −4 33 1 −5 3

⎞

⎟⎟⎠

over Q. Verify A2 = −A − I and hence find, without further calcula-tion, μA(x) and χA(x). State the invariant factors of A and specify aninvertible matrix X over Q with XAX−1 in rcf.


(b) Show that ϕ :M2(Q(i)) →M4(Q), given by

(a0 + ia1 b0 + ib1

c0 + ic1 d0 + id1

)ϕ =

⎛

⎜⎜⎝

a0 a1 b0 b1

−a1 a0 −b1 b0

c0 c1 d0 d1

−c1 c0 −d1 d0

⎞

⎟⎟⎠

for all a0, a1, b0, b1, c0, c1, d0, d1 ∈Q,

is a ring homomorphism where i2 = −1. Show kerϕ = 0 and imϕ =Z(A), the centraliser (Definition 6.28) of A = C(x2 + 1)⊕C(x2 + 1)

over Q. Show det(B ′)ϕ = |detB ′|2 for B ′ ∈ M2(Q(i)).(c) Let A be a t × t matrix over the real field R with minimum polynomial

μA(x) = x2 + 1 and let r(x) = ax + b where a, b ∈ R. Show thatr(A) has t/2 invariant factors x2 − 2bx + a2 + b2 in the case a �= 0.What are the invariant factors of r(A) in the case a = 0? Describe theinvariant factors of f (A) for an arbitrary polynomial f (x) over R.Are A10 − A9 and A11 + A10 similar?

(d) Let A be a t × t matrix over a finite field F with |F | = q . Supposethe minimum polynomial μA(x) is irreducible of degree n. Use The-orem 6.26 to find a formula for the number of invertible matrices X

over F with XAX−1 in rcf. Hence find a formula for the size of thesimilarity class of A using Theorem 6.29.Taking F = Z3 calculate the number of matrices similar to eachof the following matrices A over F : C(x2 + 1), C(x3 − x + 1),C(x2 + 1) ⊕ C(x2 + 1).

(e) Complete the proof of Theorem 6.26 using Theorem 3.16 as a guide.2. (a) Let A = C(x3) over an arbitrary field F . Describe the matrices B

belonging to the centraliser Z(A). Is Z(A) a ring? (Yes/No) Is Z(A)

a vector space over F ? If so what is dimZ(A)? Which matrices B

belong to the group U(Z(A)) of invertible elements of Z(A)? AreEndM(A) and Z(A) isomorphic rings? (Yes/No) Are AutM(A) andU(Z(A)) isomorphic groups? (Yes/No)Taking F = Z2 find the integers |EndM(A)| and |AutM(A)|. IsAutM(A) a cyclic group? In the case F = Fq state formulae for|EndM(A)| and |AutM(A)|.

(b) Let A be a t × t matrix over a field F . For each β ∈ EndM(A) write(β)θ for the matrix of β relative to the standard basis B0 of F t . UsingTheorem 6.27 show that θ : EndM(A) ∼= Z(A) is an algebra isomor-phism, that is, θ is both a ring isomorphism and a vector space iso-morphism. Deduce that the restriction θ | of θ to AutM(A) is a groupisomorphism θ | : AutM(A) ∼= U(Z(A)).


(c) Let g(x) be a non-zero polynomial over a finite field F of or-der q . Show Φq(g(x)) = |U(F [x]/〈g(x)〉)| where Φq : F [x] → Z

is the function introduced after Definition 6.28. Using (c) abovededuce the multiplicative property of Φq , that is, Φq(g(x)h(x)) =Φq(g(x))Φq(h(x)) where gcd{g(x),h(x)} = 1. Show alsoΦq(p(x)n) = qmn − qm(n−1) where p(x) is irreducible of degree m

over F .(d) Determine |EndM(A)| and |AutM(A)| in the following cases where

A = C(f (x)), F = Z2 and f (x) is

(i) x6; (ii) (x2 + x + 1)3;(iii) x2(x2 + x + 1)2; (iv) x6 + x2.

In each case use Theorem 6.29 to find the number of matrices similarto A.

(e) Let Fq denote a finite field having prime power q elements. Determinethe sizes of the q + q2 similarity classes of 2 × 2 matrices A over Fq

and verify that their sum is |M2(Fq)|. Specify those classes whichbelong to GL2(Fq) and verify that the sum of their sizes is |GL2(Fq)|.

3. (a) Let Ω be a non-empty set and let G be a multiplicative group. A per-mutation representation θ of G on Ω is a group homomorphismθ : G → S(Ω) where S(Ω) is the symmetric group on Ω , that is,S(Ω) is the group of all bijections β : Ω → Ω the group operationbeing composition of bijections. Let xβ denote the image of x ∈ Ω

by β ∈ S(Ω). Write x ∼ y if there is g ∈ G with x(g)θ = y. Showthat ∼ is an equivalence relation on Ω . The equivalence class ofx is called the orbit of x and denoted by Ox where x ∈ Ω . ShowOx = {x(g)θ : g ∈ G}. For x ∈ Ω show that Gx = {g ∈ G : x(g)θ = x}is a subgroup of G. Gx is called the stabiliser of x. Prove the orbit-stabiliser theorem, namely that for each x ∈ Ω the correspondenceGxg → x(g)θ is a bijection from the set of all (left) cosets Gxg of Gx

in G to the orbit Ox . In the case G finite deduce |G|/|Gx | = |Ox | forall x ∈ Ω .Describe Ox and Gx in the two cases:(i) θ trivial, that is, (g)θ is the identity mapping of Ω for all g ∈ G,

(ii) G = S(Ω) and θ : S(Ω) → S(Ω) the identity automorphism, thatis, (β)θ = β for all β ∈ S(Ω).

(b) Let F be a field and t a positive integer. Write Ω = Mt (F ) andG = GLt (F ). For each X ∈ G show that the mapping (X)θ : Ω → Ω ,given by A(X)θ = X−1AX for all A ∈ Ω , is a bijection, that is,(X)θ ∈ S(Ω). Show that θ : G → S(Ω) is a permutation representa-


tion of G on Ω . What is the connection between the orbit OA of A andthe similarity class of the t × t matrix A over F ? Is GA = U(Z(A))?

4. (a) Let M and M ′ be R-modules where R is a commutative ring. Let α :M → M ′ and β : M → M ′ be R-linear mappings (module homomor-phisms). Show that α + β : M → M ′ and rα : M → M ′ are R-linearmappings for r ∈ R where (v)(α + β) = (v)α + (v)β and (v)(rα) =r((v)α) for all v ∈ M . Hence show that the set Hom(M,M ′) of allR-linear mappings α : M → M ′ can be given the structure of anR-module.Hint: Adapt the first part of Exercises 5.1, Question 2(e) and use Ex-ercises 3.3, Question 6(a).

(b) Let M,M1,M2,M′,M ′

1,M′2 be R-modules where R is a commutative

ring. Establish the module isomorphisms

Hom(M1 ⊕ M2,M′) ∼= Hom(M1,M

′) ⊕ Hom(M2,M′)

and

Hom(M,M ′1 ⊕ M ′

2)∼= Hom(M,M ′

1) ⊕ Hom(M,M ′2).

Hint: Adapt the answer to Exercises 3.3, Question 6(b).(c) Let M and M ′ be cyclic F [x]-modules where F is a field. Suppose

M = 〈v0〉 where v0 has order d0(x) in M . Suppose also M ′ = 〈v′0〉

where v′0 has order d ′

0(x) in M ′. Show that, except for the cased0(x) = 0(x), d ′

0(x) �= 0(x), the F [x]-module Hom(M,M ′) is cyclicwith generator β0 of order gcd{d0(x), d ′

0(x)}. Describe Hom(M,M ′)in the exceptional case.

(d) Let (d1(x), d2(x), . . . , ds(x)) be the invariant factor sequence of theF [x]-module M and let (d ′

1(x), d ′2(x), . . . , d ′

t (x)) be the invariantfactor sequence of the F [x]-module M ′ where ds(x) �= 0(x) andd ′t (x) �= 0(x). Generalise Frobenius’ theorem (Corollary 6.34) by

showing that Hom(M,M ′) is a vector space over F of dimension

s∑

i=1

t∑

j=1

deg(gcd{di(x), d ′j (x)}).

Find a necessary and sufficient condition on the invariant factors of M

and M ′ so that Hom(M,M ′) is cyclic.5. (a) Let A be an m × m matrix over a field F and let A′ be a n × n matrix

over F . Let B be an m × n matrix over F and let β : Fm → Fn bethe F -linear mapping determined by B . Generalise the first part ofTheorem 6.27 by showing β ∈ Hom(M(A),M(A′)) if and only if B

intertwines A and A′, that is, AB = BA′.


(b) Working over an arbitrary field F determine the matrices B intertwin-ing C(x2) and C(x3). Find a matrix B0 such that the linear mappingβ0 determined by B0 generates the F [x]-module Hom(M(C(x2)),

M(C(x3))) and find the order of β0 in this module. Are the F [x]-mod-ules Hom(M(C(x2)),M(C(x3))) and M(C(x2)) isomorphic?Answer the same question with the roles of C(x2) and C(x3) inter-changed.

(c) Let d(x) and d ′(x) be monic polynomials of positive degrees m andn respectively over a field F and write A = C(d(x)), A′ = C(d ′(x)).Let

gcd{d(x), d ′(x)} = a0 + a1x + · · · + ar−1xr−1 + xr

and

d ′(x)/gcd{d(x), d ′(x)} = b0 + b1x + · · · + bn−r−1xn−r−1 + xn−r .

Write u0 = (d ′(x)/gcd{d(x), d ′(x)})e′1 working in the F [x]-module

M(A′) and construct the m×n matrix B0 over F with eiB0 = xi−1u0

for 1 ≤ i ≤ m where e1, e2, . . . , em and e′1, e

′2, . . . , e

′n denote the

standard bases of Fm and Fn respectively. Show that u0 has ordergcd{d(x), d ′(x)} in M(A′) and verify AB0 = B0A

′. Show

eiB0 = b0e′i + b1e

′i+1 + · · · + bn−r−1e

′i+n−r−1 + e′

i+n−r

for 1 ≤ i ≤ r,

eiB0 = −(a0ei−rB0 + a1ei−r+1B0 + · · · + ar−1ei−1B0)

for r < i ≤ m.

What is rankB0?Write (v)β0 = vB0 for all v ∈ Fm. Use Question 4(c) above to showthat β0 generates Hom{M(A),M(A′)}.Discuss the simplifications to B0 in the cases

(i) d ′(x)|d(x); (ii) d(x)|d ′(x);(iii) gcd{d(x), d(x)′} = 1.

(d) Let C = C(d1(x))⊕C(d2(x))⊕· · ·⊕C(ds(x)) be a t × t matrix overa field F in rcf Definition 6.4 where t = ∑s

i=1 degdi(x). The t × t

matrix

B =

⎛

⎜⎜⎜⎜⎝

B11 B12 . . . B1s

B21 B22 . . . B2s

......

. . ....

Bs1 Bs2 . . . Bss

⎞

⎟⎟⎟⎟⎠


over F is ‘sympathetically’ partitioned, that is, the Bij aredegdi(x) × degdj (x) submatrices for 1 ≤ i, j ≤ s. Show B ∈ Z(C)

if and only if C(di(x))Bij = BijC(dj (x)) for 1 ≤ i, j ≤ s. ForB ∈ Z(C) use (c) above to describe the matrices Bij for 1 ≤ i, j ≤ s.

(e) Let C and C′ be square matrices over F which are both in rcf. Ex-plain how the theory in (d) above can be modified to determine allmatrices B which intertwine C and C′. Suppose A = X−1CX andA′ = (X′)−1C′X′ where X and X′ are invertible over F . How are thematrices which intertwine A and A′ related to the matrices B whichintertwine C and C′? Under what condition does there exist an invert-ible matrix B over F which intertwines C and C′?

(f) Working over the rational field Q find a Q-basis for the matrices B

which intertwine A and A′ in the following cases:

(i) A = C((x + 1)(x2 + 1)), A′ = C((x + 1)3);(ii) A = C((x + 1)(x2 + 1)), A′ = C((x2 + 1)2);(iii) A = C(x + 1) ⊕ C((x + 1)2),

A′ = C(x + 1) ⊕ C((x3 + x2 + x + 1).

How would your answers change on replacing Q by Z2?6. (a) Let A = C(x) ⊕ C(x2) ⊕ C(x2) over a field F . Verify that

B(x) =⎛

⎝b11(x) b12(x) b13(x)

b21(x) b22(x) b23(x)

b31(x) b32(x) b33(x)

⎞

⎠ in M3(F [x])

satisfies the endomorphism condition Definition 6.31 relative to M(A)

if and only if x|b12(x) and x|b13(x). Verify directly that the sumand product of two matrices B(x) and B ′(x) satisfying e.c.rel. M(A)

also satisfies e.c.rel. M(A). Show further that the set RA of allB(x) satisfying e.c.rel. M(A) is a subring (Exercises 2.3, Ques-tion 3(b)) of M3(F [x]). Let K = {B(x) = (bij (x)) ∈ M3(F [x]) :x|bi1(x), x2|bi2(x), x2|bi3(x), i = 1,2,3}. Is K ⊆ RA? (Yes/No).Show that K is an ideal (Exercises 2.3, Question 3(a)) of RA. Is K anideal of M3(F [x])?Taking F = Z2 use reduced matrices to find |RA/K| and use Lem-ma 6.35 to find |U(RA/K)|. Find the size of the similarity class of A.

(b) Let A be a t × t matrix over a field F with invariant factor sequence(d1(x), d2(x), . . . , ds(x)). Let RA denote the set of matrices B(x) =(bij (x)) ∈ Ms(F [x]) satisfying e.c.rel. M(A), that is, di(x)bij (x) ≡0 (mod dj (x)) for 1 ≤ i, j ≤ s. Complete the proof Theorem 6.32 thatRA is a subring of Ms(F [x]) by showing


(i) RA is closed under negation,(ii) RA contains the zero and identity matrices of Ms(F [x]).

(c) Let B(x) = (bij (x)) belong to the ring RA. Complete the proof ofTheorem 6.33 by showing that there is an endomorphism β of M(A)

represented by B(x) as in Definition 6.30.7. (a) Show that the set of 344 non-invertible 3 × 3 matrices over Z2 par-

titions into 8 similarity classes. Find the number of matrices in eachsimilarity class.Hint: Start by listing the invariant factor sequences.

(b) List the 8 monic irreducible cubic polynomials over Z3. Determinethe number of 3 × 3 matrices over Z3 in each similarity class and ver-ify that the sum of these numbers is 39. Find the number of matricesin each of the 24 conjugacy classes in GL3(Z3). Show that GL3(Z3)

contains an element of multiplicative order 26. Find the multiplicativeorder of each element of SL3(Z3) = {A ∈ GL3(Z3) : detA = 1} anddeduce that SL3(Z3) does not contain an element of multiplicative or-der 26.Hint: Use Exercises 4.1, Question 3(c).

(c) Determine the number of 3 × 3 matrices over the finite field Fq ineach of the q3 + q2 + q similarity classes partitioning M3(Fq) (Ex-ercises 6.1, Question 7(a)). Verify that their sum is q9. Specify theq2 + 2q similarity classes not in GL3(Fq) and verify that the size oftheir union is q9 − |GL3(Fq)| = q8 + q7 − q5 − q4 + q3.Hint: There are q(q − 1)/2 monic irreducible quadratic polynomialsover Fq and (q + 1)q(q − 1)/3 monic irreducible cubic polynomialsover Fq .

(d) Find the number of monic irreducible quartic (degree 4) polynomialsover Z3 by factorising x81 − x over Z3. List the numbers of invari-ant factor sequences of 4 × 4 matrices A over Z3 according to theirirreducible factorisations over Z3 (there are 10 cyclic types and 11non-cyclic types). Find the number of matrices in each of the 129similarity class in M4(Z3) (there are 20 different numbers). Whichsimilarity classes belong to GL4(Z3)? Check that the sums of thesenumbers are 316 and |GL4(Z3)|.

Solutions to Selected Exercises

EXERCISES (page 7)

Solution 1:(

ρ1

ρ2

)−1

=(−5 4

4 −3

)as

∣∣∣∣3 44 5

∣∣∣∣ = −1.

So ρ1, ρ2 is a Z-basis of Z2 and

(m1,m2) = (10,7)

(−5 44 −3

)= (−22,19).

No, as∣∣∣∣3 55 6

∣∣∣∣ = −7 �= ±1.

Solution 2: The 6 elements of Z2/K are: 1g0 = K + e1 + e2, 2g0 = K + 2e2,

3g0 = K + e1, 4g0 = K + e2, 5g0 = K + e1 + 2e2, 6g0 = K . So Z2/K = 〈g0〉 is

cyclic with generator g0 and invariant factor 6.Solution 3: Each of (1,0), (1,2), (0,2) has order 2 and generates a C2 type sub-

group. Each of (0,1), (0,3), (1,1), (1,3) has order 4. 〈(0,1)〉 = 〈(0,3)〉 and〈(1,1)〉 = 〈(1,3)〉 are C4 type subgroups. H = 〈(1,0)〉 ⊕ 〈(0,2)〉 has isomor-phism type C2 ⊕ C2. 〈(0,0)〉 and G′ are subgroups of type C1 and C2 ⊕ C4

respectively. H1 is either 〈(1,2)〉 or 〈(1,0)〉, H2 is either 〈(0,1)〉 or 〈(1,1)〉.

C. Norman, Finitely Generated Abelian Groups and Similarity of Matrices over a Field,Springer Undergraduate Mathematics Series,DOI 10.1007/978-1-4471-2730-7, © Springer-Verlag London Limited 2012

339

http://dx.doi.org/10.1007/978-1-4471-2730-7

340 Solutions to Selected Exercises

EXERCISES 1.1

Solution 5(a): Apply c1 + c2, c2 − c1.Solution 5(b): Applying the sequence to A = (a, b) gives (b,−a), to (b,−a) gives

(−a,−b) and to (−a,−b) gives (−b, a). One of these has non-negative entries.

EXERCISES 1.2

Solution 1(b): ejP1 = ej + lek , eiP1 = ei (i �= j). Postmultiply by A : ejP1A =ejA+ lekA, i.e. row j of P1A is row j of A+ l (row k of A). Also eiP1A = eiA,that is row i of P1A is row i of A for i �= j . So P1A is the result of applyingrj + lrk to A.

Solution 1(c): Q1eTj = eT

k , Q1eTk = eT

j , Q1eTi = eT

i (i �= j, k). Premultiply by

A : AQ1eTj = AeT

k , AQ1eTk = AeT

j , AQ1eTi = AeT

i , i.e. cols j and k of AQ1

are cols k and j respectively of A, col i of AQ1 is col i of A (i �= j, k). So AQ1

is the result of applying cj ↔ ck to A.Solution 1(d): Let Is and It denote the s × s and t × t identity matrices over Z. Then

IsAI−1t = A showing (i) A ≡ A for all s × t matrices A over Z. Suppose A ≡ B .

There are invertible P and Q over Z with PAQ−1 = B . Then P −1B(Q−1)−1 =A showing (ii) A ≡ B ⇒ B ≡ A as P −1 and Q−1 are invertible over Z. Suppose

A ≡ B and B ≡ C. There are invertible P1, P2, Q1, Q2 over Z with P1AQ−11 = B

and P2BQ−12 = C. Then (P2P1)A(Q2Q1)

−1 = C showing (iii) A ≡ B and B ≡C ⇒ A ≡ C as P2P1 and Q2Q1 are invertible over Z. So ≡ is an equivalencerelation.

EXERCISES 1.3

Solution 4(d): BC = B ′C′ and so detBC = detB ′C′ = (detB ′)(detC′) = 0 × 0 = 0by Theorem 1.18 as col l of B ′ is zero and row l of C′ is zero.

Solution 5(c): By Corollary 1.20 d1d2 · · ·ds = gs(A) = 1. So each di = 1 andS(A) = (Is | 0) where Is is the s × s identity matrix. There are invertible ma-trices P1 and Q1 over Z with P1AQ−1

1 = S(A) = (Is | 0). So

AQ−11 = P −1

1 (Is | 0) = (P −11 | 0) = (Is | 0)

(P −1

1 0

0 It−s

)

where It−s is the (t − s) × (t − s) identity matrix. So

Q =(

P −11 0

0 It−s

)

Q1

is invertible over Z and satisfies A = (Is | 0)Q. So A can be reduced toS(A) = (Is | 0) using ecos only and A is the submatrix of Q consisting of thefirst s rows.

Solutions to Selected Exercises 341

EXERCISES 2.1

Solution 4(a): Let g1, g2 ∈ G. Then (g1 + g2)θ = c(g1 + g2) = cg1 + cg2 =(g1)θ + (g2)θ . For g ∈ G, m ∈ Z, (mg)θ = c(mg) = (cm)g = (mc)g = m(cg) =m((g)θ). So θ is Z-linear.As (g0)θ ∈ G = 〈g0〉 there is an integer c with (g0)θ = cg0. For g ∈ G there ism ∈ Z with g = mg0. Hence (g)θ = (mg0)θ = m((g0)θ) = m(cg0) = c(mg0) =cg. So there is an integer c as stated. Let c′ ∈ Z satisfy (g)θ = c′g for all g ∈ G.Then (c − c′)g0 = cg0 − c′g0 = (g0)θ − (g0)θ = 0 showing that c − c′ ∈ 〈n〉.Hence n|(c − c′), i.e. c ≡ c′ (mod n), i.e. c is unique modulo n. In particu-lar c is unique for n = 0 and c is arbitrary for n = 1. Suppose that θ is anautomorphism of G. As θ is surjective there is a ∈ Z with (ag0)θ = g0, i.e.cag0 = g0, i.e. (ca − 1)g0 = 0, i.e. ca − 1 ∈ 〈n〉, i.e. ca − 1 = bn for someb ∈ Z. Hence ca − bn = 1 showing that gcd{c,n} = 1. Conversely suppose thatgcd{c,n} = 1. There are integers a, b with ca−bn = 1. Reversing the above stepsgives (ag0)θ = g0 and hence (mag0)θ = mg0 for all m ∈ Z, showing θ to be sur-jective. Suppose (mg0)θ = (m′g0)θ for some m,m′ ∈ Z. Then cmg0 = cm′g0

and so cm − cm′ ∈ 〈n〉, i.e. n|c(m − m′). Hence n|m − m′ as gcd{c,n} = 1. Som − m′ ∈ 〈n〉. As 〈n〉 is the order ideal of g0 we conclude (m − m′)g0 = 0, i.e.mg0 = m′g0 showing that θ is injective. So θ is an automorphism being bijective.The additive group Z is generated by the integer 1 with order ideal 〈0〉; son = 0 and gcd{c,0} = 1 ⇔ c = ±1. So Z has exactly two automorphisms namelym → m and m → −m for all m ∈ Z.For n > 0 the Z-module Zn is cyclic being generated by 1 with order ideal 〈n〉. Bythe first part every Z-linear mapping θ : Zn → Zn is of the form (m)θ = cm forsome integer c and all m ∈ Zn. As c is unique modulo n we may write cm = cm

unambiguously. It follows directly from the first part with G = Zn, g0 = 1, thatθ is an automorphism of Zn ⇔ gcd{c,n} = 1. So the additive group Z9 has 6automorphisms corresponding to the 6 invertible elements c of Z9 namely 1, 2,4, 5, 7, 8, i.e. the elements c with gcd{c,9} = 1. Yes, all these automorphisms arepowers of θ2 since (2)1 = 2, (2)2 = 4, (2)3 = 8, (2)4 = 16 = 7, (2)5 = 32 = 5,(2)6 = 64 = 1, i.e. 2 generates the multiplicative group of invertible elementsof Z9. So (m)θ3

2 = 8m, (m)θ42 = 7m etc.

Solution 4(b): As ng0 = 0, applying ϕ gives n((g0)ϕ) = (ng0)ϕ = (0)ϕ = 0′ showingn ∈ 〈n′〉, i.e. n′|n. Suppose first that θ : G → G′ is Z-linear and (g0)θ = g′

0. Then(mg0)θ = m((g0)θ) = mg′

0 for all m ∈ Z and so there is at most one such θ .Consider θ : G → G′ given by (mg0)θ = mg′

0 for all m ∈ Z. Let m1g0 = m2g0.Then n|(m1 − m2) as m1 − m2 ∈ 〈n〉 since (m1 − m2)g0 = 0. As d|n we deduced|(m1 −m2). So (m1 −m2)g

′0 = 0 as 〈d〉 is the order ideal of g′

0. So m1g′0 = m2g

′0

showing that θ is unambiguously defined. Also θ is additive as (mg0 + m′g0)θ =((m + m′)g0)θ = (m + m′)g′

0 = mg′0 + m′g′

0 = (mg0)θ + (m′g0)θ for m,m′ ∈ Z.


As (m(m′g0))θ = ((mm′)g0)θ = (mm′)g′0 = m(m′g′

0) = m((m′g0)θ) we see θ isZ-linear.

Solution 4(c): With g1 = g2 = 0 in Definition 2.3 we obtain (0 + 0)θ = (0)θ + (0)θ ,i.e. (0)θ = (0)θ + (0)θ as 0 + 0 = 0. Add −(0)θ , the negative in G′ of (0)θ , toboth sides obtaining 0′ = −(0)θ + (0)θ = −(0)θ + (0)θ + (0)θ = 0′ + (0)θ =(0)θ . Apply θ to −g + g = 0 and use Definition 2.3 to obtain (−g)θ + (g)θ =(−g + g)θ = (0)θ = 0′ which means −(g)θ = (−g)θ for all g ∈ G. The integerm is in the order ideal of

r ⇔ mr = 0 in Zn ⇔ mr = qn

for some

q ∈ Z ⇔ m(r/gcd{r, n}) = q(n/gcd{r, n}) ⇔ (n/gcd{r, n})|m.

Therefore 〈n/gcd{r, n}〉 is the order ideal of r in Zn. For r ∈ Zn there isa unique Z-linear mapping θ : Zm → Zn with (1)θ = r ⇔ (n/gcd{r, n})|m.Hence (n/gcd{m,n})|gcd{r, n} and so (n/gcd{m,n})|r as gcd{r, n}|r . Con-versely (n/gcd{m,n})|r ⇒ (n/gcd{r, n})|m in the same way. So there aregcd{m,n} choices for r ∈ Zn namely r = l(n/gcd{m,n}) for 1 ≤ l ≤ gcd{m,n}.

Solution 4(d): (g1 + g2)θθ ′ = ((g1)θ + (g2)θ)θ ′ = (g1)θθ ′ + (g2)θθ ′ ∀g1, g2 ∈ G

and (mg)θθ ′ = ((mg)θ)θ ′ = (m((g)θ))θ ′ = m(((g)θ)θ ′) = m((g)θθ ′) ∀m ∈ Z,g ∈ G. So θθ ′ is Z-linear. Suppose θ bijective. Then (g′

1 + g′2)θ

−1θ = g′1 + g′

2 =(g′

1)θ−1θ + (g′

2)θ−1θ = ((g′

1)θ−1 + (g′

2)θ−1)θ as θ is Z-linear. As θ is in-

jective (g′1 + g′

2)θ−1 = (g′

1)θ−1 + (g′

2)θ−1 ∀g′

1, g′2 ∈ G′. Also ((mg′)θ−1)θ =

(mg′)θ−1θ = mg′ = m((g′)θ−1θ) = (m((g′)θ−1))θ and as θ is injective(mg′)θ−1 = m((g′)θ−1) ∀m ∈ Z, g′ ∈ G′. So θ−1 is Z-linear. Let θ , ϕ, ψ beautomorphisms of G. Then θϕ ∈ AutG by the above theory with G′ = G′′ = G

and θ ′ = ϕ. Also (θϕ)ψ = θ(ϕψ) as composition of mappings is associative.The identity ι : G → G is in AutG and ιθ = θ = θι for all θ in AutG. For eachθ ∈ AutG we see θ−1 ∈ AutG and θ−1θ = ι = θθ−1. Hence AutG is a group.

Solution 7(a): (i) Let h,h′ ∈ H1 ∩ H2. Then h,h′ ∈ Hi (i = 1,2) and so h + h′ ∈ Hi

as Hi is closed under addition. So h + h′ ∈ H1 ∩ H2 showing that H1 ∩ H2 isclosed under addition. 0 ∈ Hi (i = 1,2) and so 0 ∈ H1 ∩ H2. −h ∈ Hi (i = 1,2)

as Hi is closed under negation and so −h ∈ H1 ∩ H2. Therefore H1 ∩ H2 is asubgroup of G.(ii) (h1 + h2) + (h′

1 + h′2) = (h1 + h′

1) + (h2 + h′2) ∈ H1 + H2 for all hi, h

′i ∈ Hi

(i = 1,2). 0 = 0 + 0 ∈ H1 + H2. −(h1 + h2) = (−h1) + (−h2) ∈ H1 + H2. SoH1 + H2 is a subgroup of G.

Solution 8(a): For n = 3 we have s3 = (g1 + g2) + g3 = g1 + (g2 + g3) by the as-sociative law. Take n > 3 and suppose inductively the result to be true for allordered sets of less than n elements of G. Each summation of g1, g2, . . . , gn in


order decomposes hi + h′n−i for some i with 1 ≤ i < n where hi is a summation

of g1, g2, . . . , gi in order and h′n−i is a summation of gi+1, gi+2, . . . , gn in order.

By induction

hi = si and h′n−i = s′

n−i

where s′n−i = (· · · ((gi+1 + gi+2) + gi+3) · · · ) + gn = s′

n−i−1 + gn say. Hencehi + h′

n−i = si + (s′n−i−1 + gn) = (si + s′

n−i−1) + gn. As si + s′n−i−1 is a sum-

mation of g1, g2, . . . , gn−1 we deduce si + s′n−i−1 = sn−1 by induction. Therefore

hi + h′n−i = sn−1 + gn = sn which completes the induction. Each summation of

g1, g2, . . . , gn in order is equal to sn. So the generalised associative law of addi-tion holds.

Solution 8(b): By the commutative law g1 + g2 = g2 + g1. Take n > 2 and sup-pose the result is true for all sets of less than n elements of G. Each summa-tion of g1, g2, . . . , gn decomposes hi + h′

n−i for some i with 1 ≤ i < n wherehi is a summation of gj , j ∈ X, |X| = i and h′

n−i is a summation of gj , j ∈ Y ,|Y | = n − i, X ∩ Y = ∅. Interchanging hi and h′

n−i if necessary, we may assumen ∈ Y . By induction h′

n−i = h′n−i−1 + gn where h′

n−i−1 is a summation of gj forj ∈ Y/{n} and so hi + h′

n−i−1 = sn−1 by induction. The induction is completedby hi + h′

n−i = hi + (h′n−i−1 + gn) = (hi + h′

n−i−1) + gn = sn−1 + gn = sn.Solution 8(c): For m ≥ 0 by (b) above m(g1 +g2) = mg1 +mg2 on adding up the 2m

elements gi, gi, . . . , gi (i = 1,2) in two ways. For m < 0 write m = −n. Thenm(g1 + g2) = −n(g1 + g2) = −ng1 + (−ng2) = mg1 + mg2. If m1m2 = 0 then(m1 + m2)g = m1g + m2g. By symmetry we may assume m1 ≥ m2. For m1 > 0,m2 > 0 using (a) above with gi = g, (m1 + m2)g = sm1+m2 = sm1 + sm2 =m1g + m2g. For m1 = −n1 < 0, m2 = −n2 < 0 we have (m1 + m2)g =−(n1 + n2)g = −n1g + (−n2g) = m1g + m2g. For m1 > 0, m2 = −n2 < 0,m1 + m2 > 0, (m1 + m2)g = sm1+m2 = sm1 − sn2 = m1g − n2g = m1g + m2g.For m1 > 0, m2 = −n2 < 0, m1 + m2 = −n < 0, (m1 + m2)g = −ng =−sn = −sn2−m1 = −(sn2 − sm1) = sm1 − sn2 = m1g − n2g = m1g + m2g.Now (m1m2)g = 0 = m1(m2g) for m1m2 = 0. For m1 > 0, m2 > 0, by (a)above, (m1m2)g = sm1m2 = m1(m2g). Hence for m1 = −n1 < 0, m2 = −n2 < 0,(m1m2)g = ((−n1)(−n2))g = (n1n2)g = n1(n2g) = (−n1)(−n2g) = m1(m2g).For m1 > 0, m2 = −n2 < 0, (m1m2)g = (−m1n2)g = −((m1n2)g) =−(m1(n2g)) = m1(−n2g) = m1(m2g).For m1 = −n1 < 0, m2 > 0, (m1m2)g = (−n1m2)g = −((n1m2)g) =−(n1(m2g)) = (−n1)(m2g) = m1(m2g).

EXERCISES 2.2

Solution 4(e): As mn(g + h) = nmg + mnh = n0 + m0 = 0 we see that g + h

has finite order l where l|mn. Now l(g + h) = 0 and so lg = −lh. Hencenlg = n(−lh) = −lnh = −l0 = 0 showing that the order m of g is a divisor


of nl, i.e. m|nl. As gcd{m,n} = 1 we deduce m|l. In the same way we ob-tain n|l and so mn|l using gcd{m,n} = 1 again. Therefore mn = l. Note that|G| = |K| × |G/K| = mn. Replacing ϕ in Exercises 2.1, Question 4(b) by thenatural homomorphism η : G → G/K , we see that the order s of h0 is a divisorof the order n of (h0)η = K +h0. So h = (s/n)h0 has order n. By the above g+h

has order mn, as g has order m where K = 〈g〉. Therefore g + h generates G, i.e.G = 〈g + h〉 is cyclic.

Solution 4(f): Let g1, g′1, g

′′1 ∈ G1 and g2, g

′2, g

′′2 ∈ G2. Addition in G1 ⊕ G2 is asso-

ciative as

((g1, g2) + (g′1, g

′2)) + (g′′

1 , g′′2 ) = (g1 + g′

1, g2 + g′2) + (g′′

1 , g′′2 )

= ((g1 + g′1) + g′′

1 , (g2 + g′2) + g′′

2 )

= (g1 + (g′1 + g′′

1 ), g2 + (g′2 + g′′

2 ))

= (g1, g2) + (g′1 + g′′

1 , g′2 + g′′

2 )

= (g1, g2) + ((g′1, g

′2) + (g′′

1 , g′′2 )).

The zero element of G1 ⊕ G2 is (01,02) since (01,02) + (g1, g2) =(01 + g1,02 + g2) = (g1, g2). The negative of (g1, g2) is (−g1,−g2) as(−g1,−g2) + (g1, g2) = (−g1 + g1,−g2 + g2) = (01,02). Addition in G1 ⊕ G2

is commutative as

(g1, g2) + (g′1, g

′2) = (g1 + g′

1, g2 + g′2) = (g′

1 + g1, g′2 + g2)

= (g′1, g

′2) + (g1, g2).

So G1 ⊕ G2 is an additive abelian group. Consider α : G1 ⊕ G2 → G2 ⊕ G1

defined by (g1, g2)α = (g2, g1) for all g1 ∈ G1, g2 ∈ G2. Then α : G1 ⊕ G2 ∼=G2 ⊕ G1.

Solution 6(b): Suppose to the contrary that the additive group Z has non-trivial sub-groups H1 and H2 such that Z = H1 ⊕ H2. As H1 and H2 are ideals of thering Z, by Theorem 1.15 there are positive integers n1 and n2 with H1 = 〈n1〉and H2 = 〈n2〉. Then 0 = 0×n1 +0×n2 = n2 ×n1 + (−n1)×n2, i.e. the integerzero is expressible in two different ways as a sum of integers from H1 and H2. SoZ is indecomposable.

EXERCISES 2.3

Solution 1(a): (i) Write K = ker θ and let k, k′ ∈ K . Then (k + k′)θ = (k)θ + (k′)θ =0′ + 0′ = 0′ showing k + k′ ∈ K . Also (−k)θ = −(k)θ = −0′ = 0′ and (0)θ = 0′showing −k ∈ K and 0 ∈ K . As (mk)θ = m((k)θ) = m0′ = 0′ for m ∈ Z, weconclude mk ∈ K and so K is a submodule of the Z-module G. Suppose K = {0}and let g1, g2 ∈ G satisfy (g1)θ = (g2)θ . Then (g1 − g2)θ = (g1)θ − (g2)θ =


(g1)θ − (g1)θ = 0′ showing g1 − g2 ∈ K . So g1 − g2 = 0, i.e. g1 = g2 andθ is injective. Conversely suppose that θ is injective and let k ∈ K . Then(k)θ = 0′ = (0)θ . So k = 0 as θ is injective, giving K = {0}.(ii) Let g′

1, g′2 ∈ im θ . Then g′

1 = (g1)θ and g′2 = (g2)θ for some g1, g2 ∈ G.

Then g′1 + g′

2 = (g1)θ + (g2)θ = (g1 + g2)θ ∈ im θ as g1 + g2 ∈ G. Also−g′

1 = −(g1)θ = (−g1)θ ∈ im θ since −g1 ∈ G. As 0′ = (0)θ ∈ im θ and mg′1 =

m((g1)θ) = (mg1)θ ∈ im θ for all m ∈ Z, we see im θ is a submodule of theZ-module G′. Yes, im θ = G′ is the same as θ being surjective.

Solution 3(a): There are k1, k2 ∈ K such that r1 = r ′1 + k1, r2 = r ′

1 + k2. There-fore r1r2 = (r ′

1 + k1)(r′2 + k2) = r ′

1r′2 + k3 where k3 = r ′

1k2 + k1r′2 + k1k2.

As K is an ideal of R we see that r ′1k2, k1r

′2, k1k2 ∈ K , and so k3 ∈ K as

K is closed under addition. Hence r1r2 ≡ r ′1r

′2 (mod K) and so K + r1r2 =

K + r ′1r

′2 by Lemma 2.9, showing that coset multiplication is unambiguously

defined. Write r = K + r and then R/K has binary operations r1 + r2 =r1 + r2 and (r1)(r2) = r1r2 where r1, r2 ∈ R. Now (R/K,+) is an abeliangroup by Lemma 2.10. Let r1, r2, r3 ∈ R. Then ((r1)(r2))(r3) = (r1r2)(r3) =(r1r2)r3 = r1(r2r3) = (r1)(r2r3) = (r1)((r2)(r3)) showing that coset multiplica-tion is associative. Coset multiplication is distributive because ((r1)+ (r2))(r3) =(r1 + r2)(r3) = (r1 + r2)r3 = r1r3 + r2r3 = r1r3 + r2r3 = (r1)(r3) + (r2)(r3) andsimilarly (r1)((r2) + (r3)) = (r1)(r2) + (r1)(r3). Also (e)(r) = er = r = re =(r)(e) for all r ∈ R, and so R/K is a ring with 1-element e = K + e.By Lemma 2.10 η is additive. As (r1r2)η = r1r2 = (r1)(r2) = (r1)η(r2)η for allr1, r2 ∈ R and (e)η = e we see η is a ring homomorphism. Also imη = R/K andkerη = K .

Solution 3(b): By Exercises 2.3, Question 1(a)(ii), im θ is a subgroup of (R′,+). As(r1)θ(r2)θ = (r1r2)θ for all r1, r2 ∈ R we see im θ is closed under multiplication.The 1-element e′ of R′ belongs to im θ as e′ = (e)θ . So im θ is a subring of R′ andhence im θ is itself a ring. By Exercises 2.3, Question 1(a)(i), ker θ is a subgroupof (R,+). Consider r ∈ R, k ∈ K ; then (rk)θ = (r)θ(k)θ = (r)θ × 0 = 0 andso rk ∈ K = ker θ . Similarly kr ∈ K and so K is an ideal of R. Kernels of ringhomomorphisms are ideals. By Theorem 2.16 θ : R/K ∼= im θ is an isomorphismof additive abelian groups. Also ((r1)(r2))θ = (r1r2)θ = (r1r2)θ = (r1)θ(r2)θ =(r1)θ (r2)θ for all r1, r2 ∈ R. So θ is a ring isomorphism as (e)θ = e′. Thereforeθ : R/K ∼= im θ .

Solution 3(c): By (b) above ker θ is an ideal of the ring Z. By Theorem 1.15 there is anon-negative integer d with ker θ = 〈d〉. By (b) above θ : Z/〈d〉 ∼= im θ , showingthat the rings Zd = Z/〈d〉 are, up to isomorphism, the (ring) homomorphic imagesof Z.

Solution 3(d): For r1, r2 ∈ R we see (r1 + r2)θθ ′ = ((r1)θ + (r2)θ)θ ′ =(r1)θθ ′ + (r2)θθ ′ and (r1r2)θθ ′ = ((r1)θ(r2)θ)θ ′ = (r1)θθ ′(r2)θθ ′ showing thatθθ ′ is additive and multiplicative. As (e)θ = e′ and (e′)θ ′ = e′′ we see (e)θθ ′ = e′′


where e, e′, e′′ are the 1-elements of R, R′, R′′. So θθ ′ : R → R′′ is a ring ho-momorphism. Suppose that θ is a ring isomorphism. Consider r ′

1, r′2 ∈ R′ and

write r1 = (r ′1)θ

−1, r2 = (r ′2)θ

−1. Then ((r ′1)θ

−1 + (r ′2)θ

−1)θ = (r1 + r2)θ =(r1)θ + (r2)θ = r ′

1 + r ′2 = (r ′

1 + r ′2)θ

−1θ and so (r ′1)θ

−1 + (r ′2)θ

−1 = (r ′1 + r ′

2)θ−1

as θ is injective, and θ−1 is additive. Similarly ((r ′1)θ

−1(r ′2)θ

−1)θ = (r1r2)θ =(r1)θ(r2)θ = r ′

1r′2 = (r ′

1r′2)θ

−1θ and so (r ′1)θ

−1(r ′2)θ

−1 = (r ′1r

′2)θ

−1 as θ is ad-ditive, and θ−1 is multiplicative. As (e′)θ−1 = e we conclude θ−1 : R → R′ is aring isomorphism. Take R = R′ = R′′ and suppose θ , θ ′ are bijective, i.e. supposeθ, θ ′ ∈ AutR. By the above theory θθ ′, θ−1 ∈ AutR. As the identity mapping ιR

of R belongs to AutR we see that AutR is a group (it’s a subgroup of the groupof all bijections of R → R).

Solution 3(f): By Exercises 2.1, Question 7(a)(i) and (ii) both K ∩ L and K + L

are additive abelian groups. Consider r ∈ R and m ∈ K ∩ L. Then m ∈ K andm ∈ L. As K is an ideal of R we see rm,mr ∈ K . As L is an ideal of R we seerm,mr ∈ L. So rm,mr ∈ K ∩ L and K ∩ L is an ideal of R. Consider r ∈ R,m ∈ K +L. Then m = k+ l where k ∈ K and l ∈ L. So rm = r(k+ l) = rk+ rl ∈K +L since rk ∈ K and rl ∈ L as before. Also mr = (k + l)r = kr + lr ∈ K +L

since kr ∈ K and lr ∈ L. So K + L is an ideal of R. For r1, r2 ∈ R using additionand multiplication in the rings R/K , R/L and R/K ⊕ R/L

(r1 + r2)α = (r1 + r2 + K,r1 + r2 + L)

= ((r1 + K) + (r2 + K), (r1 + L) + (r2 + L))

= (r1 + K,r1 + L) + (r2 + K,r2 + L) = (r1)α + (r2)α

and

(r1r2)α = (r1r2 + K,r1r2 + L)

= ((r1 + K)(r2 + K), (r1 + L)(r2 + L))

= (r1 + K,r1 + L)(r2 + K,r2 + L) = (r1)α(r2)α.

Let e be the 1-element of R. As (e)α = (e + K,e + L) is the 1-element ofR/K⊕R/L we see that α is a ring homomorphism. The 0-element of R/K ⊕ R/L

is (K,L). As

(r)α = (K,L) ⇔ (r + K,r + L) ⇔ r ∈ K,r ∈ L ⇔ r ∈ K ∩ L

we see kerα = K ∩ L. Now use K + L = R to find imα: there are elementsk0 ∈ K and l0 ∈ L with k0 + l0 = e. Consider an arbitrary element (s + K, t + L)

of R/K ⊕ R/L and so s, t ∈ R. Write r = sl0 + tk0. Then r − s = r − se =s(l0 −e)+ tk0 = s(−k0)+ tk0 = (t −s)k0 ∈ K and so r +K = s+K . Also r − t =r − te = sl0 + t (k0 − e) = sl0 + t (−l0) = (s − t)l0 ∈ L and so r + L = t + L.


Therefore (r)α = (r +K,r +L) = (s+K, t +L) and imα = R/K ⊕R/L. By (b)above α : R/(K ∩L) ∼= R/K ⊕R/L is a ring isomorphism where (r +K ∩L)α =(r + K,r + L) for all r ∈ R.

Solution 4(a): Suppose that K is normal in G. Let g ∈ G and consider kg ∈ Kg wherek ∈ K . Then kg = g(g−1kg) ∈ gK as g−1kg ∈ K . So Kg ⊆ gK . Replacing g byg−1 in the normality condition gives gkg−1 ∈ K for all k ∈ K . Consider gk ∈ gK .Then gk = (gkg−1)g ∈ Kg. So gK ⊆ Kg and hence Kg = gK . Converselysuppose Kg = gK for all g ∈ G. For k ∈ K , g ∈ G we have kg ∈ Kg and sokg ∈ gK . There is k′ ∈ K with kg = gk′. Hence g−1kg = k′ ∈ K , i.e. g−1kg ∈ K

for all k ∈ K , g ∈ G. Suppose Kg1 = Kg′1 and Kg2 = Kg′

2. Using the above the-ory we obtain Kg1g2 = Kg′

1g2 = g′1g2K = g′

1g′2K = Kg′

1g′2, showing that coset

multiplication is unambiguously defined. So G/K is closed under coset multi-plication. Let e denote the identity element of G and let g,g1, g2, g3 ∈ G. Then(Kg1Kg2)Kg3 = K(g1g2)g3 = Kg1(g2g3) = Kg1(Kg2Kg3) showing that cosetmultiplication is associative. As KeKg = Keg = Kg = Kge = KgKe we seethat K = Ke is the identity element of G/K . As Kg−1Kg = Kg−1g = Ke =Kgg−1 = KgKg−1 we see that Kg−1 is the inverse of Kg. So G/K is a group.

Solution 4(c): Write x = (e)θ . Then x2 = (e)θ(e)θ = (e2)θ = (e)θ = x. As x ∈ G′there is x−1 ∈ G′ with xx−1 = e′. Hence e′ = xx−1 = x2x−1 = x, i.e. (e)θ = e′.Applying θ to g−1g = e = gg−1 gives (g−1)θ(g)θ = e′ = (g)θ(g−1)θ show-ing that (g−1)θ is the inverse of (g)θ , i.e. (g−1)θ = ((g)θ)−1 for all g ∈ G. As(e)θ = e′ we see e ∈ K . Let k1, k2 ∈ K . Then (k1k2)θ = (k1)θ(k2)θ = e′e′ = e′showing k1k2 ∈ K . As (k−1

1 )θ = ((k1)θ)−1 = e′−1 = e′ we see k−11 ∈ K . There-

fore K = ker θ is a subgroup of G. As (e)θ = e′ we see e′ ∈ im θ . As (g1)θ(g2)θ =(g1g2)θ ∈ im θ for g1, g2 ∈ G and so im θ is closed under multiplication. As((g1)θ)−1 = (g−1

1 )θ ∈ im θ for all g1 ∈ G we see im θ is a subgroup of G′.Let k ∈ K , g ∈ G. Then (g−1kg)θ = (g−1)θ(k)θ(g)θ = ((g)θ)−1e′(g)θ =((g)θ)−1(g)θ = e′ showing that g−1kg ∈ K . So K = ker θ is normal in G. Ker-nels of group homomorphisms are normal subgroups. As (kg)θ = (k)θ(g)θ =e′(g)θ = (g)θ all elements of the coset Kg are mapped by θ to the sameelement (g)θ . So θ : G/K → im θ defined by (Kg)θ = (g)θ is unambigu-ous and surjective. Suppose (Kg1)θ = (Kg2)θ . Then (g1)θ = (g2)θ and so(g1g

−12 )θ = (g1)θ((g2)θ)−1 = (g1)θ((g1)θ)−1 = e′ showing g1g

−12 = k ∈ K . So

g1 = kg2 and hence Kg1 = Kg2, that is, θ is injective. As ((Kg1)(Kg2))θ =(Kg1g2)θ = (g1g2)θ = (g1)θ(g2)θ = (Kg1)θ(Kg2)θ we see that θ is a groupisomorphism and so θ : G/K ∼= im θ , the first isomorphism theorem for groups.

Solution 4(d): Let g1, g′1, g

′′1 ∈ G1 and g2, g

′2, g

′′2 ∈ G2. Then

((g1, g2)(g′1, g

′2))(g

′′1 , g′′

2 ) = (g1g′1, g2g

′2)(g

′′1 , g′′

2 )

= ((g1g′1)g

′′1 , (g2g

′2)g

′′2 )

= (g1(g′1g

′′1 ), g2(g

′2g

′′2 ))


= (g1, g2)(g′1g

′′1 , g′

2g′′2 )

= (g1, g2)((g′1, g

′2)(g

′′1 , g′′

2 ))

showing that componentwise multiplication on G1 × G2 is associative. Thepair (e1, e2) consisting of the identity elements e1 of G1 and e2 of G2 isthe identity element of G1 × G2 because (g1, g2)(e1, e2) = (g1e1, g2e2) =(g1, g2) = (e1g1, e2g2) = (e1, e2)(g1, g2) for all (g1, g2) ∈ G1 × G2. The inverseof (g1, g2) is (g−1

1 , g−12 ) as (g1, g2)(g

−11 , g−1

2 ) = (g1g−11 , g2g

−12 ) = (e1, e2) =

(g−11 g1, g

−12 g2) = (g−1

1 , g−12 )(g1, g2). So G1 × G2 is a group, the external direct

product of G1 and G2.Solution 5(a): Let m1,m2 ∈ Z. Then (m1 + m2)χ = (m1 + m2)e = m1e + m2e =

(m1)χ + (m2)χ applying the result of Exercises 2.1, Questions 8(c) to the ad-ditive group of F . Also (m1m2)χ = (m1m2)e = (m1m2)e

2 = (m1e)(m2e) =(m1)χ(m2)χ . So χ is a ring homomorphism as (1)χ = e. By Theorem 1.15 thereis a unique non-negative integer d with kerχ = 〈d〉. By the first isomorphismtheorem for rings χ : Z/kerχ ∼= imχ , i.e. χ : Zd

∼= imχ defined by (i)χ = ie

for all i ∈ Zd = Z/〈d〉 is a ring isomorphism. Suppose d > 0. As e �= 0 we seed �= 1. Suppose that d is not prime. Then d = d1d2 for positive integers d1, d2.Hence (d1)χ(d2)χ = (d1d2)χ = (d)χ = 0. As F has no divisors of zero, either(d1)χ = 0 or (d2)χ = 0, i.e. either d1 ∈ 〈d〉 or d2 ∈ 〈d〉 both of which are impos-sible as d is not a divisor of either d1 or d2. So either d = 0 or d = p a prime.For d = 0 we have χ : Z0 ∼= imχ and so imχ has an infinite number of ele-ments. So for each finite field F there is a prime p (the characteristic of F ) suchthat χ : Zp

∼= imχ . As Zp is a field we see that imχ = F0 is a subfield of F .Regard the elements v, v′ of F as vectors and the elements a of F0 as scalars;then v + v′ ∈ F and av ∈ F satisfy the vector space laws as these laws followdirectly from the laws of a field. In short F is a vector space over F0. As thereare only a finite number of vectors, this vector space is finitely generated and sohas a basis v1, v2, . . . , vs . Each element of F can be uniquely expressed in theform a1v1 + a2v2 + · · · + asvs where a1, a2, . . . , as ∈ F0. As Zp

∼= F0 there arep independent choices for each of the s scalars a1, a2, . . . , as . Hence |F | = ps .For 0 < r < p the binomial coefficient

(pr

)is divisible by the prime p. As pe = 0

by the first paragraph, we see(

pr

)ap−rbr = ((

pr

)/p

)(pe)ap−rbr = 0. By the

binomial theorem

(a + b)p =p∑

r=0

(p

r

)ap−rbr = ap + bp

as only the first and last terms contribute to the sum. Therefore (a + b)θ =(a + b)p = ap + bp = (a)θ + (b)θ showing that θ is additive. As ker θ ={a ∈ F : ap = 0} = {0} we see that θ is injective by Exercises 2.3, Ques-tion 1(a)(i). As θ : F → F and F has only a finite number of elements we deduce


that θ is also surjective. Finally (ab)θ = (ab)p = apbp = (a)θ(b)θ showing thatθ is multiplicative. As (e)θ = ep = e we conclude that θ is an automorphism ofthe finite field F .

Solution 5(b): Let A = (aij ) and B = (bij ) be t × t matrices over Zn. Then

(A + B)δt = ((aij + bij )δ1) = ((aij )δ1 + (bij )δ1)

= ((aij )δ1) + ((bij )δ1) = (A)δt + (B)δt

and

(AB)δt =((

t∑

j=1

aij bjk

)

δ1

)

=((

t∑

j=1

(aij )δ1(bjk)δ1

))

= ((aij )δ1)((bjk)δ1) = (A)δt (B)δt .

As (1n)δ1 = 1d and (0n)δ1 = 0d we see that δt maps the 1-element of the ringMt (Zn) to the 1-element of Mt (Zd) and so δt is a ring homomorphism. Asδ1 is surjective so also is δt , i.e. im δt = Mt (Zd). Let A = (aij ) ∈ ker δt . Then(aij )δ1 = 0d and so aij = mn where d|m. There are therefore n/d independent

choices for each of the t2 entries in A. Hence |ker δt | = (n/d)t2. Using the multi-

plicative property of determinants

A ∈ GLt (Zn) ⇔ detA = mn ∈ U(Zn) ⇔ gcd{m,ps} = 1

⇔ gcd{m,p} = 1

⇔ det(A)δt = mp ∈ U(Zp) = Z∗p

⇔ (A)δt ∈ GLt (Zp).

Hence δt | : GLt (Zps ) → GLt (Zp) makes sense and is surjective. As δt respectsproducts so also does the restriction δt |, i.e. it is a homomorphism of multiplica-tive groups. Now A ∈ ker δt | ⇔ (A)δt = (I )δt ⇔ A ∈ ker δt + I and so ker δt | =ker δt + I is a normal subgroup of GLt (Zps ) having |ker δt | = p(s−1)t2

elements.By the first isomorphism theorem for groups GLt (Zps )/ker δt | ∼= GLt (Zp).So GLt (Zps ) partitions into |GLt (Zp)| cosets of ker δt . Hence |GLt (Zps )| =p(s−1)t2

(pt −1)(pt −p) · · · (pt −pt−1) using the formula following Lemma 2.18.Solution 7(a): Suppose v′

1, v′2, . . . , v

′t generate M ′. Let v′ ∈ M ′. By Definition 2.19(i)

there are r1, r2, . . . , rt ∈ R with r1v′1 + r2v

′2 + · · · + rtv

′t = v′. Write v =

r1v1 + r2v2 + · · · + rtvt ∈ M . Then

(v)θ = (r1v1 + r2v2 + · · · + rt vt )θ

= r1(v1)θ + r2(v2)θ + · · · + rt (vt )θ

= r1v′1 + r2v

′2 + · · · + rt v

′t = v′


showing θ to be surjective. Conversely suppose that θ is surjective. Let v′ ∈ M ′.There is v ∈ M with (v)θ = v′. As v1, v2, . . . , vt generate M there arer1, r2, . . . , rt ∈ R with v = r1v1 + r2v2 + · · · + rt vt . As θ is R-linear we have

v′ = (v)θ = (r1v1 + r2v2 + · · · + rtvt )θ

= r1(v1)θ + r2(v2)θ + · · · + rt (vt )θ

= r1v′1 + r2v

′2 + · · · + rt v

′t

showing that v′1, v

′2, . . . , v

′t generate M ′. Suppose v′

1, v′2, . . . , v

′t are R-independent

elements of M ′. Consider u ∈ ker θ . As v1, v2, . . . , vt generate M there arer1, r2, . . . , rt ∈ R with u = r1v1 + r2v2 + · · · + rtvt . As θ is R-linear

0 = (u)θ = (r1v1 + r2v2 + · · · + rt vt )θ

= r1(v1)θ + r2(v2)θ + · · · + rt (vt )θ

= r1v′1 + r2v

′2 + · · · + rt v

′t

and so r1 = r2 = · · · = rt = 0. Hence u = 0v1 + 0v2 + · · · + 0vt = 0 showingker θ = {0} and so θ is injective by Exercises 2.3, Question 1(a)(i). Converselysuppose θ is injective. Then ker θ = {0} by Exercises 2.3, Question 1(a)(i). Con-sider r1v

′1 + r2v

′2 + · · · + rt v

′t = 0 where r1, r2, . . . , rt ∈ R. Then

(r1v1 + r2v2 + · · · + rt vt )θ = 0,

i.e. r1v1 + r2v2 + · · · + rtvt ∈ ker θ . So r1v1 + r2v2 + · · · + rt vt = 0. Asv1, v2, . . . , vt are R-independent we conclude r1 = r2 = · · · = rt = 0 and sov′

1, v′2, . . . , v

′t are R-independent elements of M ′. Now suppose θ : M → M ′ is

an isomorphism and M is free of rank t . Then M has R-basis v1, v2, . . . , vt . Letv′i = (vi)θ for 1 ≤ i ≤ t . As θ is surjective and injective v′

1, v′2, . . . , v

′t generate M ′

and are R-independent using the above theory. So M ′ has R-basis v′1, v

′2, . . . , v

′t

and so is free of rank t = t ′. Conversely suppose M and M ′ to be free R-modulesof the same rank t . Let M have R-basis v1, v2, . . . , vt and let M ′ have R-basisv′

1, v′2, . . . , v

′t . Consider θ : M → M ′ defined by (r1v1 + r2v2 + · · · + rt vt )θ =

r1v′1 + r2v

′2 +· · ·+ rt vt for all r1, r2, . . . , rt ∈ R. Let u = r1v1 + r2v2 +· · ·+ rt vt ,

v = s1v1 + s2v2 + · · · + stvt where si ∈ R for 1 ≤ i ≤ t . Then

(u + v)θ = ((r1 + s1)v1 + (r2 + s2)v2 + · · · + (rt + st )vt )θ

= (r1 + s1)v′1 + (r2 + s2)v

′2 + · · · + (rt + st )v

′t

= (r1v′1 + r2v

′2 + · · · + rt v

′t ) + (s1v

′1 + s2v

′2 + · · · + st v

′t )

= (r1v1 + r2v2 + · · · + rt vt )θ + (s1v1 + s2v2 + · · · + st vt )θ

= (u)θ + (v)θ


and so θ is additive. Also for r ∈ R we see

(ru)θ = (r(r1v1 + r2v2 + · · · + rt vt ))θ

= (rr1v1 + rr2v2 + · · · + rrt vt )θ

= rr1v′1 + rr2v

′2 + · · · + rrt v

′t

= r(r1v′1 + r2v

′2 + · · · + rt v

′t ) = r((u)θ).

So θ is R-linear and as (vi)θ = v′i we can apply the above theory: since

v′1, v

′2, . . . , v

′t generate M ′ and are R-independent, θ is surjective and injective,

i.e. θ : M ∼= M ′.Solution 7(b): By hypothesis M has R-basis v1, v2, . . . , vt . There are t × t matri-

ces P = (pij ) and Q = (qjk) over R such that vi = ∑tj=1 pijuj (1 ≤ i ≤ t) and

uj = ∑tk=1 qjkvk (1 ≤ j ≤ t) as in the proof of Theorem 2.20. Then PQ = I on

comparing coefficients in the equation

vi =t∑

j=1

pij

(t∑

k=1

qjkvk

)

=t∑

k=1

(t∑

j=1

pij qjk

)

vk.

So Q is invertible over R by Lemma 2.18. Hence u1, u2, . . . , ut is an R-basis ofM by Corollary 2.21.

Solution 7(c): Regarding M and M ′ as Z-modules, by Exercises 2.3, Question 1(a)(i)and (ii) above, ker θ and im θ are additive subgroups of M and M ′ respec-tively. For u ∈ ker θ , r ∈ R, we have (ru)θ = r((u)θ) = r × 0 = 0 showingthat ru ∈ ker θ . So ker θ is a submodule of the R-module M by Definition 2.26.For u′ ∈ im θ , r ∈ R, there is u ∈ M with (u)θ = u′. So (ru)θ = r((u)θ) = ru′showing ru′ ∈ im θ as ru ∈ M . So im θ is a submodule of the R-module M ′by Definition 2.26. Let θ be bijective. Then θ−1 : M ′ → M is additive by Ex-ercises 2.1, Question 4(d). Let r ∈ R and v′ ∈ M ′ and write v = (v′)θ−1. Then(rv)θ = r((v)θ) = rv′. Applying θ−1 gives (rv′)θ−1 = rv = r((v′)θ−1) showingthat θ−1 is R-linear. Yes, the inverse of an isomorphism of R-modules is bijectiveand R-linear and so is itself an isomorphism of R-modules by Definition 2.24.

Solution 7(d): Consider r1, r2 ∈ R and v ∈ M . Using coset addition, before Lem-ma 2.10, and law 5 (part 2), before Definition 2.19, we see (r1 + r2)(N + v) =N + (r1 + r2)v = N + r1v + r2v = (N + r1) + (N + r2) which shows thatlaw 5 (part 2) holds in M/N . As law 6 holds in M we see (r1r2)(N + v) =N + (r1r2)v = N + r1(r2v) = r1(N + r2v) = r1(r2(N + v)) showing that law 6holds in M/N . The 1-element 1 of R satisfies 1(N + v) = N + 1v = N + v andso law 7 holds in M/N . Therefore M/N is an R-module.

Solution 7(e): There is an element u0 ∈ N . Then z0 = 0u0 ∈ N . As 0 + 1 = 1 in R wesee z0 +u0 = 0u0 + 1u0 = (0 + 1)u0 = 1u0 = u0 using the distributive law in M .Hence −u0 + (z0 + n0) = −u0 + u0 = 0 and so z0 = 0 (the 0-element of M) on


using the associative and commutative laws of addition in M . So N contains the0-element of M . For u ∈ N we have (−1)u ∈ N . Then (−1)u+u = (−1)u+1u =(−1+1)u = 0u = 0 on replacing u0 by u in the above paragraph. So −u = (−1)u

and so N is closed under negation. N is a subgroup of the additive group of M .So N is a submodule of M .Consider v, v′ ∈ N1 + N2. There are u1, u

′1 ∈ N1 and u2, u

′2 ∈ N2 with

v = u1 + u2, v′ = u′1 + u′

2.

So v + v′ = u1 + u2 + u′1 + u′

2 = (u1 + u′1) + (u2 + u′

2) ∈ N1 + N2 sinceu1 + u′

1 ∈ N1 and u2 + u′2 ∈ N2. So N1 + N2 is closed under addition. Also

rv = r(u1 + u2) = ru1 + ru2 ∈ N1 + N2 as ru1 ∈ N1 and ru2 ∈ N2 for all r ∈ R.By the above theory N1 + N2 is a submodule of M . Consider u,u′ ∈ N1 ∩ N2.Then u,u′ ∈ N1 and so u + u′ ∈ N1. Also u,u′ ∈ N2 and so u + u′ ∈ N2. There-fore u+u′ ∈ N1 ∩N2 and so N1 ∩N2 is closed under addition. For r ∈ R we haveru ∈ N1 as u ∈ N1. Also ru ∈ N2 as u ∈ N2. So ru ∈ N1 ∩ N2. So N1 ∩ N2 is asubmodule of M .

Solution 7(f): By Exercises 2.2, Question 4(f) we know M1 ⊕ M2 is an additiveabelian group. We next check that the R-module laws 5, 6 and 7 (before Defi-nition 2.19) hold in M1 ⊕ M2 given that these laws hold in M1 and M2. Considerv = (v1, v2) ∈ M1 ⊕ M2, v′ = (v′

1, v′2) ∈ M1 ⊕ M2 and r, r ′ ∈ R. Then

r(v + v′) = r((v1, v2) + (v′1, v

′2)) = r(v1 + v′

1, v2 + v′2)

= (r(v1 + v′1), r(v2 + v′

2))

= (rv1 + rv′1, rv2 + rv′

2) = (rv1, rv2) + (rv′1, rv

′2)

= r(v1, v2) + r(v′1, v

′2) = rv + rv′

and

(r + r ′)v = (r + r ′)(v1, v2) = ((r + r ′)v1, (r + r ′)v2)

= (rv1 + r ′v1, rv2 + r ′v2)

= r(v1, v2) + r ′(v1, v2) = rv + r ′v

which shows that law 5 holds. Also

(rr ′)v = (rr ′)(v1, v2) = ((rr ′)v1, (rr′)v2) = (r(r ′v1), r(r

′v2))

= r(r ′v1, r′v2) = r(r ′(v1, v2)) = r(r ′v)

showing that law 6 holds. As 1v = 1(v1, v2) = (1v1,1v2) = (v1, v2) = v we seethat law 7 holds and so M1 ⊕ M2 is an R-module.


EXERCISES 3.1

Solution 5(b): As G is generated by t say of its elements there is a surjective Z-linearmapping θ : Zt → G. Consider k1, k2 ∈ K ′ and so (k1)θ = h1 ∈ H and (k2)θ =h2 ∈ H . Then (k1 + k2)θ = (k1)θ + (k2)θ = h1 + h2 ∈ H showing k1 + k2 ∈ K ′.Also (−k1)θ = −(k1)θ = −h1 ∈ H showing −k1 ∈ K ′. As (0)θ = 0 ∈ H wesee 0 ∈ K ′ and so K ′ is an additive subgroup of Z

t , i.e. K ′ is a submoduleof Zt . So K ′ is free with Z-basis z1, z2, . . . , zs , s ≤ t by Theorem 3.1. Let h ∈ H .There is k ∈ K ′ with (k)θ = h. There are integers m1,m2, . . . ,ms with k =m1z1 + m2z2 + · · · + mszs . As θ is Z-linear we obtain

h = (k)θ = (m1z1 + m2z2 + · · · + mszs)θ

= m1(z1)θ + m2(z2)θ + · · · + ms(zs)θ

which shows that the s elements (z1)θ, (z2)θ, . . . , (zs)θ generate H . So H isfinitely generated.

Solution 5(c): Consider g = (g1, g2, . . . , gs) ∈ G1 ⊕ G2 ⊕ · · · ⊕ Gs . Then

ng ∈ n(G1 ⊕ G2 ⊕ · · · ⊕ Gs)

and also

ng = (ng1, ng2, . . . , ngs) ∈ nG1 ⊕ nG2 ⊕ · · · ⊕ nGs.

So the Z-modules n(G1 ⊕ G2 ⊕ · · · ⊕ Gs) and nG1 ⊕ nG2 ⊕ · · · ⊕ nGs areidentical.Consider g = (g1, g2, . . . , gs) ∈ G1 ⊕ G2 ⊕ · · · ⊕ Gs . Then

g ∈ (G1 ⊕ G2 ⊕ · · · ⊕ Gs)(n) ⇔ ng = 0 ⇔ ngi = 0 for 1 ≤ i ≤ s

⇔ gi ∈ (Gi)(n) for 1 ≤ i ≤ s.

So the Z-modules (G1 ⊕G2 ⊕· · ·⊕Gs)(n) and (G1)(n) ⊕ (G2)(n) ⊕· · ·⊕ (Gs)(n)

are equal.Solution 5(d): Consider m1 in (Zd)(n). Then mn1 = 0, the 0-element of Zd . As

1 has order d in the additive group (Zd ,+) we deduce d|mn from the dis-cussion preceding Theorem 2.5. Therefore d/gcd{n,d}|m(n/gcd{n,d}) andso d/gcd{n,d}|m as d/gcd{n,d} and n/gcd{n,d} are coprime integers. Som1 = q(d/gcd{n,d})1 where q ∈ Z showing that (d/gcd{n,d})1 generates theZ-module (Zd)(n).

Solution 7(a): (i) Let e denote the 1-element of R. Then a ≡ a for all a ∈ R as a = ae

and e ∈ U(R). Suppose a ≡ b; then a = bu for some u ∈ U(R); hence b ≡ a asu−1 ∈ U(R) and b = au−1. Suppose a ≡ b and b ≡ c where a, b, c ∈ R; thereare u,v ∈ U(R) with a = bu, b = cv; hence a ≡ c as a = (cv)u = c(vu) andvu ∈ U(R). So ≡ is an equivalence relation on R.


EXERCISES 3.2

Solution 5(b): Let g,g′ ∈ G. By Theorem 3.10 there are unique elementsgj , g

′j ∈ Gpj

(1 ≤ j ≤ k) with g = g1 +g2 +· · ·+gk and g′ = g′1 +g′

2 +· · ·+g′k .

Then

(g + g′)α =(

k∑

j=1

(gj + g′j )

)

α =k∑

j=1

(gj + g′j )αj =

k∑

j=1

((gj )αj + (g′j )αj )

=k∑

j=1

(gj )αj +k∑

j=1

(g′j )αj = (g)α + (g′)α

showing that α : G → G is a homomorphism. As α−1 = α−11 ⊕ α−1

2 ⊕ · · · ⊕ α−1k

we see that α is bijective and so α ∈ AutG. Consider β ∈ AutG. As (Gpj)β =

Gpjthe mapping βj : Gpj

→ Gpjdefined by (gj )βj = (gj )β for all gj ∈ Gpj

isan automorphism of Gpj

for 1 ≤ j ≤ k. Then

(g)β =(

k∑

j=1

gj

)

β =k∑

j=1

(gj )β =k∑

j=1

(gj )βj

= (g)(β1 ⊕ β2 ⊕ · · · ⊕ βk)

for all g ∈ G showing β = β1 ⊕β2 ⊕· · ·⊕βk . As βj is the restriction of β to Gpj

for 1 ≤ j ≤ k we see that the βj are uniquely determined by β . Hence the mappingAutG → AutGp1 × AutGp2 × · · · × AutGpk

, defined by β → (β1, β2, . . . , βk)

for all β ∈ AutG, is bijective. Let β,β ′ ∈ AutG. There are β ′j ∈ AutGpj

(1 ≤ j ≤ k) with β ′ = β ′1 ⊕ β ′

2 ⊕ · · · ⊕ β ′k . Then

(g)ββ ′ = ((g)β)β ′ =((

k∑

j=1

gj

)

β

)

β ′ =(

k∑

j=1

(gj )βj

)

β ′

=k∑

j=1

((gj )βj )β′j =

k∑

j=1

(gj )βjβ′j

showing that the correspondence β ↔ (β1, β2, . . . , βk) is a group homomor-phism. So AutG ∼= AutGp1 × AutGp2 × · · · × AutGpk

is a group isomorphism.Solution 6(a): Suppose H is indecomposable. Let H have t ′ invariant factors. Then

t ′ = 1 by Definition 3.8 as otherwise H = H1 ⊕ (H2 ⊕ · · · ⊕ Ht ′) with both H1

and H2 ⊕ · · · ⊕ Ht ′ non-trivial. So H is cyclic of isomorphism type Cd whered �= 1. Either d = 0 or d ≥ 2. In the latter case |H | = d is divisible by just oneprime, as otherwise the primary decomposition Theorem 3.10 of H would bea non-trivial decomposition contradicting the fact that H is indecomposable. Sod = pn where p is prime. Conversely suppose H has isomorphism type Cd where


either d = 0 or d = pn where p is prime. In the case d = 0 we see H ∼= Z, thatis, H is isomorphic to the additive group Z of integers; but Z (and hence H )is indecomposable by Exercises 2.2, Question 6(b). By the discussion followingCorollary 3.12, cyclic groups of prime power order d = pn are indecomposable.

Solution 6(b): The submodule Hi of the f.g. Z-module G is itself f.g. by Exer-cises 3.1, Question 5(b) for 1 ≤ i ≤ m. Let ri be the torsion-free rank of Hi and letthe torsion subgroup Ti of Hi have li elementary divisors. Then li + ri ≥ 1 and,using Theorem 3.4, Corollary 3.5 and Theorem 3.10 we see Hi is the direct sumof li + ri non-trivial indecomposable submodules for 1 ≤ i ≤ m. Substituting foreach Hi we obtain a decomposition G = H ′

1 ⊕H ′2 ⊕· · ·⊕H ′

m′ where m ≤ m′ andeach H ′

i′ is indecomposable for 1 ≤ i′ ≤ m′. By (a) above each H ′i′ has isomor-

phism type C0 or Cpn . The number of H ′i′ having isomorphism type C0 is r the

torsion-free rank of G. Now H ′i′ has isomorphism type Cpn if and only if pn is an

elementary divisor of the torsion subgroup T of G; so the number of such H ′i′ is l.

Therefore m′ = l + r and so m ≤ l + r . From above we see l1 + l2 + · · · + lm = l

as T1 ⊕T2 ⊕· · ·⊕Tm = T . Also r1 + r2 +· · ·+ rm = r on comparing torsion-freeranks in H1 ⊕ H2 ⊕ · · · ⊕ Hm = G.Suppose m = l+r . Then m = m′ which means li +ri = 1 for 1 ≤ i ≤ m. So eitherri = 1 or li = 1. In the first case Hi is indecomposable being of isomorphismtype C0. In the second case Hi is indecomposable being of isomorphism typeCpn . So m = l + r implies that Hi is indecomposable for 1 ≤ i ≤ m. Converselysuppose Hi is indecomposable for 1 ≤ i ≤ m. Then H ′

i = Hi for 1 ≤ i ≤ m andso m = m′ = l + r .

EXERCISES 3.3

Solution 1(a): For g1, g2 in G by Definition 2.3 we have

(g1 + g2)(α + α′) = (g1 + g2)α + (g1 + g2)α′

= (g1)α + (g2)α + (g1)α′ + (g2)α

′

= (g1)α + (g1)α′ + (g2)α + (g2)α

′

= (g1)(α + α′) + (g2)(α + α′)showing that α+α′ is an endomorphism of G. We verify the axioms of an additiveabelian group (Section 2.1). Consider α,α′, α′′ in EndG. Then (α + α′) + α′′ =α + (α′ + α′′) as

(g)((α + α′) + α′′) = ((g)α + (g)α′) + (g)α′′

= (g)α + ((g)α′ + (g)α′′) = (g)(α + (α′ + α′′))for all g ∈ G. The zero endomorphism 0 satisfies 0 + α = α as (g)(0 + α) =(g)0 + (g)α = 0 + (g)α = (g)α for all g ∈ G. Write −α : G → G where(g)(−α) = −(g)α for all g ∈ G. Then −α ∈ EndG as


(g1 + g2)(−α) = −(g1 + g2)α = −((g1)α + (g2)α)

= −(g1)α − (g2)α = (g1)(−α) + (g2)(−α)

for all g1, g2 ∈ G. Also −α + α = 0 since (g)(−α + α) = (g)(−α) + (g)α =−(g)α + (g)α = 0 = (g)0 for all g ∈ G. Finally α +α′ = α′ +α as (g)(α +α′) =(g)α + (g)α′ = (g)α′ + (g)α = (g)(α′ + α) for all g ∈ G. So (EndG,+) is anabelian group. As

(g)(α(α′ + α′′)) = ((g)α)(α′ + α′′) = ((g)α)α′ + ((g)α)α′′

= (g)αα′ + (g)αα′′ = (g)(αα′ + αα′′)

for all g ∈ G, the distributive law α(α′ + α′′) = αα′ + αα′′ holds.

EXERCISES 4.1

Solution 2(a): Consider f (x) = ∑aix

i and g(x) = ∑bix

i in F [x]. Then

(f (x) + g(x))εa =(∑

(ai + bi)xi)εa =

∑(ai + bi)a

i .

As ai , bi , ai belong to the field F we obtain∑

(ai + bi)ai =

∑aia

i +∑

biai

from the commutative and distributive laws. As∑

i≥0

aiai +

∑

i≥0

biai = f (a) + g(a) = (f (x))εa + (g(x))εa

we obtain

(f (x) + g(x))εa = (f (x))εa + (g(x))εa,

showing that εa is additive. Similarly

(f (x)g(x))εa =(∑

(a0bi + a1bi−1 + · · · + aib0)xi)εa

=∑

(a0bi + a1bi−1 + · · · + aib0)ai

which is the result of collecting together terms involving ajak where j + k = i inthe product

(∑aja

j)(∑

bkak)

= f (a)g(a) = (f (x))εa(g(x))εa.

So (f (x)g(x))εa = (f (x))εa(g(x))εa showing that εa is multiplicative. For allc ∈ F we have (c)εa = c showing that εa : F [x] → F is surjective and εa mapsthe 1-element of F [x] to the 1-element of F . Therefore εa is a surjective ringhomomorphism.


Solution 2(b): Suppose h(x) ∈ 〈f (x), g(x)〉. There are a(x), b(x) ∈ F [x] with

h(x) = a(x)f (x) + b(x)g(x).

Hence

h(x) = (a(x)q(x) + b(x))g(x) + a(x)r(x) ∈ 〈g(x), r(x)〉showing 〈f (x), g(x)〉 ⊆ 〈g(x), r(x)〉. As r(x) = f (x) − q(x)g(x) we obtain〈g(x), r(x)〉 ⊆ 〈f (x), g(x)〉 and so 〈f (x), g(x)〉 = 〈g(x), r(x)〉. Comparingmonic generators of these ideals of F [x] gives gcd{f (x), g(x)} =gcd{g(x), r(x)} by Theorem 4.4. Write d(x) = gcd{d1(x), d2(x), . . . , dt (x)},d ′(x) = gcd{d1(x),gcd{d2(x), . . . , dt (x)}}. Then d(x)|d1(x) and d(x)|di(x) for2 ≤ i ≤ t . So d(x)|d1(x) and d(x)|gcd{d2(x), . . . , dt (x)} which combine to gived(x)|d ′(x). Conversely d ′(x)|d1(x) and d ′(x)|gcd{d2(x), . . . , dt (x)} which com-bine to give d ′(x)|d1(x) and d ′(x)|di(x) for 2 ≤ i ≤ t . So d ′(x)|d(x). Therefored(x) = d ′(x).

Solution 2(c): Consider (ai) and (bi) in P(R). There are non-negative integers m andn with ai = 0 for all i > m and bi = 0 for all i > n; the least such integers aredenoted deg(ai) and deg(bi), the degrees of the non-zero sequences (ai) and (bi)

respectively. Then ai + bi = 0 for i > max{m,n} and∑

j+k=i

aj bk = a0bi + a1bi−1 + · · · + aib0 = 0

for i > m + n as each term in the sum is zero. So (ai) + (bi), (ai)(bi) ∈ P(R).As the elements of R form an additive abelian group and addition of sequences iscarried out entry-wise, the elements of P(R) also form an additive abelian group:e.g. (ai)+ (bi) = (ai + bi) = (bi + ai) = (bi)+ (ai) shows addition of sequencesto be commutative. The zero sequence (0) having all entries zero is the 0-elementof P(R). Consider (ai), (bi), (ci) ∈ P(R). As the elements of R obey the ringlaws we obtain

∑

l+k=t

( ∑

i+j=l

aibj

)ck =

∑

i+j+k=t

(aibj )ck =∑

i+j+k=t

ai(bj ck)

=∑

i+s=t

ai

( ∑

j+k=s

bj ck

)

showing ((ai)(bi))(ci) = (ai)((bi)(ci)) as these sequences have the same en-try t , i.e. multiplication in P(R) is associative. Similarly

∑j+k=i (aj + bj )ck =∑

j+k=i aj ck + ∑j+k=i bj ck shows that entry i in the sequence ((ai) + (bi))(ci)

is equal to entry i in the sequence (ai)(ci) + (bi)(ci) for all i ≥ 0. So((ai) + (bi))(ci) = (ai)(ci) + (bi)(ci) showing that the right distributive lawholds in P(R). In the same way the left distributive law (ai)((bi) + (ci)) =


(ai)(bi) + (ai)(ci) holds in P(R). The sequence e0 = (1,0,0, . . . ,0, . . .) is the1-element of P(R) as e0(ai) = (ai) = (ai)e0. So P(R) is a ring. Let a0, b0 ∈ R.Then (a0 + b0)ι

′ = (a0 + b0,0,0, . . .) = (a0,0,0, . . .) + (b0,0,0, . . .) =(a0)ι

′ + (b0)ι′, (a0b0)ι

′ = (a0b0,0,0, . . .) = (a0,0,0, . . .)(b0,0,0, . . .) =(a0)ι

′(b0)ι′ and (1)ι′ = e0 showing that ι′ : R → P(R) is a ring homomorphism.

As (a0)ι′ = (b0)ι

′, i.e. (a0,0,0, . . .) = (b0,0,0, . . .) implies a0 = b0 we see that ι′is injective. Hence a0 → (a0)ι

′ is a ring isomorphism between R and im ι′ = R′,showing that R′ is a subring of P(R) (Exercises 2.3, Question 3(b)) and R′ ∼= R.Let R be an integral domain, i.e. R is commutative, non-trivial and has no zero-divisors. Consider (ai), (bi) ∈ P(R). Then

∑

j+k=i

aj bk =∑

k+j=i

bkaj

showing (ai)(bi) = (bi)(ai), i.e. the ring P(R) is commutative. As R′ is non-trivial, being isomorphic to R, we see that P(R) is also non-trivial, as it con-tains the subring R′. Suppose (ai) �= (0), (bi) �= (0). Then

∑j+k=i aj bk = 0 for

i > m + n where m = deg(ai), n = deg(bi) as each term in the sum is zero.But

∑j+k=m+n ajbk = ambn �= 0, only one term in the sum being non-zero

as R has no zero-divisors. So (ai)(bi) �= 0, showing that P(R) has no zero-divisors and in fact deg(ai)(bi) = m + n = deg(ai) + deg(bi). So P(R) is anintegral domain. Conversely P(R) an integral domain implies that its subring R′is an integral domain and hence R is an integral domain as R ∼= R′. Using themultiplication rule in P(R) we obtain (a0)ι

′x = (a0,0,0, . . .)(0,1,0,0, . . .) =(0, a0,0,0, . . .) = (0,1,0,0, . . .)(a0,0,0, . . .) = x(a0)ι

′ for all a0 ∈ R. By con-vention x0 = e0 the 1-element of P(R), and x1 = x = e1. Suppose xi−1 = ei−1

for some i > 1. Then xi = xxi−1 = e1ei−1 and using the multiplication rule inP(R) we see e1ei−1 = ei , showing xi = ei and completing the induction. Hence(0,0, . . . ,0, ai,0,0, . . .) = (ai,0,0, . . .)ei = (ai)ι

′xi and so (a0, a1, . . . , ai, . . .) =∑i≥0(ai)ι

′xi which is a polynomial in the indeterminate x over R′.Solution 3(c): First note that b = 0 is a zero of xqn − x. For b �= 0 we see b be-

longs to the multiplicative group E∗ of non-zero elements of the field E. As E

is an n-dimensional vector space over F we see that |E| = |F |n = qn and so|E∗| = qn − 1. By the |G|-lemma, in multiplicative notation, bqn−1 = 1 and sobqn − b = 0 showing that b is a zero of xqn − x. The qn elements of E are there-fore the qn zeros of xqn − x. So x − b is a monic, irreducible over E, factor ofxqn − x for each b ∈ E. Hence xqn − x = ∏

b∈E(x − b) is the factorisation ofxqn − x into monic irreducible polynomials over E. So xqn − x splits over E intodistinct monic factors.Write d(x) = gcd{p(x), xqn − x}. As p(x) and xqn − x are polynomials over F ,d(x) is also a polynomial over F . Either d(x) = p(x) or d(x) = 1 as p(x)

is monic and irreducible over F and d(x)|p(x). Let c = 〈p(x)〉 + x ∈ E.


The discussion following Theorem 4.9 shows p(c) = 0 and cqn − c = 0 fromabove. Suppose d(x) = 1. By Corollary 4.6 there are a1(x), a2(x) ∈ F [x] witha1(x)p(x) + a2(x)(xqn − x) = 1. Applying εc gives 0 = a1(c)0 + a2(c)0 =a1(c)p(c) + a2(c)(c

qn − c) = 1 which is a contradiction, showing d(x) = p(x)

and so p(x)|(xqn − x). As xqn − x splits over E into a product of distinct factorsso also does its divisor p(x). Substituting p′(x) for p(x) in the foregoing part ofthe question we see that p′(x)|(xqn − x) and so there is c′ ∈ E with p′(c′) = 0.By Theorem 4.4 εc′ : F [x] → E has kernel 〈p′(x)〉. As 1, c′, (c′)2, . . . , (c′)n−1 isa basis of the n-dimensional vector space E over F we see εc′ is surjective. Henceεc′ : F [x]/〈p′(x)〉 ∼= E where (〈p′(x)〉 + f (x))εc′ = f (c′) for all f (x) ∈ F [x].Let p(x) be irreducible over F and have zero c in an extension field E of F . Asabove p(x)|(xqn − x) and so p(x) has no squared factor of positive degree, i.e. itis impossible for an irreducible polynomial over a finite field to have a repeatedzero in an extension field.

Solution 4(c): Consider e = a + bc, e′ = a′ + b′c ∈ F(c). Then

(e + e′)θ = ((a + a′) + (b + b′)c)θ

= a + a′ − (b + b′)a1 − (b + b′)c

= (a − ba1 − bc) + (a′ − b′a1 − b′c)

= (a + bc)θ + (a′ + b′c)θ = (e)θ + (e′)θshowing that θ respects addition. Now p(c) = 0 gives c2 = −a0 − a1c. Therefore

(ee′)θ = ((a + bc)(a′ + b′c))θ

= (aa′ − bb′a0 + (ab′ + ba′ − bb′a1)c)θ

= aa′ − bb′a0 − (ab′ + ba′ − bb′a1)a1

− (ab′ + ba′ − bb′a1)c.

But

(e)θ(e′)θ = (a − ba1 − bc)(a′ − b′a1 − b′c)

= (a − ba1)(a′ − b′a1) − bb′a0

− ((a − ba1)b′ + b(a′ − b′a1) + bb′a1)c.

Comparison of the above expressions gives (ee′)θ = (e)θ(e′)θ showing that θ

respects multiplication. As (a)θ = a for all a ∈ F we see (1)θ = 1 showing thatθ respects the 1-element of F which is also the 1-element of F(c). Finally

(e)θ2 = ((e)θ)θ = (a + bc)θ = (a − ba1 − bc)θ

= a − ba1 − (−b)a1 − (−b)c = a + bc = e

for all e ∈ F(c) and so θ is self-inverse. So θ is an automorphism of F(c).


Solution 8(d): To show α is unambiguously defined (u.d.), consider f1(x), f2(x) ∈F [x] such that 〈g(x)h(x)〉 + f1(x) = 〈g(x)h(x)〉 + f2(x). Then f1(x) − f2(x) ∈〈g(x)h(x)〉 and so g(x)h(x)|(f1(x) − f2(x)). Hence g(x)|(f1(x) − f2(x)) andh(x)|(f1(x) − f2(x)), i.e. f1(x) − f2(x) ∈ 〈g(x)〉 and f1(x) − f2(x) ∈ 〈h(x)〉.Therefore 〈g(x)〉 + f1(x) = 〈g(x)〉 + f2(x) and 〈h(x)〉 + f1(x) =〈h(x)〉 + f2(x) showing that α is u.d. as we set out to prove. Consider elementsa, a′ ∈ F [x]/〈g(x)h(x)〉. There are f (x), f ′(x) ∈ F [x] with a =〈g(x)h(x)〉 + f (x) and a′ = 〈g(x)h(x)〉 + f ′(x). Then

(a + a′)α = (〈g(x)h(x)〉 + f (x) + f ′(x))α

= (〈g(x)〉 + f (x) + f ′(x), 〈h(x)〉 + f (x) + f ′(x))

= (〈g(x)〉 + f (x), 〈h(x)〉 + f (x))

+ (〈g(x)〉 + f ′(x), 〈h(x)〉 + f ′(x))

= (a)α + (a′)α

and

(aa′)α = (〈g(x)h(x)〉 + f (x)f ′(x))α

= (〈g(x)〉 + f (x)f ′(x), 〈h(x)〉 + f (x)f ′(x))

= (〈g(x)〉 + f (x), 〈h(x)〉 + f (x))

· (〈g(x)〉 + f ′(x), 〈h(x)〉 + f ′(x))

= (a)α(a′)α

showing that α respects addition and multiplication. As (〈g(x)h(x)〉 + 1)α =(〈g(x)〉 + 1, 〈h(x)〉 + 1) we see that α respects 1-elements. So α is a ring ho-momorphism. Suppose gcd{g(x),h(x)} = 1. To show that α is injective suppose(a)α = (a′)α. Using the above notation we obtain

(〈g(x)〉 + f (x), 〈h(x)〉 + f (x)) = (〈g(x)〉 + f ′(x), 〈h(x)〉 + f ′(x))

which gives 〈g(x)〉+f (x) = 〈g(x)〉+f ′(x) and 〈h(x)〉+f (x) = 〈h(x)〉+f ′(x).Therefore f (x) − f ′(x) is divisible by g(x) and by h(x). So f (x) − f ′(x) isdivisible by g(x)h(x) giving a = 〈g(x)h(x)〉 + f (x) = 〈g(x)h(x)〉 + f ′(x) = a′.So α is injective. Consider a typical element (〈g(x)〉 + s(x), 〈h(x)〉 + t (x)) of(F [x]/〈g(x)〉) ⊕ (F [x]/〈h(x)〉). By Corollary 4.6 there are a(x), b(x) ∈ F [x]with a(x)g(x) + b(x)h(x) = 1. Let r(x) = t (x)a(x)g(x) + s(x)b(x)h(x). Then

r(x) ≡ s(x) (mod g(x)) and r(x) ≡ t (x) (mod h(x))


and so (〈g(x)〉 + s(x), 〈h(x)〉 + t (x)) = (〈g(x)h(x)〉 + r(x))α showing α surjec-tive. The conclusion is:

α : F [x]/〈g(x)h(x)〉 ∼= (F [x]/〈g(x)〉) ⊕ (F [x]/〈h(x)〉).

EXERCISES 5.1

Solution 2(a): Write xI − A = (bij ). The entries in xI − A are x − aii = bii of de-gree 1 over R and −aij = bij for i �= j is a constant polynomial over R. So(signπ)b1(1)πb2(2)π · · ·bt(t)π has degree < t − 1 over R unless (i)π = i for atleast t − 1 integers i ∈ {1,2, . . . , t}. In this case (i)π = i for all i ∈ {1,2, . . . , t}as π is a permutation of {1,2, . . . , t}, i.e. π = ι, the identity. Therefore the co-efficient of xt−1 in χA(x) is the coefficient of xt−1 in (sign ι)b11b22 · · ·btt =(x − a11)(x − a22) · · · (x − att ) which is −a11 − a22 − · · · − att = − traceA.The constant term in χA(x) is χA(0) = (χA(x))ε0 where ε0 : R[x] → R is ‘eval-uation at 0’. As det(xI − A) ∈ R[x] we see (χA(x))ε0 = (det(xI − A))ε0 =det(0I − A) = det(−A) = (−1)t detA since −A is the result of changing thesign of each of the t rows of A.

Solution 2(b): Consider a permutation π of {1, . . . , t1, t1 + 1, . . . , t1 + t2} and sup-pose first there is i ∈ {t1 + 1, . . . , t1 + t2} with (i)π /∈ {t1 + 1, . . . , t1 + t2}. The(i, (i)π)-entry in A is zero and so (signπ)×a1(1)πa2(2)π · · ·at(t)π = 0. Now sup-pose π satisfies i ∈ {t1 + 1, . . . , t1 + t2} ⇒ (i)π ∈ {t1 + 1, . . . , t1 + t2} and so, asπ is a permutation, i ∈ {1, . . . , t1} ⇒ (i)π ∈ {1, . . . , t1}. Let π1 be the restrictionof π to {1, . . . t1}; then π1 is a permutation of {1, . . . , t1}. Let π2 be the restrictionof π to {t1 + 1, . . . , t1 + t2}; then π2 is a permutation of {t1 + 1, . . . , t1 + t2}. Alsosignπ = (signπ1)(signπ2). Conversely each pair of permutations π1,π2 as abovearises from a unique permutation π . So detA = ∑

π (signπ)a1(1)πa2(2)π · · ·at(t)π

where the summation is restricted to π with i ∈ {t1 + 1, . . . , t1 + t2} ⇒(i)π ∈ {t1 + 1, . . . , t1 + t2} all other terms being zero. Hence

detA =∑

π1,π2

(signπ1)

· (signπ2)a1(1)π2 · · ·at1(t1)π1at1+1(t1+1)π2 · · ·at2(t2)π2

=(∑

π1

(signπ1)a1(1)π1 · · ·at1(t1)π1

)

·(∑

π2

(signπ2)at1+1(t1+1)π2 · · ·at2(t2)π2

)

= (detA1)(detA2).

Taking B = 0 we obtain A = A1 ⊕ A2. So det(A1 ⊕ A2) = (detA1)(detA2).


Solution 2(c): Let

X =(

0 It2

It1 0

)

,

that is, X is a partitioned (t2 + t1)× (t1 + t2) matrix with entries the t1 × t1 identitymatrix It1 and the t2 × t2 matrix It2 as indicated together with rectangular zero ma-trices. Then detX = (−1)t1t2 as the operation of t1t2 interchanges of consecutiverows changes X into It . So X is invertible over F . Also

X(A1 ⊕ A2) =(

0 It2

It1 0

)(A1 0

0 A2

)

=(

0 A2

A1 0

)

=(

A2 0

0 A1

)(0 It2

It1 0

)

= (A2 ⊕ A1)X.

Therefore X(A1 ⊕ A2)X−1 = A2 ⊕ A1 showing A1 ⊕ A2 ∼ A2 ⊕ A1.

Solution 2(d): Suppose f (x) = a, i.e. f (x) is a constant polynomial over R. Thenf (A1 ⊕ A2) = a(A1 ⊕ A2) = (aA1) ⊕ (aA2) = f (A1) ⊕ f (A2). Let degf (x) =t > 0 and suppose the equation g(A1 ⊕ A2) = g(A1) ⊕ g(A2) holds for allg(x) ∈ R[x] with degg(x) < t . Then f (x) = axt + g(x) and so

f (A1 ⊕ A2) = a(A1 ⊕ A2)t + g(A1 ⊕ A2)

= a(At1 ⊕ At

2) + g(A1) ⊕ g(A2)

as ⊕ respects matrix multiplication. As ⊕ respects matrix addition we obtain

f (A1 ⊕ A2) = (aAt1) ⊕ (aAt

2) + g(A1) ⊕ g(A2)

= (aAt1 + g(A1)) ⊕ (aAt

2 + g(A2)) = f (A1) ⊕ f (A2)

which completes the induction and the proof in the case s = 2. Suppose s > 2and inductively f (A1 ⊕ A2 ⊕ · · · ⊕ As−1) = f (A1) ⊕ f (A2) ⊕ · · · ⊕ f (As−1).Using the case s = 2 with A1 and A2 replaced by A1 ⊕ A2 ⊕ · · · ⊕ As−1 and As

respectively the proof is finished by

f (A1 ⊕ A2 ⊕ · · · ⊕ As−1 ⊕ As)

= f (A1 ⊕ A2 ⊕ · · · ⊕ As−1) ⊕ f (As)

= f (A1) ⊕ f (A2) ⊕ · · · ⊕ f (As−1) ⊕ f (As).

Solution 2(e): Modify the solution of Exercises 4.1, Question 2(a).Solution 3(d): Let f1(x), f2(x) ∈ K . Then (v)f1(α) = 0 and (v)f2(α) = 0. Adding

gives (v)(f1(α) + f2(α)) = (v)f1(α) + (v)f2(α) = 0 + 0 = 0 which showsf1(x) + f2(x) ∈ K . Also −f1(x) ∈ K as (v)(−f1(α)) = −(v)f1(α) = −0 = 0.


As 0(x)v = 0 the zero polynomial 0(x) belongs to K . Therefore K is a sub-group of the additive group of F [x]. For f (x) ∈ F [x] we have f (x)f1(x) ∈ K

as (v)f (α)f1(α) = (v)f1(α)f (α) = (0)f (α) = 0. So K is an ideal of F [x]by Definition 4.3. Suppose V is t-dimensional. Then v, (v)α, (v)α2, . . . , (v)αt

are t + 1 vectors of V . These vectors are linearly dependent and so there area0, a1, . . . , at ∈ F , not all zero, with a0v + a1(v)α + · · · + at (v)αt = 0, i.e.(v)f0(α) = 0 where f0(x) = a0 + a1x + · · · + atx

t �= 0(x) and f0(x) ∈ K . SoK is a non-zero ideal of F [x].

Solution 3(e): Suppose f (x) = f ′(x). Then f (x) − f ′(x) = q(x)m(x) for someq(x) ∈ F [x]. So f (A)−f ′(A) = q(A)m(A) = q(A)×0 = 0, i.e. f (A) = f ′(A).Therefore f (x)v = vf (A) = vf ′(A) = f ′(x)v showing that the product f (x)v isunambiguously defined. As M ′ = F t as sets and addition in M ′ is the usual vec-tor addition, we see module laws 1, 2, 3 and 4 (stated before Definition 2.19)are obeyed by M ′. For f (x), f1(x), f2(x) ∈ F [x] and v, v1, v2 ∈ M ′ we havef (x)(v1 + v2) = (v1 + v2)f (A) = v1f (A) + v2f (A) = f (x)v1 + f (x)v2 and

(f1(x) + f2(x))v = (f1(x) + f2(x))v = v(f1(A) + f2(A))

= vf1(A) + vf2(A) = f1(x)v + f2(x)v

showing that M ′ obeys module law 5. Also

(f1(x)f2(x))v = f1(x)f2(x)v = vf1(A)f2(A)

= vf2(A)f1(A) = (f2(x)v)f1(A) = f1(x)(f2(x)v)

showing that M ′ obeys module law 6. The 1-element of F [x]/〈m(x)〉 is 1(x)

and 1(x)v = v1(A) = vI = v for all v ∈ M ′ showing that M ′ obeys modulelaw 7. So M ′ is an F [x]/〈m(x)〉-module. Suppose m(x) irreducible over F

and let r = degm(x). Then F ′ = F [x]/〈m(x)〉 is a field and so M ′ is a vec-tor space over F ′. Consider v ∈ M ′. There are fj (x) ∈ F [x] for 1 ≤ j ≤ t ′

with v = ∑t ′j=1 fj (x)vj as v1, v2, . . . , vt ′ span M ′ over F ′. Further we may

assume degfj (x) < r . Write fj (x) = ∑r−1i=0 ajix

i . Substituting for each fj (x)

gives v = ∑i,j ajixivj where the summation is for 0 ≤ i < r , 1 ≤ j ≤ t ′. The

rt ′ vectors xivj span M ′ over F as aji ∈ F . Suppose∑

i,j ajixivj = 0. As

v1, v2, . . . , vt ′ are linearly independent over F ′ we deduce fj (x) = 0(x), i.e.

m(x)|fj (x). So fj (x) = 0(x) for 1 ≤ j ≤ t ′. The rt ′ vectors xivj are linearlyindependent and so form an F -basis of F t . Hence t/t ′ = r = degm(x).

Solution 5: Consider (u)β = (a1, a2, . . . , as) ∈ F s , (u′)β = (a′1, a

′2, . . . , a

′s) ∈ F s for

u,u′ ∈ N which gives

u = a1u1 + a2u2 + · · · + asus and u′ = a′1u1 + a′

2u2 + · · · + a′sus.


So u + u′ = (a1 + a′1)u1 + (a2 + a′

2)u2 + · · · + (as + a′s)us which shows

(u + u′)β = (a1 + a′1, a2 + a′

2, . . . , as + a′s) = (u)β + (u′)β , i.e. β is additive.

For a ∈ F we have au = aa1u1 + aa2u2 +· · ·+ aasus showing (au)β = a((u)β)

and so β is F -linear. Also β is bijective as u1, u2, . . . , us is an F -basis of N . Soβ : N ∼= F s is a vector space isomorphism with (ui)β = ei for 1 ≤ i ≤ s.Write B = (bij ) for 1 ≤ i, j ≤ s. Then

(xui)β = ((ui)α)β = (bi1u1 + bi2u2 + · · · + bisus)β

= (bi1, bi2, . . . , bis) = eiB = ((ui)β)B = x((ui)β)

for 1 ≤ i ≤ s. Using the F -linearity of β we see (xu)β = x((u)β) for all u ∈ N .By Lemma 5.15 with θ = β we conclude β : N ∼= M(B) is an isomorphism ofF [x]-modules.

EXERCISES 5.2

Solution 1(d): Suppose C(f (x)) ∼ D where D = diag(λ1, λ2, . . . , λt ). Comparingcharacteristic polynomials using Lemma 5.5 and Theorem 5.26 gives f (x) =(x − λ1)(x − λ2) · · · (x − λt ). Suppose λi = λj for i, j with 1 ≤ i < j ≤ t . ThenN = 〈ei, ej 〉 is a non-cyclic submodule of M(D): for all v ∈ N we have xv =x(aiei + aj ej ) = (aiei + aj ej )D = aiλiei + ajλj ej = λi(aiei + aj ej ) = λiv

showing that v does not generate the 2-dimensional subspace N . But M(D) iscyclic being isomorphic to the cyclic F [x]-module C(f (x)) by Theorem 5.13.So N is cyclic by Theorem 5.28. This contradiction shows λi �= λj . Thereforef (x) has t distinct zeros in F . Conversely suppose f (x) has t distinct zerosλ1, λ2, . . . , λt in F . So λ1, λ2, . . . , λt are t distinct eigenvalues of C(f (x)) byTheorem 5.26. Denote by vi a row eigenvector of C(f (x)) corresponding to λi

for 1 ≤ i ≤ t . Let X be the t × t matrix with eiX = vi for 1 ≤ i ≤ t . Then X is in-vertible over F(λ1, λ2, . . . , λt distinct implies v1, v2, . . . , vt linearly independent)and XC(f (x))X−1 = diag(λ1, λ2, . . . , λt ).

Solution 3(a): Write d0(x) = gcd{f (x), d(x)}. Then (d(x)/d0(x))f (x)v =(f (x)/d0(x))d(x)v = (f (x)/d0(x))0 = 0 which shows that the monic polyno-mial d(x)/d0(x) belongs to the order ideal K of f (x)v in M . So K is non-zeroand so has a unique monic generator g(x) by Theorem 4.4. By Definition 5.11g(x) is the order of f (x)v in M and g(x)|d(x)/d0(x). But g(x)f (x)v = 0shows g(x)f (x) belongs to the order ideal 〈d(x)〉 of v in M . So g(x)f (x) =h(x)d(x) where h(x) ∈ F [x]. Hence g(x)(f (x)/d0(x)) = h(x)(d(x)/d0(x)). Asgcd{f (x)/d0(x), d(x)/d0(x)} = 1 we conclude d(x)/d0(x)|g(x). Therefore

g(x) = d(x)/d0(x),

i.e. f (x)v has order d(x)/gcd{f (x), d(x)} in M .


Solution 4(a): Consider f1(x), f2(x) ∈ KN . Then f1(x)v0 ∈ N and f2(x)v0 ∈ N . AsN is closed under addition we see (f1(x) + f2(x))v0 = f1(x)v0 + f2(x)v0 ∈ N ,i.e. f1(x) + f2(x) ∈ KN showing KN to be closed under addition. As 0(x)v0 =0 ∈ N we see 0(x) ∈ KN , i.e. KN contains the zero polynomial. As N isclosed under negation we have (−f1(x))v0 = −f1(x)v0 ∈ N showing that−f1(x) ∈ KN , i.e. KN is closed under negation. As N is closed under polynomialmultiplication we obtain (g(x)f1(x))v0 = g(x)(f1(x)v0) ∈ N , i.e. g(x)f1(x) ∈KN showing KN to be closed under polynomial multiplication. So KN is an idealDefinition 4.3 of F [x].As d0(x)v0 = 0 ∈ N we see d0(x) ∈ KN and so 〈d0(x)〉 ⊆ KN . Suppose KN =〈d0(x)〉 and v ∈ 〈v0〉 ∩ N . Then v = f (x)v0 ∈ N for some f (x) ∈ F [x]. There-fore f (x) ∈ KN and so d0(x)|f (x). Hence f (x)v0 = 0, i.e. 〈v0〉 ∩ N = 0. Con-versely suppose 〈v0〉 ∩ N = 0 and consider f (x) ∈ KN . So f (x)v0 ∈ N . Asf (x)v0 ∈ 〈v0〉 we see f (x)v0 = 0, i.e. f (x) ∈ 〈d0(x)〉 and so KN = 〈d0(x)〉. Sup-pose N1 ⊆ N2 and consider f (x) ∈ KN1 . Then f (x)v0 ∈ N1 and so f (x)v0 ∈ N2.Therefore f (x) ∈ KN2 showing KN1 ⊆ KN2 . Suppose M = 〈v0〉 and KN1 ⊆ KN2 .Let v ∈ N1. There is f (x) ∈ F [x] with v = f (x)v0. Therefore f (x) ∈ KN1 andso f (x) ∈ KN2 also. This means f (x)v0 ∈ N2, i.e. v ∈ N2. So N1 ⊆ N2.

Solution 6(d): The formula detT = (−1)t (t−1)/2 holds for t = 1. Take t ≥ 2 and sup-pose inductively that the t × t matrix

T =(

0 1

T ′ 0

)

is such that detT ′ = (−1)(t−1)(t−2)/2. The inductive step is completed by detT =(−1)t−1 detT ′ = (−1)t−1(−1)(t−1)(t−2)/2 = (−1)t−1+(t−1)(t−2)/2 = (−1)t (t−1)/2.

EXERCISES 6.1

Solution 4(b):

Rf C(f (x)) =

⎛

⎜⎜⎝

a1 a2 a3 1a2 a3 1 0a3 1 0 01 0 0 0

⎞

⎟⎟⎠

⎛

⎜⎜⎝

0 1 0 00 0 1 00 0 0 1

−a0 −a1 −a2 −a3

⎞

⎟⎟⎠

=

⎛

⎜⎜⎝

−a0 0 0 00 a2 a3 10 a3 1 00 1 0 0

⎞

⎟⎟⎠

which is symmetric. As Rf is invertible and symmetric we obtain

Rf C(f (x)) = (Rf C(f (x)))T = C(f (x))T RTf = C(f (x))T Rf

and so Rf C(f (x))R−1f = C(f (x))T .


Let f (x) = a0 + a1x + a2x2 + · · · + at−1x

t−1 + atxt where at = 1. Let Rf de-

note the t × t matrix over F with (i, j)-entry ai+j−1 for 1 ≤ i + j − 1 ≤ t and(i, j)-entry 0 for i + j − 1 > t . Write g(x) = (f (x) − a0)/x = a1 + a2x + · · · +at−1x

t−2 + atxt−1. Then

Rf C(f (x)) =(

a 1

Rg 0T

)(0 I

−a0 −a

)

where a = (a1, a2, . . . , at−1),0 is the 1 × (t − 1) zero matrix and I is the(t − 1) × (t − 1) identity matrix. Therefore

Rf C(f (x)) =(

−a0 0

0T Rg

)

on multiplying out the indicated partitioned matrices. So Rf C(f (x)) is symmet-ric as Rg is symmetric. As detRf = (−1)t (t−1)/2 (use induction on t here) we seethat Rf is invertible and so Rf C(f (x))R−1

f = C(f (x))T as before.Solution 6(a): Suppose μA(x) = μB(x) = x − c for some c ∈ F . Then μA(A) = 0

gives A − cI = 0, i.e. A = cI . In the same way μB(B) = 0 gives B = cI and soA ∼ B as A = B in this case. Suppose μA(x) = μB(x) �= x − c for any c ∈ F .As μA(x)|χA(x) and degχA(x) = 2 we see μA(x) = χA(x). By Corollary 6.10there is v0 having order μA(x) in M(A). So M(A) = 〈v0〉 and A ∼ C(μA(x)) byCorollary 5.27. For the same reason B ∼ C(μB(x)) and so A ∼ B as C(μA(x)) =C(μB(x)).

Solution 8(b): Consider the t-tuples v(x) = (f1(x), f2(x), . . . , ft (x)) ∈ F [x]t andv′(x) = (f ′

1(x), f ′2(x), . . . , f ′

t (x)) ∈ F [x]t and let f (x) ∈ F [x]. Then θA is ad-ditive as

(v(x) + v′(x))θA =t∑

i=1

(fi(x) + f ′i (x))ei

=t∑

i=1

fi(x)ei +t∑

i=1

f ′i (x)ei = (v(x))θA + (v′(x))θA

and

(f (x)v(x))θA =t∑

i=1

(f (x)fi(x))ei = f (x)

(t∑

i=1

fi(x)ei

)

= f (x)((v(x))θA)

using the module laws which hold in M(A) by Lemma 5.7 and Definition 5.8. SoθA is F [x]-linear.


Solution 8(c): Take d(x) = d1(x) and consider the isomorphism α| : M(d1)∼= M ′

(d1).

Then (F [x]/〈di(x)〉)(d1(x))∼= F [x]/〈d1(x)〉 since gcd{d1(x), di(x)} = d1(x) for

1 ≤ i ≤ s. From M ∼= F [x]/〈d1(x)〉 ⊕ F [x]/〈d2(x)〉 ⊕ · · · ⊕ F [x]/〈ds(x)〉 wededuce

M(d1(x))∼= (F [x]/〈d1(x)〉)(d1(x)) ⊕ (F [x]/〈d2(x)〉)(d1(x)) ⊕ · · ·

⊕ (F [x]/〈ds(x)〉)(d1(x))

∼= F [x]/〈d1(x)〉 ⊕ F [x]/〈d1(x)〉 ⊕ · · · ⊕ F [x]/〈d1(x)〉= (F [x]/〈d1(x)〉)s .

The F [x]-module M(d1(x)) is therefore isomorphic to the free Fx]/〈d1(x)〉-module (F [x]/〈d1(x)〉)s of rank s. By Lemma 2.25 both M(d1(x)) and M ′

(d1(x))

are free F [x]-modules of rank s. Combining

(F [x]/〈d ′i (x)〉)(d1(x))

∼= F [x]/〈gcd{d1(x), d ′i (x)}〉 for 1 ≤ i ≤ s′

and

M ′ ∼= F [x]/〈d ′1(x)〉 ⊕ F [x]/〈d ′

2(x)〉 ⊕ · · · ⊕ F [x]/〈d ′s′(x)〉

gives M ′(d1(x))

∼= ⊕∑s′i=1 F [x]/〈gcd{d1(x), d ′

i (x)}〉 showing that M ′(d1(x)) is the

direct sum of s′ cyclic submodules and so is generated by s′ of its elements.From Theorem 2.20 we deduce s′ ≥ s. As α−1 : M ′ ∼= M the preceding the-ory ‘works’ with M and M ′ interchanged. Using α−1| : M ′

(d ′1(x))

∼= M(d ′1(x)) the

F [x]/〈d ′1(x)〉-module M ′

(d ′1(x))

is isomorphic to the free F [x]/〈d ′1(x)〉-module

(F [x]/〈d ′1(x)〉)s′

of rank s′. Hence M(d ′1(x))

∼= ⊕∑si=1 F [x]/〈gcd{d ′

1(x), di(x)}〉is a free F [x]/〈d ′

1(x)〉-module of rank s′ and is generated by s of its ele-ments. Therefore s ≥ s′ by Theorem 2.20 and so s = s′. From Lemma 2.18 andCorollary 2.21 the s generators of M ′

(d1(x)) form an F [x]/〈d1(x)〉-basis of theF [x]/〈d1(x)〉-module M ′

(d1(x)) (Exercises 2.3, Question 7(b)). Therefore each ofthese s generators has order 0 in the F [x]/〈d1(x)〉-module M ′

(d1(x)) and orderd1(x) in the F [x]-module M ′

(d1(x)). From the first of these generators we deduce

gcd{d1(x), d ′1(x)} = d1(x) showing d1(x)|d ′

1(x). Interchanging the roles of M

and M ′ gives d ′1(x)|d1(x) and so d1(x) = d ′

1(x). Let m1 denote the number of i

with di(x) = d1(x) and let m′1 denote the number of i with d ′

i (x) = d1(x). Asd1(x)(F [x]/〈di(x)〉) ∼= F [x]/〈di(x)/gcd{d1(x), di(x)}〉 = F [x]/〈di(x)/d1(x)〉we obtain d1(x)M ∼= ∑

m1<i≤s ⊕F [x]/〈di(x)/d1(x)〉. So d1(x)M is the directsum of s − m1 non-trivial cyclic submodules. As di(x)/d1(x)|dj (x)/d1(x) form1 < i ≤ j ≤ s this decomposition of d1(x)M is again as in Theorem 6.6. In thesame way d1(x)M ′ ∼= ∑

m′1<i≤s ⊕F [x]/〈d ′

i (x)/d1(x)〉 which is a decomposition

of d1(x)M ′ into s′ − m′1 non-trivial cyclic submodules as in Theorem 6.6. As


α| : d1(x)M ∼= d1(x)M ′ the proof can be completed by induction on the number,r say, of different polynomials among d1(x), d2(x), . . . , ds(x). Take r = 1. Thenm1 = s and d1(x)M is trivial. So d1(x)M ′ is also trivial and m′

1 = s. There-fore di(x) = d1(x) = d ′

i (x) for 1 ≤ i ≤ s. Now take r > 1. There are r − 1different polynomials among dm1+1(x)/d1(x), dm1+2(x)/d1(x), . . . , ds(x)/d1(x)

and so the conclusion of Theorem 6.6 holds on replacing α : M ∼= M ′ byα| : d1(x)M ∼= d1(x)M ′, i.e. s − m1 = s − m′

1 (showing m1 = m′1) and also

di(x)/d1(x) = d ′i (x)/d1(x) for m1 < i ≤ s. Therefore s = s′, di(x) = d1(x) =

d ′i (x) for 1 ≤ i ≤ m1 = m′

1 and di(x) = d ′i (x) for m1 < i ≤ s′ on multiplying by

d1(x). The induction is now complete.

EXERCISES 6.2

Solution 3(a): Write N = M(A)p(x). Consider u,v ∈ N . Then p(x)nu = 0 andp(x)nv = 0. Using the distributive law in M(A) we have

p(x)n(u + v) = p(x)nu + p(x)nv = 0 + 0 = 0

showing u + v ∈ N . Also −v ∈ N as

p(x)n(−v) = −p(x)nv = −0 = 0.

The zero row vector 0 in F t belongs to N as p(x)n0 = 0 and so N is a subgroupof the additive group (F t ,+). For

f (x) ∈ F [x]p(x)n(f (x)v) = (p(x)nf (x))v = (f (x)p(x)n)v = f (x)(p(x)nv)

= f (x)0 = 0

showing that f (x)v ∈ N . By Definition 2.26 we conclude that N is a submoduleof M(A).Suppose N = {0} and look for a contradiction. By the Cayley–Hamilton the-orem p(x)n(q(x)v) = (p(x)nq(x))v = (χA(x)v) = vχA(A) = v0 = 0 for allv ∈ M(A). So q(x)v ∈ N , i.e. q(x)v = 0, for all v ∈ M(A). Therefore eiq(A) =q(x)ei = 0 for 1 ≤ i ≤ t showing that all the rows of the t × t matrix q(A) arezero. So q(A) = 0 which means μA(x)|q(x) by Corollary 6.10. But p(x)|μA(x)

as χA(x) and μA(x) have equal irreducible factors by Corollary 6.11. Sop(x)|q(x) contrary to gcd{p(x), q(x)} = 1. So N �= {0}, i.e. the primary com-ponents of M(A) are non-trivial.

Solution 3(b): Suppose M = M(C(p(x)n)) has submodules N1 and N2 such thatM = N1 ⊕ N2. Then M is cyclic with generator e1, the first element of the stan-dard basis of F t where t = degp(x)n, by Theorem 5.26. The monic divisors ofp(x)n are the n + 1 polynomials p(x)m for 0 ≤ m ≤ n. By Theorem 5.28 thesubmodules N1 and N2 are also cyclic with generators p(x)m1e1 and p(x)m2e1


respectively and we may assume 0 ≤ m1 ≤ m2 ≤ n. So N2 ⊆ N1 as the gen-erator p(x)m2e1 of N2 is a polynomial multiple p(x)m2−m1 of the generatorp(x)m1e1 of N1. As M = N1 ⊕ N2 we know N1 and N2 are independent, i.e.(N1,+) and (N2,+) are independent subgroups (Definition 2.14) of (M,+). SoN1 ∩ N2 = {0}. But N1 ∩ N2 = N2 as N2 ⊆ N1 and so N2 = {0}. As M is non-zero we see that M is indecomposable. Let N be an indecomposable submoduleof M(A). As N �= {0} by Exercises 5.1, Question 5 there is an s × s matrix B

over F with N ∼= M(B). So M(B) is indecomposable. Should χB(x) be divis-ible by two or more irreducible polynomials over F the primary decompositionTheorem 6.12 of M(B) would contradict its indecomposability by (a) above. SoχB(x) = p(x)n where p(x) is a monic irreducible polynomial over F and n is apositive integer. Also M(B) has only one invariant factor, since otherwise the in-variant factor decomposition Theorem 6.5 of M(B) would contradict its indecom-posability. So M(B) is cyclic. By Corollary 5.27 we conclude N ∼= M(C(p(x)n)).

Solution 3(c): As dimNj ≥ 1 for 1 ≤ j ≤ r , on comparing dimensions of subspacesof

F t t = dimF t = dimM(A) = dim(N1 ⊕ N2 ⊕ · · · ⊕ Nr)

= dimN1 + dimN2 + · · · + Nr ≥ r

showing r ≤ t . Consider a decomposition M(A) = N1 ⊕ N2 ⊕ · · · ⊕ Nr into a di-rect sum of r non-zero submodules Nj with r as large as possible (as r is boundedabove by t there is such an r). Suppose N1 is decomposable and so N1 = N ′

1 ⊕N ′′1

where N ′1 and N ′′

1 are non-zero. Then M(A) = N ′1 ⊕N ′′

1 ⊕N2 ⊕ · · ·⊕Nr is a de-composition with r + 1 summands (terms) contrary to the choice of r . So N1 isindecomposable and in the same way each Nj is indecomposable for 1 ≤ j ≤ r .Therefore Nj

∼= M(C(pj (x)nj )) where pj (x) is monic and irreducible over F

and nj is a positive integer by (b) above. From Corollary 5.20 we concludeA ∼ C(p1(x)n1) ⊕ C(p2(x)n2) ⊕ · · ·⊕ C(pr(x)nr ) showing that the polynomialspj (x)nj for 1 ≤ j ≤ r are the elementary divisors of A. So r is the number ofelementary divisors of A.

Solution 3(d): Write mj(x) = χA(x)/pj (x)nj = ∏ki=1,i �=j pi(x)ni which is a monic

polynomial of degree t − nj degpj (x) over F . Then

lcm{m1(x),m2(x), . . . ,mk(x)} = χA(x)

and (more to the point) gcd{m1(x),m2(x), . . . ,mk(x)} = 1 as pj (x) is not adivisor of mj(x) for 1 ≤ j ≤ k, i.e. the polynomials m1(x),m2(x), . . . ,mk(x)

have no common irreducible divisor. By Corollary 4.6 there are aj (x) ∈ F [x] for1 ≤ j ≤ k such that a1(x)m1(x) + a2(x)m2(x) + · · · + ak(x)mk(x) = 1. We arenow ready to prove Theorem 6.12.


Consider v ∈ M(A). Then

v = 1v =(

k∑

j=1

aj (x)mj (x)

)

v =k∑

j=1

aj (x)mj (x)v =k∑

j=1

vj

where vj = aj (x)mj (x)v for 1 ≤ j ≤ k. Now vj ∈ M(A)pj (x) as

pj (x)nj vj = pj (x)nj aj (x)mj (x)vj = aj (x)χA(x)v

= aj (x)(vχA(A)) = aj (x)0 = 0

for 1 ≤ j ≤ k. Therefore

M(A) = M(A)p1(x) + M(A)p2(x) + · · · + M(A)pk(x).

Suppose v1 + v2 + · · · + vk = 0 where vj ∈ M(A)pj (x) for 1 ≤ j ≤ k. We con-centrate on one particular term vj . For i �= j , 1 ≤ i ≤ k we have mj(x)vi = 0 aspi(x)ni |mj(x) and pi(x)ni vi = 0. Inserting k − 1 zero terms mj(x)vi = 0 pro-duces

mj(x)vj = mj(x)vj +k∑

i=1,i �=j

mj (x)vi =k∑

i=1

mj(x)vi

= mj(x)

(k∑

i=1

vi

)

= mj(x)0 = 0.

The polynomial 1 − aj (x)mj (x) = ∑ki=1,i �=j ai(x)mi(x) is divisible by pj (x)nj

since pj (x)nj |mi(x) for i �= j,1 ≤ i ≤ k. So a′j (x) = (1 − aj (x)mj (x))/pj (x)nj

is a polynomial over F . Then

vj = 1vj = (aj (x)mj (x) + (1 − aj (x)mj (x)))vj

= aj (x)mj (x)vj + (1 − aj (x)mj (x))vj

= aj (x)mj (x)vj + a′j (x)pj (x)nj vj

= aj (x)0 + a′j (x)0 = 0 for 1 ≤ j ≤ k.

So v1 + v2 + · · · + vk = 0 implies v1 = v2 = · · · = vk = 0. Therefore the primarycomponents of M(A) are independent Definition 2.14. By Lemma 2.15

M(A) = M(A)p1(x) ⊕ M(A)p2(x) ⊕ · · · ⊕ M(A)pk(x).

Solution 7(c): Let g(x) = bnxn + bn−1x

n−1 + b1x + b0 be a polynomial over F andlet c ∈ F . For i ≤ j ≤ n the coefficients of xj−i in Hi(f (x) + g(x)) and in


Hi(f (x)) + Hi(g(x)) are equal as

(aj + bj )

(j

i

)= aj

(j

i

)+ bj

(j

i

).

Therefore Hi(f (x) + g(x)) = Hi(f (x)) + Hi(g(x)). Also cHi(f (x)) =Hi(cf (x)) as the coefficient of xj−i in both these polynomials is caj

(ji

)for

i ≤ j ≤ n. So f (x) → Hi(f (x)) is a linear mapping of the (infinite dimensional)vector space F [x].With f (x) = xj we have f ′′(x) = 0 for 0 ≤ j < 2 and f ′′(x) = j (j − 1)xj−2

for j ≥ 2. As j (j − 1) is even we see f ′′(x) = 0 for all j ≥ 0 in the caseF = Z2. Applying twice in succession part (a)(i) above, we conclude: f ′′(x) = 0for all f (x) ∈ Z2[x]. Now H2(x

j ) = 0 for 0 ≤ j < 2 and H2(xj ) = (

j2

)xj−2

for j ≥ 2. Also(

j2

) = j (j − 1)/2 is even if and only if either j ≡ 0 (mod 4)

or j ≡ 1 (mod 4). So f (x) = anxn + an−1x

n−1 + a1x + a0 ∈ Z2[x] satis-fies H2(f (x)) = 0 if and only if aj = 0 for j ≡ 2 (mod 4) and aj = 0 forj ≡ 3 (mod 4). So

kerH2 = 〈1, x, x4, x5, x8, x9, . . .〉,imH2 = 〈1, x, x4, x5, x8, x9, . . .〉 and kerH2 = imH2.

Hence H 22 = 0, i.e. H2(H2(f (x))) is the zero polynomial for all f (x) ∈ Z2[x].

As i!( ji

) = j !/(j − i)! we see i!Hi(xj ) = j (j − 1)(j − 2) · · · (j − i + 1)xj−i

which is the result of formally differentiating i times the polynomial xj for j ≥ i.As both i!Hi and formally differentiating i times are linear mappings of F [x]with xj in their kernels for 0 ≤ j < i, we conclude i!Hi(f (x)) = f (i)(x) for allf (x) ∈ F [x]. As

(j − i)

(j

i

)= (i + 1)

(j

i + 1

)

we deduce

H ′i (x

j ) = (j − i)

(j

i

)xj−i−1 = (i + 1)

(j

i + 1

)xj−i−1

= (i + 1)Hi+1(xj )

for all j . As above we conclude H ′i (f (x)) = (i + 1)Hi+1(f (x)) for all

f (x) ∈ F [x].Suppose χ(F ) is not a divisor of i!, i.e. i! �= 0 in F . Dividing the equationi!Hi(f (x)) = f (i)(x) through by i! gives Hi(f (x)) = f (i)(x)/i!.Suppose χ(F ) is not a divisor of i + 1. Then i + 1 �= 0 in F . Dividing theequation H ′

i (f (x)) = (i + 1)Hi+1(f (x)) through by i + 1 gives Hi+1(f (x)) =Hi(f (x))/(i + 1).


Solution 9(a): The composition αβ of the additive mappings α and β is itself additiveby Exercises 2.1, Question 4(d). As α and β are semi-linear there are θ,ϕ ∈ AutRwith (av)α = (a)θ(v)α and (bw)β = (b)ϕ(w)β for all a, b ∈ R and v,w ∈ M .Setting b = (a)θ,w = (v)α gives

(av)αβ = ((av)α)β = ((a)θ(v)α)β = (bw)β = (b)ϕ(w)β

= (a)θϕ(v)αβ

for all a ∈ R, v ∈ M and so αβ is semilinear as θϕ ∈ AutR by Exercises 2.3,Question 3(d).Let α be bijective. Then α−1 is additive by Exercises 2.1, Question 4(d). Fora ∈ R, v ∈ M we see av ∈ M and so there is w ∈ M with (w)α = av. ByExercises 2.3, Question 3(d) we know θ−1 ∈ AutR. Also ((a)θ−1(v)α−1)α =((a)θ−1θ)((v)α−1α) = av = (w)α. Therefore (av)α−1 = w = (a)θ−1(v)α−1

showing that α−1 is semi-linear.Denote two elements of Rt by

v = (a1, a2, . . . , at ) and w = (b1, b2, . . . , bt ).

As θ is additive we have (ai + bi)θ = (ai)θ + (bi)θ for 1 ≤ i ≤ t showingthat the ith entries in (v + w)θ and (v)θ + (w)θ are equal. So (v + w)θ =(v)θ + (w)θ for all v,w ∈ Rt showing θ to be additive. For a ∈ R we have(aai)θ = (a)θ(ai)θ showing that the ith entries in (av)θ and (a)θ(v)θ are equalfor 1 ≤ i ≤ t . Therefore (av)θ = (a)θ(v)θ for all a ∈ R, v ∈ Rt showing that θ issemi-linear.

Solution 9(b): Applying the eros ri − rl+i for 1 ≤ i ≤ l over E to Y followed by(c − c′)−1ri for 1 ≤ i ≤ l and finally rl+i − c′ri for 1 ≤ i ≤ l produces the matrixZ′

S with rows eiZ′S = v1i , el+iZ

′S = v0i for 1 ≤ i ≤ l. Comparing determinants

gives detY = (c − c′)l detZ′S . For i = 1,2, . . . , l in turn we apply the l − i + 1

eros rj ↔ rj−1 to Z′S for j = l+ i, l+ i −1, . . . ,2i which produces ZS . Therefore

detZ′S = (−1)l(l+1)/2 detZS as each of these

l + l − 1 + l − 2 + · · · + 2 + 1 = l(l + 1)/2

eros gives a sign change in the determinant. So detY = (−1)l(l+1)/2(c−c′)l detZS .

EXERCISES 6.3

Solution 1(e): As μA(x) is irreducible over F we see χA(x) = μA(x)s by Corol-lary 6.13. Comparing degrees gives s = t/m where m = degμA(x). A typicalelement of the field F(c), where μA(c) = 0, is f (c) where f (x) ∈ F [x]. Alsof (c) = g(c) ⇔ f (x) ≡ g(x) (mod μA(x)) where f (x), g(x) ∈ F [x] (see the dis-cussion after Theorem 4.9). So the product of f (c) and the element v of the


F [x]-module M(A) is unambiguously defined by f (c)v = f (x)v = vf (A). Theseven module laws in M(A) immediately give rise to the seven laws of a vectorspace over F(c), i.e. F t has the structure of a vector space over F(c) which wedenote by M(A)′. Then dimM(A)′ = s′ where s′ ≤ t as F(c) is an extensionfield of F . Let v1, v2, . . . , vs′ be a basis of M(A)′. The ms′ vectors xi−1vj for1 ≤ i ≤ m, 1 ≤ j ≤ s′ form a basis of F t and so dimM(A)′ = s′ = s = t/m.Let N be a submodule of M(A). Then f (x)v ∈ N where f (x) ∈ F [x], v ∈ N .So f (c)v ∈ N where f (c) ∈ F(c), v ∈ N . So N is a subspace of M(A)′. Fi-nally N a subspace of M(A)′ ⇒ N a submodule of M(A). Let β ∈ EndM(A).Then (f (c)v)β = (f (x)v)β = f (x)((v)β) = f (c)((v)β) showing that β is a lin-ear mapping of M(A)′. Conversely each linear mapping of M(A)′ belongs toEndM(A). Let B denote a basis of M(A)′. For each β ∈ EndM(A) write (β)θ forthe matrix of the linear mapping β of M(A)′ relative to B. Then (β)θ ∈Ms(F (c))

and, mimicking the proof of Theorem 3.15 with Z, t , replaced by F(c), s, we seethat θ : EndM(A) ∼= Ms(F (c)) is a ring isomorphism. Restricting θ to AutM(A),the group of invertible elements of EndM(A), produces the group isomorphismθ | : AutM(A) ∼= GLs(F (c)).

Solution 2(b): Consider β,β ′ ∈ EndM(A) with matrices B = (bij ),B′ = (b′

ij ) re-

spectively relative to the standard basis B0 of F t . So (ei)β = ∑tj=1 bij ej and

(ei)β′ = ∑t

j=1 b′ij ej for 1 ≤ i ≤ t . Adding these equations gives

(ei)(β + β ′) = (ei)β + (ei)β′ =

t∑

j=1

bij ej +t∑

j=1

b′ij ej

=t∑

j=1

(bij + b′ij )ej for 1 ≤ i ≤ t

which shows that β + β ′ has matrix B + B ′ = (bij + b′ij ) relative to B0, i.e.

(β + β ′)θ = B + B ′ = (β)θ + (β ′)θ . So θ respects addition. Now(ej )β

′ = ∑tk=1 b′

jkek for 1 ≤ j ≤ t . So applying β ′ to (ei)β = ∑tj=1 bij ej gives

(ei)(ββ ′) = ((ei)β)β ′ =(

t∑

j=1

bij ej

)

β ′ =t∑

j=1

bij (ej )β′

=t∑

j=1

bij

(t∑

k=1

b′jkek

)

=t∑

k=1

(t∑

j=1

bij b′jk

)

ek

for 1 ≤ j ≤ t which shows that ββ ′ has matrix BB ′ = (∑t

j=1 bij b′jk) relative

to B0, i.e. (ββ ′)θ = BB ′ = (β)θ(β ′)θ . So θ respects multiplication. The iden-tity mapping ι of F t is the 1-element of the ring EndM(A) and (ι)θ = I , i.e.the matrix of ι relative to B0 is the t × t identity matrix I over F . As I is


the 1-element of Mt (F ) we conclude that θ ′ : EndM(A) → Mt (F ), defined by(β)θ ′ = (β)θ for all β ∈ EndM(A), is a ring homomorphism. Let β ∈ ker θ ′.Then (β)θ = B = 0, the zero t × t matrix over F , i.e. B = (bij ) where bij = 0for 1 ≤ i, j ≤ t . This gives (ei)β = ∑t

j=1 bij ej = ∑tj=1 0ej = 0 for 1 ≤ i ≤ t .

For v ∈ F t we have v = ∑ti=1 aiei and so (v)β = ∑t

i=1 ai(ei)β = ∑ti=1 ai0 = 0

showing that β = 0. Therefore ker θ ′ = 0 showing that θ ′ is injective by Ex-ercises 2.3, Question 1(a)(i). By Theorem 6.27 and Definition 6.28 we seeim θ ′ = Z(A). So Z(A) is a subring of Mt (F ) by Exercises 2.3, Question 3(b).Write θ = θ ′ι′ where ι′ : Z(A) → Mt (F ) is the inclusion. As ι′ is an injectivering homomorphism we see that θ : EndM(A) → Z(A) is a ring isomorphism,i.e. θ : EndM(A) ∼= Z(A). Now AutM(A) = U(EndM(A)), i.e. the automor-phisms of the F [x]-module M(A) are exactly the invertible elements of the ringEndM(A). Therefore θ | : AutM(A) ∼= U(Z(A)), i.e. θ | is a group isomorphism(Exercises 2.3, Question 4(c)) between the corresponding groups of invertible el-ements of these rings.

Solution 2(c): Each element of the ring F [x]/〈g(x)〉 is uniquely expressible〈g(x)〉 + f (x) where degf (x) < degg(x). Also 〈g(x)〉 + f (x) is an invert-ible element of F [x]/〈g(x)〉 if and only if gcd{f (x), g(x)} = 1. Therefore inthe case of a finite field F of order q we have Φq(g(x)) = |U(F [x]/〈g(x)〉)|,i.e. the number of polynomials f (x) over F with gcd{f (x), g(x)} = 1 anddegf (x) < degg(x) is the order of the multiplicative group U(F [x]/〈g(x)〉).For non-zero g(x),h(x) ∈ F [x] with gcd{g(x),h(x)} = 1

Φq(g(x)h(x)) = |U(F [x]/〈g(x)h(x)〉)|= |U(F [x]/〈g(x)〉) × U(F [x]/〈h(x)〉)|= |U(F [x]/〈g(x)〉)| × |U(F [x]/〈h(x)〉)|= Φq(g(x))Φq(h(x))

showing that Φq has the multiplicative property. As degp(x)n = mn thereare qmn polynomials f (x) over F with degf (x) < degp(x)n as there areq choices for each of the mn coefficients of xi in f (x) for 0 ≤ i < mn.Suppose gcd{f (x),p(x)n} �= 1. Then p(x)|gcd{f (x),p(x)n} as p(x) is irre-ducible over F . So p(x)|f (x) and so f (x)/p(x) is a polynomial of degreeless than mn − m = m(n − 1) over F . There are qm(n−1) such polynomialsover F and hence there are exactly qm(n−1) polynomials f (x) as above withgcd{f (x),p(x)n} �= 1. Therefore

Φq(p(x)n) = qmn − qm(n−1) = qm(n−1)(qm − 1).

Solution 2(e): Each of the q scalar matrices aI for a ∈ Fq belongs to a singleton simi-larity class {aI }. Let A be a non-scalar 2×2 matrix over Fq . Then M(A) is cyclic


with quadratic minimum polynomial μA(x) by Exercises 6.1, Question 6(a).There are q polynomials μA(x) = (x − a)2 giving Φq((x − a)2) = q2 − q ,and so the similarity class of A has size (q2 − 1)(q2 − q)/(q2 − q) = q2 − 1by Theorem 6.29. There are q(q − 1)/2 polynomials μA(x) = (x − a)(x − b),a �= b, giving Φq((x − a)(x − b)) = (q − 1)2 and so the size of the sim-ilarity class of A is (q2 − 1)(q2 − q)/(q − 1)2 = (q + 1)q . There remainq2 − q − q(q − 1)/2 = q(q − 1)/2 monic quadratics which are the irreducibleμA(x) and so Φq(μA(x)) = q2 − 1 and (q2 − 1)(q2 − q)/(q2 − 1) = q(q − 1)

is the size of the similarity class of A. Adding the sizes of similarity classes inM2(Fq) gives

q + q(q2 − 1) + (q(q − 1)/2)(q + 1)q + (q(q − 1)/2)q(q − 1)

= q + q3 − q + q4 − q3 = q4

as expected since |M2(Fq)| = q4. There are q − 1 scalar matrices aI in GL2(Fq)

namely those with a �= 0. The q − 1 similarity classes of matrices A withμA(x) = (x − a)2, a �= 0 are contained in GL2(Fq). The (q − 1)(q − 2)/2 simi-larity classes of matrices A with μA(x) = (x − a)(x − b), a �= 0, b �= 0, a �= b arecontained in GL2(Fq). All the q(q − 1)/2 similarity classes of matrices A withμA(x) irreducible are contained in GL2(Fq) as μA(0) �= 0. Adding up the numberof matrices in GL2(Fq) according to their q2 − 1 conjugacy classes gives

q − 1 + (q − 1)(q2 − 1) + ((q − 1)(q − 2)/2)(q + 1)q

+ (q(q − 1)/2)q(q − 1)

= (q2 − 1)(q2 − q) = |GL2(Fq)|.Solution 3(a): Let e denote the identity element of G. Then (e)θ is the identity map-

ping ι of Ω . So x(e)θ = xι = x showing x ∼ x for all x ∈ Ω . Suppose x ∼ y forx, y ∈ Ω . There is g ∈ G with x(g)θ = y. As θ : G → S(Ω) is a group homomor-phism we see ((g)θ)−1 = (g−1)θ and so y(g−1)θ = x showing y ∼ x as g−1 ∈ G.Suppose x ∼ y and y ∼ z where x, y, z ∈ Ω . There are g,h ∈ G with x(g)θ = y

and y(h)θ = z. As (g)θ(h)θ = (gh)θ we see x(gh)θ = x(g)θ(h)θ = (x(g)θ )(h)θ =y(h)θ = z showing x ∼ z as gh ∈ G. We conclude that ∼ is an equivalence rela-tion on Ω . The equivalence class of x is

Ox = {y ∈ Ω : y ∼ x} = {y ∈ Ω : y = x(g)θ for g ∈ G} = {x(g)θ : g ∈ G}.We verify that Gx is a subgroup of G by showing that Gx contains the iden-tity e of G and Gx is closed under multiplication and inversion. As (e)θ = ι wesee x(e)θ = xι = x showing e ∈ Gx . Consider g,h ∈ Gx . Then x(g)θ = x andx(h)θ = x. As above we see x(gh)θ = x(g)θ(h)θ = (x(g)θ )(h)θ = x(h)θ = x show-ing gh ∈ Gx . Also x(g)θ = x gives x((g)θ)−1 = x and so x(g−1)θ = x showing


g−1 ∈ Gx . Therefore Gx is a subgroup of G. To show that the correspondenceGxg → x(g)θ is unambiguously defined suppose Gxg = Gxh for g,h ∈ G. Thengh−1 ∈ Gx which means x(gh−1)θ = x. As x(g)θ(h−1)θ = x(g)θ((h)θ)−1

on applying(h)θ we obtain x(g)θ = x(h)θ showing that the above correspondence from the setof left cosets of Gx to Ox is indeed ambiguously defined. This correspondenceis surjective directly from the definition of Ox . This correspondence is injectivebecause x(g)θ = x(h)θ implies x(gh−1)θ = x which implies gh−1 ∈ Gx and henceGxg = Gxh, on reversing the above steps. So this correspondence is bijective,completing the proof of the orbit-stabiliser theorem. In the case of Gx having fi-nite index n in G we see |Ox | = n as there are exactly n distinct left cosets ofGx in G. If G is finite then n = |G|/|Gx | as the n left cosets each consist of |Gx |elements and partition G. So |G|/|Gx | = |Ox |.

Solution 5(c): The 1 × n vector e′1 has order d ′(x) in M(A′) by Theorem 5.26 as

A′ = C(d ′(x)). Write f (x) = d ′(x)/gcd{d(x), d ′(x)}. Then gcd{f (x), d ′(x)} =f (x) as f (x)|d ′(x) and so u0 = f (x)e′

1 has order d ′(x)/gcd{f (x), d ′(x)} =d ′(x)/f (x) = gcd{d(x), d ′(x)} by Lemma 5.23. For 1 ≤ i < m we have

eiAB0 = ei+1B0 = xiu0 = xi−1u0A′ = eiB0A

′.

Let d(x) = c0 + c1x + · · · + cm−1xm−1 + xm. By Theorem 5.26 we know e1 has

order d(x) in M(A) as A = C(d(x)). So

emAB0 = −(c0e1 + c1e2 + · · · + cm−1em)B0

= −(c0 + c1x + · · · + cm−1xm−1)u0

= xmu0 = xm−1u0A′ = emB0A

′

since d(x)u0 = 0 as gcd{d(x), d ′(x)}|d(x). Therefore AB0 = B0A′ as these m×n

matrices have equal rows, i.e. B0 intertwines A and A′. For 1 ≤ i ≤ r we have

eiB0 = xi−1(b0 + b1x + · · · + bn−r−1xn−r−1 + xn−r )e′

1

= b0e′i + b1e

′i+1 + · · · + bn−r−1e

′i+n−r−1 + bn−re

′i+n−r

as xj−1e′1 = e′

j for 1 ≤ j ≤ n since A′ is a companion matrix. For r < i ≤ m weshow that row i of B0 is a certain linear combination of the preceding r rowsof B0. In fact (a0 + a1x + · · · + ar−1x

r−1 + xr)u0 = gcd{d(x), d ′(x)}u0 = 0.Multiplying this equation by xi−r−1 and rearranging gives

eiB0 = xi−1u0 = −(a0xi−r−1 + a1x

i−r + · · · + ar−1xi−2)u0

= −(a0ei−rB0 + a1ei−r+1B0 + · · · + ar−1ei−1B0).

The first r rows of B0 are the vectors of the basis Bu0 as in Theorem 5.24 and soare linearly independent over F . By the above equation the remaining rows of B0

are linear combinations of the first r rows of B0. Therefore rankB0 = r .


As B0 intertwines A and A′, by (a) above β0 ∈ Hom(M(A),M(A′)). As (e1)β0 =e1B0 = u0 = (d ′

0(x)/gcd{d0(x), d ′0(x)})e′

1, taking v0 = e1, v′0 = e′

1 in Ques-tion 4(c) above we see that β0 generates Hom(M(A),M(A′)).(i) In the case d ′(x)|d(x) we see gcd{d(x), d ′(x)} = d ′(x) and sod ′(x)/gcd{d(x), d ′(x)} = 1. So u0 = 1 × e′

1 = e′1 and r = n = degd ′(x) ≤ m.

Therefore eiB0 = e′i for 1 ≤ i ≤ n and eiB0 = e′

n(A′)i−n for n < i ≤ m.

(ii) In the case d(x)|d ′(x) we see gcd{d(x), d ′(x)} = d(x) and so r = m =degd(x) ≤ n. Therefore all the rows of B0 are given by the above formulaeiB0 = b0e

′i + b1e

′i+1 + · · · + bn−r−1e

′i+n−r−1 + bn−re

′i+n−r for 1 ≤ i ≤ m.

(iii) In the case gcd{d(x), d ′(x)} = 1 we have r = 0 and u0 = d ′(x)e′1 = 0. So

B0 = 0 showing that the only matrix which intertwines C(d(x)) and C(d ′(x)) isthe zero m × n matrix.

Solution 5(d): The t × t matrix CB is partitioned, in the same way as B , intodegdi(x) × degdj (x) submatrices C(di(x))Bij for 1 ≤ i, j ≤ s. Also the t × t

matrix BC is partitioned, in the same way as B , into degdi(x) × degdj (x) sub-matrices BijC(dj (x)) for 1 ≤ i, j ≤ s. Therefore

B ∈ Z(C) ⇔ CB = BC ⇔ C(di(x))Bij = BijC(dj (x))

for all 1 ≤ i, j ≤ s.So B ∈ Z(C) if and only if Bij intertwines C(di(x)) and C(dj (x)) for all1 ≤ i, j ≤ s. By (c) above there is a degdi(x)× degdj (x) matrix (Bij )0 such thatBij = fij (x)(Bij )0 where fij (x) is a polynomial of degree less thandeg gcd{di(x), dj (x)} over F for all 1 ≤ i, j ≤ s. Therefore the F [x]-moduleEndM(C) is the direct sum of s2 cyclic submodules Mij generated by (βij )0

determined by (Bij )0 for 1 ≤ i, j ≤ s.Solution 6(b): (i) Let B(x) = (bij (x)) belong to the ring RA. Multiplying

di(x)bij (x) ≡ 0 (mod dj (x))

by −1 gives

di(x)(−bij (x)) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s.

Therefore −B(x) ∈ RA, i.e. RA is closed under negation.(ii) As

di(x)0(x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s

the zero matrix of Ms(F [x]) is in RA. As

di(x)0(x) ≡ 0 (mod dj (x)) for 1 ≤ i, j ≤ s, i �= j

and

di(x)1(x) ≡ 0 (mod di(x)) for 1 ≤ i ≤ s


we see that the identity matrix of Ms(F [x]) is in RA. Therefore RA is a subringof Ms(F [x]) completing the proof of Theorem 6.32.

Solution 6(c): Let B(x) = (bij (x)) belong to the ring RA and suppose v,w ∈ M(A).For 1 ≤ i ≤ s there are fi(x), gi(x) ∈ F [x] with v = ∑s

i=1 fi(x)vi , w =∑si=1 gi(x)vi and so v + w = ∑s

i=1(fi(x) + gi(x))vi as module law 5 holdsin M(A). As

(v)β =s∑

i,j=1

fi(x)bij (x)vj and (w)β =s∑

i,j=1

gi(x)bij (x)vj ,

on adding and using the module laws

(v)β + (w)β =s∑

i,j=1

(fi(x) + gi(x)bij (x)vj = (v + w)β

which shows β to be additive. For f (x) ∈ F [x] we have

f (x)v =s∑

i=1

f (x)fi(x)vi

and so

(f (x)v)β =s∑

i,j=1

f (x)fi(x)bij (x)vj = f (x)((v)β)

as the module laws are obeyed by M(A). So β is F [x]-linear, i.e. β ∈ EndM(A).For 1 ≤ i ≤ s on taking fi(x) = 1(x), fj (x) = 0(x) for j �= i we obtain

vi =s∑

i=1

fi(x)vi and (vi)β =s∑

i,j=1

fi(x)bij (x)vj =s∑

j=1

bij (x)vj

showing that B(x) represents β (Definition 6.30).

Index

AAdjugate matrix, 37Associate classes of ring elements, 118Associate elements, 167Automorphism

fixed field, 183Frobenius, 94, 183of M(A), 306Z-module, 54

BBasis

R-basis, 83standard, 86standard basis of F [x]t , 252

CCanonical form

Jordan normal form (Jnf), 282primary (pcf), 279rational (rcf), 257real Jordan form, 298separable Jordan form (sJf), 289

Cauchy–Binet theorem over R, 39Cayley–Hamilton theorem, 266Centraliser of A, 309Characteristic of field, 94Chinese remainder theorem, 68

for companion matrices, 241for polynomials, 186generalised, 92

Companion matrix, 231Conjugate elementary operations

over F [x], 188over Z, 11

Coprime, 42Coset, 61

DDedekind’s formula, 185Direct sum

external, 65, 69external of modules, 96internal, 70of matrices, 217of rings, 92

Division propertyin F [x], 167in Z, 17

Divisor sequence, 144

EElementary divisors

of finite abelian group, 125of matrix, 278

Elementary matrixover F [x], 188over Z, 10

C. Norman, Finitely Generated Abelian Groups and Similarity of Matrices over a Field,Springer Undergraduate Mathematics Series,DOI 10.1007/978-1-4471-2730-7, © Springer-Verlag London Limited 2012

379

http://dx.doi.org/10.1007/978-1-4471-2730-7

380 Index

Elementary operationover F [x], 187over Z, 9

Endomorphismof abelian group, 134of M(A), 306ring, 134

Endomorphism condition, 313Equivalence relation, 30Equivalent matrices

over F [x], 189over Z, 20

Euclidean algorithmover F [x], 172over Z, 21

Euclidean ring, 188Euler φ-function, 68

for polynomials, 309Existence of rcf, 258Exponent of finite Z-module, 123Extension field, 178Extension field F(c), 179

FField of fractions, 287Finite extension field, 184First isomorphism theorem

for groups, 93for R-modules, 89for rings, 92for Z-modules, 76

Frobenius’ theorem, 318Fundamental theorem of algebra, 174Fundamental theorem of arithmetic, 29

GGalois field Fq , 180Greatest common divisor (gcd), 32Greatest common divisor of polynomials, 171Group

additive abelian group, 48character, 117cyclic group of order n, 51cyclic subgroup 〈g〉, 55elementary abelian, 124general linear, 81indecomposable, 125infinite cyclic group, 50order of, 52, 62presentation, 101quotient, 64special linear, 93subgroup, 50

HHomomorphic image, 78Homomorphism

evaluation, 167, 252natural, 63, 92ring, 67Z-module, 53

IIdeal, 91

annihilator ideal KA, 265generator, 33ideal of F [x], 169ideal of Z, 33order ideal, 54, 212principal ideal of Z, 33

Indecomposable module, 280Independent submodules, 72Index of subgroup, 62Integer division property, 17Integral domain, 166Intermediate subfield, 185Invariance theorem

for finite Z-modules, 110for F [x]-modules, 262

Invariant factor condition, 113Invariant factor decomposition

of f.g. Z-module, 104of M(A), 258

Invariant factorsof f.g. Z-module, 112of matrix over F [x], 263

Invariant subspace, 216Isomorphism

algebra, 309module, 87Z-module, 54

Isomorphism type, 56

LLagrange’s theorem, 63Lattice, 79|G|-lemma, 63Linear mapping determined by A, 208

MMapping

additive, 53evaluation at α, 207evaluation at a, 167idempotent, 90image, 76kernel, 76natural, 53, 64

Index 381

Mapping (cont.)R-linear, 87restriction, 91, 216semi-linear, 294, 305Z-linear, 53

Matrixcharacteristic, 206, 254characteristic polynomial, 205intertwining, 335Jordan block, 282nilpotent, 291of linear mapping relative to a basis, 204reduced, 313separable Jordan block matrix, 289similar over a field, 205similarity class, 205

l-minor, 36, 197Module

cyclic, 50cyclic with generator v0, 228determined by a linear mapping, 207determined by a matrix, 208free, 84indecomposable, 125, 280, 301R-module, 83submodule, 50, 88, 215Z-module, 49

NNon-elementary operation, 118Normal subgroup, 80, 92

OOrbit-stabiliser theorem, 311, 334Order of subgroup element, 54Order of module element, 212

PPaired elementary operations, 10, 188Partition function, 127Partition of positive integer, 127Partition of set, 62Permutation representation of group, 311Polarity

lattice, 271of subgroup lattice, 117

Polynomialdegree, 166divisor, 168formal derivative, 287

Hasse derivative, 304irreducible, 174minimum polynomial of matrix, 264monic, 167over field F , 166over ring R[x], 166palindromic, 271reducible, 176ring R[x], 182separable, 287splitting, 182, 286zero of, 167

Primary componentsof finite Z-module, 120of M(A), 274

Primary decompositionof finite Z-module, 121of module M(A), 276

Prime subfield, 94Principal Ideal Domain (PID), 34, 118, 170

QQuotient group, 64, 93Quotient ring, 92

RRank of free module, 86Resultant, 244

SSmith normal form

over F [x], 189over PID, 118

Smith normal form over Z, 20existence, 26uniqueness, 40

Splitting field of polynomial, 286Subfield, 178Submodule, 215

cyclic generated by v0, 237Subring, 92

TTorsion subgroup, 107Torsion submodule, 119Torsion-free rank, 109

UUniqueness of rcf, 262Unit of ring, 68

springer undergraduate mathematics...

Documents