kam-tim leung linear algebra and geometry 1974

Linear Algebraand Geometry

KAM-TIM LEUNG

HONG KONG UNIVERSITY PRESS

The Author

Dr K.T. Leung took his doctorate in Mathe-matics in 1957 at the University of Zurich.From 1958 to 1960 he taught at MiamiUniversity and the University of Cincinnati inthe U.S.A. Since 1960 he has been with theUniversity of Hong Kong, where he is nowSenior Lecturer in Mathematics and Dean ofthe Faculty of Science.

He is the author (with Dr Doris L.C. Chen)of Elementary set theory, Parts I and II,also published by the Hong Kong UniversityPress.

LINEAR ALGEBRA AND GEOMETRY

Linear Algebraand Geometry

KAM-TIM LEUNG

HONG KONG UNIVERSITY PRESS

1974

© Copyright 1974 Hong Kong University PressISBN 0-85656-111-8

Library of Congress Catalog Card Number 73-89852

Printed in Hong Kong by

EVERBEST PRINTING CO., LTD12-14 Elm Street, Kowloon, Hong Kong

PREFACE

Linear algebra is now included in the undergraduate curriculum ofmost universities. It is generally recognized that this branch ofalgebra, being less abstract and directly motivated by geometry, iseasier to understand than some other branches and that because of thewide applications it should be taught as early as possible. Thepresent book is an extension of the lecture notes for a course inalgebra and geometry given each year to the first-year undergraduatesof mathematics and physical sciences in the University of Hong Kongsince 1961. Except for some rudimentary knowledge in the languageof set theory the prerequisites for using the main part of this bookdo not go beyond Form VI level. Since it is intended for use bybeginners, much care is taken to explain new theories by building upfrom intuitive ideas and by many illustrative examples, though thegeneral level of presentation is thoroughly axiomatic.

The book begins with a chapter on linear spaces over the real andthe complex field in leisurely pace. The more general theory of linearspaces over an arbitrary field is not touched upon since no sub-stantial gain can be achieved by its inclusion at this level of instruc-tion. In §3 a more extensive knowledge in set theory is needed forformulating and proving results on infinite-dimensional linear spaces.Readers who are not accustomed to these set-theoretical ideas mayomit the entire section.

Trying to keep the treatment coordinate-free, the book does notfollow the custom of replacing any space by a set of coordinates,and then forgetting about the space as soon as possible. In this spiritlinear transformations come (Chapter II) before matrices (Chapter V).While using coordinates students are reminded of the fact that aparticular isomorphism is given preference. Another feature of thebook is the introduction of the language and ideas of categorytheory (§8) through which a deeper understanding of linear algebracan be achieved. This section is written with the more capablestudents in mind and can be left out by students who are hardpressed for time or averse to a further level of abstraction. Except fora few incidental remarks, the material of this section is not usedexplicitly in the later chapters.

v

vl

Geometry is a less popular subject than it once was and itsomission in the undergraduate curriculum is lamented by many mathe-maticians. Unlike most books on linear algebra, the present bookcontains two substantial geometrical chapters (Chapters III and IV)in which affine and projective geometry are developed algebraicallyand in a coordinate-free manner in terms of the previously developedalgebra. I hope this approach to geometry will bring out clearly theinterplay of algebraic and geometric ideas.

The next two chapters cover more or less the standard material onmatrices and determinants. Chapter VII handles eigenvalues up to theJordan forms. The last chapter concerns itself with the metricproperties of euclidean spaces and unitary spaces together with theirlinear transformations.

The author acknowledges with great pleasure his gratitude to DrD.L.C Chen who used the earlier lecture notes in her classes andmade several useful suggestions. I am especially grateful to Dr C.B.Spencer who read the entire manuscript and made valuable sug-gestion for its improvement both mathematically and stylistically.Finally I thank Miss Kit-Yee So and Mr K.W. Ho for typing themanuscript.

K. T. Leung

University of Hong KongJanuary 1972

CONTENTS

PREFACE ....................................... v

Chapter I LINEAR SPACE .......................... I

§ 1 General Properties of Linear Space ................. 4A. Abelian groups B. Linear spaces C. ExamplesD. Exercises

§2 Finite-Dimensional Linear Space................... 17A. Linear combinations B. Base C. Linear indepen-dence D. Dimension E. Coordinates F. Exercises

§3 Infinite-Dimensional Linear Space .................. 32A. Existence of Base B. Dimension C. Exercises

§4 Subspace ..................................... 35

A. General properties B. Operations on subspacesC. Direct sum D. Quotient space E. Exercises

Chapter II LINEAR TRANSFORMATIONS ............. 45

§5 General Properties of Linear Transformation .......... 45A. Linear transformation and examples B. CompositionC. Isomorphism D. Kernel and image E. FactorizationF. Exercises

§6 The Linear Space Hom (X, Y) ..................... 62A. The algebraic structure of Horn (X, Y) B. Theassociative algebra End (X) C. Direct sum and directproduct D. Exercises

§7 Dual Space .................................... 73A. General properties of dual space B. Dual trans-formations C. Natural transformations D. A dualitybetween ..(AX) and E. Exercises

§8 The Category of Linear Spaces .................... 84A. Category B. Functor C. Natural transformationD. Exercises

Chapter III AFFINE GEOMETRY ..................... 96

§9 Affine Space .................................. 96A. Points and vectors B. Barycentre C. Linear varie-ties D. Lines E. Base F. Exercises

§ 10 Affine Transformations .......................... 113A. General properties B. The category of affine spaces

Chapter IV PROJECTIVE GEOMETRY ................§ 11 Projective Space ................................

A. Points at infinity B. Definition of projectivespace C. Homogeneous coordinates D. Linear varietyE. The theorems of Pappus and Desargues F. Cross

118

118

ratio G. Linear construction H. The principle ofduality I. Exercises

§ 12 Mappings of Projective Spaces ..................... 141A. Projective isomorphism B. Projectivities C. Semi-linear transformations D. The projective groupE. Exercises

Chapter V MATRICES ............................. 155

§ 13 General Properties of Matrices ..................... 155A. Notations B. Addition and scalar multiplication ofmatrices C. Product of matrices D. Exercises

§ 14 Matrices and Linear Transformations................ 166A. Matrix of a linear transformation B. Square matricesC. Change of bases D. Exercises

§15 Systems of Linear Equations ...................... 175A. The rank of a matrix B. The solutions of a systemof linear equations C. Elementary transformations onmatrices D. Parametric representation of solutionsE. Two interpretations of elementary transformationson matrices F. Exercises

Chapter VI MULTILINEAR FORMS .................. 196

§ 16 General Properties of Multilinear Mappings ........... 197A. Bilinear mappings B. Quadratic forms C. Multi-linear forms D. Exercises

§ 17 Determinants .................................. 206A. Determinants of order 3 B. Permutations C. De-terminant functions D. Determinants E. Some usefulrules F. Cofactors and minors G. Exercises

Chapter VII EIGENVALUES ........................ 230

§ 18 Polynomials ................................... 230A. Definitions B. Euclidean algorithm C. Greatestcommon divisor D. Substitutions E. Exercises

§ 19 Eigenvalues ................................... 239A. Invariant subspaces B. Eigenvectors and eigenvaluesC. Characteristic polynomials D. Diagonalizable endo-morphisms E. Exercises

§20 Jordan Form .................................. 250A. Triangular form B. Hamilton-Cayley theoremC. Canonical decomposition D. Nilpotent endomor-phisms E. Jordan theorem F. Exercises

Chapter VIII INNER PRODUCT SPACES ............... 267

§21 Euclidean Spaces ............................... 268A. Inner product and norm B. OrthogonalityC. SCHWARZ'S inequality D. Normed linear spaceE. Exercises

§22 Linear Transformations of Euclidean Spaces.......... 280A. The conjugate isomorphism B. The adjoint trans-formation C. Self-adjoint linear transformationsD. Eigenvalues of self-adjoint transformations E. Bi-linear forms on a euclidean space F. IsometryG. Exercises

§ 23 Unitary Spaces ................................. 297A. Orthogonality B. The Conjugate isomorphismC. The adjoint D. Self-adjoint transformations E. Iso-metry F. Normal transformation G. Exercises

Index ........................................... 306

Leitfaden - summary of the interdependence of the chapters.

Linear space

I

Affine geometry

Linear transformations

Matrices Eigenvalues I Projective geometry

Multilinearforms

Inner productspaces

CHAPTER I LINEAR SPACE

In the euclidean plane E, we choose a fixed point 0 as the origin,and consider the set X of arrows or vectors in E with the commoninitial point 0. A vector a in E with initial point 0 and endpoint A isby definition the ordered pair (0, A ) of points. The vector a = (0, A)can be regarded as a graphical representation of a force acting at theorigin 0, in the direction of the ray (half-line) from 0 through Awith the magnitude given by the length of the segment OA.

Let a = (0, A) and b = (O, B) be vectors in E. Then a point Con Eis uniquely determined by the requirement that the midpoint of thesegment OC is identical with the midpoint of the segment AB. Thedifferent cases which can occur are illustrated in fig. 1.

The sum a + b of the vectors a and b is defined as the vectorc = (0, C). In the case where the points 0, A and B are not collinear,OC is the diagonal of the parallelogram OACB. Clearly our con-struction of the sum of two vectors follows the parallelogram methodof constructing the resultant of two forces in elementary mechanics.

Addition, i.e. the way of forming sums of vectors, satisfies thefamiliar associative and commutative laws. This means that, for anythree vectors a, b and c in the plane E, the following identities hold:

(Al) (a+b)+c=a+(b+c);(A2) a+b=b+a.

Moreover, the null vector 0 = (0, 0), that represents the force whosemagnitude is zero, has the property

(A3) 0 +a = a for all vectors a in E.

1

2 I LINEAR SPACE

Furthermore, to every vector a = (0, A) there is associated a uniquevector a'= (0, A') where A' is the point such that 0 is the midpoint ofthe segment AA'.

The vectors a and a' represent forces with equal magnitude but inopposite directions. The resultant of such a pair of forces is zero; forthe vectors a and a' we get

(A4) a' + a = 0.

In an equally natural way, scalar multiplication of a vector by a realnumber is defined. Let A be a real number and a = (0, A) a vector inthe plane E. On the line that passes through the points 0 and A thereis a unique point B such that (i) the length of the segment OB is IAItimes the length of the segment OA and (ii) B lies on the same(opposite) side of 0 as A if A is positive (negative).

The product Xa of A and a is the vector b = (0, B). If a representsa force F, then Xa represents the force in the same or oppositedirection of F according to whether A is positive or negative, withthe magnitude equal to IAI times the magnitude of F.

The following three rules of calculation can be easily seen to holdgood. For any real numbers A and p and any vectors a and b,

(M 1) A(pa) _ (AA)a,(M2) A(a+b)=Aa+Ab and (A+p)a=Aa+pa,(M3) la =a.

I LINEAR SPACE 3

To summarize, (1) there is defined in the set of all vectors in theplane E an addition that satisfies (Al) to (A4) and (2) for eachvector a and each real number A, a vector Aa, called the productis defined such that (Ml) to (M3) are satisfied. At the same time,similar operations are defined for the forces acting at the origin 0 suchthat the same sets of requirements are satisfied. Moreover, results ofa general nature on vectors (with respect to these operations) canbe related, in a most natural way, to those on forces. Similaroperations are also defined for objects from quite different branchesof mathematics. Consider, for example, a pair of simultaneouslinear equations:

(*) 4X1+5X2- 6X3 =0X, - X2 + 3X3 = 0.

Solutions of this pair of equations are ordered triples (a, , a2 , a3) ofreal numbers such that when a; is substituted for X; (i = 1, 2, 3) inthe equations we get

4a1 + 5a2 - 6a3 = 0 and a, -a2 + 3a3 = 0.Thus (1, -2, -1), (-1, 2, 1) are two of the many solutions of thepair of equations (*). Let us denote by S the set of all solutions ofthe equations (*) and define for any two solutions a = (a,, a2, a3)and b = (b 1, b2 , b3) and for any real number A the sum as

a+b=(al +b1 a2 +b2 a3 +b3)and the product as

ka = (Aal , Xa2 ,)la3).

Then we can easily verify that both a + b and Xa are again solutionsof the equations (*); therefore addition and scalar multiplication aredefined in the S. It is straightforward to verify further that (Al) to(A4) and (Ml) to (M3) are satisfied.

This suggests that it is desirable to have a unified mathematicaltheory, called linear algebra by mathematicians, which can be appliedsuitably to the study of vectors in the plane, forces acting at a point,solutions of simultaneous linear equations and other quite dissimilarobjects. The mathematician's approach to this problem is to lay downa definition of linear space that is adaptable to the situations dis-cussed above as well as many others. Therefore a linear space shouldbe a set of objects, referred to as vectors, together with an additionand a scalar multiplication that satisfy a number of axioms similar to

4 1 LINEAR SPACE

(Al) ,(A2), (A3), (A4) and (M 1), (M2), (M3). Once such a definitionis laid down, the main interest will lie in what there is that we can dowith vectors and linear spaces. It is to be emphasized here, that thephysical character of the vectors is of no importance to the theory oflinear algebra - in the same way that results of arithmetical calcula-tions are independent of any physical meaning the numbers mayhave for the calculator.

§ 1. General Properties of Linear Space

A. Abelian groupsFrom the preliminary discussion, we see that it is necessary for us

to know more about the nature of the operations of addition andscalar multiplication. These are essentially definite rules that assignsums and products respectively to certain pairs of objects and satisfya number of requirements called axioms. To formulate these moreprecisely, we employ the language of set theory.

Let A be a set. Then an internal composition law in A is a mappingr: A x A - A. For each element (a, b) of A x A, the image r(a, b)under r is called the composite of the elements a and b of A and isusually denoted by a r b.

Thus addition of vectors, addition of forces and addition of solu-tions discussed earlier are all examples of internal composition laws.Analogously, an external composition law in A between elements ofA and elements of another set B is a mapping a: B x A -> A. Similarly,scalar multiplications of vectors, forces and solutions discussedearlier are all examples of external composition laws. Now we saythat an algebraic structure is defined on a set A if in A we have oneor more (internal or external) composition laws that satisfy somespecific axioms. These axioms are not just chosen arbitrarily; on thecontrary, they are well-known properties shared by composition lawsthat we encounter in the applications, such as commutativity andassociativity. Abstract algebra is then the mathematical theory ofthese algebraic structures.

With this in mind, we introduce the algebraic structure of theabelian group and study the properties thereof.

§1 GENERAL PROPERTIES OF LINEAR SPACE 5

DEFINITION 1.1. Let A be a set. An internal composition lawr: A x A - A is said to define an algebraic structure of abeliangroup on A if and only if the following axioms are satisfied.

[G 1 ] For any elements a, b and c of A, (a rb)rc = a r(b rc).[G2] For any elements a and b of A, a rb = bra.[G3] There is an element 0 in A such that Ora = a for every

element a of A.

[G4] Let 0 be a fixed element of A satisfying [G3]. Then forevery element a of A there is an element -a of A such that(-a)ra = 0.

In this case, the ordered pair (A, r) is called an abelian group.It follows from axiom [G3] that if (A, r) is an abelian group, then

A is a non-empty set. We note that the non-empty set A is just a partof an abelian group (A, r) and it is feasible that the same set A maybe part of different abelian groups; more precisely, there might betwo different internal composition laws r, and r2 in the non-emptyset A, such that (A, r,) and (A, r2) are abelian groups. Therefore weshould never use a statement such as "an abelian group is a. non-emptyset on which an internal composition law exists satisfying the axiom[Gl ] to [G4]" as a definition of an abelian group. In fact, it can beproved that on every non-empty set such an internal compositionalways exists, and furthermore if the set in question is not a single-ton, then more than one such internal composition law is possible.For this reason, care should be taken to distinguish the underlyingset A from the abelian group (A, r). However, when there is nodanger of confusion, then we shall denote, for convenience, theabelian group (A, r) simply by A and say that A constitutes anabelian group (with respect to the internal composition law r). Inthis case, the set A is the abelian group A stripped of its algebraicstructure, and by a subset (an element) of the abelian group A, wemean a subset (an element) of the set A.

The most elementary example of an abelian group is the set Z ofall integers together with the usual addition of integers. In this case,axioms [G11 to [G4] are well-known properties of ordinaryarithmetic. In fact, many other well-known properties of ordinaryarithmetic of integers have their counterparts in the abstract theoryof abelian groups. In the remainder of this section, § IA, we shall usethe abelian group Z as a prototype of an abelian group to study thegeneral properties of abelian groups.

6 1 LINEAR SPACE

For convenience of formulation, we shall use the followingnotations and abbreviations.

(i) The internal composition law r of (A, r) is referred to as theaddition of the abelian group A.

(ii) The composite arb is called the sum of the elements a and bof the abelian group A and denoted by a + b. a, b are calledsummands of the sum a + b. The particular notations that weuse are not essential to our theory, but if they are wellchosen, they will not only simplify the otherwise clumsyformulations but will also help us to handle the calculationsefficiently.

(iii) A neutral element of the abelian group A is an element 0 ofA satisfying [G3 ] above, i.e., 0 + a = a for every a eA.

(iv) For any element a of the abelian group A, an additiveinverse of a is an element -a of A satisfying [G4] above,i.e., a + (-a) = 0.

(v) As a consequence of the notations chosen above, the abeliangroup A is called an additive abelian group or simply anadditive group.

Now we shall turn our attention to the general properties ofadditive groups. To emphasize their importance in our theory, weformulate some of them as theorems. In deriving these properties, weshall only use the axioms of the definition and properties alreadyestablished. Therefore, all properties shared by additive groups areposessed by virtue of the definition only.

THWREM 1.2. For any two elements a and b of an additive groupA, there is one and only one element x of A such that a + x = b.

PROOF. It is required to show that there exists (i) at most oneand (ii) at least one such element x of A; in other words, wehave to show the uniqueness and the existence of x. For theformer, let us assume that we have elements x and x' of A suchthat a + x = b and a + x' = b. From the first equation, we get-a + b =-a + (a + x) = (-a + a) +x =0 +x = x. Similarly, -a +b = x'.Therefore x = x', and this proves the uniqueness. For the ex-istence, we need only verify that the element -a + b of A satis-fies the condition that x has to satisfy. Indeed, we havea + (-a + b) = (a + (-a)) + b = 0 + b = b. Our proof of the theoremis now complete.

§ I GENERAL PROPERTIES OF LINEAR SPACE 7

Another version of the above theorem is this: given any ele-ments a and b of an additive group A, the equation a + x = badmits a unique solution x in A. This solution will be denoted by b -a,and is called the difference of b and a. Using this notation, we geta -a = 0 and 0 -a = -a for all elements a of A. Consequently, we have

COROLLARY 1.3. In an additive group, there is exactly one neutralelement and for each element a there is exactly one additive inverse-a of a.

Here is another interesting consequence of 1.2 (or 1.3). In anadditive group A, for each element x of A, x = 0 if and only ifa + x = a for some a of A. In particular, we get -0 = 0.

Let us now study more closely the axioms [G1 ] and [G2] ; theseare called the associative law and the commutative law of addition re-spectively. The equation (a + b) + c = a + (b + c) means that the elementof A obtained by repeated addition is independent of the position ofthe brackets. Therefore it is unambiguous to write this sum of threeelements as a + b + c. Analogously, we can write a + b + c + d =(a + b + c) + d for the sum of any four elements of an additivegroup A. In general, if a, , ... , aN are elements of an additive groupA then, for any positive integer n such that 0 < n < N, we have therecursive definition:

a, +a2 + ... +a, +an+I =(a, +a2 + ...+an)+an+The associative law [G 1 ] can be generalized into

[GI'] (ai +a2 + ... +am)+(am+I +am+2 + ... +am+n)=a,+ a2 + ... +am+n,

The proof of [G1'] is carried out by induction on the number n. Forn = 1, [GI'] follows from the recursive definition. Under the in-duction assumption that [G1'] holds for an n > 1, we get

(a, + ... + am) + (am +, + ... + am +n + i )

=(a, + ... +am)+ [(am+, + ... +am+n)+am+n+i ][(a, + ... +am)+(am+, + ... +am+n)] +am+n+i

=(a, + ... +am+n) + am+n+i=a, + ... +am+n+,.

8 1 LINEAR SPACE

This establishes the generalized associative law [G1'] of addition.A simple consequence of the generalized associative law is that

we can now write a multiple sum a + ... + a (n times) of an elementa of an additive group A as na.

The commutative law [G2] of addition means that the sum a + bis independent of the order in which the summands a and b appear.In other words, we can permute the summands in a sum withoutchanging it. Generalizing this, we get

[G2'] For any permutation (0(l), 0(2), ... ,O(n)) of the n-tuple(1,2, ...,n),

ao(I) + ao(1) + ... + aO(n) =a1 +a2 + ... +an

where a permutation is a bijective mapping of the set (1, 2, ... , n )onto itself. The statement [G2'] is trivially true for n = 1. We assumethat this is true for n-1 > 1, and let k be the number such thatO(k) = n. Then (0(l), . . . , Ak), . . . , 0(n)), where the number¢(k) under the sumbol ^ is deleted, is a permutation of the(n-l)-tuple (1, ... , n-1). From the induction assumption, we geta¢(1) + ... + acb(k) + ... +

2O(n)= a1 + ... +an-1 where aq5(k)

under ^ is deleted. Now

ao(1) + ... +aO(n) =(ao(1) + ... +ao(k) + ... + ao(n ))+ ao(k)

=(a1 + ... +an-1)+an =a1 + ... +an.

The generalized commutative law [G2'] of addition therefore holds.Taking [G Y] and [G2'] together we can now conclude that the sumof a finite number of elements of an additive group is independent of(i) the way in which the brackets are inserted and (ii) the order inwhich the group elements appear.

It is convenient to use the familiar summation sign E to write then

sum a I+ a2 + ... + an as Eai or E (ai : i = 1, ... ,n}. In particular,whenever the range of summation is clear from the context, we alsowrite a1 + a2 + ... + an as Eat or simply Eat. The elements

i nai (i = 1, ... , n) are called the summands of the sum Ea.*

i=1

Using this notation, we can handle the double summations withmore ease. Let aii be an element of an additive group A for eachi = 1, ... , m and j = 1, ... , n. We can then arrange these m ngroup elements in a rectangular array:


a1 1 a12 ... all ........ain

a21 a22 . . . a2J ....... . a2n

ai i ai 2 ... ail ........ ai n

aml amt am/ amn.

A natural way of summing is to get the partial sums of the rows firstand then the final sum of these partial sums. Making use of thesummation sign ;, we write:

(a11 +a12 + ... +a1n) +(a21 +a22 + ... +a2n)+ ... +m n

(ail + ai2 + ... + ain) + ... + (am 1 + amt + amn) = ; (;ail)1=1 /=1

On the other hand, we can also get the partial sums of the columnsfirst and then the final sum of these partial sums. Thus we write(a11 +a21 + ... aml)+(a12 + a22 + .. . +am2)+ ... +

(all +a2J+ ... +amj)+ ... +(aln

Applying [G1'] and [G2'] , we get

n m+ ... +amn)=; (Fail).1=1 i-1+a2n

m n n m; (;all) =E ail).

1= 1 /=1 1=1 1=1

Therefore this sum can be written unambiguously as

;a.

or;[a11:i=1,...,m;j=1,...,n}i=1,...,m 1. ifj=1'...,n

or simply Fail when no danger of confusion about the ranges of i andi,J

j is possible. Triple and multiple summations can be handled similar-ly. Finally we remark that there are many other possible ways ofgetting ;aij besides the two ways given above.

1,

Another important consequence of the laws [G1'] and [G2'] isthat we can define summations over certain families of elements ofan additive group. More precisely, we say that a family (ai)iGJ ofelements of an additive group A is of finite support if all but a finite

10 1 LINEAR SPACE

number of terms ai of the family are equal to the neutral element 0of A. The sum Eat of the family (ai)iEI is defined as the sum of the

iElneutral element 0 and all the terms ai of the family that aredistinct from 0. It follows from [G1'] and [G2'] that the sum

Ea1is well-defined for any family (ai)1Ej of finite support of elements

j E=-1

an additive group. In particular, when I 0, i.e., for the emptyfamily, we get Eai = 0. Moreover, if 1 1, ... , n 1, then

iEOn

Lai =a1 +a2 + ... + a, = Ea;.iEI i=1

B. Linear spacesWith the experience gained in dealing with additive groups, we

now have no difficulty in laying down a definition of linear space.

DEFINITION 1.4. Let X be an additive group and R the set of all realnumbers. An external composition law a: R x X -i X in X is said todefine an algebraic structure of real linear space on X if and only ifthe following axioms are satisfied:

[M 1 ] for any elements A,µ of R and any element x of X,Aa(µax) = (X z)ax;

[M2] for any elements X, p of R and any elements x, y of X,(X + µ)ax = Xax + iax and Xa(x + y) = ]sax + pay;

[ M3 ] for all elements x of X, 1 ax = x.

In this case, the ordered pair (X,a) is called a real linear space, alinear space over R, a real vector space or a vector space over R.

Here again, the additive group X and hence also the set X are onlyparts of a real linear space (X,a). However, when the addition andthe external composition law a are clear from the context, we denoteby X the real linear space (X,a). In this way, the letter X representsthe set X, the additive group X, or the real linear space X, as the casemay be. The external composition law a is called the scalar multipli-cation of the real linear space X and the addition of the additivegroup X is called the addition of the linear space X. The axioms[Ml ] and [M2] are called the associative law of scalar multiplicationand the distributive laws of scalar multiplication over addition res-pectively. The composite Xox is called the product of X and x or amultiple of x and is denoted by Ax. Elements of R are usually


referred to as scalars and elements of X as vectors; in particular theneutral element 0 of X is called the nullvector or the zero vector ofthe linear space X.

Algebraic structure of complex linear space and complex linearspace are similarly defined. In fact, we need only replace in 1.4 theset R of all real numbers by the set C of all complex numbers. In thegeneral theory of linear spaces, we only make use of the ordinaryproperties of the arithmetic of the real numbers or those of thecomplex numbers; therefore our results hold good in both the realand the complex cases. For simplicity, we shall use the terms "X is alinear space over A" and "X is a A-linear space" to mean that X is areal linear space or X is a complex linear space according to whetherA = R or A = C.

Now that we have laid down the definition of linear space, themain interest will lie in what we can do with vectors and linearspaces. We emphasize again, that the physical character of vectors isof no importance to the theory of linear algebra, and that the resultsof the theory are all consequences of the definition.

Here are some immediate consequences of the axioms of linearspaces.

THEOREM 1.5. Let X be a linear space over A. Then, for any AeA andxeX,Ax=Oifandonly ifX=Oorx=0.PROOF. If A= 0 or x = 0, then Ax = (A + A)x = Ax + Ax, or Ax = A(x + x)= Xx + Ax respectively. Therefore in both cases, we get ax = 0.Conversely, let us assume that Ax = 0 and X 0 0. Then we getx = I x

From the distributive laws, we get (-X)x = A(-x) _ -(Ax) for eachvector x and each scalar A. In particular, we have (-1)x = -x. We canuse arguments, similar to those we had in § I A, to prove thegeneralized distributive laws:

[M2'] (A1 + ... + Al a + + Xa;A(al + ... +Aa,,.

C. ExamplesBefore we study linear space in detail, let us consider some

examples of linear spaces.

12 1 LINEAR SPACE

EXAMPLE 1.6. A most trivial example of a linear space over A is thezero linear space 0 consisting of a single vector which is necessarilythe zero vector and is hence denoted by 0. Addition and scalarmultiplication are given as follows. 0 + 0 = 0 and AO = 0 for all scalarsA of A.

I?XAMPLE 1.7. The set V2 of all vectors in the euclidean plane Ewith common initial point at the origin 0 constitutes a real linearspace with respect to the addition and the scalar multiplicationdefined at the beginning of this chapter. The linear space V3 of allvectors in the euclidean space or (ordinary) 3-dimensional space isdefined in an analogous way. Similarly, the set of forces acting at theorigin 0 of E also constitutes a real linear space with respect to theaddition and the scalar multiplication defined earlier.

EXAMPLE 1.8. For the set Rn of all ordered n-tuples of realnumbers, we define the sum of two arbitrary elements

x=(1, , Sn)andY=(111, . . . , 11n)

of Rnby x+Y=( 1+111, ..., to+11n)

and the scalar product of a real number A and x by

Ax = (A1, ... , Atn)

It is easily verified that the axioms of linear space are satisfied. Inparticular 0 = (0, . . . , 0) is the zero vector of Rn and-x = (-h1, ... , -ln) is the additive inverse of x. With respect to theaddition and the scalar multiplication above Rn is called then-dimensional arithmetical real linear space.

The n-dimensional arithmetical complex linear space Cn is similar-ly defined.

EXAMPLE 1.9. The set of all ordered n-tuples of complex numbersalso constitutes a real linear space with respect to the addition andthe scalar multiplication defined by

(1, ..., to )+(771, ... , 'nN SrrI + 111, ... , Sn +71n)

A(G1, ...,tn)=(AS1, ...,Atn)where A is a real number and ti, 71; are complex numbers fori = 1, . . . , n. This real linear space shall be denoted by RC2 n .Note that the set RC2n and the set Cn are equal, but the real linear


space RC2 n and the complex linear space C" are distinct linear spaces.At this juncture, the reader may ask why the superscript 2n is usedhere instead of n. Until we have a precise definition of dimension(see §2C) we have to ask the reader for indulgence to accept that thelinear space RC2n is a 2n-dimensional real linear space while thelinear space Cn, with the same underlying set, the same additionbut a different scalar multiplication, is an n-dimensional complexlinear space.

EXAMPLE 1.10. Let R[T] be the set of all polynomials with realcoefficients in the indeterminate T. R[T] is then a real linear spacewith respect to the usual addition of polynomials and the usualmultiplication of a polynomial by a real number.

EXAMPLE 1.11. The set F of all real valued functions f defined onthe closed interval [a,b ] = [ t( =-R: a < t < b) of the real axis is a reallinear space with respect to the following addition and scalar multip-lication:

(f +g)(t) = f(t) +g(t) and (Af)(t) = A(f(t)) for all tE [a, b] I.

EXAMPLE 1.12. Consider the set of all differentiable functions f de-fined on the closed interval [a, b I, which satisfy a SCHRODINGER'Sdifferential equation

dt2f-Af

for a fixed real number A. This set constitutes a real linear space withrespect to addition and scalar multiplication of functions as definedin 1.11.EXAMPLES 1.13. (a) Let S = {s1, ....,sn } be a finite non-emptyset. Consider the set FS of all functions f : S - R. With respect toaddition and scalar multiplication of functions as defined in 1.11, FSconstitutes a real linear space called the free real linear spacegenerated by S or the free linear space generated by Sover R. If forevery element s, (=- S we denote by f, : S -+ R the function defined by

f (s,) =I ifi=j0 if i f/

then every vector f of FS can be written uniquely asf= f(s1)f1 + + f(sn)fn. It is convenient to identify each s; ES withthe corresponding f;ES and consequently for every fEFs we getf=f(s1)s1 + +f(Sn)sn

d2

14 1 LINEAR SPACE

(b) If S is an infinite set, some slight modification of the methodgiven above is necessary for the construction of a free linear space.At the end of § 1A we saw that the natural generalization of afinite sum of vectors is a sum of a family of vectors of finitesupport. Therefore in order to arrive at a representation similar tof = f(sl)f1 + + f(sn)f, above, we only consider functionsf: S-> R for which the subset {tES: f(t) * 0} of S is finite. A functionf: S -> R with this property is called a function of finite support.Let Fs be the set of all functions f: S -* R of finite support. Thenwith respect to addition and scalar multiplication as defined in 1.11,Fs constitutes a real linear space called the free linear space generatedby S. For every tES by ft : S -* R again we denote the functiondefined by

f ()r

1 if t = x0 ift*x

Then ft E Fs. If f E Fs, then the family (f(t))tES of scalars is offinite support since f is of finite support. Therefore the family(f(t )ft)r Es of vectors of Fs is also of finite support. Hence E f(t)ft is

tESa vector of Fs and f = E f(t)ft. Again, for convenience, we identify

(ESeach t E S with the corresponding ft E Fs and consequently for everyvector f r= Fs we get f = E f(t)t.

tESFree complex linear space generated by S is similarly defined.

EXAMPLE 1.14. The restriction to functions of finite support im-posed to Fs of 1.13 (b) is necessary for representation of vectors ofFs by a sum: f = E f(t)t. To provide another type of linear space wedrop this restriction. Let S be a (finite or infinite) non-empty set. Weconsider the set Its = Map (S, R) of all mappings S -* R and definesum and product by

(f + g) (t) = f(t) + g(t)

(Xf) (t) = X(f(t))

Then RS is a real linear space with respect to addition and scalarmultiplication above. Every vector f E RS is uniquely represented bythe family (f(t))1ES of scalars. Moreover the representation is com-patible with the algebraic structure of RS in the following sense: if fand g are represented by (f(t))res and (g(t))tE=-S respectively, thenf + g and Xf are represented by the families (f(t) + g(t))fES and


(?f(t))tes respectively. If S is finite, then Rs = Fs. If S is infinite,then RS and Fs are essentially different linear spaces: If S = [a, b I,then RS is identical with the linear space F of 1.1 1.

These few examples show that the theory of linear space can beapplied to various kinds of dissimilar objects. On the other hand,these examples can always be used to illustrate definitions andtheorems throughout the course. The reader will find it most helpfulto resort to these and other examples when he has difficulty inunderstanding an abstract definition or a complicated theorem.

D. Exercises

1. Let A be an additive group. For any elements a and b of A, wedenote by a - b the unique element x of A such that x + b = a.Show that

(i) -(a + b) = (- a) - b.(ii) a - (b - c) =(a - b) + c.

2. Let A be an arbitrary non-empty set. Prove that there exists aninternal composition law r: A x A - A so that (A, r) is anadditive group. Prove also that more than one such internalcomposition law exists if A is not a singleton.

3. In the set T = 10, 11 we define an internal composition r by:

0T0=0,1 70=0r 1 = 1,

1 r 1 = 0.

Show that (T, r) is an additive group. Find another internalcomposition a in T so that (T, a) is an additive group.

4. If a, , ... , a are fixed complex numbers, show that the set ofall ordered n-tuples x = (x, , ... , of complex numbers suchthat a, x, + . . . + a x = 0 is a linear space under the usualaddition and scalar multiplication.

16 1 LINEAR SPACE

5. Find out whether the following addition and scalar multipli-cation in R x R define an algebraic structure of linear space onRxR.

( 1 , t2) +rr(771 772) = ( 1 + rtl , t2 +rr 772 + S1 771)

( 1 , 5 2 (X 1,X 2 + Sj)2

6. Let R+ be the set of all ordered n-tuples of positive realnumbers. Show that Rn constitutes a real linear space withrespect to addition and scalar multiplication defined by

(al, ..., an) + (N1, ..., On) = (cj31, ...,cnOn)t(a1, ...,Lyn) = («1, ...,axn).

7. Show that the set X = R x R becomes a complex linear spaceunder the composition laws:

and(x1 , x2) + (Y1, Y2) = (x1 + Y1 , x2 + Y2)

(a + i(3) (x, y) = (ax - (3y, ay + /3x) .

8. Let a = (1, 0, 2, 3), b = (2, -1, 4, 7) and c = (3, 1, 5, -3) bevectors R1. Evaluate

2a - 3b + 5c-a.- b + c

a + 3b + 8cand find the scalars X, µ, v so that

Xa + µb + pc = (1, -1, 1, -7).9. Let A and B be two linear spaces over the same A. Prove that

A x B is a linear space over A with respect to the addition andscalar multiplication defined by

(a, b) + (a', b') = (a + a', b + b')

and X (a, b) = (Xa, X b) .

The linear space A x B is called the cartesian product of thelinear space A and B.

§2 FINITE-DIMENSIONAL LINEAR SPACE 17

10. Let (Ai)i EI be a non-empty family of linear spaces all over thesame A and A = TTAi is the cartesian product of the sets A;(iEI).

iEIFor any elements x = (xi)iEI and y = (yi)iEt (where xi E Ai, yi E Aifor every i d ) of A and any scalar AE A, we define

x + Y = (xi + Yi)iEI

Ax = (Axi)iEJ

(a) Show that A is a linear space over A with the addition andthe scalar mutiplication defined above. A is called thecartesian product of the linear spaces Ai(iEI).

(b) Show that the subset B of A consisting of all elementsx = (xi)iEI with all but a finite number of xi equal to 0 is asubspace of A. B is called the direct sum of the linearspaces A1(i(=-I )

§2. Finite-Dimensional Linear Space

It follows from 1.5 that the underlying set of a linear space X isalways an infinite set unless X = 0, which is an entirely uninterestingcase. But, as we have emphasized before, the linear space X is notjust the underlying set X alone. To study the linear space X we mustmake full use of its algebraic structure. In the present §2 we shall gointo the question whether a number of "key vectors" of X can beselected in such a way that all vectors of X can be "reached" fromthese "key vectors" by an "algebraic mechanism".

A. Linear combinationsConsider the linear space V2 of all vectors in the euclidean plane E

with common initial point at the origin O. To start with, let us pickan arbitrary non-zero vector a of V2 and see what other vectors ofV2 can be obtained by repeated applications of addition and scalarmultiplication to it. Clearly vectors obtained this way are all of theform Xa where A is a real number; in other words; the endpoints ofthese vectors all lie on one and the same straight line on E passingthrough O. This shows that (i) a large number of vectors of V2 can be"reached" by this "algebraic mechanism" from a single vector, and

18 I LINEAR SPACE

(ii) there are vectors of V2 which cannot be "reached" this way.Now a simple geometric consideration shows that if a and b are twonon-zero and non-collinear vectors, i.e., their endpoints and initialpoints are not collinear,

a+bFig. 4

0Fig. 4

b

then repeated applications of addition and scalar multiplication onthem will yield vectors of the form Xa + pb. But every vector of V2can be put into the form Xa +µb; thus a and b can be regarded as apair of "key vectors" of V2.

Let us now clarify our position by giving precise definitions tothose terms between inverted commas. Let X be a linear space overA, x and y two vectors of X, and A and p two scalars of A. Then thescalar multiplication of X allows us to form in X the multiples Axand µy of x and y respectively; and the addition of X allows us toform in X the sum Ax + µy. We called the vector Ax + py of X alinear combination of the vectors x and y.

The concept of linear combination is a very important one in thetheory of linear space, for the concepts of sum and product, whichare fundamental in the algebraic structure of linear space, are specialcases of it. Consider a linear combination Ax + µy of vectors x and y.Substituting special values for A and p, we see that the vectors 0, xand y, the sum x + y and the products Ax and py are all linear com-binations of the vectors x and y. On iteration, we have, for thescalars Al , ... , An and the vectors x1, ... , xn , a linear com-

pbination Al x1 + ... + Anxn =EA;x,. More generally, let (A,);Ej be afamily of scalars and (x;);EI be a family of vectors. If (x;)tEJ is offinite support or (AI);Ej is of finite support (i.e., all but a finitenumber of terms are zero), then the family (Afx;)jEtof vectors is, by1.5, of finite support. ThereforeEN.x; is a vector of the linear spacecalled a linear combination of the vectors of the family (xi)1EJ.


Clearly, if x is a linear combination of vectors of a family (x;),Efsuchthat for each iEI, x; is a linear combination of vectors of a family

then x is also a linear combination of the vectors of the family(Y/)jEJ

B. Base

In the last section, we have given a precise definition to the"algebraic mechanism" mentioned in the introductory remarks. Inthis section, we try to do the same to the "key vectors" through theconcept of base.

It follows from the definition of linear combination that eachvector of a linear space X is a linear combination of vectors of X.This result, trivial though it is, leads to some important definitionsand questions in our theory.

DEFINITION 2.1. Let X be a linear space and (x;),Ela family ofvectors of X. The family (x;),Elis said to generate the linear space Xif and only if each vector x of X is a linear combination of thevectors of the family. In this case (x,),E1is called a family of genera-tors of the linear space X.

Since Ex; = 0, we see that the empty family generates the zeroieo

e 0. In general, the family of all non-zero vectors of alinear spaclinear space X generates the linear space X. Clearly, we can removefrom this family a great deal of vectors so that the reduced familystill generates X. For example, we can take away, for a fixed non-zerovector x, all scalar products Ax with X * 1, and then for a fixednon-zero vector y : x of this reduced family all Ay with A 0 1, andso forth. Our aim is then to remove all the redundant vectors and geta minimal family of generators of X. The advantage of dealing withsuch a minimal family is obvious.

DEFINITION 2.2. A base of linear space X is a family (x;);Ej ofgenerators of A such that no proper subfamily of (x;)((-=S generates A.

Now we can put forward a most important question. Does therealways exist a base for each linear space? We shall show in §3A thata base always exists in a linear space, thus answering the abovequestion in the affirmative. For the present, let us consider somespecial cases.

In the real linear space V2 of 1.7, any two non-zero vectors (0, A)and (0, B), where 0, A and B are not collinear, form a base of V2.

20 1 LINEAR SPACE

In the real (complex) linear space R" (Cn) the family (e;)t=of vectors R" (Cn) where

e1 =(1,0,...,0),e2 =(0,1,0,...,0), ...,en =(0,...,0,1)form a base of R" (C"), called the canonical base of R"(C").

For the real linear space RCZ" we find that the family

a, _ (1,0, . . .,0), a2 = (0,1,0, ... ,0), ... , an = (0, ... Al)b1 =(1,0,...,0),b2 =(0,1,0,...,0),...,b"=(0,...,0,i)

of vectors of RCZ" form a base for RC,".The family (pk)k=0, 11 ... of polynomials where pk = Tk form an

infinite but countable base of the linear space R [ T 1.The free linear space generated by a set S over A admits the family

(f )tes as a base.

C. Linear independence

In this and the following sections, we shall study properties ofbases of a linear space so as to prepare ourselves for the proof ofthe theorem on the existence of base.

Let (xI, ... , xn) be any finite subfamily of a base B of a linearspace X. Then none of these vectors is a linear combination of theother n -1 vectors; for otherwise, we would obtain a proper subfamilyof B that generates X by removing one of these n vectors from thegiven base B. This is an important property of vectors of a base andan equivalent formulation of it is given in the following theorem.

THEOREM 2.3. Let (x1, ... , xn) be a family of n (n > 0) vectorsof a linear space X over A. Then the following statements areequivalent :

(i) none of the vectors x1, ... , x" is a linear combination ofthe others.

(ii) if, for any scalars A1i ... , An of A, Al x1 + + Anxn = 0,then Al = A2 = ... = A" = 61

PROOF. (ii) follows from (i). Assume that A; 0 0 and A1x1 + +

An xn = 0. Then we would get

1A+ ... + xnxj = ( 1 X1 + ... + A


where the summand x; on the right-hand side under the symbol isdeleted. This would mean that the vector x, is a linear combinationof the others, contradicting (i).

(i) follows from (ii). Assume that x; is a linear combination of theother vectors. Then we would get x, = A, x, + + zl + +An Xn where x; under" is deleted. Therefore

where A. = - 1, contradicting (ii).

DEFINITION 2.4a. A finite family (x1 , . . . , x,) of vectors of alinear space X over A is said to be linearly independent if and only ifit satisfies the conditions of 2.3; otherwise it is said to be linearlydependent.

It follows that the empty family, every finite subfamily of a baseof X and every subfamily of a linearly independent family arelinearly independent. Furthermore, by 1.5, a family (x) consisting ofa single vector x is linearly independent if and only if the vector x isnon-zero. For a family (x,y) consisting of a pair of vectors to belinearly independent, it is necessary and sufficient that both x and yare non-zero and x is not a multiple of y. Moreover, no two terms ofa linearly independent family (x1 , ... , x) are equal, i.e., x; * xfor i * j. Therefore if the family (xI , . . . , is linearly inde-pendent, then we can say that the set {x, , ... , X,,) of vectors islinearly independent or that the vectors x; (i = 1, . . . , n) are linearlyindependent. In other words, a finite family of linearly independentvectors is essentially the same as a finite set of linearly independentvectors.

A necessary and sufficient condition for a family (y1 , ... , Ym )of vectors being linearly dependent is that there is a family(A, , ... , Am) of scalars such that AI y, + +) ym = 0 and A, 0 0for some i =1 , ... , m. Clearly, a family (y, , . , y,n) of vectors islinearly dependent if one of its terms is equal to 0, or if y, = yl forsome i 0 j. Finally, for a linearly dependent family (yI, . . . , ym )of vectors, the set (y1 , ... , ym) may or may not be linearlydependent. For example, if b * 0, then the family (bb) is linearlydependent and the set {b, b) = {b } is linearly independent. On theother hand, both the family (0,b) and the set {0,b} are linearlydependent. For this reason, care should be taken to distinguisha linearly dependent family (yi , ... , ym) from the set{y,,...,ym} .

A, x, + ... + Ai xi + ... + )"'X" = 0

22 1 LINEAR SPACE

THEOREM 2.5a. Let (xl , ... , xn) be a linearly independent familyof vectors of a linear space X. Then

Xi XI + ... + Anxn = pl xi + ... + 11nXn

if and only if A, = µ, for all i = 1, ... , n.

PROOF. If A, x, + .. + AnXn = Al Xl + . + tlnx , then we get0 = (A,X, + . . + AnXn) - (lAlX1 + . . . + IlnXn)(A, - µi )Xi + . + (An - µn )xn . Since the family is linearlyindependent, we get (Ai - µi) = 0 for all i = 1, . . . , n. ThereforeAi = µ, for all i = 1, . . . , n. The converse is trivial.

From Theorem 2.5a. It follows that there is a one-to-one corres-pondence between the set of all linear combinations of vectorsxl , ... , xn and the set of all ordered n-tuples (Al , . . . An) ofscalars. By means of this correspondence, a process of assigning co-ordinates (similar to those used in analytic geometry) to linear com-binations is possible.

We now generalize 2.4a. A generalized definition of linear in-dependence is needed for the characterization of a base given in 2.7below and it is also needed for the proof of existence of bases in §3.

DEFINITION 2.4. A family (x,),EI of vectors of a linear space X issaid to be linearly independent if and only if all its finite subfamilyare linearly independent; otherwise it is said to be linearly de-pendent.

2.4 agrees with 2.4a when the family (x,),(=-/ in question is finite.Similarly a necessary and sufficient condition for a family (x,),EIof vectors to be linearly independent is that if EA,x, = 0 for a

iElfamily (A,),EI of finite support of scalars, then A, = 0 for all iE7. Againbecause no two vectors of a linearly independent family are equal,we need not distinguish a family of linearly independent vectors froma set of linearly independent vectors. However a family of linearlydependent vectors is essentially different from a set of linearly de-pendent vectors for reasons similar to those given above.

Theorem 2.5a can be generalized as:

THEOREM 2.5. Let (x,),(=-I be a linearly independent family of vectorsof a linear space X. Then EA,x, = Ypix, if and only if A, = p, for alliEJ.

iEl iel


The proof of 2.5, which is a straightforward rewording of that of2.5 a, is left to the reader.

Our next theorem gives two necessary and sufficient conditionsfor a family of vectors to be a base of a linear space.

THEOREM 2.6. Let B = (xi)lel be a family of vectors of a linear spaceX. Then the following statements on the family B are equivalent.

(i) B is a base of X.

(ii) B is a maximal family of linearly independent vectors of X,i.e., B is not a proper subfamily of any family of linearlyindependent vectors of X.

(iii) B is a linearly independent family of generators of X.

PROOF. (ii) follows from (i). Let B be a base. Then B is clearly alinearly independent family of vectors. It remains to prove that anyfamily B' of vectors which has B as a proper subfamily cannot belinearly independent. Indeed such a family must contain xi for each

and, besides these, at least one vector y of X distinct from eachxi (iEI). Since B is a base, y is a linear combination of the vectorsxi. Therefore at least one of the vectors of B' is a linear com-bination of other vectors of B' and hence B' is linearly dependent.(iii) follows from (ii). Let B be a maximal family of linearly inde-pendent vectors. For each vector x of X, we have scalars A and Aisuch that Ax +,e Aixi = 0 where A 0 0 or Ai : 0 for some i EI.

Assume that A = 0. Then we get EAixi = 0 where Ai * 0 for someiel,jEl

contradicting the linear independence of B. Therefore A * 0 andx = E

(-wi)xiis a linear combination of vectors of B. Hence B is a

iE1 A

linearly independent family of generators of X.(i) follows from (iii). Let B be a linearly independent family of

generators of X. By the linear independence of B, no vector of B is alinear combination of the other vectors of B. Therefore no propersubfamily of B can generate A. Hence B is a base of X.

D. DimensionIn the examples of §2A, we have seen that some linear spaces have

finite bases whereas others do not. We say that a linear space is

24 1 LINEAR SPACE

finite-dimensional if it possesses a finite base. We propose to studyhere the bases of a finite-dimensional linear space X.

Earlier in §2B, we have put forward the question of whether abase always exists for each linear space. With the definition just givenabove it seems that we have neatly "defined away" the problempartly. This is, to a certain extent, quite true; but even in the case ofa finite-dimensional linear space, where a finite base exists by defini-tion, there are some interesting and important problems to be solved.First, we shall show that every base of X is finite; this means that wecan conclude from the finiteness of one base of X the finiteness ofevery other base of X. Finally, we shall show that all bases of X havethe same number of vectors. This number will be called the dimen-sion of X. After we have successfully completed this work andbecome accustomed to dealing with finite bases, we shall tackle thegeneral case in § 3-

LEMMA 2.7. Let x1, .. . , xp be linearly independent vectors of alinear space X. If a vector xp+ 1 of X is not a linear combination ofthe vectors x, , ... , xp, then the vectors x,, : .. , xp, xp + 1 arelinearly independent.

PROOF. Let Al , ... , Ap, Ap+1 be scalars such that AI xl + - . +Apxp + Ap+lxp+l = 0. Then Ap+I = 0, for otherwise xp+ I is a linearcombination of x1, ... , xp, contradicting the assumption. There-fore we get A1x1 + ... + Apxp = 0. By the linear independence ofthese vectors, we get Al = = Xp = 0. Therefore Al = A2 =

- ' ' =Xp = Ap+1 = 0, proving the linear independence of the vectorsX1, ... , Xp, Xp+1 .

Lemma 2.7. gives a sufficient condition for enlarging a linearlyindependent family (xl, ... , xp)by the adjunction of a vector xp+ I toa linearly independent family (x1 , . . . , xp, xp+ I ). Our next step isto study the replacement of a vector in a linearly independent family(x1, ... , xp) by another vector y without changing the set ofvectors which the family generates.

In the statement of the next lemma, we again use the symbol A toindicate that the expression under A is to be deleted. From now onthe symbol A will be used exclusively for this purpose.

LEMMA 2.8. Let x, , ... , xp be linearly independent vectors of alinear space X and y = Al x1 + .. + Apxp a linear combination.Suppose ie{1, . . . , p) is such that Ai * 0; then


(a) the vectors y, x1, ... , z,, .. , xp, are linearly inde-pendent, and

(b) the vector x, is a linear combination of the vectorsY.x1, ..., x,, ..., xP

PROOF. Without loss of generality, we may assume that i = 1. Letµl, ... , pp be scalars such that µ1y + 112z2 + + µPXP = 0.After substitution, we get

µl'XlXI + (µ1R2 + µ2)R2 + ... + (µ1)'P + Mp)xp = 0.

By the linear independence of the vectors x1, ... , xp andthe inequality X1 * 0, we get µl = µ2 = ... = µp = 0. There-fore the vectors y, x2 , ... , xp are linearly independent.

Moreover, x1 = fly - i x2 - - 4P-x,, .Therefore (a) and (b)hold.

We are now in a position to prove the following supplementationor replacement theorem which has many important applications inthe theory of linear spaces. This theorem is essentially a generali-zation of 2.8 which allows us to replace a number of vectors of alinearly independent family (xl, ..... xp) by certain given linearlyindependent vectors Y1, ... , yp, without changing the set of vectorswhich the family generates.

ThEOREM 2.9. Let x1, . . . , xP be p linearly independent vectorsand let yl , ... , Yq be q linearly independent vectors of a linearspace X. If each y, (j = 1, . . . , q) is a linear combination of thevectors x 1 , . . . , xP , then q < p and there exist q vectors among thevectors x1 , . . . , xp, the numbering being chosen so that thesevectors are x1, . . . , Xq, such that

(a) the p vectors y1 , ..., yq xq+ 1 , ... , xp are linearlyindependent, and

(b) for each j = 1, ... , q, the vector xi is a linear com-bination the p vectors yI , ... , yq, Xq + I , ... , xP .

PROOF. The theorem is trivial if q = 0. For q 0 0, we proceed toreplace one vector at a time. Since the vectors yI , ... , Yq arelinearly independent, we have y1 = 0. Therefore in the unique re-presentation y1 = X1 x1 + + Xpxp, at least one of the p scalars

26 1 LINEAR SPACE

Al , ... , Ap is non-zero. Renumbering x, , ... , xp and the cor-responding scalars if necessary, we can ensure that A, * 0. Then itfollows from 2.8 that

(a,) Y1, X2, ... , XP are linearly independent, and

(b1) xI is a linear combination of y, , x2, ... , x, .

Similarly y2 * 0 and, by (b,) above, y2 is a linear combination ofY1, x2 , ... , xp . In the unique representation y2 = Al Y1 + 112X2+ + pp xp , at least one among the p-1 scalars P2, ... , pp isnon-zero, for otherwise y, and y2 would be linearly dependent,contradicting the hypothesis of the theorem. Again renumbering ifnecessary, we can ensure that p2 * 0. Then it follows from 2.8 that

(a2) y, , J'2, x3 , ... , xp are linearly independent, and(b2) x2 and hence also x, are linear combinations of the

vectors y1, y2, x3, ... ,xp.

The theorem is proved if this procedure can be carried out up to theq-th step, at which we obtain:

(aq) y1 , ... , yq, xq+1 , . . . , xp are linearly independent, and

(bq) X1, ... , xq are linear combinations of yl, . . . , yq,xq+,, ..., xp.

In other words, it is sufficient to prove that q < p. This means thatthe vectors x, are not used up before the vectors y; are. Let usassume to the contrary that q > p. Then we can carry out ourprocedure of replacement up to the p-th step and get

(ap) y, , ... , yp are linearly independent, and

(bp) x, , ... , xp are linear combinations of y l , ... , yp .

But the vector yp+, is a linear combination of x, , ... , xp andtherefore, by (bp ), it is a linear combination of y 1 , ... , yp, con-tradicting the linear independence of y, , ... , yq. The proof isnow complete.

COROLLARY 2.10. Let x 1i ... , xp be p linearly independentvectors and F = (yt);E, a family of linearly independent vectors of alinear space X. If, for each iel, y; is a linear combination of xl, . . . , xpthen F is finite and the number of vectors of F is not greater than p.

We can now apply the supplementation theorem and its corollary


to prove some basic results on bases of a finite-dimensional linearspace.

It follows from 2.10 that every infinite family of vectors of afinite-dimensional linear space is linearly dependent. Therefore everybase of a finite-dimensional linear space is finite. If x 1, ... , xp andyl, ... , yq form two bases of a linear space X, then the vectorsx1, ... , xp are linear combinations of the vectors y,, ... , yqand conversely the vectors yl , ... , Yq are linear combinations of thevectors x 1, . . . , xp. Therefore by 2.9 or 2.10, p = q. Hence thefollowing theorem on invariance of dimensionality of finite-dimen-sional linear space is proved.

THEOREM 2.11. Let X be a finite-dimensional linear space which hasa base consisting of n vectors. Then every base of X consists of nvectors and conversely, every family of n linearly independent vectorsof X is a base of X.

Therefore the following definition is justified.

DEFINITION 2.12. The dimension of a finite-dimensional linearspace X over A is the number n (n > 0) of vectors of any base of Xand is denoted by dimAX, or simply dim X.

For a linear space X whose dimension is n, we also say that X is ann-dimensional linear space. The linear spaces of the examples 1.6 to1.9 have dimensions as follows:

dim 0 = 0; dim V2 =2; dim V'=3; dim R" =n;dim C" = n; and dim RC2" = 2n.

The linear spaces of 1.10 to 1.12 are not finite-dimensional. Theseare called infinite-dimensional linear spaces.

E. CoordinatesIn the discussion following 2.5a, we have indicated the possibility

of assigning coordinates to vectors of a linear space. This section willconcern itself with this problem.

Let X be a linear space over A (here A = R or A = C as the casemay be) of dimension n and (x1, . . . , x") a base of X. Then, bytheorem 2.5a, every vector x of X is a unique linear combinationx = X 1 x 1 + + X" x" . The scalars A1, ... , X, , which are uniquely

28 I LINEAR SPACE

determined by the vector x, are called the coordinates of the vector xrelative to the base (x1, ... , xn).

Addition and scalar multiplication of vectors can be expressed interms of coordinates. If x and y have coordinates A, , ... , X, andµ, , ... , µn , respectively, relative to the base (x, , ... , xn ), then,from

X = X1x, + ... + Anxn

and y = ta1X1 + + µnXn

we get x + y = (AI + µI)xI + .. + (X, +lan)Xn

and rlx = (riX1)x1 + - + (rjX n)Xn for any scalar??.

Thus the coordinates of the sum of two vectors are the sums of thecorresponding coordinates of the summands, and the coordinates ofthe product of a vector by a scalar are the products of the coordinatesof the vector by the scalar in question.

Using these results, we get a bijective mapping 4): X - An (hereAn - Rn or An = Cn as the case may be) by putting for every xE X

4)(x) = (a, , ... , an) where x = A 1 x I + ... + An xn .

This mapping 4) has furthermore the property that

4)(x + y) _ D(x) + 4)(y)

and ((Ax) A4)(x)

for all x, y(=-X and XEA. Mappings with this property will be thesubject of our investigation in the next chapter.

In the linear space V', a base consists of three vectors x = (0 A),y = (0, B) and z = (0, C), where 0, A, B and C are distinct non-coplanar points. We call the straight line passing through 0 and A thex-axis; the y-axis and the z-axis are similarly defined. The coordinates(A, p, 7) of a vector p = (0, P) relative to the base (x, y, z) are ob-tained as follows. Construct a plane F through P and parallel to theplane spanned by the y-axis and the z-axis. Then F intersects thex-axis at a point A and A is the ratio of the directed segment OA' tothe directed segment OA, i.e., (OA') = X(O, A). The coordinatesp and y are similarly obtained. In this way our definition of thecoordinates of the vector p = (0, P) coincides with the definition ofthe coordinates of the point P in a parallel (but not necessarilyrectangular cartesian) coordinate system.


F. Exercises

1. Consider four vectors x = (1, 0, 0), y = (0, 1, 0), z = (0, 0, 1)and u = (1, 1, 1) of R3 . Show that the four vectors are linearlydependent, whereas any three of them are linearly inde-pendent.

2. (a) Find a necessary and sufficient condition for which thevectors (1, t) and (1, 17) of R2 are linearly dependent.

(b) Find a necessary and sufficient condition for which thevectors (1, i E2 ), (1, r1, r12) and (1, , 2) of R3 are linearlydependent.

3. Is it true that if x, y and z are linearly independent vectors thenso also are the vectors x + y, y + z and z + x?

4. Test whether the following family of vectors of R4 are linearlyindependent.

(i) a1 = (1,0,2,3), a2 = (1,6,-16,-21), a3 = (4,3,-1,0).(ii) a, _

a4 =

(iii) a, _a4 =

(1, 2, -1, 0), a2 = (0, 1, 3, 1), a3 = (0, 0, 4, 3),(0, 0, 0, 6).

(1, 1, 0, 0), a2 = (1, 0, -1, 0), a3 = (0, 1, 0, 1),(1, 0, 0, 1), as = (3, 2, -1, 2).

5. Find maximal subfamilies of linearly independent vectors of thefollowing families:

(i) a, = (1, 0, 1), a2 = (3, 0, 0), a3 = (2, 0, 1) of R3.(ii) b, = 1, b2=%/2, b3 \/2 -\/3, b4=./6,

b5=2+V3 of R'.(iii) C, = 1, C2 = X, C3 = X2, C4 = X2 - X,

c5 = X + 1 of R [X].

6. Consider the canonical base (e I, e2, ... , en) of R'. Show thatthe vectors

fi = e,f2 = e1 + e2

A = e1+ ...+en

30 1 LINEAR SPACE

form a base of Rn and so also the vectorsgl = fl

g2 = f1 + f2

gn = f1 +f2+... + fn .

Express en as linear combination of f1i ... , fn and as linearcombination of g1i . . . , gn .

7. Show that the vector x = (6, -7, 3) is a linear combination ofthe vectors

a = (2, -3, 1), b = (3, -5, 2), c = (1, 1, 0).Do the vectors a, b, c form a base of R'?

8. Let a=(1,1,1), b=(1,1,2) c=(1,2,3)d= (2, 1,-3), e = (3, 2,-5), f = (1,-1,-1).

Show that B = (a, b, c) and C = (d, e, f) are bases of R3. Find thecoordinates of the vectors

x = (6,9 14) y =(6,2,-7) z = (0,0,0)relative to B and to C.

9. Let F be the real space of all real-valued functions defined onthe closed interval [a, b] (a <b). Are the families (1, t, t2, t3, ... )and (1, sine t, cos? t) linearly independent?

10. Let a and b be linearly independent vectors of a real linear spaceX and let u = c11a + c12b,

v = c21a + c22b.Show that u, v are linearly independent if and only if c1 1 c2 2 -c12c21 *0-

11. Let W be the set of all (x1, . , x5) in R5 which satisfy

2x1 - x2 + 4x3 - x4 = 0

x1 + 3x3 - x5 = 0

9x1 - 3x2 + 6x3 - 3x4 - U. = 0 .

Find a finite set of vectors which generates W.


12. Let x = (a, , a2) and y = (01 i R2) be vectors of R2 such thata, X31 + a2 92 = 0, Q1 + R2 = ai + = 1. Prove that B = (x, y) isa base for R2.

13. Find two bases of C4 that have no vector in common so thatone of them contains the vectors (1, 0, 0, 0) and (1, 1, 0, 0) andthe other contains the vectors (1, 1, 1, 0) and (1, 1, 1, 1).

14. Determine the dimension of the linear space R+ of Exercise 6 ofU.

15. Determine the dimension of the complex linear of Exercise 7 of§1.

16. Let X be the linear space of all real valued functions of a realvariable t. Prove that the functions et and e2t are linearlyindependent.

17. Vectors x1 , ... , xp of Rn are said to be in echelon if thenumber of consecutive zero components in x;, starting from thefirst component, increases strictly with i. Prove that non-zerovectors x, , ... , xP are linearly independent when they arein echelon.

18. Let a,, ... , a, be distinct real numbers. Show that thevectors.

x1 = (1, a1, ai, ..., a1)

nx2 = (1, a22, ... , a2).......................

2 nxm = (1, am, am, ... , am

are linearly independent vectors Rn+1 if n > m-i.

19. Let R[T]n be the linear space of all polynomials in the in-determinate T with real coefficient and of degree 5 n-1.

(a) Prove that g1 = 1 , 92 = T, ... , gn = Tn -' forma base ofR[T] .

(b) Prove that if a, , ... , an are n distinct real numbers, thenthe polynomials

f = (T-a1) ... (T-at-1)(T-a,+1) ... (T - an)

(i= 1, 2, ... , n) form abase of R [ T] n .

32 1 LINEAR SPACE

(c) Express each ff as a linear combination of gl , ... , g, .(d) Express each gl as a linear combination of fi , , f,

20. Let X be a real vector space. Consider the set Xc = {(x, y):x, y e X). Show that XX is a complex linear space with respectto addition and scalar multiplication below:

(x, Y) + (x', y') = (x + x', Y + Y')(A + iµ) (x, Y) = (Ax - µY. Ay + µx).

Show also the dimRX = dimcXc. XX is called the complexi-fication of X.

§ 3. Infinite-Dimensional Linear Spaces

The study of infinite-dimensional linear spaces occupies nowadaysan important place in mathematics. Its development has been largelyinfluenced by demands of functional analysis and theoretical physics,where applications make use of the notion of topological linearspace. Here we shall not touch upon notions of topology and restrictourselves to purely algebraic results.

In formulating and proving the theorems below, extra set-theore-tical knowledge will be needed. Readers who are not accustomed tothese set-theoretical ideas and study linear algebra for the first timeare advised to omit the present §3 entirely. Only the finite-dimen-sional counterparts of the theorems 3.1, 3.2 and 3.3 will be used inthe main development of the theory in the later chapters.

A. Existence of baseWe shall prove in this section the basic result that every linear

space has a base. For the proof of this, we shall make use of ZORN'Slemma. We remark that ZORN'S lemma, which is essentially an axiomin set theory, is a very powerful and convenient tool for handlinginfinite sets.

Let A be a set and C a non-empty set of subsets of A. A subset 1Wof Q' is called a chain in C if C C D or D C C for any C, Anupper bound of 'e' in 0 is any Ue 6 such that C C U for all Ce W.Any Me 6th at is not a proper subset of any element of ' is called amaximal element of 6. In this setting, ZoRu's lemma is formulatedas follows.

§3 INFINITE-DIMENSIONAL LINEAR SPACE 33

If every chain ' of e has an upper bound in (', then (' has amaximal element.

We have seen in the discussion following definition 2.4 that afamily of linearly independent vectors is essentially the same as a setof linearly independent vectors. Therefore a base of a linear space Xis a maximal set of linearly independent vectors of X.

THEOREM 3.1. Every linear space has a base.

PROOF. Let X be a linear space. If X is finite-dimensional, then bydefinition, X has a finite base and there is nothing more to be said.Assume now X is infinite-dimensional and let C be the set of allsubsets of linearly independent vectors of X. By our assumption onX, we get X 0. Therefore 0 is non-empty; for instance, {x} Efor every non-zero vector x of X. We shall show that every chainin 0' has an upper bound in Cl. Let ' be a chain in (. Then weconsider the union U of all CE K. Since ' is a chain, for any n(n %0)vectors x, , ... , xn of U there is a member C of ' such that x;ECfor i = 1, ... , n. Since C belongs toe, the vectors x, , . . . , xn arelinearly independent. This means that U is a set of linearly inde-pendent vectors of X and therefore UE O1 and U is an upper bound ofthe chain W. By ZORN's lemma, C has a maximal element M. M isthen a maximal set of linearly independent vectors of X, and hence abase of X.

We can now generalize 2.10 to the supplementation theorembelow.

THEOREM 3.2. Let B be a base of a linear space X and S a set oflinearly independent vectors of X. Then there exists a subset B' of Bsuch that SnB' = 0 and SUB' is a base of X.

PROOF. Let C be the set of subsets of SUB, such that C belongs toC if and only if S CC and C is linearly independent. Following anargument similar to that used in the proof of 3.1, we have no dif-ficulty in proving that Q' has a maximal element M. M is clearly alinear independent set of generators of X and hence a base of X.Therefore B' = M \ S satisfies the requirements of the theorem.

B. DimensionThe general theorem on the invariance of dimensionality of linear

space does not follow immediately from the supplementationtheorem as in the finite-dimensional case. For the formulation and

34 I LINEAR SPACE

the proof of this theorem, certain results of set theorey are necessary.To each set S there is associated a unique set Card(S) called the

cardinal number of S in such a way that for any two sets S andT, Card(S) = Card(T) if and only if S and Tare equipotent, i.e., there isa bijective mapping between them. When S is equipotent to a subset ofT, or equivalently when there is a surjective mapping of T onto S, thenwe write Card(S) < Card(T). The well-known SCHRODER-BERNSTEINTheorem states that if Card(S) < Card(T) and Card(T) < Card(S), thenCard(S) = Card(T). In what follows, we shall also make use of thefollowing theorem: If A is an infinite set and T a set of finite subsetsof A such that every element x of A belongs to some element S of ?P,then Card A = Card (jf ).

The general . theorem on the invariance of dimensionality is givenas follows.

THEOREM 3.3. Let X be a linear space. Then for any two bases B andC of X, Card(B)= Card(C).

PROOF. For a finite-dimensional linear space X, this theorem clearlyholds. Therefore we may assume X to be infinite-dimensional. In thiscase, B and C are both infinite sets. For the sets Y-(B) and 5C) ofall finite subsets of B and C respectively, we get Card(l'(B )) = Card(B)and Card(,(C)) = Card(C). For every Sef'(B), we constructTs = { cCC: c is a linear combination of vectors of S). By 2.10, we seethat Ts is finite, hence T. E f(C). Therefore a mapping b: f B) -5C)is defined by the relation that 4)(S) = TS for all SEfB).Denoting the direct image of 4) by $', we obtain Card(() <Card(_f(B)). Since B is a base of X, every vector c of C is a linearcombination of a finite number of vectors of B. This means thatevery vector c of C belongs to some TS of T-, therefore Card(C)=Card(f) < Card(fB))= Card(B). By symmetry, we obtain alsoCard(B) < Card(C). Therefore Card(B) = Card(C) by theScImoDER-BERNsmn Theorem.

DEFINITION 3.4. The dimension of a linear space X over A, denoted bydim AX or simply by dim X, is equal to Card(B) where B is any baseof XC. Exercises1. Show that in the linear space R[T] of polynomials

p = (p0,..., pk ..) and Q = (qo, ... , qk ...) where pk = Tkand qk = (T - X)k, X * 0 being a constant, are two bases. Expressthe vectors qk explicitly in terms of pk.

§4 SUBSPACE 35

2. Prove that the linear space F of all real valued functions definedon the closed interval [a,b], where a 0 b, is an infinite-dimen-sional linear space.

3. Let a be any cardinal number. Prove that there is a linear space Aover A such that dim A = a.

§ 4. Subspace

A. General propertiesMany linear spaces are contained in larger linear spaces. For

instance the real linear space R' is a part of the real linear spaceRC2". Again, the linear space V2 of vectors in a plane is a part of thelinear space V3 of vectors of a euclidean space. In the linear spaceR[T] of all polynomials with real coefficients in an indeterminate T,the set of all polynomials of degree less than a fixed positive integern constitutes an n-dimensional linear space.

These examples suggest the concept of a subspace. A subset Y of alinear space X over A is called a subspace of X if Y is itself a linearspace over the same A with respect to the addition and the scalarmultiplication of X. More precisely, Y is a subspace of X if x + y andXx belong to Y for every x, yEY and every scalar XEA, and theaxioms [Gl ] to [G4] and [M1 ] to [M3] are satisfied.

It follows immediately from this definition that if Y is a subspaceof a linear space X, then the subset Y of the set X is non-empty. Forany linear space X, the subspace 0 = {0} consisting of the zero vectoralone is the smallest (in the sense of inclusion) subspace of X and Xitself is the largest (in the sense of inclusion) subspace of X. We shallsee presently that if dim X > 1, then X has subspaces other than 0and X.

THEOREM 4.1. Let X be a linear space over A. Then Y is a subspaceof X if and only if Y is a non-empty subset of X and Xx + py belongsto Y for any vectors x, y of Y and any scalars X, p of A.

PROOF. If Y is subspace, then clearly Y is non-empty and contains alllinear combinations of any two vectors of Y. Conversely, since x +yand Xx are linear combinations of the vectors x and y of Y, theybelong to Y. The axioms [G1], [G2], [M1], [M2] and [M3] areobviously satisfied by Y since they are satisfied by X. Moreover, 0

36 1 LINEAR SPACE

and -x for every x of Y are linear combinations of vectors of Y.Therefore [G3] and [G4] are also satisfied, and hence Y is a sub-space of X.

EXAMPLE 4.2. Let X be a linear space and S any subset (or anyfamily of vectors) of X. Then the set Y of all linear combinations ofvectors of S is obviously a subspace of X. Y is called the subspace ofX generated or spanned by S. In particular 0 is the subspacegenerated by 0.

The following theorems give information on the dimension of asubspace.

THEOREM 4.3. If Y is a subspace of a linear space X, thendim Y < dim X.

PROOF. By 3.1, X and Y have bases, say B and C respectively. By 3.3and 3.4, we have dim X = Card B and dim Y = Card C. Since everyvector of C is a linear combination of vectors of B. we haveCard C < Card B from the proof of 3.3. Therefore dim Y < dim X.

For finite dimensional linear spaces we have a more precise result.

THEOREM 4.4. Let Y be a subspace of a finite-dimensional linear spaceX. Then dim Y < dim X. Furthermore, dim Y = dim X if and only ifY = X.

PROOF. The first part of the theorem is a special case of 4.3.However, we shall give a new proof for this special case withoutresorting to the results of §3. For this, it is sufficient to show that Yhas a base. If Y = 0, then the empty set 0 is a base of Y. For Y = 0,we can find a non-zero vector x, E Y. The set S, = {x 1 } is thenlinearly independent. If S, generates Y, then S, is a base of Y;otherwise, we can find a vector x2 of Y that is not a linear com-bination of vectors of S, S. By 2.7, the set S2 = {x1, x2 } is linearlyindependent. If S2 generates Y, then S2 is a base of Y; otherwise wecan proceed to find x3, and so forth. Now this procedure has toterminate after no more than n = dim X steps, since every set of n+lvectors of X are linearly dependent. Therefore Y has a base of nomore than n vectors. The first part of the theorem is established. Thesecond part of the theorem follows immediately from 2.11.

B. Operations on subspacesThe set -'(X) of all subspaces of a linear space X over A is

§4 SUBSPACE 37

(partially) ordered by inclusion. Thus for any three subspaces Y', Y"and Y"" of A, we get

if Y' D Y" and Y" D Y"', then Y' D Y"'

Y' = Y" if and only if Y' D Y" and Y" D Y'.

We shall introduce two operations in 2 (X). For any twosubspace Y' and Y" of X, the intersection

Y' n Y" = {xEX:xEY'andxEY")is, by 4.1, also a subspace of X. In the sense of inclusion, Y' n Y"is the largest subspace of X that is included in both Y' and Y".By this, we mean that (i) Y' 3 Y' n Y" and Y" 3 Y' n Y", and(ii) if Y' D Z and Y" 3 Z for some subspace Z of X, then Y' n Y" 3 Z.

Dual to the intersection of subspaces, we have the sum of sub-spaces. In general, the set theoretical union Y'U Y"of two subspaceY' and Y" of X fails to be a subspace of X. Consider now thesubspace of X generated by the subset Y'UY" of X. This subspace,denoted by Y' + Y" and called the sum or the join of Y' and Y", isclearly the smallest subspace of X that includes both Y' and Y". Bythis, we mean that (iii) Y' + Y" 3 Y' and Y' + Y" 3 Y", and (iv) ifZ 3 Y' and Z 3 Y" for some subspace Z of X, then Z 3 Y' + Y". It isnot difficult to see that

Y'+Y"={zEX:z=x+yfor some xEY'andyEY"}.The following rules hold in2'(X). For any three elements Y', Y"

and Y"' of 2°(X),

(a) (Y' n Y") n Y""" = Y, n (Y., n Y'"") and(Y'+Y"")+Y"'=Y'+(Y"+Y'");

(b) Y'n Y""=Y"nY' and Y'+ Y" = Y"+Y';(c) Y'n0=0,Y'nX=Y' and Y'+O=Y', Y'+X=X;(d) if Y' 3 Y"', then Y, n (Y" + Y"') = (Y, n y") + Y"'

On the dimensions of the intersection and the sum of subspaceswe have the following useful theorem.

THEOREM 4.5. Let Y' and Y" be subspaces of a finite-dimensionallinear space X. Then dim Y' + dim Y" = dim(Y' + Y") + dim(Y' n Y").

PROOF. If x, , . . . , x, form a base of Y'n Y", then these vectors canbe supplemented by vectors y, , . . . , ys of Y' to form a base of Y',and by vectors zl , ... , zt of Y" to form a base of Y". The theoremfollows if we prove that the vectors xl , ... , x,, y,, ... , ys'

38 1 LINEAR SPACE

Z1i ... , zt form a base of the sum Y' + Y". Clearly these vectorsgenerate the subspace Y' + Y", and therefore it remains to be provedthat they are linearly independent. Assume that(1) X, 1x1 +. ..+A,X,+p1yj +...+µsYS +PIZ, +...+vvzt = 0,

then the vector

(2) x=AIx1 +...+ArXr+µ1Y1

(3) x = -v1 z1 - ... - vtztEY"

is a vector of Y' n Y". Since the vectors x I, ... , x, form a base ofY' n Y", we get from (2) µI = µ2 = = µs= 0. Hence (1) becomes

A1X1 + .. + ArXr + vIZ1 + ... + vtZt = 0.

But the vectors x1, . . . , Xr, z1, . . . , zt form a base of Y", thereforeAl = . . = Ar = Vi = = Pt = 0. Hence the r+s+t vectors inquestion are linearly independent.

C. Direct sum

We have seen that for any two subspaces Y' and Y" of a linearspace X we have

Y' + Y" = {zeX: z = x + y for some xeY' and some Y E Y" }.

The representation z = x + y of a vector of the sum by a sum ofvectors of the summands is not unique in general. Take, for instance,the case where Y' = Y" = X.

Let us assume that for a vector zEY' + Y", we have

z=x+y=x,+y,where x, x' are vectors of Y' and y, y' are distinct vectors of Y'Then the non-zero vector

t = X - X, C= Y'

t=y'-yeY"belongs to the intersection Y' n Y". Conversely, if t is a non-zerovector of Y' n Y", then for each z = x + y of Y' + Y", where xEY'

§4 SUBSPACE 39

and yEY", we get z =(x+t)+(y-t)=x+y. ThusY'fY"=0isanecessary and sufficient condition for the unique representation ofeach vector of Y' + Y" as a sum of a vector of Y' and a vector of Y".This state of affairs suggests the following definition.

DEFINITION 4.6. Let Y' and Y" be subspaces of a linear space X.Then the sum Y' + Y" is called a direct sum and is denoted by Y'® Y"if for every vector x of Y' + Y" there is one and only one vector y'of Y' and there is one and only one vector y" of Y" such thatx = y' + y".

It follows from the discussion above that Y' + Y" is a direct sum ifand only if Y' fl Y" = 0. Furthermore for a direct sum Y' a 1"', theunion B' U B" of any base B' of Y' and any base B" of Y" is a baseof Y'9 Y". Therefore dim Y' e Y" = dim Y'+ dim Y".

A straightforward extension of the idea of direct sum to that ofmore than two subspaces is as follows:

DEFINITION 4.7. Let Y1 (i = 1, ... , p) be subspaces of a linearspace X. Then the sum Y1 + . + Yp is called a direct sum and isdenoted by Y1 ® .. ® Yp if for every vector x of Y1 + . + Ypthere is one and only one vector y; of Y1 (i = 1, ... , p) such thatx=yl+...+yp.

Similarly for Y1 ® . ® Yp,, the union B1 U U Bp of basesB, of Y; is a base of Y1 ® .. ® Yp. We remark here that the directsum Yl ®... is Yp requires Y fl (Y1 + + Y/ + ... + Yp) = 0for/=1, ...,p,andnotjustY;flYf=0fori*j.InV2,letL1,L2and L3 be subspaces generated by the non-zero vectors (O,A1),(O,A2) and (O, A3) where the points 0, At, Ai are not collinear fori * j. Then the sum L1 + L2 + L3 is clearly not a direct sum butLi fl Li = 0 for i * j.

The importance of direct sum is much enhanced by the possibilitythat, for any subspace Y' of a linear space X, we can always find asubspace Y" of X, called a complementary subspace of Y' in X, suchthat X = Y' ® Y". Indeed, if B' is a base of Y', then, by the supple-mentation theorem 3.2 or 2.9 for the finite dimensional case, wehave a set B" of linearly independent vectors of X such that B' fl B"= 0 and B' U B" is a base of Y. The subspace Y" generated by B"clearly satisfies the required condition that Y = Y' ® Y". We havetherefore proved the following theorem.

40 1 LINEAR SPACE

THEOREM 4.6. For any subspace Y' of a linear space X, there is asubspace Y" of X, called complementary subspace of Y' in X, suchthat X=Y'0Y".

It follows that if Y" is a complementary subspace of Y' in X thenY' is a complementary subspace of Y" in X. Furthermore

dim X = dim Y' + dim Y";in particular, when X is finite-dimensional

dim Y" = dim X - dim Y'.We remark that a complementary subspace is not unique (except intrivial cases). Indeed from the example above, we see that L1 Q) L 2 = V,L1 e L3 = V2 and L. e L3 = V2. Finally if X = Y1 e . . . e YP,then we can study the subspaces Y1 individually to obtain in-formation on X. Therefore the subspaces Y; can be regarded, in acertain sense, as linear independent subspaces of X, and the forma-tion of direct sum as putting these independent components togetherto form X.

D. Quotient spaceSimilar to the calculation of integers modulo a fixed integer, we

have the arithmetic of vectors of a linear space modulo a fixed sub-space. Let Y be a subspace of a linear space X over A. Then we saythat two vectors x and y of X are congruent modulo Y if and only ifx - y e Y. In this case, we write

x = y (mod Y).

The congruence modulo Y is easily seen to be reflexive, symmetricand transitive: for any three vectors x, y and z of X.

(a) x = x (mod Y);(b) if x = y (mod Y), then y = x (mod Y);(c) if x = y (mod Y), and y = z (mod Y), then x = z (mod Y).

Furthermore, the congruence modulo Y is compatible withaddition and scalar multiplication in X. By this, we mean that ifx = y (mod Y) and s = t (mod Y), then

(d) x+s=y+t(modY),(e) Ax = Ay (mod Y) for all scalars XeA.

§4 SUBSPACE 41

Therefore, by (a), (b) and (c), the congruence modulo Y definedin X is an equivalence relation. With respect to this equivalencerelation, we have a partition of X into equivalence classes

[x] = {aEX: x = a (mod Y)}

where x(-=X. In the set X/Y of all equivalence classes [x] (xEX), anaddition and a scalar multiplication are defined by

[x] + [s] _ [x +s]and A[x] _ [lax].

By (d) and (e), these two composition laws are well-defined. More-over, it is readily verified that the set X/Y constitutes a linear spaceover A called quotient space of X by the subspace Y. For any y E Y,and in particular for y = 0, the equivalence class [y] is the zerovector of the quotient space X/Y, similarly the equivalence class[-x] is the additive inverse of the equivalence class [x] for anyxEX.

Consider the linear space V2 of all vectors on the ordinary planewith common initial point at the origin 0. If we represent graphicallyevery vector of V2 by its endpoint, then a 1-dimensional subspace isrepresented by a straight line passing through the origin, and converselyevery straight line passing through the origin represents in this way a1-dimensional subspace of V2. Let L be a 1-dimensional subspacerepresented by the straight line 1. Then it is easily seen that elementsof the quotient space V2/L are represented by straight lines parallelto 1, and that the algebraic structure on V2/L corresponds to that ofthe 1-dimensional subspace G represented by the straight line gpassing through the origin and perpendicular to 1.

It follows from the definition that the quotient space X/0 isessentially the same as X itself whereas XIX = 0. In a finite-dimensional linear space X, we get dim X/Y = dim X - dim Y. To seethis we use a base (x1, ... , xp, xp+ 1 , ... ) xn) such that (x1, ..., xp)is a base of Y. Then ([xp + 1 ] , ... , [x, ]) is easily seen to form a baseof X/Y.

E. Exercises

1. In R4, L1 is the subspace generated by (1,2,0,1),(-1, 2, 0,-3) and (1, 0, 0, 2), and L2 is the subspace generated

42 1 LINEAR SPACE

by (0, 2, 0,-1), (0, 1, 0, 1) and (2, 5, 0, 3). Find the dimensionsof L1, L2, L, n L2 and L, +L2-

2. In C', Let H, be the subspace generated (1, 0, 0) and (1, 1, 0)and let H2 be the subspace generated by (1, 0,-1) and (1, 1, 1).Find the intersection H, n H2 and determine its dimension.

3. Let a=(-1,1,0),b=(1,0,1), c = (0, 1, 1) be vectors of R3.Show that the subspace X generated by a, b, c has dimension 2.Find a general expression of vectors of X using 2 independentparameters s and t.

4. Let X be a linear space and S a subset of X. Show that thefollowing statements are equivalent:

(i) Y is the subspace of X generated by S.

(ii) Y is the intersection of all subspaces of X that include S asa subset.

(iii) Y is the smallest (in the sense of inclusion) subspace of Xthat includes S as a subset.

5. Let A' and A" be subspaces of a linear space A. Prove that ifA = A' U A" then A = A' or A = A".

6. Let A' and A" be subspaces of a linear space A. Prove thatA' U A" is a subspace of A if and only if A' J A" or A' C A".

7. Let X1, ... , X. be proper subspaces of a linear space X. Showthat there is at least one vector of X which belongs to each X1and there is at least one vector of X which does not belong toany X; .

8. Let X = X1 e X2 and Y = Y1 a Y2 . Show that

X X Y = (X 1 x Y1) e (X2 X Y2 ).

9. Let A and B be subspaces of the n-dimensional complex arith-metical linear space C'. Show that

(a) A n R" is a subspace of the n-dimensional real arithmeticallinear space R".

(b) 2 dimCA - n < dimR A n R" < dimcA.(c) If dims A = dimR A n R", dims B = dimR B n R" and

AnR"=BnR", then A B.

§4 SUBSPACE 43

10. Let A be a linear space and (X,),Ej a family of subspaces of A.

(a) Prove that the intersection X = nX, = {x E: A: x e X, for all

i e 1} is a subspace of A.

(b) What is X if I = 0?

iCj

11. Let (x,)iej be a family of vectors of a linear space Aand X; the 1-dimensional subspace of A generated by x; foreach i el. Prove that the family (x,),ej is linearly independent ifand only if (i) xi Xi for i = j and (ii) for each jeI, X; n Yi = 0where Y is the subspace of A generated by all xi where ieI, i * j.

12. Let A be a linear space. Show that for any subspaces A, , A2and A 3 of A the following equations hold.

(i) (A, +A2)+A3 =A1 + (A2 + A3)(ii) A, +A2 =A2 +A,(iii) A, n (A2 + A3) =(A, n A2) +(A1 n A3) if Al 3 A2.

13. Let A' be a subspace of a linear space A. Prove that

(a) dim A'+ dim A/A' = dim A.(b) if B" is any base of any complementary subspace A" of A'

in A, then the set { [x"] : x"eB"} is a base of A/A'.(c) If A' 0 0 and A' 0 A, then A' has more than one com-

plementary subspace in A.(d) all complementary subspaces A" of A' in A have the same

dimension.

14. Let X1 and X2 be subspaces of a linear space A such thatx, n X2 = 0. Prove that B1 U B2 is a base of X, e X2 for anybase B1 of X1 and any base B2 of X2 .

15. Let X1, ... , X be subspaces of a finite-dimensional linearspace A and X = X, + .. + X their sum. Show that thefollowing statements are equivalent.

(i) every xcX has a unique representation as x = x1 + .. + xnwhere xi r= Xi for i = 1, 2, ... , n.

(ii) X,n(X1+ +Xi+ ... ...,n.(iii) dim X = dim X, + + dim X,, .

44 1 LINEAR SPACE

(iv) x, nx2 =o,(x,+X2)nX3=0,(X,+X2 +X3) n x4= 0,..., (X, + X2 + ... + X -1) n Xp =0.

16. Let A be a linear space and a family of subspaces of A.We define X = EXi as the subspace of A generated by the subset

UXi of A.iEI

(a) Show that each vector x of X can be written as x = Exi,iEI

where (xi)1EJ is a family of finite support of vectors of Asuch that xiEXi for every

(b) Show that representation x = Exi in (a) is unique for everyxeXif and only if x, n (EXi) = 0 for each jEI.

iEJi 4i

17. Let X, , ... , X, be subspace of an n-dimensional linear spaceX. Suppose for i = 1, ... , r dim Xi < k where k < r. Show thatthere exists a (n-k)-dimensional subspace Y of X such thatXi0Y=0foralli=l, . . . , r.

18. Let X1, ... , X, be subspaces of a linear space X. Suppose Y issubspace of X which is not contained in any of the subspacesXi. Show that there is a vector y in Y which is not contained inany of the subspaces Xi.

CHAPTER II LINEAR TRANSFORMATIONS

At the beginning of the last chapter, we gave a brief description ofabstract algebra as the mathematical theory of algebraic systems and,in particular, linear algebra as the mathematical theory of linearspaces. These descriptions are incomplete, for we naturally want tofind relations among the algebraic systems in question. In otherwords, we also have to study mappings between algebraic systemscompatible with the algebraic structure in question.

A linear space consists, by definition, of two parts: namely anon-empty set and an algebraic structure on this set. It is thereforenatural to compare one linear space with another by means of amapping of the underlying sets on the-one hand. On the other hand,such a mapping should also take into account the algebraic struc-tures. In other words we shall study mappings between the un-derlying sets of linear spaces that preserve linearity. These mappingsare called linear transformations or homomorphisms and they will bethe chief concern of the present chapter.

§ 5. General Properties of Linear Transformation

A. Linear transformation and examples

In assigning coordinates to vectors of an n-dimensional linear spaceX over A relative to a base (x 1 , . . . ) X,), we obtained in §2E amapping F: X -+ A" such that, for every vector x of X,

,D(xl = (XI , ... , A") where x = X1 x1 + ... + A"x".

As a mapping of the set X into the set A", 4) is bijective. Relative tothe algebraic structure of the linear space X and of the linear spaceM, (D has the following properties:

(i) Vx + y) _ c(x) + c(y),(ii) cF(Ax) = A(D(x),

for any two vectors x and y of X and any scalar A of A. Note that x + y

45

46 11 LINEAR TRANSFORMATIONS

and Ax are formed according to the addition and the scalar multi-plication of the linear space X, while 4'(x) + 4)(y) and A4)(x) ac-cording to those of the linear space A" . Since the algebraic structureof a linear space is defined by its addition and scalar multiplication,the relations (i) and (ii) express that the mapping 4' is compatiblewith the algebraic structure of linear space. Therefore 4) is anexample of the type of mapping that we are looking for.DEFINITION 5.1. Let X and Y be linear spaces over the same A. Amapping 0 of the set X into the set Y is called a linear transformationof the linear space X into the linear space Y if and only if for anyvectors x and y of X and any scalar A of A, the equations

(i) 0(x +Y) = 0(x) + 0(Y),(ii) 0(Ax) = A0(x)

hold.Note that the domain and the range of a linear transformation

must be linear spaces over the same A. In other words, we do notconsider as a linear transformation any mapping of a real linear spaceinto a complex linear space even if (i) holds and (ii) also holds for allAER. Therefore, whenever we say: `4: X -> Y is a linear transforma-tion' we mean that X and Y are linear spaces over the same A and 0is a linear transformation of the linear space X into the linear spaceY.

Since sums and scalar products are expressible as linear com-binations, we can replace conditions (i) and (ii) by a single equivalentcondition

(iii) O(Ax + µy) = AO(x) + µ¢(y) for any x, yeX and A, pr=- A.Property (iii) is called linearity of 0. For linear spaces over A,

linear transformation, linear mapping, A-homomorphism, andhomomorphism are synonymous.

EXAMPLE 5.2. Let X be a linear space over A and Y a subspace of X.Then the inclusion mapping t: Y -> X is clearly an ijective lineartransformation. In particular, the identity mapping ix : X --> X is abijective linear transformation.

EXAMPLE 5.3. Let X and Y be two linear spaces over the same A.Then the constant mapping 0: X -> Y, such that 0(x) = 0 for everyxEX, is a linear transformation called the zero linear transformation.It is easily seen that the zero linear mapping is the only constantlinear transformation of the linear space X into the linear space Y.

§5 GENERAL PROPERTIES OF LINEAR TRANSFORMATION 47

EXAMPLE 5.4. Let Y be a subspace of a linear space X and X/Y thequotient space of X by Y. Then the natural surjection 77: X - X/Ydefined by q(x) = [x I for every x 1C is a surjective linear transforma-tion.

EXAMPLE 5.5. If 0: X - Y is a linear transformation and X1 is asubspace of X then the restriction 01 = 0IXI of the mapping 0 to thesubset X1 is clearly a linear transformation 01 : X1 - Y of linearspaces.

EXAMPLE 5.6. In §4C we have seen that if X = X1 ® X2, then thedirect summands Xi can be regarded as independent components ofX which are put together to form X by a direct sum formation.Following this idea, we consider a direct sum decompositionX = X1 ® X2 of the domain of a linear transformation 0: X -> Y.By 5.5 above, we get restrictions ¢1 : X1 -> Y and 02: X2 -+ Y of 0.On the other hand for each vector x of X we can write x = x1 + x2with unique x, E X, . Now

Ox=0(x1+x2) =Ox1 +¢x2 =01x1 +02X2

In other words 0 can be recovered from the restrictions O;. Con-versely if 1Ij: X1 -+ Y are linear transformations, then we define amapping 4: X -> Y by

OX = 1/(x1 +x2) = 'Plx1 + 2x2

If y = y1 + y2 with y;EX1, then for arbitrary scalars X and µ, ?x + µy= ()xI + /.[y1) + (?x2 + µy2) is the unique representation of Xx + µyas a sum of vectors of the direct summands. Consequently

(llx+µy)=161(1.x1 +µY1)+lP2()X2 +µY2)

=XV'lx1 +µ01Y1 +MJi2x2 +µ2y2X (4/1 X1 + 412x2) + µ('PIYI + 02Y2)

=Xt'x+µVIYTherefore 41 is a linear transformation. We have: If X = X1 ® X2,then there is a one to one correspondence between linear transforma-tions 0: X -+ Y and pairs of linear transformations 0.: X1-> Y (i = 1, 2).

&AMPLE 5.7. Following through the idea of Example 5.6, we canfurther write X as a direct sum of 1-dimensional sub-spaces. Analo-gously we see that 0 can be recovered from its restrictions on these1-dimensional direct summands of X. This prompts us to considerlinear transformations of 1-dimensional linear spaces. Let L be a


1-dimensional linear space and rr: L -+ Y be a linear transformation.For a fixed non-zero vector z of L, let y = rrz be its image under rr.Since L has dimension 1, (z) is a base of L and every tEL can bewritten as t = Az with a unique scalar A. Now

it = rr(Az) = A(rrz) = Ay.

Thus rr is completely determined by linearity and its value at a non-zero vector.

Conversely let u be any non-zero vector of L and v any vector ofY. Then we obtain a linear transformation t: L - Y through anextension by linearity of associating v with u: for t = µu of L, define

Linearity of t is easily verified.

EXAMPLES 5.8.(a) Let X be an m-dimensional linear space over A with base

(x, , ... , x,,,) and Y any linear space over the same A. Then eachfamily (y,, ... , y,,,) of m vectors of Y determines uniquely a lineartransformation 0: X -* Y such that $(x;) = y; for i = 1, . . . , m.Indeed, for each vector xEX, we have a unique representationx=Alx1 + ... +Amxm. In putting 0(x) = A1y1 + ... + Amym, weobtain a linear transformation 0: X -+ Y with the required propertythat 0(x;) = y, for every i = 1, ... , m. If 4)': X - Y is another suchlinear transformation, then

0'(x) = A, O'(xl) + ... + Am O'(xm) = Al y1 + ... + X YM = 0(x).

Therefore 0' = 0 and ¢ is unique.(b) In particular, if both X and Y are finite-dimensional linear

spaces, with bases (x1 , ... , xm) and (y1, . . . , y,) respectively,then each family (ai/);=1,

, . , , m ; / =1. .. , n of m x n scalars of Adetermines uniquely a linear transformation ¢: X - Y, such that

O(x;) = ajIYi + "' + ainyn for i = 1, ... , m.

Furthermore, every linear transformation of the linear space X intothe linear space Y can be determined in this way.

The family (cqj);=1, ... , m ; /=1, ... , n of scalars is usuallywritten as


c«11 ......... CYIn

a21 .... ..... Q2n

l«mn ......... CYmnj

and referred to as the matrix of 0 relative to the bases (x 1, ... , xm)and (y1, ... , ym). (See §14A).

(c) Using the method outlined in Example 1.13, we can constructfor any two sets S and T, the free linear spaces FS and FT generatedover A by the sets S and T respectively. If 0: S - T is a mapping ofsets, then we can extend 0 by linearity to a linear transformation(D: FS -> FT by defining

'F(s) = ¢s for every and

F(f) = Ef(s)4)s for every f = Ef(s)s of FS.sES sES

EXAMPLE 5.9. Let R[T] be the linear space of all polynomials in anindeterminate T with real coefficients. For each polynomial f ofWT], we define ¢(f) = dT ; then 0: R[T] - R[T] is a linear

transformation.EXAMPLE 5.10. Let F be the linear space of all continuous functionsdefined on the closed interval [a, b I, For each f of F, we define

f (x) = fXf(t)dt for all xE[a, b];

then 0: F--' F such that 0(f) = T is a linear transformation.

These few examples provide some of the most frequently usedmethods of constructing linear transformations; they also illustratethe usefulness of the concept of linear transformations.

It follows immediately from the linearity that, for every lineartransformation ¢: X-+ Y, (a) 0(0) = 0 and therefore (b) if x1, . . . , xpare linearly dependent vectors of X then their images 0(x1), ... , ¢(xp)are linearly dependent vectors of Y.

We observe that the converse of (b) does not hold generally. In thefollowing theorem, we shall see that the validity of the converse of(b) characterizes injective linear transformation.


THEOREM 5.11. Let 0: X - Y be a linear transformation. Then thefollowing statements are equivalent.

(i) 0 is injective.

(ii) If xEX is such that q(x) = 0, then x = 0.(iii) If x1 , ... , xp are linearly independent vectors of X, then

O(x1), ... , O(xp) are linearly independent vectors of Y.

PROOF. Clearly (ii) follows from (i).(i) follows from (ii). If O(x) = 0(x'), then ¢(x - x') = 0, by the

linearity of 0. Therefore, by (ii) x - x' = 0, proving (i).(iii) follows from (ii). Let x1, ... , xp be linearly independent

vectors of X and Al0(x1) + ... + Xp4(xp) = 0 for some scalarsX1, . .., Xp. Then O(X1x1 + - - + Xpxp) = 0 and thereforeX1x1 + -- - + Xpxp = 0, by (ii). Since the vectors x1, ... , xp arelinearly independent, we get X1 = - - - = Xp = 0. Therefore the vectors¢(x I), ... , O(xp) are linearly independent.

(ii) follows from (iii). Let xEX be such that 0(x) = 0. Thismeans that the vector 0(x) of Y is linearly dependent, therefore by(iii) the vector x of X is also linearly dependent. Hence x = 0.

B. Composition

We recall that for any two sets X and Y, the mappings of the set Xinto the set Y form a set Map(X, Y) and if Z is an arbitrary set, thenfor any OEMap(X, Y) and any GMap(Y,Z) a unique compositeOoOE Map(X, Z) is defined by

/-O(x) = i(O(x)) for every xEX.

It follows immediately from the definition that all linear transforma-tions of a linear space X over A into a linear space Y over the same Aconstitute a non-empty set. Denoted this set by HomA(X, Y) orsimply Hom(X, Y) if no danger of confusion about A is possible.Moreover, if Oe Hom (X, Y) and E Hom (Y, Z), then the compositeiio0EHom(X, Z). Indeed, for any x, yeX and scalars A, p of A, weget

>VoO(Xx + lay) = Wxx + l1y)) = ,(X0(x) + µo(Y))

= ?u(O(x)) + µ'(0(Y)) = X,-O(X) + la,oO(y).


The composition of linear transformations is associative, that is, if0: X -+ Y and l i : Y -- Z and t: Z - T are linear transformations, then

(o0o0On the other hand, the composition is not commutative. By this,

we mean that in general /o 0 * 0o 1 J / even if the composites >Ji o o and0 o 1V are defined. Take, for instance, the linear transformations 0 and>y of a 2-dimensional linear space X into X defined by

Xx 1) = 0, 0(x 2) = X2;

/i(x1) = x2, 1G(x2) = x1 ;

where X1, x2 form a base of X. Then , o0(x 1) = 0, whereascao kX 1) = x2 , and therefore `loo 0 0 o..

The identity mapping ix : X -> X is a neutral element with respectto the composition, in the sense that for any pair of lineartransformations 0: X --> Y and ii': Y -> X.

coix = 0 and ix o 1J = 1i.

Relative to composition, injective and surjective linear trans-formations can be characterized by the properties formulated in thefollowing theorems.

THEOREM 5.12. Let 0: Y - Z be a linear transformation. Then thefollowing statements are equivalent.

(a) 0 is injective.

(b) For any linear space X and any linear transformations a, $3of X into YifOoa=¢o$, then a=(3.

PROOF. (b) follows from (a). Let a, 0: X -+ Y be linear trans-formations such that 0oa = Oo (3. Then for everyxeX,we get 0(a(x))= 0(13(x)). Since 0 is injective, a(x) = $3(x) for every xEX. Thereforea=$3.

(a) follows from (b). Let y be a vector of Y such that 0(y) = 0.If y 0 0, then, by the supplementation theorem 3.2 or 2.9 in the casewhere Y is finite-dimensional, we have a base {y} U S of Y. By5.8(a), a linear transformation a: Y - Y is uniquely defined by

a(y) = y and a(s) = 0 for all seS.


Then 0° a = 0 °0 where 0: Y - Y is the zero linear transformation.By (b), we get a = 0, contradicting the definition of a. Thereforey. = 0 and 0 is injective, by 5.8.

THEOREM 5.13. Let ¢: X -+ Y be a linear transformation. Then thefollowing statements are equivalent.

(a) 0 is surjective.

(b) For any linear space Z and any pair of linear trans-formations a,(3of Yinto Z if a0/3°0, then a=(3.

PROOF. (b) follows from (a). Let a, 0: Y -> Z be linear transforma-tions such that a°4) = (3o 0. Since 0 is surjective, for every vector y ofY, we get a vector x of X such that ¢(x) = y. Therefore a(y) _a(¢(x)) = a° 4)(x) = R ° 4)(x) = (3(0(x)) = (3(y), proving a = (3.

(a) follows from (b). Since 0 is a linear transformation, we see thatIm 0 is a subspace of Y. Suppose Im 0 * Y, then by 4.6 there is asubspace Y' of Y such that Y'* 0 and Y = Y' ® Im 0. Now a lineartransformation a: Y -; Y is uniquely defined by

a(y' + y") = y' for all y' E Y' and y"E Im¢.

Then a ° 0 = 0 ° 0 where 0: Y - Y is the zero linear transformation.By (b), we get a = 0, which is impossible, and therefore Im 4) = Y.Hence 0 is surjective.

Theorems 5.12 and 5.13 state that injective (surjective) lineartransformations are left (right) cancellable.

In fact the conditions (b) of these theorems can be taken as adefinition of injective (surjective) linear transformation in terms ofcomposition of linear transformations alone.

C. Isomorphism

Bijective linear transformations are called isomorphisms. If0: X -+ Y is an isomorphism, then the inverse mapping 0-1 of theset Y into the set X defined by

0-1(y) = x where xEX is such that 0(x) = y,

is also an isomorphism called the inverse isomorphism of 0. Clearly0-1 is bijective; therefore it remains to be shown that 0-1 is a linear


transformation. For y and y' of Y, let x and x' be the unique vectorsof X such that O(x) = y and 0(x') = y'. Then for any scalars A and A'we get 0(Ax + A'x') = Ay + A'y'. Therefore 0-' (Ay '+ A'y') = Ax+ A'x'= X0_' (y) + and hence 0-' is linear.

The formation of inverse isomorphisms has the followingproperties.

(a) 0-1 o O = ix and 000-' = iY.(b) If 0: X --> Y and 0: Y -> Z are isomorphisms, then

Two linear spaces X and Y are said to be isomorphic linear spacesif and only if there exists an isomorphism between them. In this case,we write X = Y. From the abstract point of view, two isomorphiclinear spaces X and Y are essentially indistinguishable because we arenot interested in the nature of the vectors of these spaces but only intheir composition laws and these are essentially the same.

Through the notion of isomorphism, the notion of dimension oflinear spaces gains its prominence. This state of affairs is formulatedin the following important theorem.

THEOREM 5.14. Two linear spaces X and Y are isomorphic if andonly if dim X = dim Y.

PROOF. Let us consider the case where X and Y are finite-dimen-sional. Assume that dim X = dim Y = n. For a fixed base x1 , ... , xof X and a fixed base y1, ... , yn of Y, we define a linear trans-formation 0: X -> Y such that 0(x1) = y; for all i = 1, ... , n. Then 0is evidently an isomorphism.

Conversely, let 0: X i Y be an isomorphism and (x1, ... , xn) abase of X. Then, by 5.11, 0(x 1), ... , 0(xn) are linearly independentvectors of Y. Therefore, by the supplementation theorem 2.9, we canfind vectors y 1, ... , yn, (m > 0) such that

(0(x1), ... , O (xn ), Y 1, ... , Ym )

is a base of Y. Applying the inverse isomorphism 0-' of 0to this base, we get n + m linearly independent vectorsX11 ... , xn , 0-1 (y 1), ... , 0'1 (ym) of X. Therefore m = 0, andhence dim X = Y.

54 II LINEAR TRANSFORMATIONS

Thus, the above theorem holds in the finite-dimensional case. Theproof for the general case, being a straightforward generalization ofthe above one, is left to the reader.

Consequently, every n-dimensional linear space over A isisomorphic to the n-dimensional arithmetical linear space An. In this

sense, An can be regarded as the prototype of n-dimensional linearspace over A. However this does not imply that from now on, whendealing with finite-dimensional linear spaces, we shall restrict our-selves to the study of arithmetical linear spaces. The main objectionto doing this is that each isomorphism (D of an n-dimensional linearspace X onto A" defines a unique coordinate system in X. Thereforeif we study A" instead of X, we are committing ourselves to a parti-cular coordinate system of X. This would mean that with each de-finition and each theorem it is necessary to show the independenceof the choice of the coordinate system in which the definition or thetheorem is formulated. Such a proof is usually very tedious anduninteresting.

We observe that X = Y means by definition that there exists oneisomorphism between the linear spaces in question. In some specificcase, we are concerned with the existence of isomorphisms thatexpress certain relations between the linear spaces X and Y (see § 8Cand § 22A).

D. Kernel and image

Let 0: X -+ Y be a linear transformation. Then, in a most naturalway, an equivalence relation ^- is defined in the non-empty set Xby

x,"' x2 if and only if O(XI) = ¢(x2),

for every pair of vectors x, and x2 of X. Consequently, the set X ispartitioned into mutually disjoint equivalence classes. The class thatcontains the zero vector of X deserves our special attention. This iscalled the kernel of the linear transformation 0 and is defined by

Ker 0 = (zEX:0(z) = 0) .

Firstly, we shall see that the kernel of 0 is a subspace of the linearspace X. Since 0 e Ker 0 and for any z, and z2 of Ker 0 and anyscalars X, and X2, we have ¢(A, z, + X222) = X2 0(z,) + X2 0(z2) = 0,Ker 0 is a subspace of X.


Moreover, due to the algebraic structure of X and the linearity ofthe behaviour of 0 in relation to vectors of X is completely

determined by Ker 0. More precisely: for any vectors x, and x2 ofX, 0(x1) = 0(x2) if and only if xI - x2 belongs to Ker 0. In parti-cular, a linear transformation 0 is injective if and only if Ker = 0,and 0 = 0 if and only if Ker 0 = X.

Let us now consider some subspaces of the range Y of a lineartransformation 0: X -> Y. For every subspace X' of X, the directimage 0[X'] = {yEY: ¢(x) = y for some xeX'} of X under 0 is asubspace of the linear space Y. Indeed, for any two vectors yI andy2 of 0[X'], we have vectors xI and x2 of X such that 0(x1) = Yland 0(x2) = Y21 therefore for any scalars A, and A2 , we get0(X1 XI + A2x2) = XI0(X1) + A20(x2) = AIYL + A2y2. Since 0[X']is not empty, 0[X'] is a subspace of Y. In particular, the imageIm 0 _ 0[X] of 0 is a subspace of Y.

From the discussions above, we see that the subspaces Ker 0 andIm 0 show to what extent the linear transformation 0 in questiondiffers from an isomoprhism. NOETHER's isomorphism theorem 2.17in § 5E will give full information on this matter.

A useful relation between the dimensions of Ker 0 and Im 0 isgiven in the theorem below.

THEOREM 5.15. Let X be a finite-dimensional linear space and0: X -> Y a linear transformation. Then

dim X = dim (Ker 0) + dim (im 0).

PROOF. Let ( x 1 , . . . , xp) be a base of the subspace Ker 0. By thesupplementation theorem, we can extend this base of Ker 0 to a base

,-(XI .. , xp) xp+ I .... , xn) of the entire linear space X. Thetheorem is proved if we can show that (0xp+I ) ... , Oxn) form abase of Im Y. Every vector of Im 0 is of the form ¢x with xeX.Therefore if x = A1x, + + ApXp + Ap+Ixp+I + + Apxn,then Ox = Ap+, 0xp+, + + An ¢xn ; hence (Oxp+I, ... , On)generates Im0.Ifpp+I Oxp+I + +µn¢xn = 0 for some scalars j.(i = p + 1, ... , n) then by linearity 0(µp+1xp+1 + + Anxn)= 0.Therefore the vector µp+Ixp+I + .. + Anxn belong to Ker 0. Ourchoice of the base (xI,...) x,, xp+I,...,xp) shows thatAp+,= ... = An = 0. Therefore (0xp+ I , ... , On) is linearly independentand the theorem is proved.


When the domain X of a linear transformation 0: X --> Y is finite-dimensional linear spaces, a numerical rank of 0, denoted by r(¢),is defined as

r(o) = dim(Im ).Since Im 0 is a subspace of Y, we get

r(o) 5 dim Y.

On the other hand we get from linearity of p

r(0) < dim X.

E. FactorizationLet X, Y and Z be three linear spaces over the same A. Then for

any linear transformations 0: X - Y and >!l: Y Z, we get . = Ooo:X -+ Z. In other words, we can fill into the following diagram aunique linear transformation t so as to make it commutative.

Now we shall investigate a problem of the following nature. Givenlinear transformations t : X -- Z and 0: X - Y, does there exist alinear transformation >!l: Y - Z such that t = Y,o0? In other words,can we fill into the following diagram some linear transformationso as to make it commutative?

X0 y

In general, there exists no such linear transformation y'. Take, forinstance, t * 0 and 0 = 0. Therefore we shall only consider a specialcase of this problem in which a restrictive condition is imposed on 0.


Let us assume that 0: X -+ Y is surjective. In this case, if a lineartransformation 0: Y -* Z exists so that t = oO, then is unique, byTheorem 5.13. Moreover, we have Ker t D Ker 0. ' ThereforeKer t D Ker ¢ is a necessary condition for the solvability of thespecial problem. Now we shall show that this is sufficient. Indeed,under the assumption that Ker t J Ker 0, we get

if O(x) = 0(x') for some x, x' of X, then t(x) = t(x').

For the equation 0(x) = ¢(x) implies that x - x' belongs to Ker 0, andhence to Ker %; therefore t(x) = t(x'). On the other hand, since 0is surjective, we get for each yEY some xEX such that 0(x) = y.Therefore a mapping 1/i of the set Y into the set Z is uniquely deter-mined by the condition

for every yEY, > i (y) = t(x) where xEX and 0(x) = y.Obviously i; = l'/oO. Therefore it remains to be shown that 1': Y - Zis a linear transformation. For any two vectors y j and Y2 of Y,let x1 and x2 be vectors of X such that 0(x 1) = y1 and 0(x2) = Y2 .Then, by the linearity of 0, we get ¢(X1 x1 + X2x2) = X1y1 + X2y2and therefore ll/(X1y1 +X2y2) = (A1x1 + X2x2) = Al (x1) + X2 t(x2) =)'1 i(y1) + A2 1 i(y2), and hence i is a linear transformation. We stateour result as part of the following theorem.

THEOREM 5.16. Let 0: X - Y be a surjective linear transformation.Then for any linear transformation t: X - Z, a linear transformationip: Y -+ Z exists such that t = o0 if and only if Ker t D Ker 0. In thiscase, the linear transformation y is unique. Furthermore, if i issurjective then i, is surjective; and if Ker t = Ker 0, then >y is injective.

PROOF. The first two statements are proved above. If t is surjective,then for every zEZ, we have some xeX such that i(x) = z. By de-finition of s, >y(y) = z where y = 0(x). Therefore >L is surjective. IfKer t = Ker 0, then for any x, x' of X, 0(x) = 0(x') is equivalent tot(x) = t(x'). Therefore (y) = 0(y') implies y = y' and hence isinjective.

As a corollary of Theorem 5.16, we have NOETHER'S isomorphismtheorem that expresses an important relation between the image andthe kernel of a linear transformation.

COROLLARY 5.17. Let 0: X -- Y be a linear transformation. ThenX/Ker0=Im 0.


It follows from 5.17 that for a linear transformation 0: X - Xwhose domain X and range Y are finite-dimensional linear spaces,we get

r(o) = dim X -dim(Ker 0).

F. Exercises

1. Determine which of the mappings below were linear transforma-tions.

(a) gyp: R -> R2 such that p(t) _ (et, t).

(b) iy: R -> R2 such that Vi(t) _ (t, 2t).(c) t: R2 -> R2 such that t(x, y) = (2x, 3y).(d) : R2 -> R2 such that (x, y) _ (xy, y).(e) q: R -> R2 such that rl(x, y) _ (ex cos y, ex sin y).

2. Let ,i: Y - Z and gyp: X -+ Y be linear transformations. Provethat

(a) if t//oip is surjective, then tp is surjective,

(b) if is injective, then p is injective,

(c) if Wop is surjective and 0 is injective, then p is surjective,(d) if is injective and p is surjective, then ty is injective.

3. Let E: X -+ Z be a linear transformation and 41: Y -> Z aninjective transformation. Prove that a linear transformationgyp: X - Y exists so that t = if and only if Im>fi D lint

4. Let a, 1, y, 6 be fixed complex numbers.

(a) Prove that gyp: C2 -> C2 such that,p(x, y ) = (ax + (3y, -yx + 6y) is a linear transformation.

(b) Prove that p is bijective if and only if aS - i3ry 0 0.


5. Let X = C[a, b] be the space of continuous real-valued func-tions defined on [a, b I, and let tp: X -* X be defined by

t

(0) (t) = r f(x)dt.

(i) Show that W is a linear transformation.(ii) Is p injective?

6. Let X, be a proper subspace of a linear space X. Let Y be alinear space and yEY. Suppose x0EX\X1. Prove that there existsa linear transformation gyp: X - Y such that ip(xo) = y and p(x) _O for all xeX1.

7. Let ip be an endomorphism of a linear space X, i.e. a lineartransformation of X into itself. Suppose for every endomorphism0 of X if oip = 0 then ifs = 0. Prove that p is surjective.

8. Let X be an n-dimensional linear space over A. Prove that iff: X -> A is a linear transformation different from the constantmapping 0 then dim(Kerf) = n-l.

9. Let X and Y be two n-dimensional linear spaces and gyp: X -> Y alinear transformation. Show if ip is injective or surjective,is an isomorphism.

10. Show that if X = X 1 ® X2 , then X2 = X/X1 .

11. Let X1 and X2 be subspaces of a linear space X.

(a) Show that (X1 + X2)/X2 = X I /(X 1 n X2 ).

(b) Show that if X1 D X2 , then X/X 1 = (X/X2) / (X1/X2).

12. Let X be a linear space and X1, X2 and X3 subspaces of X suchthat X3 C X1. Show that

(XI +X2 )/(X3 +X2)=X1/(X3 +(X1 nX2))13. Let >fi : Y -> Z and p: X- Y be linear transformations where

X, Y and Z are all finite-dimensional linear spaces. Prove that

(a) <r(;p) and <r(1.),(b) if 0 is injective r(,tio1p) = r(,p),

(c) if p is surjective r(i,(iotp) = r(1/i).


14. Let gyp: A -- B be a linear transformation and X, X' two arbitrarysubspaces of A. Prove that

(a) p[X] C AX'] if X C X',(b) p[X +X'](c) p[X n X'] C ,p[X] n AX'l .

Find subspaces X and X' of A so thatpp[XnX'] * p[X]

15. Let gyp: A - B be a linear transformation. For any subset Y of B,define

gyp-' [Y] = {xEA :,p(x)G Y } .

Prove that for every subspaces Y, Y' of B and every subspace Xof A,

gyp-' [ Y] is a subspace of A,

Y]) = dim(Y n Im4p) + dim(Ker gyp),

4p-'[Y] Ccp-' [Y'] if YCY',1p-' [Y+ Y'] _ p-' [Y] + 0-' [Y'l,pp-' [Y n Y'] _ p-1 [Y] n p-1 [Y'l ,

Y D p [tip-' [Yl

XC P-' [AXl ]16. A sequence of linear spaces and linear transformations

pn+2 pn+I Ipn Pn-IXn+1 ) Xn-)Xn-1 0

...is said to be exact if and only if for each integer n

Im pn + 1 =

(a) Show that a linear transformation gyp: X -> Y is injective ifand only if

0-*X p >Yis exact.

(b) show that a linear transformation bfi : X -> Y is surjective ifand only if

X 0>Y--00is exact.


(c) Show that for each subspace X' of a linear space X, thesequence

X7r

> X/X' )0'where t is the inclusion mapping and 7r the naturalsurjection, is exact.

(d) Show that if a sequence

0--oX'--0 X - X" - '.0is exact, then X' can be identified with a subspace ofX and X" with XIX'.

17. The cokernel of a linear transformation gyp: X -- Y is defined as

Coker p = Y/Imp.

(a) Show that if the diagram of linear spaces and linear trans-formations

X' X

P'l 1'Py'Yis commutative, then the linear transformations 4) anddefine linear transformations Keep' - Kernp, Imp' -+ Impand Coker 'p' -

(b) Let

X' --,X 0X"

Y' > Y ) Y"

be a commutative diagram of linear spaces and lineartransformations.

(i) Show that if the linear transformation Y' -+ Y inquestion is injective, then the sequence

Kerp-

is exact.


(ii) Show that if the linear transformation X -> X" issurjective, then the sequence

CokerV -> Cokerip -> Cokerip"

is exact.

18. Let A, B be real linear spaces and C(A), .C(B) the sets of allsubspaces of A and of B respectively. A mapping: £(A) -+.C(B)is called a projectivity of A onto B if and only if (i) 4' isbijective and (ii) for any X, X' E (A), X J X if and only if4'(X) J 4(X'). Show that for any projectivity of A onto B andany subspaces X, X' of A

(a) b(X + X') _ 4(X) + D (X),(b) (D(X n X') _ 4(X) n 'F(X'),(c) (N0)=0,(D(A)=B,(d) dimX = dim((D(X)).

§6. The Linear Space Hom(X, Y)

A. The algebraic structure of Hom(X, Y)In the § 5B, we saw that for any two linear spaces X and Y over

the same A the set Hom(X, Y) of all linear transformations of Xinto Y is non-empty. We shall introduce appropriate compositionlaws into this set so as to make it a linear space over the same A inquestion.

Let 0, 0 E Hom(X, Y). We define a mapping 0 + Ji of the set X intothe set Y by the equation

(1) (0 +,y)(x) = 0(x) + i/i(x) for every xEX.

We take note that the + sign on the left refers to the addition to bedefined and the + sign on the right comes from the given addition ofthe linear space Y. Thus addition of mappings X -> Y is defined interms of addition of vectors of the range space Y alone. Since

(0 + 0)(Xx + µy) = 0(Xx + µy) + 4i (Ax + µy)

=X0(x)+110(Y)+XiG(x)+µ0(Y)

_ A(0(x) + O(x)) + µWY) + 41(Y)) = X(O + O(x) +µ(O + 41)(Y),

the mapping 0 + 0 is a linear transformation of X into Y.

§6 THE LINEAR SPACE HOM (X. Y) 63

We shall show that the ordered pair (Hom(X, Y), +) is an additivegroup. For the associative law and the commutative law, we have toverify that for any three linear transformations and t ofHom(X, Y), the equations

and (0+ J,)+t ¢+(,y+ )0+,y1P +0

hold. On both sides of these equations, we have mappings withidentical domain and identical range, and moreover for any xEX, weget

0 + Ji) + ) (x) = (0 + 0) (x) + E(x) _ O(x) + OW + t(x)= O(x) + (J, + ) (x) = (0 + (J< + )) (x)

and (0 + Ji) (x) _ O(x) + (x) _ OW + O(x) = (4 + 0) (x),

therefore the above equations of mappings are proved.The zero mapping defined in 5.3 is easily seen to be the neutral

element of the present addition. For 0 E Hom(X, Y), the mapping -Oof X into Y defined by

(-0)(x) = -0(x) for every xEX,

is clearly a linear transformation and as such it is the additiveinverse of 0 with respect to the present addition. Therefore(Hom(X, Y), +) is an additive group.

For every 0 E Hom(X, Y) and every scalar A of A, we define amapping AO of the set X into the set Y by the equation

(2) (AO)(x) = A(¢(x)) for every xEX.

Clearly A0EHom(X, Y), for (AO) (µx + vy) = (Ap)O(x) + (Av)O(y) _y((A0)(x)) + v((AO)(y)). Similarly, we verify that the additive groupHom(X, Y) together with the scalar multiplication above form alinear space over A.

We remark here that the algebraic structure on Hom(X, Y) isgiven by that on Y alone.

THEOREM 6.1. Let X and Y be linear spaces over A. Then the setHom(X, Y) of all linear transformations of X into Y is a linear spaceover A with respect to the addition and the scalar multiplicationdefined in (1) and (2).

Relative to addition and scalar multiplication of the linear space


Hom(X, Y), composition of linear transformations has the followingproperty. For all 0, >y E Hom(X, Y)

(a) (0 + ly)oa = Ooa + 0oa and (X0)oa = Oo(Xa) = X(4oa) for anylinear transformation a: X' -> X.

(b) 1o(0 + >) = f3oO +(3o4i and 13o(X0) =(V)o0 = X(Qo0) for anylinear transformation 0: Y -* Y'.

This property is referred to as bilinearity of the composition of lineartransformations. Therefore for any linear transformations a: X - Xand p: Y -> Y' the mapping Hom(a, 0): Hom(X', Y') --> Hom(X', Y')defined by

Hom(a, 0)(0) = /o0oa for every OEHom(X, Y)

is a linear transformation. Thus

Hom(a, Q)(X0 + 110) = XHom(a, 9)(0) + p Hom(a,

Finally let us consider the dimension of the linear space Hom(X, Y)for the case where X and Y are finite-dimensional. We shall usethe method of 5.8 to construct a base of Hom(X, Y) and show thatdim Hom(X, Y) = dimX dimY. Let (x1, . . . , x m) be a base of Xand (y I, ... , be a base of Y. By 5.8, we need only specify theimages of x1 and extend by linearity to obtain a linear transforma-tion. Using yl and the zero vector 0 of Y as images we get a lineartransformation 011 : X - Y which send xI to y1 and all other x; tozero. Similarly a linear transformation 021: X -> Y is determined byspecifying 02 1 (x2) = yl , and 021(x1) = 0 for i * 2. In this way mlinear transformations 0, X - Y (j =1, . . . , m) are obtained suchthat y1 if i = j

011(x1) =10 if i * j m).

Using y2 instead of yI , we get another m linear transformations012 , . . . , 0m2 such that

y2 if i = j012(x;)0 if i*j (i= 1,...,m).

Carrying out similar constructions for all yk (k = 1, . . . , n) we obtainmn linear transformations 0 k (j = 1, ... , m; k = 1, . . . , n) suchthat yk ifi= j

0jk(x,)-10 ifi*j m).

§6 THE LINEAR SPACE HOM (X, Y) 65

At this juncture, we introduce a notation that will help us simplifythe complicated formulations above before we proceed to prove thatthe linear transformations Ojk form a base of Hom(X, X). For anypair of positive integers (or more generally, of elements of an indexset 1) i and j, the Kronecker symbol 6rj is defined by

=1

0 if

if

i

i j6r/ *j.

Using these symbols we can write

Oik(x1) = 6rjyk (Q = 1, . . . , m; k = 1, ... , n).

If 0: X --> Y is a linear transformation, then

¢(x1)=a11Y1+. +a1. yn (i=1, ...,m)for some scalars alk (i = 1, ... , m; k = 1, ... , n). Consider the linear

1combination P =k

ak jk . Then

(x1) _ (Y aikOik) (xi) = Eaik0ik(xI) = Eaik6liyk.j,k j,k j,k

For the last summation, if we sum over the index j first, then forevery k = 1, ... , n, we have a partial sum

Y ajk 61jyk .

But of the m summands only the one corresponding to j = i can bedifferent from zero since Sri = 0 for i # j; therefore

aiksi,yk = arkyk.iHence

1y(x1) = all Y1 + ... + arnyn = >G (xl)

and >y = 4' by 5.8(a). Therefore the linear transformations 4jk generateHom(X, Y). Moreover if

i0

for some scalars NiO = 1, ... , m; k = 1, ... , n), then for eachi = 1, . . . , m, we get

0 = TX/k¢ik(xl) = Xr1Y1 + ... + xinyn,


Since the vectors y, , . . . , y, are linear independent, we getA=1 = . . = A;,r = 0 for i = 1 , ... , m. Therefore the linear trans-formations 0ik ( j = 1 , ... , m; k = 1, ... , n) form a base ofHom(X, Y), proving dim Hom(X, Y) = dim X dim Y.

THEOREM 6.2. If X and Y are finite-dimensional linear spaces overA, then dim Hom(X, Y) = dim X dim Y.

B. The associative algebra End(X)Linear transformations of a linear space X into itself are also

called endormorphisms of X. Consequently the linear spaceHom(X, X) is also denoted by End(X). In this linear space we candefine one more internal composition law called multiplicationas follows.

00 = 0 o ¢ for every ¢, 0 of End(X).

It follows from associativity and linearity of composition that forany of End(X) and A, p of A,

(>; 4,)o _ >;(40)t(4,+0) t4,+ to( + 00 = to + 00(4)(14)= (X)(4,*).

The linear space End(X) together with multiplication as definedabove constitute an example of what is known as an associativeA-algebra.

When the multiplication of endomorphisms is our chief concern,then instead of the associative A-algebra, we consider the algebraicsystem which consists of the set End(X) and its multiplication. Inthis case, the subset Aut(X) of all automorphisms of X, i.e., iso-morphisms of X onto itself, is closed under the multiplication. Bythis we mean that for any 0, 41 a Aut(X) the product 4io belongs toAut(X). Furthermore, we have

[GI ] t(4,) = (.i)(k for any three elements t, 0, 0 of Aut(X),

[G3 ] there is an element ix of Aut(X) such that ix 0 = 0 forevery 0 of Aut(X).

[G4] to each 0 of Aut(X) there is an element 0`' of Aut(X)such that 0-10 = ix.


Using the terminology of abstract algebra, we say that the algebraicsystem consisting of the set Aut(X) and its multiplication is a group.This group is called the group of automorphisms of the linear spaceX over A, and has played an important role in the development of thetheory of groups.

C. Direct sum and direct product

The idea of direct sum formation which was described earlier asfitting together independent components of a linear space has beenrepeatedly used to give useful results. On the other hand linear trans-formations have been shown to be indispensable tools in the study oflinear spaces. Here we shall try to formulate direct sum decom-position of a linear space in terms of linear transformations so as tolink up these two important ideas.

Let X = X 1 ® X2 be a direct sum decomposition of a linear spaceX. Then we have a pair of inclusion mappings LI : X1 - X and12 : X2 -> X which are linear transformations. On the other hand everyvector x of X is uniquely expressible as a sum x = x I + x2 with x I EX Iand X2EX2 . Therefore a pair of mappings TrI : X -+ XI and 7r2: X -+X2are defined by 7r1 (x) = x1 and ir2(x) = x2. It is easily seenthat 1r1 and 7r2 are linear transformations and they will be called theprojections of the direct sumX = XI ®X2.

From these two pairs of linear transformations, we get composites:

and

iriX, - X1®X2 -Xi (i,j=1,2)

7r

X1®X2 -' X1ti

X1®X2 (j = 1, 2) .

It is easy to see that

(a) 1ri°4 = 6JkiXk, and

(b) b ° ir2 + r.2 ° 7r2 = 'X1 ®X2

Using these notations, we can reformulate the result of 5.6 asfollows: If X = X1®X2, then a one-to-one correspondence betweenHom(X1 (D X2,Y) and Hom(X1,Y) x Hom(X2,Y) is given 'by0-'(L1°0, L2° 0) and its inverse is given ir2°W2Conversely we have the following theorem whose proof is left to thereader as an exercise.


THEOREM 6.3. Let X, and X2 be subspaces of a linear space X andlet ti: Xt -* X be the inclusion mappings. If there exist linear trans-formations p1: X - Xt (j = 1, 2) such that

(a) pj°St=SjkiXk, and

(b) t1°P1 + L2°P2 = ix,

then X = X1 ®X2 and the linear transformations p, and p2 are theprojections of the direct sum.

Consider two linear spaces X, and X2 over the same A and thecartesian product X, x X2 of the sets X, and X2. In this set wedefine addition and scalar multiplication by

(x1,x2)+(YI,Y2)= (x1 +Y1, x2 +Y2),X(x1,x2)(Xx1,)v2)

for all x1 FYI EX 1,x2 i y2 EX2 and XEA. Clearly X, X X2 constitutes alinear space over A with respect to these composition laws. We callthis linear space the cartesian product of the linear spaces X, andX2.

The cartesian product X, X X2 x ... X X of a finite number oflinear space X1 , X2, ... , X, over the same A is similarly defined.In particular, we have An = A x ... x A (n-times).

More interesting and more useful in the applications is to find anecessary and sufficient condition for an arbitrary linear space X tobe isomorphic to the Cartesian product X, x X2 of two linear spacesX, and X2.

Consider the pair of mappings called the canonical projections ofthe cartesian product

X, 4A

X1 X X2 P20X2

and the pair of mappings called the canonical injections of thecartesian product

11 12

X, ' X1 xX2 4 X2

defined by

P1(x1,x2) = x1, P2(XI,x2) = x2and '1(x1) = (x1, 0), i2 (x2) = (O,X2)


Clearly all four mappings are linear transformations and betweenthem the following equations hold:

P1° it = iXi, P2°2 = iX2P1° 2 = 0, p2°il = 0

ii °P1 + I2 °P2 = 'XI X X2.

On the other hand, if a linear space X is isomorphic to Xi X X2,then we get from an isomorphism Z: X Xl x X2, and the followinglinear transformations

aj = '-i ° ij, pj = p o ( for j = 1, 2.These linear transformations, clearly satisfy the following equations:

(c) Pi° Qk = Sjk'Xk

(d) Q1 'PI + 02 °p2 = iX.Conversely if there exist linear transformations

Xi Pi X P2 X2

X I a i 4 X, 02X2

such that the equations (c) and (d) are satisfied, then the mapping

(D: X -- X1 X X2

defined by

cb(x) _ (p 1 (x), p2 (x)) for every xEX

is an isomorphism.

THEOREM 6.4. Let X, XI, X2 be linear spaces. Then X is isomorphicto X1 X X2 if and only if there are linear transformations

X ja j I X p- X. j = 1, 2.

for which the equationsP/°Qk = SjklXk

Q1 °P1 + Q2° P2 = ix

hold.

We say that the linear transformations a,, pi yield a representationof X as a direct product of the linear spaces XI , X2 .


From 6.4, it follows that a direct sum is a special case of directproduct where the linear transformations vt : Xi - X are the inclusionmappings.

D. Exercises

1. Let p be an endomorphism of a finite-dimensional linear spaceX and let Y be a subspace of X. Prove that

di mp[Y} + dim Y.

2. Let X = Xl + X2. Define gyp: X1 x X2 --> X by

,p(x1,x2) = x1 + x2 for all x1 EXI, x2 eX2.

Show that p is a linear transformation. Find Show alsothat W is an isomorphism if and only if X = X 1 ® X2.

3. Let a and a be endomorphisms of a finite-dimensional linearspace X. Suppose a + (3 and (i - a are automorphisms of X.Prove that for every pair of endomorphisms y and 5 thereexist endomorphisms p and such that

av +00 ='Y(3,p+ai = S

4. Let p and i be endormorphisms of an n-dimensional linearspace. Prove that

r(cp°>!i) > r(ip) + r(>!i) - n.5. Let p and 0 be endomorphisms of a finite-dimensional linear

space A. Prove that

0) < r(ip) + r(qi).

6. Let X be an n-dimensional linear space over A and letppEEnd(X). For every polynomial f(T)=amTm+... +a1T + aoin the indeterminate T with coefficients a; in A we denoteby f(op) the endomorphism

r(,P) = a1 p+ao.Prove that

(i) There exists a polynomial f(T) of degree < n2 such thatf(V) = 0.

§6 THE LINEAR SPACE HOM (X. Y) 71

(ii) If tp is an automorphism, then there exists a polynomial f(T)with nonzero constant term such that f (4p) = 0.

7. Prove that if for an endomorphism p of a linear space X theequation

holds for every i E End (X), then p = Xix for some scalar X.

8. An endomorphism a of a linear space X is called a projection ofX iff a2 = a. Two projections a and r of X are said to beorthogonal to each other iff aor=Toa = 0.(a) Show that if a is a projection of X, then a and ix - a are

orthogonal projections.(b) Show that if a is a projection of X, then Kera = Im (ix - a),

Ima = Ker(ix - a) and X= Kera ® Imo.(c) Show that if al , ... , an are mutually orthogonal

projections of X and ix = al + ... + an, thenX a) ... ® Iman.

9. 4 E End(X) is called an involution if 02 = ix.(a) Prove that if p r= End (X) is a projection then i = 2tip - ix

is an involution.

(b) Prove that if X is a real linear space and if End (X) isan involution, then p ='/z (0 + ix) is a projection.

10. Let >y be an endomorphism of a 2-dimensional space X. Provethat if >fi * ix is an involution, then X = X1 ® X2 whereX1 = {xEX: Ox = x} and X2 = {xEX: Ox = -x} .

11. Let 'p and 41 be endomorphisms of a linear space X.

(a) Prove that if ipo a i - o p = ix, then1ipm o 0 - o ipm =

m > 1.

(b) Prove that if X is finite-dimensional then ,poi - >/1 oip = ix.

(c) Find endomorphisms p and c of R[T] that satisfy theequation

'po>y - 0 ocp = IR1T .


12. Let p be an endomorphism of a linear space X. Denote for eachi=1,2, ...

K; = Ker(,p') and I =Prove that(a) K = U;K; and I = n;1; are subspaces of X,(b) if there exists an n so that Kn = Km for all m > n, then

Kni=0,(c) if there exists an n so that In = Im for all m > n, then

K + I = X,(d) if X has finite dimension, then X = K ®1.

--Ir>

13. Let Xi X - Xi (j = 1, ... , n) be a representation of X asdirect product of X, , . , Xn.

(a) Show that the linear transformations y: X ->X(j=1, ...,n)satisfy the following condition:[11 for any linear space Y and any system of linear trans-

formations ii : Xi -- Y (j = 1, . . . , n), a unique lineartransformation gyp: X -> Y exists so that ii = `paLj(j= 1, .. , n).

(b) Show that the linear transformations 7ri: X -+ Xi satisfythe following condition:[T] for any linear space Z and any system of linear trans-

formations pi : Z - Xi (j= 1, . . . , n), a unique lineartransformation > : Z -+ X exists so that pi = 7rioO(j= 1, ... , n).

§7 DUAL SPACE 73

14. Show that if a system of linear transformations i1: Xj -> X (j = 1,, n) satisfies the condition [11 of Exercise 13, then a

unique system of linear transformations rrj: X-+ X (j = 1, ... , n)

+ Xjexists so that the linear transformations Xj -- X -i(j = 1, ... , n) form a representation of X as direct productof X1, ..., X,,.

15. Show .that if a system of linear transformations Trj: X -> Xj(j = 1, . . . , n) satisfies the condition [T] of Exercise 13, thena unique system of linear transformations t,: X1 -> X (j = 1, ... , n)

exists so that the linear transformations X;t

X!4 X1 (j =1, ... , n)form a representation of X as direct product of X1, ... , X .

16. Let X be an n-dimensional real linear space.(a) If xo is a non-zero vector of X, prove that

{ipeEnd(X): Vxo = 01 is an (n2-n)-dimensional subspaceof End(X).

(b) Let Y be an m-dimensional subspace of X. Prove that{,pcEnd(X): p[Y] = 01 is an n(n-m)-dimensional subspaceof End(X).

17. Let p bean endomorphism of an n-dimensional linear space X.(a) Prove that set F(op) = 01 is a subspace

of End(X).(b) Find Cpl, 'p2 and p3 such that dimF(p1) = 0, n

and dimF((p3) = n2 . What other possible values candim F((p) attain?

18. Let Cpl, ... , cps be s distinct endomorphisms of a linear spaceX. Show that there exists a vector xEX such that the s vectorsP1 x, ... , osx are distinct.

19. Let (p and ,1i be projections of a linear space X. Show that(i) Im(p = lm' if and only if (po> = >!i and 1Potp =(p and(ii) Kernp = Keri if and only if po0 = (p and

§7. Dual Space

In Examples 5.5 to 5.8, the idea of direct sum decomposition ofthe domain X of a linear transformation 0: X -> Y has led us to study


linear transformations with 1-dimensional domain. As a result,different ways of constructing linear transformations were found. Wenow want to know if similar operation on the range Y would serveuseful purposes, and in particular if there is a case for studying lineartransformations with 1-dimensional range.

Let Y = Y, ® Y2 be a direct sum and let Lj and 1ri be the canonicalinjections and projections of the direct sum. Then by an argumentsimilar to that used in § 6C we see that a one-to-one correspondence be-tween the set Hom(X, Y, ® Y2) and the set Hom(X, Y,) x Hom(X, Y2)is given by 0 - (ir,o0, 1r2o¢) and its inverse is given by(p/ 1,V'2 ) -+ L, o lP I + L2 o 0 2. Following through this idea, we canfurther decompose Y = Z, ® ...®Z, as a direct sum of 1-dimensionalsubspaces. Then 0 can be recovered from the linear transformationsrr;o¢. This, in a way, motivates the study of linear transformationswith 1-dimensional range. As a prototype of such linear trans-formation, we take a linear transformation whose range is the1-dimensional arithmetical linear space A' over A.

A. General properties of dual space

Let X be a linear space over A and denote by A the 1-dimensionalarithmetical linear space A' over A. Then by the results of §6A,X* = Hom (X, A) is a linear space over A. We call X* the dual spaceor the conjugate space of the linear space X. Elements of X* are calledlinear forms, linear functions, linear functionals or covectors of X. Itfollows from 7.2 that dim X = dim X* for a finite-dimensional linearspace X.

EXAMPLE 7.1. Let X be a finite-dimensional linear space and B =(x, , ... , a fixed base of X. Then the coordinates X1 of a vectorxeX relative to the base B are determined by the equationx = a, x, + . + Xnx , By mapping x to its first coordinate X1, weobtain a linear function f, eX*. f2 , , f are similarly defined.The linear function f,, is called the i-th coordinate function of Xrelative to B.

Since covectors are linear transformations, the methods ofconstructing linear transformations outlined in Examples 5.8 areapplicable here. Likewise the kernel and the image of a covector f ofX are defined. Since the range of a covector f is the 1-dimensionallinear space A, and Im f is a subspace of A, we have either Im f = 0 orIm f = A. It is easily seen that Im f = 0 if and only if f = 0. For thekernel of a covector, we have the following theorem.

§7 DUAL SPACE 75

THEOREM 7.2. Let X be a linear space and f a covector of X. If f isnot the zero covector, then dim X = I + dim(Ker f).

PROOF. The theorem follows from 5.17, or more explicitly it can beproved as follows. Since f is non-zero, there exists a non-zero vector yof X such that yo Ker f. If Y is the 1-dimensional subspace of Xgenerated by y, then Y and Kerf are complementary subspaces of X.Indeed Yf1Ker f = 0; on the other hand, for every x(=-X, we canwrite x = Ay + z where A = f(x)/f(y) and zE Ker f. Thereforedim X = 1 + dim(Ker f).

From the definition of the zero-vector 0EX*, we see that acovector f of X is zero if and only if f (x) = 0 for every xEX. Dual tothis, we have the following theorem.

THEOREM 7.3. Let X be a linear space. Then a vector x of X is thezero-vector of X if and only if f(x) = 0 for every covector f ofX.

PROOF. If x = 0, then obviously f(x) = 0 for all fEX*. Conversely,let f (x) = 0 for all fEX *. If x # 0, then x generates a 1-dimensionalsubspace X of X. If X" is a complementary subspace of Xin X, thenevery vector z of X can be written uniquely as z = Ax + y where AEAand yEX' A covector f of X is defined by

f(z) = A for all z = Ax + y where yEY".But f (x) = f(x + 0) = 1 0 0. Therefore the assumption that x 0 0 isnecessarily false.

For finite-dimensional linear spaces, we can extend the duality toa relation between the bases of X and the bases of X*. We say that abase xl , ... , x of X and a base fl , ... , f,, of X * are dual to eachother if(1) f (xi) = Sid for i, / = 1, ... , n,

where S,i are the Kronecker symbols.Given any base x 1i ... , x of X, the equations (1) uniquely

define n vectors fl, ... of X* that form a base of X*, as hasbeen shown in the proof of Theorem 6.2. Thus for every base of Xthere is a unique base of X * dual to it. From 7.1, we see that thebase (fl , .. . , of X * dual to the base (x1 , ... , of X consistsof the coordinate function of X relative to the base (x1, ... , ofX. It is also true that for every base of X* there is a unique base of Xdual to it; this will be shown in §7C.


B. Dual transformationsLet us now consider linear transformations. For any linear trans-

formation 0: X - Y, we get a linear transformation Hom (0, in)which shall be denoted by 0*. Therefore 0*: Y* -> X* is defined by

0*(g) = goo for every gEY*,

or diagramatically

X0 >Y

A

The linear transformation 0* is called the dual transformation of 0.

EXAMPLE 7.4. It follows from 5.8(b) that relative to a baseB=(xl, ...,xm)ofXandabaseC=(y1,. .., y,)ofY,a lineartransformation 0: X -)- Y is completely determined by the scalaraij defined as follows:

O(x'i) =ailYi + ... + ainYn m).

Denote by (f l , .. j m) the dual base of B and by (g l , ... , g,)the dual base of C. Then it is easily seen that

0*(gi) = alif, + ... + am j.fm (j = 1, ... , n)or 0*(gj)(xi) = aii (i = 1, . . . , m; j = 1, ... , n).

The formation of dual transformation has following properties:

(a) for any linear space X, (ix)* = ix.,(b) for any pair of linear transformations 0: Y - Z and 0: X -+ Y,

0*4*.

Since, for every f EX *, (ix) *(f) = f o ix = f = ix *(f), (a) holds. Weobserve that (OoO)*: Z* -* X* and for every hEZ*

(Po0)*(h) = ho(OoO) = (ho0)oO = (0*(h))oO = 0*(P*(h)) = 0*o,P*(h);

therefore (b) holds.

§7 DUAL SPACE 77

C. Natural transformationsThe operation of forming the dual space X* of a linear space X is

not just an operation on a linear space X but an entire collection ofsuch operations, one for each linear space; in other words it is anoperation D on the whole class of linear spaces. Similarly, theoperation of forming the dual transformation 0*: Y* -> X* of alinear transformation 0: X - Y can be regarded as an operation A onthe whole class of linear transformations. This and other pairs ofsimilar operations will be the subject matter of the present § 7C.

Repeated applications of the operation D on a given linear space Xgive rise to a sequence of linear spaces:

X X* X** X*** .....Suppose X is a finite-dimensional linear space, then each linear spaceof the sequence is finite-dimensional and has the same dimension asX. Thus X = X*, X = X**, ... . There does not exist, however, any"particular" isomorphism from X to X*. We may be tempted tothink that the isomorphism 'F which takes the vectors x; of a base(x,, to the corresponding covectors fj of the dual base(fi, ... , f,) is a "particular" one. But 4) behaves quite differently withrespect to other pairs of dual bases; for example, it does not takevectors of the base (x, + x2 , X2, X3, . . . , to the correspondingvectors of the dual base (f, , f2 -f1 , f3, . . . , fn ). In other words,'F depends on the choice of a pair of dual bases. But there is aparticular isomorphism tX, one which distinguishes itself from all theothers, of X onto its second dual X**. It turns out that if (xl , ... , xn )is any base of X, (fl, ... , fn) the base of X* dual to (xl, ... , xn)and (F,, . . ., the base of X** dual to (fl , . . . , fn ), then thisparticular isomorphism tX takes x; into F;. Moreover tX is seen to bejust one of a collection t of such isomorphisms, one for each finite-dimensional linear space and its second dual.

Similarly repeated applications of the operation A to a linear trans-formation 0: X -+ Y give rise to a sequence of linear transformations

0** o***

JY** Y***

It is natural to ask if similar comparison between 0 and 0** can be


made and it turns out that the pair of isomorphism tx and ty can beused for this purpose as well.

Let X be an arbitrary linear space over A. For every element x ofX, consider the mapping Fx : X* - A defined by

Fx(f) = f(x) for every fEX*.

For any f, g EX * and any A, µEA, we get Fx (Af + µg) = (Af + µg) (x)= AFx(f) + µFx (g); therefore Fx is a linear transformation. HenceFxEX**, for every xeX.

THEOREM 7.5. For every linear space X, the mapping tx : X -+ X*defined by

tx(x) = Fx, where Fx(f) = f(x) for every fEX*,

has the following properties:(i) Ix is an injective linear transformation;

(ii) for every linear transformation 0: X - Y, O**otx = ty 00,i.e. the diagram

X Y

tx ty**

X**--% Y**is commutative;

(iii) tx isan isomorphism if X is a finite-dimensional linear space.In this case tx is called the natural isomorphism betweenX and X**.

PROOF. That tx is injective follows from Theorem 7.3. Therefore for(i) it remains to be proved that tx is a linear transformation. Letx, x'EX and A, µr=-A. Then for every fEX *, we get

(AFx + AFx(f) + iaFx, (f) = Af(x) + Af(x')= f(Ax + µx') = FAx+µx' (f);

therefore tx is a linear transformation, proving (i). For (ii) weobserve that both 0**otx and tyo(a are linear transformations of Xinto Y**. For every xEX we get

(0**otx)(x) _ O**(Fx) = FxoO*

and (tyoo)(x) = ty(O(x)) = Fi(x).

§7 DUAL SPACE 79

Therefore it remains to be proved that Fx oO* and Fo(x) are identicalelements of Y**. But elements of Y** are linear transformations ofY* into A, so we have to show (F o *)(g) = Fo(x)(g) for everygEY*. Now

(FX oo*) (g) = FX (o*(g)) = Fx (goo) _ (goo) (x) = g(o(x))and Fo(x) (g) = g (o(x))

Therefore (ii) is proved. Since tX is a linear transformation, Im tX isa subspace of X**. By (i) tX is injective; therefore dim(Im tX) = dim X.On the other hand dim X = dim X** for any fmite-dimensionallinear space X. Therefore dim(Im tx) = dim X** and hence by 4.4Im tX = X**, proving (iii).

REMARKS. In §8 suitable terminology and notations for handlingsuch "large" mathematical objects such as the operations D, A and tabove are introduced. Under the categorical language there, t is anatural transformation, the pair D and A constitutes a functor and thedomain on which this functor operates, i.e., the class of all linearspaces and the class of all linear transformations, is a category.

By the first two parts of Theorem 7.5, we can always identifyevery linear space X with a subspace of its second dual X* * and as aresult of this identification each linear transformation 0 is the res-triction of its second dual o**, i.e., o = o** W. Analogous relationsbetween X*, o* and X***, o*** are similarly defined. By 7.5(iii),each finite-dimensional linear space X can be identified with itssecond dual X** and consequently 0 = o** for each linear trans-formation 0: X -* Y of finite-dimensional linear spaces. Therefore inthe sequences

X X* X**, X*** ....

o, o*, 0**, o***

we need only consider the first pairs of terms X, X* and 0, o*; theremaining ones, being naturally isomorphic copies, can be identifiedwith them.

The natural isomorphism tX : X _X* * of a finite-dimensional linearspace X onto its second dual X** will have many applications lateron. We observe that from the equality dim X = dim X** alone itfollows that X and X** are isomorphic linear spaces, i.e., there existsan isomorphism 0: X -+ X**. However this unspecified isomorphism0 may not satisfy the condition:

(o(x))(f) = f(x) for all x(=-X and fEX*.


At the end of §7A, we have seen that every base (x,, . . . , ofa finite-dimensional linear space X determines a unique dual base(f, , . . . , of X* Conversely let (g, , . . . , be an arbitrarybase of X* Then this base determines a dual base (G, , . . . , ofX**. By means of the isomorphism tX: X - X**, we get a base(y,, '.... yn) of X where tx(yi) = G; for i = 1, . . . , n. Now gi(yi)_ (tX(yi)) (gi) = G.(g1) = S for i, j = 1, ... , n. Therefore the base(g, , ... , of X* and the base (y,, of X are dual toeach other. It follows from 7.3 that the base (y, , ... , y,) is alsounique. We have thus proved the following corollary.

COROLLARY 7.6. Let X be an n-dimensional linear space. Thenevery base of X determines a unique dual base of X* and converselyevery base of X* determines a unique dual base of X.

D. A duality between .?(X) and ..'(X*)

Between the set .2'(X) of all subspaces of a finite-dimensionallinear space X and the set.t'(X*) of all subspaces of the dual spaceX* of X there is a duality from which the well-known dualityprinciple of projective geometry can be derived (see § I IH). Weexpress this duality in terms of two mappings AN: 21(X) ->2' (X*)and an: f(X*) ->9(X).

For every subspace Y of X and every subspace Z of X*, we definethe annihilator of Y and the annihilator of Z respectively as

AN(Y) = {fEX*: f(x) = 0 for all x(=-Y}and an(Z) = {xEX: f(x) = 0 for all fEZ}.It is easily verified that the annihilator AN(Y) of Y is a subspace ofX* and the annihilator an(Z) of Z is a subspace of X. In terms of theannihilators, the mappings AN: L'(X) -> -51'(X*) and an:. (X *) ---2(X)are defined and their properties are formulated in the followingtheorem.

THEOREM 7.7. Let X be a finite-dimensional linear space, Y, Y, andY2 subspaces of X and Z, Z, and Z2 subspaces of the dual space X*of X. Then the following statements hold.

(i) dim Y + dim AN(Y) = dim X.(ii) an (AN(Y)) = Y.(iii) AN(Y1) C AN(Y2) iff Y, D Y2.(iv) AN(Y, + Y2) = AN(Y,) n AN(Y2).(v) AN(Y, n Y2) = AN(Y,)+AN(Y2)

§7 DUAL SPACE 81

(i )* dimZ + dim an(Z) = dim X*(ii)* AN(an(Z)) = Z.(iii)* an(Z,) C an(Z2) iff Z1 D Z2.(iv)* an(Z1 + Z2) = an(Z1) n an(Z2).(v)* an(Z1 n Z2) = an(Z1) + an(Z2)-

PROOF. (i). By the. supplementation theorem, we can find a base(x1i ... ) xP , xp+ 1) ... , of X such that (xl , ... , xp) is a baseof Y. If (f, , ... , fp, fP+, , ... , is the dual base of the base of Xin question, then from the equations

fi(xi) = Sid for i, j = 1, ... , nit follows that the covectors fp+, , ... , f, all belong to the an-nihilator AN(Y). On the other hand, if a covector f = X1 If, + - + V.belongs to the annihilator AN(Y), then for each i = 1, ... , pwe get

0 =f(xi) _ X1f1(xi)+ ... +Anfn(xi) _ A1Si1 + ... +XnSin Xi.

Therefore the covectors fn+ I, . . . , fn form a base of AN(Y) andhence (i) is established. The proof of (i)* is similar.

(ii). It follows from definition that Y is a subspace of an(AN(Y)).On the other hand, we get dim Y = dim X - dim AN(Y) anddim AN(Y) = dim X* - dim an(AN(Y)) from (i) and (i)*. Thereforedim Y = dim an(AN(Y)) and hence Y = an(AN(Y)). The proof of (ii)*is similar.

(iii) and (iii)* follow immediately from the definition of annihi-lators and (ii), (ii)* above.

(iv). From (iii) it follows that AN(Y1 + Y2) C AN(Y1)n AN(Y2)-Conversely if f EAN(Yl) n AN(Y2 ), then f (x1 + x2) = f (x1) +f(X2) = 0for all xl + x2 Y1 + Y2. Therefore (iv) is proved and (iv)* isproved similarly.

(v). It follows from (ii) and (ii)* that the mappings AN and an areinverse of each other; therefore we can find subspaces Z, and Z2 ofX* such that AN(Yi) = Z; and an(Zi) = Y; for i = 1, 2. Using (iv) and(iv)* above, we get

AN(Y1 r) Y2) = AN(an(Z1) n an(Z2)) = AN(an(Z1 +Z2))= Z, + Z2 = AN(Y1) + AN(Y2).

This proves (v) and (v)* is similarly proved.

As an important application of 7.7 we shall show thatAN(Im 0) = Ker O* and an(Im 0*) = Ker 0


for any linear transformation 0: X -> Y of finite-dimensional linearspaces and its dual transformation 0*: Y* X*. Indeed, ifgEKerO*,then goO = 0 and hence g(O(x)) = 0 for all xEX. Therefore g E AN (Im 0).Conversely, if gEAN(ImO), then g(O(x)) = 0 for all xEX. There-fore (O*(g))(x) _ (go0)(x) = g(O(x)) = 0 for all xEX, and hencegEKer 0*. This proves the first equation and the second equation isproved similarly. From these relations between kernels and imageswe get the following theorem on the rank of a linear transformation.

THEOREM 7.8. For any linear transformation 0: X -> Y of finite-di-mensional linear spaces the equality r(o) = r(¢*) holds.

PROOF. The rank r(O) of 0 is defined as dim(Im 0); therefore r(o) _dim Y - dim(AN(Im 0)) = dim Y - dim(Ker ¢*). But for the dualtransformation O*: Y* - X* we get dim Y* = dim (Ker 0*) +r(O*).Since Y is finite-dimensional, dim Y* = dim Y, and hencer(0) = r(o*).

E. Exercises1. Let X be a linear space and let f 1i ... , fP be p covectors of

X. Prove that a covector f of X is a linear combination ofP

f1, ... , fP if and only if Ker f D n Ker f; .i=1

2. Let X be an n-dimensional linear space over A, (x 1, ... , xn) abase of X and (f1, ... , fn) a base of X* dual to (x1, ... 1X').(a) Show that (x 1 + X2, X2, ... , xn) forms a base of X. Find

the base of X* dual to (x1 +x2i x2, ... ,

(b) Show that (Xx 1, X2, ... , xn) forms a base of X for anynonzero scalar X of A. Find the base of X* dual to(Xx1, x2, ..., X10-

(c) Show that the base (x1, x2 - X2x1, ... , x - of Xand thebase (f1 +X2f2 + ...+Anf,,,f2i ...,fn)ofX*are dual to each other.

3. (a) show that every linear homogeneous polynomialaX + 13Y + yZ defines a linear function f on R3 such that

f(x, y, z) = ax + ay + yz for every (x, y, z) ER3.

Show also that every linear function on R3 can be definedin this way.

§7 DUAL SPACE 83

(b) Determine the base of (R3 )* dual to the base((2, 3, 4), (1, 0, 2), (4, -1, 0)).

(c) Determine the base of R3 dual to the base(X+2Y+Z, 2X+3Y+3Z, 3X+7Y+Z).

4. Let X be an n-dimensional linear space over A with base(x1,... , xn ). Define e1 : A - X by

ti(X) = Xx1 for every Ac=A.

Prove that if (fl, ... , fn) is the base of X* dual toLi

(X 1, , Xn), then A - X 4 A is a representation of X as adirect sum of n copies of A.

5. Let X be a finite-dimensional linear space with base(x1, ... , xn ). Let (fl, . . . , fn) be a base of X* dual to(x1 , ... , xn ). If ipE End(X) is such that

nipXi =i 1,...,n,

express p * f i (i = 1, ... , n) as a linear combination of f1, .. . ,fn .

6. Let X be an n-dimensional linear space and f 1 , ... , fn r= X *.Prove that ( 1 , ... , fn) is a base of X* if and only ifnnKerfi = 0.

i=1

7. Let X be an n-dimensional linear space and fl, ... , fn EX*.Show that if there are n vectors x 1, ... , xn of X such thatf i x = 8i/ for all i, j = 1, 2, ... , n, then (f1, ... , fn) is a base ofX* and (x1, ... , xn) is a base of X.

8. Let X, Y be n-dimensional linear spaces and let gyp: X -4 Y,

X* -> Y* be linear transformations. Show that iffx for all fEX* and all xEX, then p is an

isomorphism and i = (lp*)-1.

9. Let X be an infinite-dimensional linear space. Show thatdim(X*) > dim X.

10. Let X be a linear space. Show that X and X* have the samedimension if and only if they have finite-dimensions.


11. Let gyp: X -> Y be a linear transformation and p*: Y* -+ X* thedual transformation of gyp. Prove that if p is an isomorphism then0* is an isomorphism and (gyp*)-' = (gyp -') *.

12. Let X, Y be linear spaces and X*, Y* their dual spaces. Provethat (X x Y)* _ X* x Y*.

13. Let p: X -> Y be a linear transformation whose domain X andrange Y are both finite-dimensional linear space andpp*: Y* -> X* the dual transformation of gyp. Prove that

(a) an(Imp*) and Ker,p* = AN(Im gyp)(b) Im p= an(Ker np*) and Im p* = AN(Ker gyp)(c) r(,p) =

14. Let X be a finite-dimensional linear space and Y a subspace ofX. Prove that (a) AN (Y) a5 (XI Y)* and (b) X*/AN (Y) = Y*.Show thaf (a) and (b) hold also for an arbitrary linear space ifAN (Y) is defined as the subspace of all covectors f of X suchthat f(x) = 0 for all xEX.

§8. The Category of Linear Spaces

The `modern algebra' of the 1930s concerned itself mainly withthe structures of linear spaces, groups, rings, fields and otheralgebraic systems. These algebraic systems have been studiedaxiomatically and structurally ; furthermore, homomorphisms havebeen defined for each type of algebraic system. In the last threedecades, in which algebraic topology and algebraic geometry haveundergone a rapid development, it has become clear that the formalproperties of the class of all homomorphisms repay independentstudy. This leads to the notion of category, functor and naturaltransformation first introduced by S. EILENBERG and S. MACLANEin 1945, and to a new branch of mathematics called category theory.Recent years have seen more and more mathematics being formu-lated in the language of category theory - an event somewhatsimilar to what took place earlier in this century when an effort wasmade to formulate mathematics in the language of set theory.

In this section we shall acquaint ourselves with some basic notionsof category theory. It is not just for the sake of.following a fashion-able trend that we introduce such sophisticated notions here. The

§8 THE CATEGORY OF LINEAR SPACES 85

reason for our effort is that some methods of linear algebra cannotbe made clear just by the study of a single linear space or a singlelinear transformation in isolation; for a deeper understanding of thesubject we have to study sets of linear transformations betweenlinear spaces and the effect of linear transformations upon otherconstructions on linear spaces. To do this effectively, we need thelanguage of category theory.

If set theory and linear algebra as given in the previous sections arethought to be at the first level of abstraction from mathematicstaught at the schools, then the material of the present section is ata second level of abstraction. Readers who have not quite got used tothe first level of abstraction are adviced to omit this section. This willin no way impede their progress in this course of study since thelanguage of category theory will only be used in occasional summaryremarks.

A. Category

The immediate aim of §8 is to give a definition of natural trans-formation so that the results of §7C can be organized in a moresystematic way and viewed in a better perspective. This, however,makes a definition of functor necessary, since functors are the thingsthat natural transformations operate on. Again in order to definefunctor, we must first define category since functors are the tools forcomparing categories.

We shall first try to arrive by way of a leisurely discussion at adefinition of the category of sets as a working model of the de-finition of category. To study sets in a unified fashion, we firstaccommodate all the sets in a collection. This collection s/of all sets,as we know, is no more a set. The mathematical term for such acollection is class. Thus we begin our study with a classYMembersof this class are called objects, and in the present case they are sets.While membership of sets is a chief concern of classical set theory,this has to be of only secondary interest to the category of sets,which concerns itself mainly with relations among sets and other con-structions on sets in general. Thus together with the class.Jof allsets, we study the class of sets of mappings Map(X, Y), onefor each pair of sets. Something similar to a structure on thesetwo classes is given by composition of mappings i.e. a mappingMap(X, Y) x Map(Y, Z) -> Map(X, Z) for every triple of sets X, Yand Z which takes every pair (0, ) to the composite OoO. Thus we


find ourselves in a situation rather similar to that when we first con-sidered abelian groups earlier in § IA, when we were dealing with anon-empty set together with an internal composition law satisfyingcertain well-known conditions called axioms. Similarly we single outcertain well-known and useful properties as axioms. These areas follows:

(1) Map(X, Y) and Map(Z, T) are disjoint for X * Z or Y 0 T.(2) If 4)E Map(X, Y), >, c= Map(Y,Z) and t e Map(Z,T), then

t. (0-0) = (o4)o0.(3) For each set X, there exists iyE Map(X,X) such that

0 o ix = ¢ for each 0 E Map(X, Y)and ix oo = t i for each OE Map(Z,X).

Before we proceed to give a definition of category, we remark that anumber of important concepts of set theory can be actually re-covered from (1), (2) and (3) alone. For example the empty set 0 ischaracterized by the condition that Map(O, X) is a set with exactlyone element for each X. Similarly 0 is injective if and only if it is leftcancellable, i.e. l; = 1 whenever 4ot = Oo%j/; and 0 is surjective if andonly if it is right cancellable.

Clearly the basic properties (1), (2) and (3) are common to theclass of all linear spaces over A and the class of all linear trans-formations of these linear spaces together with composition of theselinear transformations. Furthermore these properties are also com-mon to different classes of algebraic systems. This therefore mo-tivates the following definition.

DEFINITION 8.1. A category ro consists of(i) a class, the elements of which are called objects of

(ii) a function which to each pair (X, Y) of objects of Wassigns a.set Mor W(X, Y),

(iii) a functionMorq(X,Y) x

for every triple of objects X, Y and Z ofThe function given in (iii) is called composition; its effect on a pair

is denoted by 004). The above data are subject to the followingaxioms:

[C11 MorV(X, Y) and Mor'(Z,T) are disjoint if X* Z or Y * T.


[C21 OeMorc(Y,Z)andteMorC(Z,T), thento(`Yo0) = Q. 0) - 0.

[C3 J For each object X there exists iXE Mor W(X,X) such that

Oo ix = for each OE MorW(X, Y)

and iIoo = for each OE Mor c(Z,X).

Some terminology and notation is convenient for our formulation.The elements of Mor'(X, Y) are called morphisms. Often we writeMor(X, Y) for MorW(X, Y). Instead of Or= Mor?,(X, Y) we frequently

write 0: X -+ Y or X Y. Moreover, we denote

X = D(¢) = demain of 0,Y = R (0) = range of 0.

It follows from the definition that the composite Y,o0 is defined ifand only if R(¢). In this case D(iJ/o0) = D(O) and R(io¢) _R(0).

The element ix postulated in [C3] is easily seen to be unique; it iscalled the identity of X. A morphism 0: X -> Y is called an isomor-phism if there exists 0: Y - X such that Woo = ix and 0o/ = iy. Inthis case >V is unique, and is denoted by 0-1 ; this is also an isomor-phism and (0-1)1 = 0. In this case we also say that the objects Xand Y are equivalent. If 0: X -- Y and iJi: Y -+ Z are isomorphisms,then so are oO and (lJ,o0)' = 0-'o1 -'. We leave to the interestedreader the proof of these statements.

In this way, the category Iof sets is the category where(i) objects are all the sets,(ii) Mor (A, B) = Map(A, B),

(iii) composite of morphisms has the usual meaning,(iv) iA : A -> A is the identity mapping.

The isomorphisms of this category are the bijective mappings of sets.The category j of additive groups is the category where

(i) objects are all the additive groups,(ii) a morphism 0: A -> B is a mapping such that

O(x + y) = O(x) + O(y) for all X, YEA,(iii) composite of morphisms has the usual meaning,(iv) the identity morphism iA : A - A is the identity mapping.


Finally, the category AA) of linear spaces over A is a categorywhere

(i) objects are all the linear spaces over A,

(ii) a morphism ¢. X - Y is a linear transformation,(iii) composite of morphisms has the usual meaning,

(iv) the identity morphism i1: X - X is the identity lineartransformation of X.

Here isomorphisms of .1'(A) are the isomorphisms (i.e., bijectivelinear transformations) of linear spaces over A.

The category .,Y(A) of finite-dimensional linear spaces over A issimilarly defined.

The guiding principle is therefore this: whenever a new type ofalgebraic system is introduced, we should also introduce the type ofmorphism appropriate to that system as well as the composition ofmorphisms. (p

The categories _V, Y. (A) and x (A) carry some extra structure in-herited from the algebraic structures of their objects and propertiesof their morphisms; they are special cases of additive category de-fined as follows.

DEFIMTION 8.2. A category his called an additive category if

[AC 1 ] for any two objects A, B of r,, Mor (A,B) is an additive group.

[AC2 ] for any three objects A, B and C, the mappingMor(A, B) x Mor(B, C) -> Mor(A, C) defined by (0, y) -> o¢is biadditive, i.e., for ¢, E Mor (A, B) and >y, i' E Mor(B, C)

jio(4 + 0) = >yo¢ + I/so¢' and (, + Vi') oO = t/loO + 0'4 .

Additive categories are of special interest to us, since mostcategories that appear in this course are additive categories.

Finally we remark that a category as defined in 8.1 is not neces-sarily a very "large" mathematical object. For example, a very"small" category ,/' is defined by specifying that

(i) _q has two distinct objects 1 and 2;(ii) Mor (1,1), Mor (2, 2) and Mor(1, 2) are singleton sets

and Mor(2, 1) _ 0;

(iii) composition in .9 has the obvious meaning.

§8 THE CATEGORY OF LINEAR SPACES

The category .f can be presented graphically as an arrow:

1 2

Therefore ' is called the arrow category.

89

B. FunctorWe have remarked earlier that one of the guiding principles in

category theory is that whenever a new type of mathematical objectis introduced, we should also introduce the appropriate morphismsbetween these objects. Thus we ought to specify morphisms ofcategories, so that relations between categories can be formulated;these are called functors. It is natural that given two categories wand. 9we should compare objects of dwith objects of ,won the one handand morphisms of z/with morphisms of on the other hand in sucha way that the structures given by compositions should be preserved.This state of affairs is expressed by the definition below.

DEFINITION 8.3. Let d and , be categories. A covariant functorT: d -*.Wconsists of a pair of functions (both denoted by T).

(i) The first assigns to each object X in z/an object T(X) in ,W,(ii) The second assigns to each morphism 0: X -> Y in s/a

morphism T(O): T(X) -* T(Y) in M.The following conditions must be satisfied.

[CO 1 l T(i°O) = T(i)°T(O) if D(,) = R(O).[CO 2] T(iX) = iT(X).

Given two covariant functors T: -Vl -.W and S:-W-+ W, the com-posite SoT:.S -* ?is defined by S°T(X)= S(T(X)) and S° T(¢) _S(T(¢)); S°T is also a covariant functor.

For every category ,', the identity functor I: ,j - is thecovariant functor defined by IM = X and 1(0) = ¢.

The forgetful functor F: 2(A) -> .q is an example of a covariantfunctor. This assigns (i) to each linear space its underlying set and (ii)to each linear transformation its underlying mapping. The effect ofthe forgetful functor on $f(A) is, of course, to lose entirely thealgebraic structure on the linear spaces and the algebraic propertiesof the linear transformations.


Another example of a covariant functor is the functor G of Yinto(A) that assigns (i) to each set S the linear space FS generated by

the set S, and (ii) to each mapping 0: S - S' the linear transforma-tion ¢: FS -> F. defined in 5.8(c). In general FG(S) contains S as aproper subset and GF(X) is also different from X; therefore they arenot the inverse of each other. The pair of functors F and G are relatedto each other in a way that is most interesting from the categoricalpoint of view; they are said to form a pair of adjoint functors (seeexercise 7).

For any set X, we can define a covariant functor TX: .y-* Y byassigning (i) to each set Y the set Map(X, Y) and (ii) to each mapping0: Y - Y' the mapping (D: Map(X, Y) -+ Map(X, Y') such that 4(r) =oo for every : X - * Y. The functor TX is usually denoted byMap (X, ).

Let T be a covariant functor of the arrow category q into thecategory . Then T is completely determined by T(1) = X1, T(2) = X2and the image of 0: X1 - X2 of the only non-identity morphism1 -> 2. In other words the effect of T amounts to selecting a mapping0: X 1 --> X 1 of sets, or simply that T is a mapping.

The operation D which assigns (i) to each linear space X its dualX*, and (ii) to each linear transformation 0: X --> Y its dual¢*: Y* - X* is not a covariant functor because, in reversing the orderof composition, it takes the domain and the range of 0 to the rangeand the domain of O* respectively. D is, however, a functor ofanother type, called contravariant functor.

DEFINITION 8.4. Let .S/and R be categories. A con travarian t functorU: / -> M consists of a pair of functions (both denoted by U).

(i) The first assigns to each object X of _W an object U(X)of M.

(ii) The second assigns to each morphism 0: X -+ Y in .

a morphism U(O): U(Y) -> U(X) in

The following conditions must be satisfied.

[CON 1 l U(04) = U(O)o UM if D(>G) = D(O).

[CON 2] U(iX) = iU(x)

D above is easily seen to be a contravariant functor of 2' (A) into9(A); more on this functor will be said in § 8C. Another example of acontravariant functor is the functor U: ,y' -> ,' for a fixed Y which


assigns (i) to each set X the set Map(X, Y), and (ii) to each mapping0: X - X' the mapping Map(X', Y) - Map(X, Y) such that*() = to p for every t: X' -> Y. The functor Uy is usually denotedby Map( , Y).

The covariant functor Map(X, ) and the contravariant functorMap( , Y) suggest that Map( , ) can be regarded as a functor in twoarguments both in .9with values in J. . Functors of this type are calledbifunctors; we leave the exact formulation of a definition ofbifunctor to the interested reader.

Finally for additive categories, we also consider a special type of(covariant or contravariant) functor.

DEFINITION 8.5. A functor T: of additive categories is calledan additive functor if

T(4> + 0) .= T(4>) + T(>I')

for any morphisms ¢ and Vi of J/ such that D(4>) = andR(0) =R(i1i).

Following the method of constructing Map(X, ) and Map( , Y),we can easily set up functors Hom(X, ): .q'(A) -+ A(A) andHom( , Y): _V(A) -* '(A). By bilinearity of composition of lineartransformations, it follows that Hom(X, ) is a covariant additivefunctor and Hom( , Y) a contravariant additive functor.

C. Natural transformation

We have seen that D: AA) -* AA) which assigns to each linearspace X its dual space X* and to each linear transformation 0: X -> Yits dual transformation O*: Y* - X* is a contravariant functor calledthe dual functor. The composite D2 = DoD:.(A) - -V(A), called thesecond dual functor, is then a covariant functor such that D2 (X) _X** and D2(0) _ 4>**: X** -> Y**.

In § 7C we have constructed for each linear space an injectivelinear transformation tX : X -)- X**, and we have seen that this con-struction which, unlike the methods of construction in 5.8, is inde-pendent of choice of a base of X. In fact tX is defined by tX (x) = FXfor every x eX such that FX (f) =f(x) for every f EX*. Moreover forany linear transformation 0: X -+ Y, the corresponding transforma-tions tX and ty on the domain and range of 0 are such that ty o 0 =


v**otx, i.e. the diagram below

X

Y

is commutative. Denoting by 1:A(A) -+Y(A) the identity (covariant)functor of .rte (A), we can rewrite the diagram above as

Thus we see that the left column of the diagram expresses the effectof the functor I on the morphism 0: X -- Y in Y(A) and the rightcolumn that of the functor D2 on the same morphism ¢; thereforethe collection t of morphisms tx (one for each object X of .I(A)) inAA) provides a comparison between the functors I and D2. In otherwords, t can be regarded as a morphism t: I -> D2 of functors. Amorphism of functors is called a natural transformation; the generaldefinition is as follows.

ty

tx

ty

+ D2(X)

D 2(0)

D2(Y)

DEFINITION 8.6. Given covariant functors, T, S: A -*. a naturaltransformation 4): T -- S is a function which to each object X in A.assigns a morphism 4)(X): T(X) -*S(X) in .W in such a way that foreach morphism 0: X -* Y in-(/the diagram

T(X)

T(O)

T(Y)

(WX)

4)(Y)

8(X)

S(q)

S(Y)

is commutative, i.e. S(0)o4)(X) = 4)(Y)oT(0).


If (P: T - S and' : S -+ R are natural transformations of functors,then (4wb) (X) _'(X)o-t(X) is a natural transformation 'Y°4: T --> R.With the categories W and . kept fixed, we may regard functorsT: d- as "objects" of a and natural transformations: T - S as morphisms of Y', since composition of natural trans-formations defined above satisfies axiom [CI ] - [C3] of cateogry.Consequently a natural transformation 4): T - S is an isomorphismin 5I or a natural isomorphism if and only if each O(X) is anisomorphism in, ' in which case ¢"1 (X) = ((D(X))"1 .

Restricting consideration to the category 2'(A) of finite-dimensional linear spaces and denoting by I, D the identity and thedual functor of .'(A), we see that t: I -* D2 is a natural isomorphismof functors of A(A).

Finally we leave the formulation of natural transformation be-tween contravariant functors to the interested reader, and concludethis section with a remark that in general we do not compare acovariant functor with a contravariant functor.

D. Exercises1. Let' be a category. An object T of Vis called a terminal object

if for every object X of 'the set Mor(X, T) is a singleton. Provethat any two terminal objects of Ware equivalent. What are theterminal objects of the categories..X,9l(and-V(A)?

2. Let (be a category. An object I of Wis called an initial object iffor every object Y of the set Mor(I, Y) is a singleton. Provethat any two initial objects of Ware equivalent. What are theinitial objects of the categories. J/andY(A)?

3. Let'be an category and A, B two fixed objects of le(a) Prove that a category 9 is defined by specifying that

(i) objects of are ordered pairs (X -+ A, X -* B) ofmorphisms of F, and

(ii) morphisms (X -> A, X -> B) -> (Y - A, Y -+ B) of -1are morphisms X -> Y of 'such thatXA

BYis commutative, and

(iii) composition has the same meaning as in


(b) A final object of . is called a product of A and B in W.Prove that products of any two objects exist in thecategories Y, d and t(A). Compare the category-theoreti-cal definition of product with the set-theoretical definitionof product in each of these three cases.

4. Let4 be a category and A, B two fixed objects of W.(a) Prove that a category 9 is defined by specifying that

(i) objects of W are ordered pairs (A -' X, B -+ X) ofmorphisms of<, and

(ii) morphisms (A --' X, B --> X) - (A - Y, B - Y) ofare morphisms X -* Y of such that

BAYis commutative, and

(iii) composition has the same meaning as in C.(b) An initial object of `fo' is called a coproduct of A and B in

Prove that coproducts of any two objects exist in thecategories. dand'(A). what is the corresponding set-theoretical concept in each of these three cases?

5. Let le and -q be two categories. Prove that a category x _q isdefined by specifying that(i) objects of Yo" x Y are ordered pairs (A, B) where A E'and

BE-Y,(ii) morphisms (A, B) - (A', B') of ' xg are ordered pairs

(A -* A', B -+ B') of morphisms,

(iii) composition in lexg is obtained by forming compositionsof components.

Wx 9 is called the product of 'and.9J.

6. Let S: ` r te ' - > ?and T: -q - ' be covariant functors. Prove thatS x T: 15x9 -W' x.'' ' defined by

(S x T)(A, B) = (S(A), S(B)) for every object (A, B).

(S x T)(f g) = (S(f), S(g)) for every morphism (f,g)

is a convariant functor.


7. Let F: -V(A) ->,9be the forgetful functor and let G: 9-3t(A) bethe functor defined in §8B which takes every set S to the freelinear space FS generated by S over A. Prove that for everyXE Y (A) and every Sc 9.

tx,s : Map (S, F(X)) -> Hom (G(S), X)

which takes every mapping p to its linear extension gyp: Fs -> X isa bijective mapping. Prove also that the following diagrams arecommutative:

Map(S, F(X)) -* Hom(G(S), X) Map(S, F(X)) , Hom(G(S), X)

Map(S',F(X )) - Hom(G(S'), X) Map(S, F(X')) -> Hom(G(S),X').

CHAPTER III AFFINE GEOMETRY

To define the basic notions of geometry, we can follow the socalled synthetic approach by postulating geometric objects (e.g.points, lines and planes) and geometric relations (e.g. incidence andbetweenness) as primitive undefined concepts and proceed to buildup the geometry from a number of axioms which are postulated togovern and regulate these primitive concepts. No doubt this approachis most satisfactory from the aesthetic as well as from the logicalpoint of view. However it will take quite a few axioms to develop thesubject beyond a few trivial theorems, and the choice of a system ofaxioms which are both pedagogically suitable and mathematicallyinteresting does present some difficulty at this level.

One alternative to the synthetic approach is the so called analytic(or algebraic) approach which puts the geometry on an algebraicbasis by defining geometric objects in terms of purely algebraicconcepts. This approach has the advantage of having enough pro-perties implicitly built into the definitions, so that the axioms of thesynthetic geometry become theorems in the analytic geometry. Thusthe initial difficulties encountered in the. synthetic approach iscircumvented. This approach is also more appropriate to the presentmainly algebraic course since it lends itself to bringing out clearly theinterplay of algebraic and geometric ideas.

§9. Affine Space

As a prototype affine space, we shall take the ordinary plane E.Here we shall concern ourselves chiefly with incidence and paral-lelism in E and not too much with its metric properties, i.e. proper-ties that involve length, angle measurement and congruence.

To study the geometry of E algebraically, we can first choose apair of rectangular coordinate axes and then proceed to identify thepoints of E with elements of the set R X R. This is the familiarmethod followed by secondary school coordinate geometry and wehave seen that geometrical theories have been greatly assisted by theuse of coordinates. Unfortunately the success of these tools has oftenover-shadowed the main interest of the geometry itself which lies

96

§9 AFFINE SPACE 97

only in results which are invariant under changes of the coordinatesystems.

It is therefore desirable to begin our study of affine geometry witha coordinate-free definition of affine space. We have seen at thebeginning of Chapter 1, that by choosing an arbitrary point 0 as theorigin, we obtain a one-to-one correspondence between the set E andthe linear space V of all vectors of E with common initial point 0.To identify points of E with vectors of V would be just as unsatis-factory as with vectors of R2 since this would give undue prominenceto the point 0 which is to be identified with the zero vector of V.Consider now the linear space V' of all vectors of E with commoninitial point P, where P is an arbitrary point of E, and proceed to com-pare the linear spaces V and V' geometrically. In the affine geometryof E, not only are points regarded as equivalent among themselvesbut also parallel vectors are regarded as equivalent, since a vector canbe taken to a parallel vector by an allowable transformation. Nowthis relation between parallel vectors can be formulated by means ofan isomorphism Op : V' -> V defined as follows. For any vectorx = (P, Q) of V', we define Op (x) = (0, R) as the vector of V such thatthe equation x + (P, 0) = (P, R) holds in V', or in other words, PQROis a parallelogram in the ordinary plane E.

0 R

iP x Q

The relation between ordered pairs of points of E and vectors of Vcan now be formulated by means of a mapping r: E x E --> V definedby

T(P, Q) = ¢p(P, Q) for all P, QeE.

We note that the dependence of V on the choice of the origin 0 canbe regarded as merely a matter of notation so far as the mapping r isconcerned.

The mapping r is easily seen to satisfy the following properties:

(1) r(P,Q)+T(Q,R)=,r(P,R) forallP,Q,ReE;(2) for each PEE, the mapping Tp: E -> V, defined by

Tp(Q) = T(P, Q) for all Qe,E, is bijective.

98 III AFFINE GEOMETRY

A. Points and vectorsFollowing the intuitive discussion above, we shall begin our study

of affine geometry with a formal definition of affine space.

DEFINITION 9.1. Let X be a linear space over A. An affine spaceattached to X is an ordered pair (A, r) where A is a non-empty setand r: AZ -+ X is a mapping such that the following conditions aresatisfied:

[Aff 1 ] r(P, Q) + r(Q, R) = r(P, R), for any three elements P, QandR of A;

[Aff 2] for each element P of A the mapping rp: A - X, suchthat rp(Q) = r(P, Q) for all QUA, is bijective.

It follows from the definition that an affine space (A, r) attachedto a linear space X over A consists of a non-empty set A, a linearspace X over A and a mapping r that satisfies the axioms [Aff 1 ] and[Aff 2] ; however, if there is no danger of misunderstanding, we shallsimply use A as an abbreviation of (A, r) and say that A is an affinespace. The dimension dimA of an affine space A attached to thelinear space X is defined as dimX. It is convenient to refer toelements of the set A as points of the affine space A and elements ofthe linear space X as vectors of the affme space A, while elements ofA are called scalars as usual. For any two points P and Q of the affinespace A we write P = r(P, Q) and call P and Q the initial point andthe endpoint of the vector PQ respectively.

Under these notations, we can give the axioms [Aff l ] and [Aff 2]a more geometric formulation:

[Aff 1 ] PZ + OR = A for any three points P, Q and R of theaffine space A.

[Aff 21 for any point P and any vector x of the affine space Athere is one and only one point Q of A such that P = X.

By the second axiom above, we can denote by P + x the unique pointQ of A such that PP = X.

Let us now verify some basic properties of points and vectors ofan affine space.

§9 AFFINE SPACE 99

THEOREMA. Then

9.2. Let A be an affine space and P, Q any two points of

(i) PP = 0,(ii) PQ = -QP and(iii) P=Qif PQ =0.

PROOF (i) and (ii) follow immediately from a substitution of Q = Pand a substitution of R = P in [Aff 1 J respectively. To prove (iii) weobserve that P7 = PP = 0. Therefore P = Q follows from [Aff 2 ] .

Before we study some examples of affine spaces, let us observethat the mapping r:A2 -+ X which defines the structure of an affinespace on the set A is not a composition law in the sense of § IA, andhence the affine space (A, r) is not an algebraic system in the senseof § IA. However, for each affine space A we can define an externalcomposition law a: A x X -> A in the following way:

a(P,x) = P + x for all Pr---A and all x(=-X.

This composition law a obviously satisfies the following re-quirements:

(a) for any P and Q in A there existsxeX such that v(P,x) = Q;(b) o(P,x+y) = o(o(P,x),y) for all PPA and x,yeX.(c) for any xeX,x=O if and only ifa(P,x)=PforallPeA.

In this way we get an algebraic system (A, a) associated to the affinespace A.

Conversely if (A, a) is an algebraic system where A is a non-emptyset, X is a linear space over A and a: A x X -+ A satisfies therequirements (a), (b) and (c), then we verify by straightforwardcalculation that for all PeA the mapping x - o(P,x) is a bijectivemapping of X onto A. In other words, for any two elements P and Qof A there is a unique vector x of X such that Q = u(P,x). Thereforewe get a mapping r: A2 -> X such that

r(P,Q) = x where Q = o(P,x).

It is easy to verify that (A, r) is an affine space attached to X andfurthermore a(P,x) = P + x for all PeA and xeX.

100 111 AFFINE GEOMETRY

EXAMPLE 9.3. For each linear space X over A we define a mapping r:X2 -+ X by setting

-r (,Y, y) = y - x for all x, yEX.

The axioms [Aff 1 ] and [Aff 2] are easily verified, and therefore(X,r) is an affine space attached to X. We call this the canonicalaffine space attached to X and denote it by Aff(X). In particular, theaffine spaces Aff(R") and Aff(C".) are the affine spaces we study inordinary coordinate geometry.

EXAMPLE 9.4. Consider a subspace Y of a linear space X over A. If zis a fixed vector of X, then the subset A = {xeX: x-zEY) of X is anelement of the quotient space X/Y. Moreover A does not, in general,constitute a subspace unless z = 0, and in this case A = Y. If we definer: A2 -> Y by r(x,y) = y - x for all x, yEA, then (A,r) is an affinespace attached to Y. We can illustrate this state of affairs by takingX = R2, y = { (ai, a2): ai = a2) and A = { (ai,a2 ): a2 = ai - 1) .

Fig. 5

EXAMPLE 9.5. Let X be a linear space over A and A an affine spaceattached to X. Unlike the zero vector 0, which plays a distinguishedrole in the algebraic structure of X, no particular point of the affinespace A distinguishes itself from the other points of A. However ifwe choose a fixed point P of A as a point of reference, then bymeans of the b{jective mapping rp: A -> X (Tp(Q) =P6) we canidentify the set A with the set X. As a result of this identification, A

§9 AFFINE SPACE 101

becomes a linear space over A with respect to the following additionand multiplication:

Q + R = S where P = PQ + A

XQ=T where PteF= PQ-

We call this linear space A the linear space obtained by choosing P asthe origin. The origin P now plays the role of the zero vector and themapping Tp is now an isomorphism.

B. Barycentre

Corresponding to the concept of linear combination of vectors ofa linear space, we have here in affine geometry the concept ofbarycentre of points of an affine space. With the aid of this conceptwe shall be able to formulate most of our results in affine geometryin terms of points of the affine space alone.

Let us consider k (k > 0) points P1, P2, ... , Pk of an affine spaceA and k scalars A1, A2, ... , Ak of A such that Al + A2 + ... + Ak = 1.Choose an arbitrary point P of A and consider the linear com-bination Al PP1 + A2PP2 + + AkP . By axiom [Aff 21, thereexists a unique point R of the affine space A such that

PZ = A1P'1 + A2P12 + ... + AkPPk.

We want to show that the point R is uniquely determined by thepoints P1 and the scalars A, and is actually independent of the choiceof the point P. Indeed, for any point Q of A, we get

QR = QP + PR=(A1 +A2 +...+Ak)Q0+(AiPP1 +A2P'2=A1 (Q'+P1°1)+A2(QP+PP2) + ... +Ak(QP+Ppk)= Al QP1 + A2 QJP2 + ... + AkQPk

Therefore the point R does not depend on the choice of the point P;in particular, by choosing P = R, we see that the point R is uniquelydetermined by the equation

A1RP1 + A2RP2 + ... + AkRPk = 0.

Hence the following definition is justified.


DEFINITION 9.6. Let X be a linear space over A and A an affinespace attached to X. For any k (k > 0) points P1, P2, ... , P of Aand any k scalars Al , A2 , ... , Ak such that Al + A2 + .. + Xk = 1,

the unique point R of A such that Al RP1 + X2RP2 + .. + Ak RPk = 0is called the barycentre of the points P1, P2, ... , Pk corres-ponding to the weights X1, A2, ... , X. In this case we write

R=A1P1+A2P2+...+XkPk.We observe that in writing R = X1 PI + X2P2 + ... + XkPk, i.e., R

is the barycentre of P; corresponding to X,, the condition thatAl + A2 + ... + Ak = 1 must be fulfilled as it is required by thedefinition. The following example will illustrate this state of affairs. Inthe ordinary plane, let P1 and P2 be two distinct points. If Al = A2 = 1(so that, Al + X2 * 1) and if we take two distinct points P and Paspoints of reference, then the points R and R' determined respectively by

P Z = PP1 + PP2

-h IP P2

are also distinct.

Fig. 6 P

Hence the process does not give a unique point corresponding to P1and P2, so that we cannot use it to attach a meaning to PI + P2 .

On the other hand if Al = A2 = %z (so that, Al + A2 = 1), the point

M = 1/zPI + '/2 P2


has a definite meaning, i.e., the barycentre of P1 and P2 corres-ponding to the weights 1/2 and 1/z. In fact M is the midpoint of thepoints P1 and P2 in the ordinary sense.

In general, for any k points P, , P2, ... , Pk of an affine space, thebarycentre

C = kP1 + kP2 + + k'k

is .called the centroid of the points P, , P2, ... , Pk .

For k = 3, we get

C= 1P1 + 1P2 + 1P33 3 3

and therefore

P3C = 1 (P3 P, + P3P2).3

Hence, if P1, P2, P3 are three points on the ordinary plane then thecentroid C is the centroid of the triangle P1 P2 P3 in the ordinarysense. P3

P3 P1 + P3 P2

Fig. 7

For formal calculations with barycentres, a distributive law holds:

P q q p() Ep1 (EX--Pj) = E (Eµ;A;j)Pj11

i=1 j =1 j=1 i=1

where, for all i = 1 , ..., p and j = 1 , ... , q,P1 is a point of anaffine space A and p;, Xij are scalars such that

E, = 1 and E A;j = I for all i = 1, ... , p.i i


We observe that Q, = EA11Pi is a barycentre for all i = 1, ... , p andI

hence on the left-hand side of (*) we get the barycentre Q = Eµ,Q,which is characterized by `

µ1QQ1 = 0.

The assumption on the scalars µ1 and Aii implies that EµiXii = 1,i.i

and hence on the right-hand side of (*) we get the barycentreP = E (E µi A1i)Pi which is characterized by

i i

E (Eµ,A,i)PY = 0.

Therefore, by the distributive law of linear combinations of vectors,we get

PQ = Eµ1PQi = Eµ1(EA11PP) = E (Eµ1Aii)PPi = 0i i i i i

and hence P = Q, proving the distributive law (*) of barycentres.It is now straightforward to extend Definition 9.6 of barycentre to

a definition of barycentre of an arbitrary non-empty family of pointsof an affine space. Let (Pi)iEI be a non-empty family of points of anaffine space A and ()1)iet a family of scalars of finite support suchthat ;i i = 1. Then there exists a unique point R of the affine space

A such that

PR = EAifi °i for all PEAiEl

or equivalently

PEI

R is then called the barycentre of the points Pi corresponding to theweights A, (iEI), and we write

R = EA,P1.iEl

C. Linear varieties

With the aid of the concept of barycentre, which plays a role inaffine geometry similar to that played by the concept of linear com-


binations in linear algebra, we are in a position to generalize theordinary notions of "straight line" and "plane" in the elementarygeometry.

DEFINITION 9.7. Let X be a linear space over A and A an affinespace attached to X. A subset V of A is a linear variety of the affinespace A if for any k(k > 0) points P1, ... , Pk of V and any k scalarsX, , ... , Ak of A such that X, + ... + Ak = 1, the barycentreX 1 Pl + ... + X Pk belongs to V.

It follows from the definition that the empty set 0 and the set Aitself are linear varieties of affine space A. The intersection of anyfamily of linear varieties of A is clearly also a linear variety of A.However, the union of linear varieties is, in general, not a linearvariety.

EXAMPLE 9.8. Let A be an affine space and S a subset of A. Itfollows immediately from the distributive law of barycentres that thesubset V of A consisting of all barycentres P of points of S (i.e.,P= E XiPi where P;ES and (X is a family of finite support such that

iElEii = 1) constitutes a linear variety of the affine space A. We call V

the linear variety of A generated or spanned by points of S. ClearlyV is the smallest linear variety (in the sense -of inclusion) that in-cludes the given set S. In particular the empty linear variety 0 isgenerated by the empty set 0 and the linear variety generated by asingleton subset {P} is (P) itself. For practical purposes, we usuallymake no distinction between the linear variety (P) and the point P.

Let us now examine the formal relations between the linearvarieties of an affine space A and the subspaces of the linear space Xto which A is attached.

THEOREM 9.9. Let X be a linear space over A and A an affine spaceattached to X. For any non-empty subset V of A the followingproperties are equivalent:

(i) V is a linear variety of A;

(ii) for any fixed point P of V the set of all vectors PQ whereQEV is a subspace of X;

(iii) there exists a point P of V such that the set of all vectorsP72 where QeV is a subspace of X


PROOF. (i) (ii). Since PP = 0, the set of vectors in question isclearly non-empty. It remains to be shown that for any two points Q,R of V and any two scalars A, p of A there exists a point S in V suchthat P = APP + µP2. Since P, Q and R belong to V and V is a linearvariety, the barycentre S = (I - A - µ) P + XQ + µR belongs to V andsatisfies the equation P = AP + 01.

The implication (ii) (iii) is trivial.(iii) (i). Let P1i ... , Pk be k points of V and A1, ... , Ak k

scalars of A such that AI + ... + Ak = 1. We have to show that thebarycentre R = Al PI + + AkPk belongs to V. By+(iii) there existsa point in V such that PQ = AIPPI + ... +AkPPk. On the otherhand, P = AIP0°I + + AkP°k ;therefore P = A. Hence R = Qbelongs to V.

If P and P are any two points of a non-emptyylinear variety V ofA, then by 9.9 we get two subspaces Y = {PQ: QEV} and Y'_ {P'Q: QEV } of X. Now it follows from the equation

P6= P,QP7Pthat Y C Y'. Similarly Y' C Y, and hence Y' = Y. Moreover it followsthat Y = {PQ: PEV and Q(=- V) ; in other words Y is the subspace Xconsisting of all vectors whose initial points and endpoints both be-long to V. Therefore the non-empty linear variety can be regarded asan affine space attached to the linear space Y. Consequently thenon-empty linear varieties of an affine space A can be regarded as the"subspaces" of the affine space A.

We define the direction of a non-empty linear variety V of anaffine space A as the linear space Y = (PQ: PEV and Q e V). Thus thedirection of a point is the zero linear space 0 while the direction ofthe entire affine space A is the linear space X.

It is easily verified that given a subspace Y of X and a point P of A,there exists a unique linear variety V with direction Y and passingthrough (i.e. containing) P. In particular if A = Aff(X), then thelinear varieties of A with a fixed direction Y are precisely theelements of the quotient space X/Y.

With the aid of directions, we can define parallel linear varietiesand the dimension of a linear variety. Two linear non-empty varietiesVI and V, with directions Y, and Y,, respectively, are said to be


parallel if Yl D Y2 or Yl C Y2. The dimension of a non-emptyvariety V (notation: dim V) is defined to be the dimension of itsdirection. For the empty variety 0 we put dim 0 _ -1. By Theorem4.4, we get the following theorem.

THEOREM 9.10. If a linear variety V is included in a linear variety W,then dim V < dim W. Furthermore dim V= dim W if and only if V = W.

In accordance with the usual notation in elementary geometry, wecall 1-dimensional linear varieties lines and 2-dimensional linearvarieties planes. An (n-1)-dimensional linear variety of an n-dimen-sional affine space A is called a hyperplane of A. In general if A is anaffine space attached to X and V a linear variety of A with directionY and if dim(X/Y) = 1, then we say that Y is a hyperplane of A.

D. Lines

We now wish to see whether the points and lines of an affine spacehave the usual properties of the "points" and "lines" in elementarygeometry. As in elementary geometry, for any point P and any line Lof an affine space A we say that P lies on L or L passes through P ifPEL.

By definition the direction of a line contains non-zero vectors;therefore on a line of an affine space there lie at least two distinctpoints.

If P and Q are two distinct points of an affine space A, then thelinear variety L generated by P and Q consists of all barycentresR = ( 1 - X) P + XQ. Thus P i t = XP ; but on the other hand P? * 0 andtherefore the direction of L has the dimension 1. Hence L is a linepassing through P and Q. By 9.10, L is the only line passing throughP and Q. Therefore through any two distinct points of an affinespace there passes one and only one line. Consequently it followsthat two lines of an affine space are equal if and only if they havetwo distinct points in common.

Similarly, we can prove that given a line L and a point P of anaffine space there passes through P one and only one line parallel toL.

In Theorem 9.9 we have an algebraic characterization of linearvarieties in terms of vectors; the next theorem gives a geometriccharacterization of linear varieties in terms of lines.


THEOREM 9.11. A subset V of an affine space A is a linear variety ofA if and only if for any two different points P and Q of V the linepassing through them is included in V.

PROOF. The condition is clearly necessary for V to be a linearvariety. To prove the sufficiency, we may assume V to be non-empty, for otherwise the theorem is true trivially. Let P be a point V.Then by 9.9 we need only show that the vectors PQ, where Q E V,form a subspace of the linear space X to which A is attached. Inother words, we prove that for any two points Q and R of V and forany scalar A

(i) PS = APQ for some SE V

(ii) PT = PQ + PR for some TeV .

Now, if Q =P, then (i) is trivial. Suppose Q *P, then S = (l - A)P + XQlies on the line passing through P and Q. Therefore S belongs to V andsatisfies APQ = PS. To prove (ii), let T = 2M - P where M = %2 Q + 1/ Ris the midpoint of Q and R. T belongs to V; furthermore PT = 2PM= 2 ('hPQ +'/2PR) = PQ + PR. This completes the proof of thetheorem.

E. BaseConsider k + I points Po, P1, ... , Pk of an affine space where

k > 0. If V is the linear variety generated by the points P;(i = 0, 1, ... , k) and Y is the direction of V, then the linear space Y isgenerated by the k vectors PoPI , P0P2, ... , PoPk. By definitiondim V = dim Y; therefore dim V S k; furthermore dim V = k if andonly if the k vectors PoPI, POP2, ... , PoPk are linearly independent.These results lead us naturally to a definition of linear independenceof points of an affine space.

DEFINITION 9.12. A family (Po, P1, ... , Pk) of k + 1 points of anaffine space where k > 0 is linearly independent if the linear varietygenerated by the points Pi (i = 0, 1, ... , k) has dimension k;otherwise, it is linearly dependent.

In the usual way, an infinite linearly independent family ofpoints can be defined. Similar to the situation in §2B, we can also


say that the points PO, P1, ... , Pk are linearly independent if thefamily (PO , P1i ... , Pk) of points is linearly independent. Thus asingle point PO is always linearly independent, and so are also twodistinct points Po and P1. Three points P, P1 and P2 are linearlydependent if and only if they are collinear, i.e., they lie on one andthe same line. Finally it is easy to verify that subfamilies of alinear independent family of points are linearly independent.

With the aid of the notion of linear independence, the concept ofa base of a linear variety is defined modelling on Theorem 2.6.

DEFINITION 9.13. A family (PO , P1, ... , Pk) of points of a linearvariety V is a base of V if it is linearly independent and the points P1(i = 0, 1, ... , k) generate V.

Bases of infinite-dimensional linear varieties are defined similarly.With respect to a base, coordinates can be assigned uniquely to

points of a linear variety. This state of affairs is dealt with in thefollowing theorem.

T!EOREM 9.14. Let (PO , P1, ... , Pk) be a family of points of alinear variety V. The following statements are equivalent:

(i) the family (PO, P1 , ... , Pk) is a base of V;

(ii) every point Q of V has a unique representationQ=AOPo+A1P1+ ... +AkPkwhere X.+A,+ ... +Ak= 1as a barycentre of the point P, P1, ... , Pk.

PROOF. (i) (ii). Since the points P; generate V, every point Q of Vis a barycentre Q = Ao P + A, P, + .. + XkPk. The pointsPo4PI , . , Pk are linearly independent; therefore the vectors o Pi ,

Pk are linearly independent. Hence it follows from the equationP ,Q = AI PoPI + + AkPPk that the k scalars A1, ... , Ak areuniquely determined by Q. But so is Ao since X. + Al + + Ak = I.

(ii) (i). Since every point Q of V is a barycentre of the pointsPo , P, , , Pk, the k + I points P; generate V. Hence the k vectorsPoPI , PoPk generate the direction Y of V; and it remains to beproved that these vectors aie linearly independent. SupposeA0, + ... + AkP, Pk = 0. Then putting µo = 1- (11 + ... + pk), we getµ0POPO +µ,P0PI + +µkPoPk =0, and hence P0=pope+Al P1+ +µkP .By (ii),

µo 1 and µI = µ2 = = µk = 0. Thereforethe vectors POP,, ... , PoPk are linearly independent.

1 10 111 AFFINE GEOMETRY

Consequently, with respect to a base (Po, P, , . . . , Pk) of the linearvariety V every point Q determines a unique family of scalars

Ak) such that

Q = A0P0 + A, P, + ... + AkPk and Ao +X1 + ... + Ak = 1;

and vice versa. We can therefore call the family (A0, A, , . . ., Ak) ofk + I scalars the barycentric coordinates of the point Q relative tothe base (Po, P1, . . . , Pk ). We note here that the number k + 1 ofbarycentric coordinates of a point of V is by 1 greater than thedimension dim V = k of V and that the barycentric coordinates aresubject to the condition that their sum is equal to 1.

In the ordinary plane E, any three non-collinear points Po , P, andP2 form a base (Po, P11 P2) of E. Relative to this base the points P,P11 P2 have barycentric coordinates (1, 0, 0), (0, 1, 0), (0, 0, 1)respectively. The midpoint M of P, and P2 has barycentric co-ordinates (0,1/2,'/2) while the centroid C of the triangle P0P1 P2 hasbarycentric coordinates (1 /3, 1/3, 113).

P, (1,0,0) P2 (010, 1)

Let A be again an affine space and (Po, ... , Pk) a base of A. If Qis an arbitrary point of A, then the parallel coordinates of the pointQ of an affine space A relative to the base (Po , ... , Pk) are definedto be the family (µ, , ... µk) of k scalars such that

POQ = p1P0P1 + ... + µkPoPk


Here the point P. plays the role of the origin of the coordinate system.Clearly this method is essentially the same as that of secondaryschool coordinate geometry. For example for the points P., P1, P2and M of E above, we get ('h,'/2) as the parallel coordinates of Mrelative to (P,,, P1, P2 ).

F. Exercises

1. Four points P, Q, R, S in an affine space is said to form aparallelogram PQRS if QP = S k. Show that if PQRS is a paralle-logram, then the midpoint of P and R and the midpoint of Qand S coincide.

2. Let T, U, V, W be four points in an affine space. Let the pointsP,Q,R,S be the midpoints of T and U, U and V, V and W, W andT respectively. Prove that PQRS is a parallelogram.

3. Let (Po, P1,. .. , P,) be a base of an n-dimensional affine space.Let (x 1, ... , xn) denote the parallel coordinates of an arbitrarypoint P of the affine space relative to this base. Show that theset of points whose parallel coordinates satisfy the linearequations

«11X1+...+«lnXn - A(*)

«,1X1+....+anX.=j,

is a linear variety. Conversely, prove that if L is a linear varietythen scalars ark and P; can be found such that the points of L areprecisely those points whose parallel coordinates satisfy thelinear equations M.

1 12 111 AFFINE GEOMETRY

4. Let L be a linear variety of an affine space and P a point not onL.

(a) Consider the set of all lines passing through P and a pointof L. Let M be the set of all points on these lines. Is M alinear variety?

(b) Let A be a fixed scalar. Let N be the set of all points S suchthat PT = APQ for some point Q of L. Is N a linear variety?

5. Determine all pairs (p, q) of natural numbers for which a pair L,M of linear varieties of an n-dimensional affine space exists suchthat dim L = p, dimM=q andLr)M=Q.

6. Let L and M be linear varieties of an affine space and let N bethe set of barycentres AP + pQ wherePEL and QeM. Show thatN contains every line joining a point of L to a point of M. Showthat N is not necessarily a linear variety.

7. Let S and T be skew lines in a 3-dimensional affine space. Showthat there exist a unique plane PS containing S and parallel toT, and a unique plane PT containing T and parallel to S. Showalso that PS and PT are parallel.

8. Let X be a linear space and xieX for i = 1, ... , r. Denote by Ythe subspace of X generated by the r-1 vectors xi - x,( i = 2, ... , r). Denote by L the linear variety of Aff(X) spanned bythe points x 1 , . . . , x, of Aff(X) . Show that the points of L areprecisely those vectors of X of the form x, + y where y E Y.Deduce that Y is the direction of L.

9. Let P, , ... , P,, Pr+, , ... , PS be points of an affine space.Show that the line through the centroid of P, , ... , P, and thecentroid of P,+, , ... , Ps passes through the centroid ofP, , ... , PS . Consider the different cases for s = 4.

10. Let P and Q be two points of an affine space. By the segmentPQ we understand the set of points of the form Al' + µQ whereA, µ > 0 and A + p = 1. A subset S of an affine space is said tobe convex if for any two points P, Q of S all the points of thesegment PQ belong to S. Give examples of convex subsets andexamples of subsets which are not convex.

11. Show that if S is a subset of an affine space, then there is asmallest convex subset S containing S. Show that S is unique. (Sis called the convex closure of S).

§10 AFFINE TRANSFORMATIONS 113

12. Let P, , . . . , P, be r points of an affine space prove that theconvex closure of the subset (PI , , P,) consists of all pointsof the form XIPI + ... + X P, where X, > 0 and Al + ... + A, = 1.

13. Let L be a 2-dimensional linear variety in a 3-dimensional affinespace A. Show that the points of A \L are partitioned into twodisjoint subsets called half spaces such that every pair of pointsof one and the same half space can be joined by a segmentwithout passing through L.

14. Let L be a 2-dimensional linear variety in a 4-dimensional affinespace A. Show that if P and Q are any two points of A\L thenthey can be joined by a series of consecutive segments PT I ,

T I T2, . . . , T -I T,,, T Q without passing through L.

§ 10. Affine Transformations

A. General properties

Let X and X' be linear spaces over A, and let (A, r) and (A',r')be affine spaces attached to X and X' respectively. We are nowinterested in mappings 4' of the set A into the set A' which arecompatible with the affine structures defined on the sets A and A';these mappings are called affine transformatiQns of the affine spaceA into the affine space A'. The requirements that an affine trans-formation 4' has to satisfy can be expressed in terms of vectors of Xand X as follows. Every mapping 4': A - A' of sets gives rise to amapping 4' x 4': A x A --)- A' x A' defined by

4' x D(P,Q) = ((D(P), fi(Q)) for all P, QeA.

We verify without much difficulty, that a unique mapping 0: X -> Xexists such that r'o(4' x t) = ¢o-r, i.e., the diagram

(D x(D

AxA 'A'xA'

X X'

1 14 Il1 AFFINE GEOMETRY

is commutative, if and only if the following condition is satisfied:

(i) for any four points P, Q, R, S of A, 4'( 4 Qif PP=RS.

In this case 0: X -+ X is given by

O(PQ) = 4) (P) 4' (Q for all P, Q EA.

Now we say that 4': A - A' is an affine transformation if thecondition (i) is satisfied and

(ii) the unique mapping 0: X -> X' such that O(PQ) = (D(P)4'(Q)is a linear transformation.

This is a good but rather complicated definition of an affine trans-formation; we shall now formulate an equivalent geometricdefinition, i.e., in terms of points and barycentres.

We observe first that the equations of vectors involved in theconditions (i) and (ii) can be expressed as equations of points andbarycentres. In fact for any six points P, Q, R, S, T, U of an affinespace X the following statements hold:

(a) PQ = RS if and onlyifS=Q+R-P(b) PQ + RS = P! if and only if T = Q + S - R(c) XPQ = Pt if and only if U= (1 - X)P + XQ.

Therefore we can write the condition (i) as

(iii) t(S)=(D(Q)+4'(R)-4'(P) if S=Q+R-P;and, similarly, the condition (ii) as

(iv) (D(T)=4'(Q)+4'(S)-4'(R) if T=Q+S-R;4'(U) = (1 -X)4'(P) +X4'(Q) if U=(1 -X)P+XQ.

Thus we see that (iii), (iv) and hence also (i), (ii) are satisfied if 4'takes barycentres into barycentres with the corresponding weightsunaltered. This leads to the following geometric definition of anaffine transformation:

DEFINITION 10.1. Let A and A' be affine spaces attached to thelinear spaces X and X over A, respectively. A mapping 4': A A'of the set A into the set A' is an affine transformation of the affinespace A into the affine space A' if

4'(X1PI +...+1\k1k) _ Xl '(PI)+...+Xk4'(Pk)


for any k (k > 0) points P1, ... , Pk of A and any k scalarsX1, ... Xk of A such that Xl + + Xk = 1.

Clearly any affine transformation in the sense of Definition 10.1satisfies (i) and (ii); therefore we get a linear transformation 0: X -> X'defined by

30

q(PQ) = 4)( 4 Q for all P, QEA.0 is called the associated linear transformation of the affine trans-formation 4). To show that any mapping fulfilling (i) and (ii) is anaffine transformation, we first prove the following theorem.

THEOREM 10.2. Let A and A' be affine spaces attached to the linearspaces X and X over A respectively. If 0: X -+ X is a linear trans-formation and if P and P' are points of A and A' respectively thenthere exists an affine transformation such that (D(P) = P' and 0 is theassociated linear transformation of 4).

PROOF. We define a mapping 4): A --> A' by

4)(Q) = P' + 0(PQ), for all QEA,

i.e., O(P7) = P'4)( . Clearly 4)(P) = P' and if 4) is an affine trans-formation, then 0 is its associated linear transformation. Therefore itremains to be shown that 4) is an affine transformation. LetQ = X 1P1 + .. + XkPk, where X1 + .. Xk = 1. Then by definitionof 4), we get

P'cb Q = O(PQ) = 0(x1 P i + ... + Xkph)= X1P'tF(P, + ... + akP'4)(Pk .

Therefore 4)(Q) _ X 14)(P3) + . - . + Xk'F(Pk) and hence 4) is an affinetransformation.

Finally let 4): A -+ A' be a mapping fulfilling conditions (i) and(ii). Thus 0: X -* X' is the linear transformation defined by

,0 (PQ) = 4)(P) (D (Qj for all P, QEA.

Comparing 4) with the affine transformation of Theorem 10.2 deter-mined by 0 and the points P and cF(P), we see that they are identical.Therefore 4) is an affine transformation.

1 16 III AFFINE GEOMETRY

B. The category of affine spacesLet 4): A - A' and 4)': A' -+ A" be affine transformations. Then it

is easy to verify that the usual composite 4)'o4): A -- A" is an affinetransformation and, furthermore, its associated linear transformationis /'o( where 0 and 0' are, respectively, the associated linear trans-formations of (D and V.

The category AMA) of all affine spaces over A is then thecategory where

(i) objects are affine spaces attached to linear spaces over A;(ii) morphisms are affine transformations;

(iii) composite of morphisms has the usual meaning; and(iv) iA : A -* A is the identity mapping.

A covariant functor F: ,j4(A) -> of the category of affinespaces over A into the category of linear spaces over A is given bydefining F(A) to be the linear spaceX to which the affine space A isattached and F(4)) to be the associated linear transformation 0 of theaffine transformation 4).

Another covariant functor Aff: f'(A) - 'j(A) is defined by put-ting Aff(X) to be the canonical affine space attached to X (see Ex-ample 9.3) and Aff(O) = 0. Then the composite FoAff of these twofunctors is the identity functor of the category.'(A), whereas thecomposite Affo F is different from the identity functor of the cate-gory,WJ(A)

The isomorphisms of the category,54i(A) are bijective affine trans-formations. To see this, we prove the following theorem.

THEOREM 10.3. Let 4): A -* A' be a bijective affine transformationof the affine space A into the affine space A' and let 0 be theassociated linear transformation of 4). Then the following statementshold:

(a) 0 is an isomorphism,

(b) 4-' is an affine transformation, and(c) 07' is the associated linear transformation of 4)-1.

PROOF. (a). Let A and A' be attached to X and X' respectively. Foreach x'EX' we can find P', Q' r= A' such that P'Q'= x'. If P, Q EA aresuch that 4) (P) = P' and 4)(Q) = Q', then 0(P Q) = 4)(P) 0 (Q = x'.


Therefore 0 is surjective. Let R, SEA be such that 0(RS) = 0. Then4(R) 4 (S) = 0 and hence 4'(R) = 1(S). This means that R = S andhence RS = 0. Therefore 0 is injective. This proves statement (a).Statements (b) and (c) are immediate consequences of (a) and thedefinitions.

A bijective affine transformation is also called an affinity. Itfollows from Theorem 10.3 that the set of all affinities of an affinespace A onto itself constitutes a group with respect to the usualcomposition law of mappings. This group is called the affine group ofthe affine space A.

Finally we remark that the affinities treated above are special casesof a more general type of mappings of affine spaces. The interestedreader is referred to § 12A, B and C.

CHAPTER IV PROJECTIVE GEOMETRY

§ 11. Projective Space

A. Points at infinityLet A and A' be two distinct planes in the ordinary space, and let

0 be a point which is neither on A nor on A'. The central projectionp of A into A' with respect to the centre of projection 0 is defined asfollows: for any Q on A we set p(Q) = Q' if the points Q, Q' and 0are collinear (i.e. on a straight line). If A and A' are parallel planes,then p is an affinity of the 2-dimensional affine space A onto the2-dimensional affine space A'. In particular, p is a bijective mappingtaking lines into lines and intersecting lines into intersecting lines andparallel lines into parallel lines.

Fig. 8

118

§11 PROJECTIVE SPACE 119

Consider now the case where A and A' are not parallel. Here twolines, one on each plane, deserve our special attention. The planewhich passes through 0 and parallel to A' intersects A in a straightline L and the plane which passes through 0 and parallel to A in-tersects A' in a straight line L'. It is clear that the points on L haveno image points on A' under p and the points on L' are not imagepoints of points on A under p. Therefore we have to exclude theseexceptional points,- in order to obtain a well-defined bijectivemapping p: A \L - A'\L'. The situation in relation to lines is equallyunsatisfactory. Take a line G on A and suppose G * L. Then theimage points of G will lie on a line G' of A' different form L' since Gand G' are on one and the same plane passing through 0. Here the setof exceptional points is G n L on L and G' n L' on L'. For G = L, wedo not have a corresponding line G' on A'.

L'

Fig. 9

It is now no more true that intersecting lines correspond to inter-secting lines and parallel lines correspond to parallel lines. To see

120 IV PROJECTIVE GEOMETRY

this, consider two lines G, and G2 of A, neither of which is parallelto the line L. If G, and G2 intersect at a point of L, then thecorresponding G', and G'2 will be parallel; if G, and G2 are parallel,then G', and G'2 will intersect at a point of L'.

In order to have a concise theory without all these awkward ex-ceptions, we can - and this is a crucial step towards projectivegeometry - extend the plane A (and similarly the plane A') by theadjunction of a set of new points called points at infinity. Moreprecisely, we understand by a point at infinity of A the di-rection of a straight line of A and by the projective extension P of Athe set of all points of A together with all points at infinity of A. Forconvenience, we refer to elements of P as POINTS. Furthermore wedefine a LINE as either a subset of P consisting of all points of astraight line of A together with its direction or the subset of allpoints at infinity of A. The LINE consisting of points at infinity iscalled the line at infinity of A. Thus the projective extension of aplane is obtained by adjoining to the plane the line at infinity of theplane.

We have no difficulty in proving that in the projective extensionP of the plane A the following rules are true without exception:

(a) Through any two distinct POINTS there is exactly oneLINE.

(0) Any two distinct. LINES have exactly one POINT incommon.

We observe that (a) holds also if one or both POINTS are at infinityand (R) holds also if one of LINES is at infinity.

These two incidence properties of the projective extension P standout in conciseness and simplicity when they are compared with theircounterparts (A I) (A2) and (B) of the plane A :

(A,) Through any two points there is exactly one line.(A2) Through any point there is just one line of any complete

set of mutually parallel lines.(B) Two distinct lines either have exactly one point in com-

mon or they are parallel in which case they have no pointin common.

We can now also extend the mapping p to a bijective mappingir: P - P' by requiring (i) ir(L) = I' (ii) 7r(I) = L' and that (iii) 7rpreserves intersection of LINES, where I and I' denote the LINES atinfinity of A and A' respectively. Instead of going into a detailedproof of the existence and uniqueness of a we illustrate this state ofaffairs by considering the following particular case.


Let to, t1 l t2 be cartesian coordinates in the ordinary 3-dimensional space. Let A and A' be the planes to = I and t1 = 1respectively and let the centre of projection 0 be the point withcoordinates (0, 0, 0). Then the exceptional lines L and L' of theprojection p are given by the equations

to = I 1to = 0{

t,_ 0 and

1= 1

respectively. Under the projection p, a point Q with coordinates(1, 1, t2) outside of L (i.e. t1 0 0) is taken to the point P(Q) with

coordinates (- , 1,? Suppose the extensions of A to P, A' toP

and p to 7r are carried out according to the description given above.For the point Q above, we get ir(Q) = p(Q). A point S on L with co-ordinates (1, 0, a) will be taken to the direction of the line passingthrough (0, 1, 0) and (1, 1, a). If T is a point at infinity of A (say Tis the direction of the line passing through (1, 0, 0) and (1, 1, a)) then7r(T) is the point on L' with coordinates (0, 1, Q). It is easy to verifythat 7r is a bijection of P onto P taking LINES into LINES and iscompatible with incidence.


The illustrative example above also suggests that POINTS of theprojective extension has ordered triples (subject to certain restric-tion) rather than ordered pairs of real numbers as coordinates. In factin the projective extension P of the plane A(0 = 1) the points of Ahave coordinates of the form (1, t 1l S2) and the points at infinitycan be given coordinates of the form (0, 1, t2) i.e. the coordinates ofthe images under ir. Removing the condition that one of the co-ordinates is 1, we may allow each POINT of the projective extensionPto have more than one set of coordinates by requiring that (t o, t1, t2)and (170, 171 , n2) are coordinates of one and the same POINT ofP if they have the same ratio, i.e. t, = X (i = 0, 1, 2) for some A 0 0.This now suggests a preliminary algebraic definition of the projectiveplane as follows.

Every ordered triple (to, t, , E2) of real numbers, not all 0, re-presents a POINT, and every POINT may be represented in thisfashion. The ordered triples (to, S1, t2) and (1?0, n1, 7)2) representthe same POINT if and only if there exists a number A 0 0 such thatt,=Ar7; fori=0, 1,2.

This definition is short and in accordance with customary termino-logy, but has the grave disadvantage of giving preference to a parti-cular system of coordinates, a defect that will be removed in duecourse.

B. Defmition of projective spaceWe now give a definition of a projective space in the language of

linear algebra. In this definition we shall not make use of affinespaces, nor shall we distinguish between ordinary points and pointsat infinity.

DEFINITION 11.1. Let X be a linear space over A. An equivalencerelation A in the complement X\(0} of (0) in X is defined by thefollowing requirement:

xAy if and only if there exists a scalar A 0 0 such that y = Ax.

The set of equivalence classes, i.e., the quotient set P(X) = (X\(O))/iis called the projective space derived from X.

A point (i.e., an element) of the projective space P(X) is, by de-finition, the set of all non-zero vectors of a 1-dimensional subspaceof the linear space X. Consequently we can identify the projectivespace P(X) with the set of all 1-dimensional subspaces of the linearspace X. If we denote by a: X \ 10 1 -+ P(X) the canonical surjection

§ I I PROJECTIVE SPACE 123

(that maps every vector x of the domain into the equivalence class7r(x) determined by x), then every point of the projective spaceP(X) can be represented by 7r(x) for some non-zero vector of X.Clearly 7r(x) = 7r(y) if and only if y = Ax for some non-zero scalarX of A. If S is a subset of X, we shall write S for S \ {0}; thus

O

7r: X -> P(X) under this notation.

EXAMPLE 11.2. When X = Ant J , we write P (A) instead of P(An+ 1)and call P (A) the arithmetical projective space of dimension n. Thepoints of this projective space can be represented by (n+ 1)-tuples(a,, a, , ... , of scalars of A such that

(i) not all a; = 0, and(ii) (ao, a,, . , and ((30, (31, i ... , (3n) represent the same

point of P (A) if and only if a; _ Xt3; (i = 0, 1, ... , n) forsome non-zero scalar A of A.

We observe that the projective extension of an ordinary planeconsidered in § 11A is a projective space according to Definition11.1.

The dimension of the projective space P(X), to be denoted bydim P(X ), is defined as dim X -I if dim X is finite, and it is the same asdim X if dim X is infinite. Thus a projective space of dimension -1 isempty and a projective space of dimension 0 consists of a singlepoint. As usual we call projective space of dimension 1 (respectively2) projective line (respectively projective plane).

EXAMPLE 11.3. Let A be an affine space attached to a linear space Xover A. Following the idea outlined in § 11A we can define points atinfinity of A to be the 1-dimensional subspaces of X and the pro-jective extension P of A to be the set of all points and points atinfinity of A. However a more useful and precise definition isdesirable. Consider the linear space Y = A x X whose vectors areordered pairs (A, x) whereAeA and AeX. Then the projective spaceP(X)derived from Y has the same dimension as the affine space A. Wewish to show that P(Y) and P are essentially equivalent and we maythen adopt P(Y) rather than P as the projective extension of A. Forthis purpose we construct a bijective mapping 4': P(Y) -+ P.Choose an arbitrary point Q of A as a point of reference. For anyvector y = (A, x) of Y, where A * 0, we put 0(y) to be the point R ofA such that Q-'R = 0 /A)x. For any vector y' = (0, x') where x' 0 0 we


put $(y') to be the 1-dimensional subspace X of X generated by x'.Thus 0(y) is a point of A and $(y') is a point at infinity of A; in eithercase we get an element (or a POINT) of P. Moreover the mapping0: Y\ (0) ->P satisfies the following condition: ¢(y1) =O(y2) if y1 = µy2for some p 0 0. Therefore 0 induces a mapping 4): P(Y) -+ P such that4)(7r(y)) = ¢(y) for all y 0 0. It is easy to verify that 4) is bijective. Hencethe projective space P(Y) can be taken as the projective extension ofthe affine space A. Figure 11 below illustrates the mapping ¢ where Ais identified with the subset { I) x X of Y as follows. Choose anarbitrary point Q of A as a point of reference and identify everypoint R of A with the vector (1, Qk) of (I) x X.

y=(X,x)

x' Q

X'_O(y')

Y, = (0,x (0,0)

//

Fig. 11

R = 0(y)

C. Homogeneous coordinatesWe saw in the preliminary definition of the projective plane in

§ 1IA and also in Example 11.2, that points of a projective spaceadmit representation by coordinates subject to certain conditions.

In general, let X be a linear space of dimension n+1 over A andP(X) the projective space of dimension n derived from X. We denote,as before, by ir: X \ ( 0) - P(X) the canonical surjection. If


(x,, x, , ... , is a base of the linear space X and x = aoxo + a, x1+ ... , + a,, x is a non-zero vector of X, then we say that relative tothe base (xo, x 1 , . .. , the point ir(x) of P(X) admits a represen-tation by homogeneous coordinates (ao, a,, ... , We verifyimmediately the following properties of homogeneous coordinates:

(i) relative to the base (xo, x1i ... , xn ), every point 7r(x) ofP(X) has a representation by homogeneous coordinates(ao, a,, ... , an ), where not every ai is zero.

(ii) every ordered (n+l)-tuple (ao, a,, ... , of scalars of Asuch that not every ai is zero is homogeneous coordinatesrelative to the base (xo, x1, ... , xn) of a point ir(x) of theprojective space P(X); and

(iii) two such (n + 1)-tuples (ao, a,, ... , an) andare homogeneous coordinates relative to the base(xo) x1, ... , xn) of one and the same point of P(X) if andonly if there is a non-zero scalar X of A such that ai = i fori=0, 1, ...,n.

D. Linear variety

Let X be a linear space over A, P(X) the projective space derived0

from X, and Tr: X - P(X) the canonical sudection. For anysubspace Y of the linear space X, we call the direct image 7r [ Y] = L alinear variety of the projective space P(X). Now the subspace Y isitself a linear space over A, therefore we can construct the projective

O

space P(Y) = Y/A' according to Definition 11.1. Since the equiva-lence relation A' is induced by the equivalence relation A, wecan identify the linear variety L with the projective space P(Y). Bymeans of this identification, we define the dimension of the linearvariety L as the dimension of the projective space P(Y). Thus if L is alinear variety and L = P(Y), then dim L = dim Y - 1. Lines, planesand hyperplanes will have the usual obvious meaning. It is customaryto assign the dimension-1 to the empty linear variety.

The one-to-one correspondence between the linear varieties of aprojective space P(X) and the subspaces of X enables us to translateresults on subspaces into results on linear varieties. For example theintersection of any family of linear varieties of P(X) is a linear varietyof P(X). Consequently for any subset S of P(X) there is a smallestlinear variety (in the sense of the inclusion) that includes S as a


subset; we call this linear variety the join of S or the linear varietygenerated by S. It is easy to see that the correspondence between linearvarieties and subspaces is compatible with the formations of inter-section and join. In particular if L, and L2 are linear varieties of aprojective space P(X) and L. = P(Yi) (i = 1, 2), then

L, n L2 = P(Y, n Y2) and L, + L2 = P(Y1 + Y2),where L1 + L2 denotes the join of L, U L2. Therefore it follows fromTheorem 4.5 that for any two finite-dimensional linear varietiesL1 and L2 of P(X)

dim L1 +dimL2 =dim(L1 nL2)+dim(L1 +L2).

Following a similar line of argument, we can define linearindependence of points of the projective space P(X). Let Q0,Q 1 , . . . , Q, be r + 1 point of P(X). Then we say that the points Q0,Q1, ... , Q. are linearly independent if their join Q, + Q, + . + Q,is an r-dimensional linear variety. It follows from the corres-pondence between linear varieties and subspaces, that if Q; = 7r(x;)

O

with x;E X for i = 0, 1, ... , r, then the linear independence of thepoints Q0, Q1, ... , Qr and the linear independence of the vectorsX05.. - , x, are equivalent. The following theorem also is a directconsequence of the definition and the correspondence mentionedabove.

THEOREM 11.4. r + I distinct points Q0, Q I, ... , Q, of a projectivespace P(X) are linearly independent if and only if for each i = 0, 1,.... r the point Q. does not belong to the join of the other r points.

Similarly a point R of P(X) is called a linear combination of thepoints Q0, Q1, ... , Q, of P(X) if R belongs to the join of Q0,Q1, ... Qr. It is easy to see that if R = 7r(y) and Q; = ir(Xi), then thepoint R is a linear combination of the points Q0, Q1, . . . , Q, if andonly if the vector y is a linear combination of the vectors xX11 ... ,x,. Finally, similar to Theorem 9.11, we have the followingcharacterization of linear varieties in terms of joins.

THEOREM 11.5. Let P(X) be a projective space. Then a subset L ofP(X) is a linear variety of P(X) if and only if for any two distinctpoints Q and R of L the line joining Q and R is on L.

E. The theorems of Pappus and Desargues

Let us now prove two well-known classical theorems of geometry.

- §11 PROJECTIVESPACE 127

In the sequel, we use the expression LAB to denote the line generatedby two distinct points A and B.

THEOREM 11.6. (PAPPUS). Let L and L' be two distinct lines of aprojective plane P(X). If Q, R, S are three distinct points on L andQ', R', S' three distinct points on L', then the points of intersectionA of LRS' and LR'S ,B of LQS' and LQ'S and Cof LQR, and LQ'Rare collinear (i.e., they are on one and the same line).

L'

Fig. 12

PROOF. We may assume that the point D of intersection of the linesL and L' is different from any of the six points Q, R, S, Q', R', S',for otherwise the theorem is trivial.

Let x' and z be vectors of X such that ir(x') = Q and 7r(z) = D.Since R is on L = LQD there are scalars a and (3 such that 7r(ax' +(3z)= R. Now a * 0 and 9 0 0 for otherwise R = D or R = Q. Therefore7r(ax' + z) = R. Setting x = fix' we get

Q = ir(x), D = 7r(z) and R = 7r(x + z).

Using a similar argument we get a scalar A * 0, 1 such that

S = 7r(Xx + z).

Analogously we get a vector y of X and a scalar p * 0,1 of A suchthat

S' = 7r(y), R' = 7r(y + z) and Q' = a(py + z).

Since A is a linear combination of S = 7r(Ax + z) and R' = 7r(y + z),


and at the same time a linear combination of S' = tr(y) and R = 1r(x + z),from the equation

(fix+z)+(X- I) (y+z)=(a- 1)y+X(x+z)we get

A = tr(a) where a = Xx + (X - 1)y + Xz.

Analogously we get

B=tr(b) where b=Xx -py,and C = tr(c) where c = (p - 1)x + py +µz.

Between the vectors a, b and c we have X c + b - pa = O; Therefore thepoints A, B and C are linearly dependent and hence collinear.

Three points Q1, Q2, Q3 of a projective space are said to form atriangle Q 1 Q2 Q3 if they are linearly independent. In this case thelines 11 = LQ2Q3, 12 = LQ3Q1 and 13 = LQ1Q2 are called the sidesof the triangle Q1 Q2 Q3. We say two triangles Q1 Q2 Q3 andR1 R2 R3 are centrally perspective (or in perspective) if the linesLQ.R. (i = 1, 2, 3) are concurrent, i.e. if they intersect at a point(Fig 13).

Fig. 13We say two triangles with sides 11, 12, 13 and g 1, 92, g3 are axiallyperspective if each pair of the corresponding sides i and g; intersectat a point and these points of intersection are collinear (Fig 14).


(Q3

i \Q2 \

There are degenerate configurations that deserve some attention, i.e.configurations in which the triangles assume very special position; forexample R, = QI or R2, R3, Q2, Q3 are collinear. We will excludeall these cases and suppose that the configurations in question arenot too special.

THEOREM 11.7. (DESARGUES) If two triangles are centrally perspective,then they are axially perspective.

QI


PROOF. Let triangles Q1 Q2 Q3 and R1 R2 R3 be centrallyperspective and denote by T the point of intersection of the linesLQ1R1(i = 1, 2, 3). Let t, x1, y1 (i = 1, 2, 3) be vectors of X such that

T = 7r(t) , Q1 =7r(x1) and R; = (y1).

The assumption on the triangles implies that

t = Al x1 + I11Y1 = X2x2 + 1-12Y2 = X3x3 + µ3Y3

for some scalars X1 and µ1. Hence

X2 X2 -X3x3 -µ3Y3 -µ2Y2X3x3 -X1x1 =µ1Y1 -µ3Y3A1x1 -'X2x2 =µ2Y2 -µ1Y1

From the first equation above we get a point

C1 = ir(X2x2 - X3x3) = ir(113Y3 - µ2Y2)

which is on both 11 and g1 ; therefore C1 is the point of intersectionof this pair of corresponding sides. Similarly/

C2 = Tr(X3x3 - X1 x1 ), C3 = (A1 x1 - A2x2)

are the points of intersection of the other pairs of correspondingsides. Since

(X2x2 -X3x3)+(X3x3 -X1x1)+(X1x1 -X2x2)=0the points C1 , C2 and C3 are linearly dependent and hence they arecollinear.

REMARK 11.8. Both the theorem of PAPPUS and the theorem ofDESARGUES play an important role in the synthetic treatment ofprojective geometry. With certain obvious modifications, they alsohold in an affine space.

F. Cross ratioIn classical projective geometry, the concept of cross ratio

occupies a central position in the theory of geometrical invariants.Let Q, R, S, T be four distinct points on a line L of a projective

space P(X). We saw in the proof of the theorem of PAPPUS 11.6.that there are vectors x, y of X such that

Q = 7r(x), R = 7r(y) and S = 7r (x + y).

§11 PROJECTIVESPACE 131

For the fourth point T we obtain a unique (relative to the vectors xand y of X) scalar A such that

T= 7r(x+Ay).

Next we want to show that the scalar A is actually independent ofthe choice of the vectors x and y. Let x', y' be vectors of X and A' bea scalar of A such that

Q = 7r(x'), R = 7r(y'), S = 7r(x' + y') and T = rr(x' + A'y'),

Then x' = ax and y' = Qy for some a * 0 and 0 * 0 of A. Theequality 7r(x + y) = 7r(x' + y') implies that a = P. Consequently itfollows from 7r(x + Ay) _ 7r(x' + A'y') that A = V. Therefore we canintroduce the following definition.

DEFINITION 11.9. Let Q, R, S, T be four distinct points of a line in aprojective space P(X) derived from the linear space X over A. Then

the cross ratiol Q

R ] is the scalar A such that

Q = rr(x), R = 7r(y), S = 7r (x + y) and T = rr(x + Ay)

for some vectors x and y of X.

We observe that the cross ratio I QT I

is different from 0 and 1,

for otherwise we would have T = Q or T = S. Conversely for anythree distinct points Q, R, S on a line L and any scalar A * 0, 1, there

is a unique point T on L such that [ Q TJ

= A.

The cross ratio has certain symmetries:

[QR]

Q] - IQ R]-1IL

S

and 1 -IQ

T] [RQS]

In classical projective geometry, we say that the quadruple(Q, R, S, T) of points constitutes a harmonic quadruple if

[QR]=_

S


REMARK 11.10. In definition 11.9, the four points Q, R, S, T aresupposed to be distinct. In classical projective geometry, we allow acertain freedom to the fourth point T. For example, if we allow

l giT= Q or T = S, then the cross rationSCQ

RgiT

respectively.

Jves the value 0 or I

In this way, a one-to-one correspondence between the puncturedline L\ {R } and the set A of all scalars is obtained by associating the

scalar) Q T]to the point TEL\{R } . If furthermore we are agreeable

to the introduction of the "value" oro, then we can also allow

T = R in which case the cross ratiofQ

T] is co. We observe

that in all cases the points Q, R, S should be distinct. By and large wefind it more convenient to insist that all four points Q, R, S, T aredistinct.

The cross ratio defined in 11.9 is invariant under perspectivity:more precisely, if L and L' are two lines on a projective plane and thepoint quadruples Q, R, S, T on L and Q', R', S', T' on L' are inperspective (see Figure 16),

§ 1 1 PROJECTIVE SPACE 133

1then Q T] = T,J . To see this let Q = Tr(x), R = ir(y),

S = ,r(x + y), T =7r(x + ay). Then for the centre of perspectivity Zand the point Q' on LQZ we can choose a vector z such that Z = zr(z)and Q' = 7r(z + x). Moreover we can choose a non-zero scalar A suchthat R' = ir(Az + y). Then for the intersections S' = L' n LSZ andT' = L' n LTZ we obtain S' = 7r [ (z + x) + (Az +y)] *and

T'=7r[(z+x)+a(Az+y)]. Therefore Q' R'1S'

T'I = a.

G Linear construction

Let L be a line on a projective plane. With three distinct points Q,R, S on L we can set up a one-to-one correspondence T: A - L\ (R 1

Tot]

=awhere T(a) = Tot denotes the point T., on L such that [SQ R

We now wish to ask whether the algebraic structure of A makes itselfmanifest in any fairly direct manner in the geometry of the plane.More precisely we wish to know whether given two points Ta and TRon L the points Ta+R and To can be obtained by geometric con-structions in the projective plane that involve only the following twobasic operations:

1. Passing a line through two distinct points.2. Forming the intersection of two distinct lines.

We call such constructions linear constructions.We observe that our linear constructions here correspond to the

geometrical constructions in an affine plane that involve three basicoperations:

(i) Passing a line through two distinct points.

(ii) Forming the intersection of two intersecting lines.(iii) Passing a line through a point parallel to a given line.

As a lead up to the linear construction of Ta+(3 in the projectiveplane P(X), we consider the following situation in an affine plane


where Q, Ta, TT are points of a line 1 such that QTa = ax and QTa

fix. Then the point T on line 1 such that QT = (a + (3)x is foundas follows.

D 'C

Fig. 17

-R

R

We choose an arbitrary point A such that A (t 1. Through this pointA we pass lines g = LA Q, h = LA T. and finally a line 1' parallel to 1.Next we pass through Ta a line g' parallel to g and get a pointB = g' n 1'. Finally we pass through B a line h' parallel to h and getthe required point T = h' n 1.

Imagine now that the affine plane is extended to its projectiveextension, then the pair of extended LINES L, L' of 1, 1' intersects ata point R at infinity. Similarly the extended LINES G, G' of g, g'intersect at C and the extended LINES H and H' of h, h' intersectat D. Furthermore these POINTS R, C and D are collinear since theylie on the line at infinity.

Using this as a model we make a linear construction in the pro-jective plane as follows. First, we choose two auxiliary points A andC in such a way that A does not lie on L and C lies on LA Q but isdifferent from A and Q. Then we obtain, by appropriate linear con-structions, the points of intersections D = LCR n LA Ta, B = LA R n

J' C

LCTR and T = L n LDB.

§It PROJECTIVESPACE

D=7r(x+ay+z)

= 7r(x) Ta = 7r(x+ay) TR=7r(x+(3y) T= 7r(x+(a+j3)y)

Fig. 18

135

We assert that T = Ta+R. Let x, y and z be vectors of X such thatQ = ir(x), R = 7r(y), S = 7T (x + y), Ta = 7r (x + ay), TR = 7r (x + j3y),

A = 7r(z) and C = 7r(y + z). Since D lies on the lines LA Ta and LCR ,

we find

D = 7r(x + ay + z).

Similarly B lies on the lines LA R and LCTO; we find

B = 7r(z - 3y).

Finally T lies on the lines LBD and LQR; we find

T = 7r(.x + (a + (3)y).

Therefore T = Ta + R and we have thus proved the following theorem.

THEOREM 1 1.11. Let Q, R and S be three distinct points on a line Lof a projective plane P(X). Furthermore let Tot and TR be points on L

such that [ Q T ] = a and r Q

TQ

1 = 3. Then the point To, + aL a L

such that r Q R 1 = a + 0 can be found by a linear con-LS Ta+

struction.


Using similar arguments and the following figures we have nodifficulty in proving the following theorem.

THEOREM 11.12. Notations as in 11.11. The point Tag on L suchthat IQ aR 1 = Up can he found by a linear construction.

a

R = a(y) T«Q= 7(x+aQY) S=lr(x+y)Q=ir(x)

Fig. 20

REMARKS 11.13. The basic operations 1 and 2 above correspond totwo familiar incidence axioms in the synthetic approach to projective


geometry. The results of this section is therefore useful inestablishing coordinates in a projective plane defined by incidenceaxioms, since they will allow us to regard the punctured line L\ {R }as a field with R and S playing the roles of zero and identity of thefield.

H. The principle of dualityIn § 11 .D we have made extensive use of the one-to-one correspon-

dence between the subspaces of a linear space X and the linear varietiesof the projective space P(X) derived from X to translate algebraictheorems on X into geometric theorems on P(X ). Assume X to be offinite dimension n + 1. Then the mapping AN: 2' (X) - Y(X*)defined in §7.D gives us a one-to-one correspondence betweenthe subspaces of X and the subspaces of X*. We recall thatfor any subspace Y of X, the annihilator AN(Y) of Y is defined asthe subspace AN(Y) = ( fcX*: f(x) = 0 for all xe Y)) of X* andthat the mapping AN satisfies the following conditions:

(i) dim Y+dimAN(Y) =n+ 1,(ii) AN(Y1) C AN(Y2) if and only if Y, D Y2,

(iii) AN(Y1 + Y2) = AN(Y,) fl AN(Y2),

(iv) AN(Y1 fl Y2) = AN(YI) + AN(Y2).

Therefore the mapping AN provides us with a link between thegeometry of P(X) and the geometry of P(X*) which associates toeach r-dimensional linear variety of P(X) a (n-l-r)-dimensional linearvariety of P(X*) and reverses the inclusion relation. In particular to apoint of P(X) is assigned a hyperplane ofP(X*) and to a hyperplaneof P(X) a point of P(X*). Therefore given a theorem couched interms of linear varieties and inclusion relations, we can obtainedanother theorem, called its dual, by suitably changing dimensionsand reversing inclusions (e.g. interchange the words point and hyper-plane, intersection and join). Since the truth of a theorem implies thetruth of its dual, this so-called principle of duality essentially"doubles" the theorems at our disposal without our having to doextra work. We now enter into detailed study of the principle ofduality.

By a theorem of n-dimensional projective geometry over A wemean a statement which is meaningful and holds in every n-dimen-sional projective spaceP(X) derived from an (n+l )-dimensional linear


space X over A. Suppose that Th is a theorem in n-dimensionalprojective geometry that involves only linear varieties and inclusionrelations between them. Th is then usually formulated in terms ofintersections, joins and dimensions. We define the dual theorem Th*to be the statement obtained from Th by interchanging C and Dthroughout and hence replacing intersection, join and dimension r byjoin, intersection and dimension n-l-r respectively.

Similarly if F is a configuration of n-dimensional geometry over A,then the dual configuration F* is obtained from F by reversing allinclusion signs and replacing dimension r by dimension n-1-r. Forexample, if F is a complete quadrangle in a projective plane, i.e. aplane configuration consisting of four points, no three of which arecollinear and the six lines joining them in pairs, then the dual con-figuration is a complete quadrilateral and consists of four lines, nothree of which are concurrent, and their six points of intersections.

Fig. 21 Complete quadrangle Fig. 22 Complete quadrilateral

THEOREM 11.14. (The principle of duality). If Th is a theorem ofn-dimensional projective geometry over A involving only linearvarieties and inclusion relations among them, then Th * is a theoremof n-dimensional projective geometry.

PROOF. Let P(X) be an n-dimensional projective space derived from an(n+1)-dimensional linear space X over A. Suppose that the premiseof Th* is satisfied in P(X). Then making use of the mapping ANand the correspondence between subspaces and linear varieties wesee that the premise of Th** = Th is satisfied in P(X*). Since Thholds true in every n-dimensional projective space by hypothesis,the conclusion of Th is true in P(X*). Applying AN and the

§ 1 1 PROJECTIVE SPACE 139

correspondence to the conclusions of Th, we see that the conclusionof Th * holds true in P(X **) = P(X). The proof is complete.

Applying the principle of duality we see that the converse ofDESARGUES'theorem in the plane is true. The dual of PAPPUs' theoremis as follows.

Let Q and Q' be two distinct points of a projective plane. If g, h, lare three distinct lines concurrent at Q and g', h', l' are three distinctlines concurrent at Q', then the lines LRR., Lss, and LTT, are con-current where

R = gnh' S = g nI' T = hn1'R'=g'nh S'=g'n I T'= h'n I.

T

1. Exercises

1. Prove Theorem 1 1.5.

2. Let Q0, Q 1 ... , Q, be r + I linearly independent points of ann-dimensional projective space. Denote by Li the linear varietyspanned by the points Q; with i : j Show that Lo n L1 n ... n L,-= 0. Does this hold for the set of k-dimensional linearvarieties (k being fixed and 0 < k < r-2) each of which isgenerated by k + 1 points among the points 0;?


3. Let Q0, Qr , ... , Q, be r + I linearly independent points of anr-dimensional linear variety L. Show that for every k-dimen-sional linear subvariety L' of L (i.e. L' is a linear variety ofdimension k and L' C L) there are r-k points among the Qiwhich span a linear variety L" skew to L'.

4. If R is a point of a 3-dimensional projective space and L, M areskew lines not containing R, prove that there exists a uniqueline through R intersecting L and M.

5. Let L, M, N be mutually skew lines of a 4-dimensionalprojective space not lying in a hyperplane. Prove that L inter-sects M + N in a point. Show also that there is a unique lineintersecting L, M and N.

6. Draw a complete quandrangle and join the pairs of diagonalpoints. Which are the harmonic point quadruples in yourdiagram.

7. Under the hypothesis of Theorem 11.6 (Pappus), show that theline through A, B and C passes through the point of intersectionof L and L' if and only if QRS and Q'R'S' are in perspective.

8. Let Q 1 , Q2, Q3, Q4 be four distinct collinear points of a

1Q0(I) Qa(2)projective space. Express in terms of

Qa(3) Qa(3 )

Qr Q2

Q3 Q4

for all permutations a.

9. Let Q, R, S be three distinct collinear points on a projective

plane. Find the point T such that

construction.

=-1 by a linear

10. Let Q, R, S and Ta be collinear points on a projective plane

such that Q R = a # 0. Find T_a and T, lot by linearIs Tot

§ 12 MAPPINGS OF PROJECTIVE SPACES 141

constructions such that Q R = -a and Q R =1/a .IS T- -a IS T,1a

11. Let Q1 Q2 Q3 be a triangle on a projective plane. Let R1, R2,R 3 be points on the sides LQ 2 Q 3, LQ 1 Q 3, LQ 1 Q 2 respectivelysuch that LR IQ,, LR 2 Q 2 and LR 3 Q 3 intersect at a point.Show that the fourth harmonic points S1, S2, S3 for which

Q1

R3

Q2 Q2 Q3 Q3 Q1=_ -1

S3 R1 S1 R2 S2

are collinear. Dualize this result.

12. The configuration of all points on a line of a projective space iscalled a range of points. Describe the 2-dimensional dual and the3-dimensional dual of a range of points (called a pencil of linesand a pencil of planes respectively). Find the 3-dimensionaldual of the configuration consisting of all points on a plane.

13. Let Q and R be two points in the n-dimensional realarithmetical projective space P (R). We say that n + 1 real-valuedcontinuous functions fo (t), . . . , fn (t) on the unit interval[0, 11 define a path f in Pn (R) from Q to R if (i)(fo(0), fl(0), . . . , fn(0)) and Uo(1), f1(1), ... , fn(1)) arehomogeneous coordinates of Q and R respectively and (ii)(fo(t), f1(t), ... , fn(t)) is different from (0, . . . , 0) for allvalues of t. In this case points of Pn(R) with homogeneouscoordinates (fo(t), f1(t), ... , fn(t)) are called points of thepath f. Show that if H is a hyperplane in Pn (R), then for anytwo points S on T not on H, there exists a path f from S to Twhich does not pass through H. (In other words P, ,(R) is notseparated by any hyperplane).

§ 12 Mappings of Projective Spaces

Throughout the present § 12 we assume all projective spaces thatenter into our discussions to be of finite dimension not less than two.Let P(X) and P(Y) be two projective spaces over A. Here we are


interested in a class of bijective mappings of P(X) onto P(Y) whichfulfil certain geometric invariance conditions.

A. Projective isomorphism

DEFINITION 12.1. A mapping c of a projective space P(X) into aprojective space P(Y) is called a projective isomorphism if for everylinear variety L of P(X), (i) 4'[L] is a linear variety of P(Y) and (ii)dim L =dim 4'[L].

It is clear that a projective isomorphism of projective spaces is abijective mapping which preserves joins, intersections and linearindependence. In particular both it and its inverse take any threedistinct collinear points into three distinct collinear points. Thefollowing theorem shows that this property alone characterises pro-jective isomorphism.

THEOREM 12.2 A bijective mapping 4': P(X) -> P(Y) is a projectiveisomorphism if and only if both it and its inverse take any threedistinct collinear points into three distinct collinear points.

In the sequel, we shall denote by Q' the image 4'(Q) of any pointQ of P(X) and similarly by M' the direct image 4'[M] of any subsetM of P(X). We first prove

LEMMA 12.3. If L is a line, then L' is a line.

PROOF. Let Q and R be two distinct points on the line L. Then thepoints Q and R generate L, i.e., L = LQR . Since 4' is bijective, thepoints Q' and R' are distinct. We show that L' = LQ-R'. It followsfrom the definition that L' C LQ.R.. Since dim P(Y) 3 2, for eachpoint T of L .R . we can find two points A' and B' of P(Y) such thatT = LQ .R . A LAB.. Then for the point S = LQR n LAB we get4'(S) = T, proving L' = LQ .R ..

Therefore it follows from 11.4 and 11.5 that 4) takes linearvarieties of P(X) into linear varieties of P(Y) and that 4) preserveslinear independence. Hence 4' preserves dimensions. This completesthe proof of 12.2.

We recall that a linear construction in a projective plane involves onlytwo types of basic constructions, namely (i) forming the join of twopoints and (ii) forming the intersection of two lines. Let us nowthink of a linear construction as being carried out within a 2-dimen-sional linear variety L of P(X). We then apply a projective isomor-phism 4' of P(X) to all the points and lines of the construction. It


follows from the results above, that the image of this figure as awhole will lie in the plane L'. Furthermore, the image of a linejoining two points will be the line joining the images of the twopoints in question and the image of the point of intersection of twolines will be the point of intersection of the images of the two linesin question. Therefore the figure of the linear construction mapsunder (P into an exactly analogous figure. In this sense, we say that aprojective isomorphism carries a linear construction into an exactlyanalogous linear construction.

B. Projectivities

As a special type of projective isomorphism, we consider mappings ofa projective space P(X) into a projective space P(Y) that arise fromisomorphisms of the linear space X onto the linear space Y. Let>y: X - Y be an isomorphism and let 7r denote the canonical projections0 0

X - P(X) and Y - P(Y). Then i' has the property, that for any twonon-zero vectors x, y of X, 7r(x) = 7r(y) if and only if 7r(ox) = 7r(qiy).Therefore >Ji induces a bijective mapping 'Y: P(X) -+ P(Y) such that

0*(7rx) = 7r(ox) for all xEX.

In general we call a mapping : P(X) --> P(Y) a projectivity if thereexists an isomorphism is X -> Y such that 'Y(7rx) = 7r((ix) for all

0xEX.It is clear that a projectivity is a bijective mapping that pre-serves linear dependence and independence; therefore every pro-jectivity is a projective isomorphism. We shall see later that pro-jectivity and projective isomorphism are not equivalent concepts, i.e.there are projective isomorphisms which are not projectivities. Forthe moment we show that different isomorphisms of linear spacescan induce one and the same projectivity.

THEOREM 12.4. Two isomorphisms 0, iji: X -+ Y of linear spacesinduce the same projectivity c = 4': P(X) -+ P(Y) if and only if thereexists a non-zero scalar A such that ly = A.

0PROOF. If = A0, then 7r(>yx) = 7r(Acbx) = 7r(ox) for everyxEX;there-fore'=TY.

Conversely, suppose (D then for every xeX we get a non-zeroscalar AX, possibly depending on x, such that X .,,Ox ='x (since

04(7rx) _'(7rx)). We now show that Ax = Ay for all x,yEX,whencethe theorem follows.


Assume first that x and y are linearly independent. Then all threevectors x, y and x + y are different from the zero vector. It followsfrom ix + 41y ='(x + y) that ax¢x + Oy = Xx+y¢(x + y). But onthe other hand ¢x + 4y = O(x + y) an the vectors Ox and Oy arelinearly independent; therefore Xx = Xy = X

x+ . Assume next that

the non-zero vectors x and y are linearly dependent. The assumptionthat the linear space X has a dimension not less than two implies thatthere exists a non-zero vector z such that the two pairs of vectors xand z, y and z are both linearly independent. It follows, from whathas already been shown in the linearly independent case, that X = Xz

Xy. The proof is now complete.

We now turn to the problem of constructing projectivities undercertain geometric specifications.

Let P(X) and P(Y) be both n-dimensional projective spaces. Itfollows from the definition that given n + I linearly independentpoints CO, C 1 , . . . , C of P(X) and n + 1 linearly independentpoints D0, D1 i . . . , D. of P(Y), there exists a projectivity (D suchthat (D(Ci) = D. for i = 0, . . . , n. In fact if x; and y, are vectors of Xand Y respectively such that 7rxi = C; and iryi = Di for i = 0, 1, . . . , n,then (xo, x, , ... , and (yo, y, , ... , are bases of X and Yrespectively and the unique isomorphism 0: X -+ Y, such that¢xi = y, (i = 0, ... , n), will induce a projectivity (D such that Di(I = 0, ... , n). But the isomorphism 0 and hence also the projectivity4) depends on the choice of the bases (x0,...... ,x,,) and (yo, ,yn)In other words it is possible that with another choice of bases wemay obtain another projectivity ' satisfying the same requirement that'I'(C1) =Di (i = 0, . . . , n). Take, for instance, the base (2y yl, . . . , yn )of Y. We have ir(2yo) = Do and 7r(yi) =Di (i = 1, ... , n) and conse-quently for the isomorphism i, such that fix, = 2y, and >lix, = yi(i = 1, ... , n), we obtain a projectivity ': P(X) -> P(Y) such that'(Ci) = Di (i = 0, 1, . . ., n). However for the point U = 7r(xo + + x,,)we have 4)(U) 0 *(U) since rr(yo + +yn) 0 ir(2yo + y, + + y ).This discussion leads us to the following definitions.

Let P(X) be an n-dimensional projective space. A simplex of P(X)is a family (Co, C1, ..., Cn) of n + I linearly independent points ofP(X). If (C0, C 1 , . . . , Cn) is a simplex of P(X) and U is a point ofP(X) which does not belong to the join of any n points among thepoints C,, then we say that the family (Co, C1, ... , Cn 10 forms aframe of reference for P(X) with unit point U and simplex(C0,C1, ...,C,).

§12 MAPPINGS OF PROJECTIVE SPACES 145

THEOREM 12.5 Let P(X) and P(Y) be projective spaces. If(Co, C1, ... , C I U) and (Do, D1, .. . , D I V) are frames of referencef o r P(X) and P(Y) respectively, then there exists a unique projec-tivity 4 ) o f P(X) onto P(Y) such that 4'(C,) = D, f o r i = 0, 1, ... , nand 4'(U)=V.

We first prove

LEMMA 12.6. If (Ca , C1, ... , C I U) is a frame of reference for aprojective space P(X), then, up to a common scalar factor, there is aunique choice of vectors xo, x1 , ... , x of X such that 7rxi = C; fori=0, 1, . . . , nand7r(xo+x1+ ... +xn)=U.

PROOF. Let ao, a 1 , .I,. , an and u be vectors of X such that 7rai = C;for i = 0, 1, ... , n and 7ru = U. Then u = Aoao + A 1 a 1 + . + An anSince U is not a linear combination of any n vectors among ai, all Ai(i = 0, 1, ... n) must be different from zero. Therefore xi = Aiai(i = 0, 1, ... , n) satisfy the requirement of the lemma. Supposebo, b1, ... , b are vectors of X such that 7rb, = C; for i = 0,1, ... , n and 7r(bo + b1 + ... + bn) = U. Then there are non-zeroscalars A and µi such that A(bo + b l +.... +b,,)= xo + + xand p.b, = xi for i = 0, 1, ... , n. The linear independence of thevectors xo , x11 ... , xn implies that µo = ... = yn ; thereforex0, x1, ... , x, are uniquely determined up to a common scalarfactor.

PROOF OF THEOREM 11.5. Let xo , x 1, ... , xn and yo , yl, ... , yn bevectors of X and Y respectively such that 7rx, = Ci, iryi = Di and7r(xo + xl + ... + xn) = U, 7r(yo + yl + . + yn) = V. Then theunique isomorphism ¢: X -> Y such that Oxi = y, induces a projectivity4' such that 4'(Ci) = Di and 4'(U) = V. Suppose 4;: X -* Y is anisomorphism which induces a projectivity 4 satisfying the require-ment of the theorem. Then 7r( Jx,) = D; and 7r(>'xo + 41x l + ... + 1 f txn )= V. By Lemma 11.5 there exists a non-zero scalar A such that i,xi= Ayi for i = 0, ... , n; therefore = A0. Hence 4' = 'I' by 12.4.

REMARKS. The frames of reference for a projective space aresimilar to the bases of a linear space in more than one way. Forinstance, homogeneous coordinates enjoying properties similar to (i),(ii) and (iii) of § 11C can be asrigned to points of P(X) relativeto any frame of reference (Co , 01, . . . , C, J U) as follows. We


choose vectors x0, x, , ... , X. of X such that 9rx1 = C; for i = 0,1, ... , n and

0ir(xo + x, + .. + xn) = U. If x = aoxo + a, x,

+ . . . + an x, EX, then we say that relative to (C0, CI , ... , C I U )the point lrx admits a representation by homogeneous coordinates(ao , a, , . . . , an)- In particular (1, 0, . . . , 0),(0, 1, 0, ... , 0), ... ,(0, ... , 0, 1) and (1, 1, ... , 1) are homogeneous coordinates ofthe points Co , C1, . . . , C and U respectively. The basic propertiessimilar to (i), (ii) and (iii) of § 11C are now easily formulated andverified. Moreover it follows from lemma 12.6 that the homogeneouscoordinates do not depend on the choice of the base (x3, x, , .... , xn )of X.

Theorem 12.5 plays an important role in the synthetic approach tothe classical projective geometry and is usually called the fundamentaltheorem of projective geometry. (See also remarks 12.12)

C. Semi-linear transformations

We shall turn to the problem of determining if there exist projectiveisoniorphisms which are not projectivities. For this purpose we recallthat an isomorphism 0: X - Y is a bijective mapping that fulfills thefollowing conditions:

(i) O(Xx) = X (x) for all xEX and XEA; and(ii) ¢(x + y) = O(x) + ¢(y) for all x,yeX.

In the discussion above, we derive from property (i) that ¢ induces awell-defined mapping (P: P(X) --)- P(Y) which furthermore is bijectivesince 0 is bijective, whereas property (ii) further ensures that theinduced mapping c is a projective isomorphism. However if thebijective mapping 0: X - X satisfies instead of (i) the followingweaker condition:

(iii) to each non-zero scalar X there exists an non-zero scalar a(X)such that 0(kx) = a(X)O(x) for all xr=-X,

then ¢ will also induce a well-defined mapping (b: P(X) -- P(Y). Thisleads us to introduce the following definitions.

DEFINITION 12.7. A bijective mapping a: A - A is called an automor-phism of A (A = R or A = C) if a(X + y) = a(X) + a(µ) and a(X) _u(X)a(p) for X,,u of A.


We have no difficulty in verifying the following properties of anautomorphism a of A:

(a) a(0) = 0;(b) a(1)1;(c) u(A - p) = a(A) - a(p); and(d) a( '. = a(X where p * 0.

DEFINITION 12.8. Let X and Y be linear spaces over A and a anautomorphism of A. A mapping 0: X - Y is a semi-linear trans-formation relative to a if the following conditions are satisfied:

(i) O(Ax) = a(A)O(x) for all X of A and x of X;(ii) O(x + y) = ¢(x) + 0(y) for all x and y of X.Since the identity mapping of A is an automorphism of A, a linear

transformation is always a semi-linear transformation relative to thisautomorphism- The next theorem shows that every semi-linear trans-formation of a real linear space is a linear transformation.

THEOREM 12.9. For the field R of all real numbers, the identitymapping of R is the only automorphism of R.

PROOF. Let a be an automorphism of R. Then it follows fromproperty (b) above that a(n) = n for all non-negative integers n. From(d) it follows that a(r) = r for all rational numbers r. Let X be anarbitrary positive real number. Then there exists a real number p(e.g., p = %/A) such that X = p2. Therefore a(X) = a(µ2) = a(p)2 andhence a(A) > 0. Similarly if A < 0, then a(A) < 0. Assume that thereexists a real number a such that a *a(a). Without loss of generalityof the following argument, we may assume that a(a) < a. Let r be arational number such that a (a) < r < a. Then it follows from a (r) = rthat a(a - r) = a(a) - r < 0. But this is impossible since a - r >.0.Therefore the assumption that a(a) : a leads to a contradiction andthe theorem is proved.

We observe that there exist automorphisms of the field C of allcomplex numbers distinct from the identity mapping of C; forexample the automorphism a: C -> C such that a(a) = a where a isthe complex conjugate of a. In fact there are an infinite number ofautomorphisms of C.

Let 0: X -> Y be a semi-linear isomorphism of linear spaces relative0

to an automorphism a of A. Putting 4 (7rx) =7r(¢x) for every xeX, we


obtain a well-defined bijective mapping 4': P(X) - P(Y) of projectivespaces. Suppose Q, R and S are three distinct collinear points in P(X)with Q =Tr(x), R =Tr(y) and S=Tr(z). Then z =Ax + My and hence Oz= (aA)Ox + (aµ)0y, i.e. 4'(Q), (D(R) and 4'(S) are collinear points ofP(Y). Therefore every semi-linear isomorphism X -> Y induces a pro-jective isomorphism P(X) -+ P(Y). We now prove the converse of thisstatement. Thus we shall have the following algebraic characteri-zation of the geometric concept of projective isomorphism. Theprojective isomorphisms are precisely those mappings which are in-duced by semi-linear isomorphisms.

THEOREM 12.10. Let X and Y be linear spaces over A both ofdimension not less than 3. If 4' is a projective isomorphism of P(X)onto P(Y), then there exists an automorphism a of A and a semi-linear isomorphism 0: X -> Y relative to a which induces 4'.PROOF. Let (x0 , x1, ... , xn) be a base of the linear space X. Thetheorem is proved if we can find an automorphism a of A and a base(Yo,Yi, ,Yn) of Ysuch that ifx =Aoxo +A1x1 + ... +Anxn isa non-zero vector of X, then 4'(7r(x)) = ir(y) where y = a(A0)yo +a(A1)y1 + .. + a(An)yn . In the sequel we shall denote for eachpoint Q of P(X) the image 4'(Q) of Q under 4) by Q'. Thus for thebase point Q1 = Tr(x;) i = 0, 1, ... , n, the images are denoted by Q',i=0, 1, ...,n.

First we choose an arbitrary non-zero vector yo of Y such that Q'0= Tr(yo). Our next step is to find a vector y1 of Y such that

Q'1 = Tr(Y 1) and U'01 = Tr(Yo + y1) where Uo l = Tr(xo + x 1).

The vector y1 obviously exists and it is uniquely determined by thesetwo properties. Moreover the vectors yo and yl are linearly in-dependent since the points Q'0 and Q'1 are linearly independent.

Consider now the two ordered triples (Q0, Q1, U01) and(Qo, Q'1, U'01) of collinear points. For each scalar a of A, wedenote by Ta the point on the line through Q0, Q1 such that

a =Qo Q1

Uo1 Ta

A mapping a: A -+ A is then defined by putting

Q'o Q11a(a) _

UO1T'a


for all By the one-to-one correspondence between cross ratiosand points on a line, the mapping a is bijective. On the other hand,for any scalars a and (3 of A, the points Ta+p and Tag can be ob-tained by linear constructions which map under the projectiveisomorphism 1 into exactly analogous linear constructions. There-fore it follows from

a+(3 =

and a(3 =

a(a+(3) =

Qo Q1 Qo Q1

Uo1 Ta Uo1 TR

E-i

Uo 1 Ta+(3

Qo Q1 Qo Q1 Qo QI

Uo 1 Ta Uo1 T(3 Q,1 Ta (3

Q'o

Uo1

Q 'I Q10 Q'1 Q10 Q'1_ + = a(a) + a((3)

Ta+/3 U01 T'a Uo1 T(3

Q o Q'1 [Q 'o Q' 1 Q'o 11and a(a(3) = a(a)a((3)

Wo 1 Tag Uo 1 T 'a U'o1 T'(3

Hence we have proved that a is an automorphism of A. Up to thispoint of the proof, we have found an automorphism a of A and twolinearly independent vectors yo and y1 such that

Q'o = ir(Yo), Q'1 = 'r(Y1)

and 4(7r(Aoxo + A1x1)) = 7r(a(Ao)Yo + a(AI)Y1)

Let us now consider the next base point Q2 = 7r(x2) and its imageQ'2. A unique vector y2 of Y can be found such that

Q2=7r(y2)and U',12=7r(Yo +Y1 +y2) where Uo12 =7r(xo+x1 +x2).

Then vectors yo, y1i Y2 are then linearly independent. Furthermore,the point U1 2 = 7r(x1 +x2) can be obtained from the points Qo, Q1,Q2 and Uo 12 by a linear construction as indicated by the followingfigure.


U. 12 = 7r(Xo +X1 +x2)

= 7r(xo) Q1 = 7r(x1)

Fig. 24

Therefore U12' _ 7r(Y1 + Y2) Similarly, for each we get thepoint 7r(xo + axe) by a linear construction from the points Qo, Q1,Q2, U1 2 and 7r(xo - ax1) as follows:

7r(xo + aX2)

7r(xo- ax1)

Fig. 25

From this it follows that c(7r(xo +ax2)) =7r(yo + 0042) and hence'(7r(Xoxo +X2x2)) = 7r(a(Xo)yo +0r(X2)y2). Similarly we prove that(D(7r(X1X1 + X2x2)) = 7r(o(X1)Y1 + a(X2)Y2). Finally the point7r(xo + ax1 + (3x2) is obtained by the linear construction:

7r(xo + Qx2)

Q1 7r(xo +ax1)

Fig. 26


Therefore 4)(a(xo+ax1 +13x2))=7r(ya +v(a)y1 + "(0)YO and hence4)(ir(Xoxo +X1x1 +X2x2)) =,r(a(X0)Yo + a(X1)Y1 +o(X2)Y2) for allA0,X1,X2 of A.

Finally, by an induction on the dimension n, we obtain the re-quired vectors yo, y1 , .. Y1 together with the automorphism a ofA.

REMARKS 12.11. The following discussion shows the necessity ofthe condition dim X > 3 in Theorem 12.10. The case where dim X is0 or 1 is, of course, uninteresting. If dim X = 2, then P(X) is aprojective line. Consequently every bijective mapping of P(X) ontoP(Y) is a projective isomorphism, by Theorem 12.2. On the otherhand, if 0: X -> Y is a semi-linear isomorphism, then the inducedprojective isomorphism 4) has the property that it takes harmonicquadruples of points into harmonic quadruples of points, i.e.,

4)(Q) D(R) Q R_ -1 if = -1.

4)(S) V(T) S T

This is clearly not a property of every bijective mapping of P(X) ontoitself. Therefore not every projective isomorphism of a projective lineis induced by a semi-linear isomorphism.

REMARKS 12.12. We note that Definition 12.1 is given entirely ingeometric terms; therefore we can speak of projective isomorphismbetween projective spaces defined over different fields and we canuse the expression "projectively equivalent projective spaces" to meanthat there exists a projective isomorphism between them. ThenTheorem 12.10 gives rise to a Projective Structure Theorem: A pro-jective space P(X1) derived from a linear space X1 over Al and aprojective space P(X2) derived from a linear space X2 over A2 areprojectively equivalent if and only if (i) dim Xl = dim X2 and (ii)dim X1 = dim X2 = 1; or dim Xl = dim X2 = 2 and Al and A2 havethe same number of elements; or dim (Xl) = dim (X2) a 3 and Aland A2 are isomorphic fields. We leave the proof of this theorem tothe interested reader.

Theorem 12.10 stipulates that the geometric structure of a pro-jective space is completely determined by the algebraic structure ofits underlying linear space. Thus it is possible to construct from anabstractly given projective geometry (for example by a system of


suitable axioms) a linear space which gives rise to a projectivegeometry equivalent to the abstractly given one.

It follows from 12.10 that all projective isomorphisms of realprojective spaces are projectivities since there is no automorphism ofthe real field other than the identity mapping. We now show that forcomplex projective spaces the concept of projective isomorphism doesnot coincide with the concept of projectivity. For this we need thefollowing lemma which is a straightforward generalization of 12.4.

LEMMA 12.13. Let X and Y be linear spaces over A and let 0 and 'be semi-linear isomorphisms of X onto Y relative to automorphismsa and T of A respectively. Then 0 and induce the same projectiveisomorphism if and only if there exists a non-zero scalar A such that>y = A0. In this case a = T.

PROOF. The proof of the first part of the lemma is entirely similar tothe proof of 12.4. For the second part, suppose = A¢ for a non-

ezero scalar A. For any a e A and xEX, we then have six = AOx andT(a)Vix = 4i(ax) =A4(ax) =Aa(a)Ox. Therefore T(a)Acbx=Aa(a)Ox.Since both A and ¢x are non-zero, we must have r (a) = a(a). There-fore a =,r.

The existence of projective isomorphisms of complex pro-jective spaces other than projectivities now follows from 12.13,since a semi-linear isomorphism which induces a projectivity mustnecessarily be an isomorphism. In particular if (xo, ... , xn) isa base of X, then the semi-linear automorphism 0 defined bycb(aoxo + ... + anxn) = aoxo + .. + anxn cannot induce a projectivity of P(X) onto P(X).

D. The projective groupGiven a projective space P(X), the projective automorphisms of

P(X) constitute a group n with respect to composition. This group ncan be represented as follows. Let G be the set of all semi-linearautomorphisms of the linear space X over A. If 0 and >y are semi-linear automorphisms of X relative to the aiffomorphisms a and r ofA respectively. Then it is easy to verify that 0*0 is a semi-linearautomorphism of X relative to the automarphism aor of A and that(G, o) is a group. We define an equivalence relation R in the set G byputting OR Vi if and only if 0 = Any for a non-zero scalar A. Then by


12.13 two elements of G induce one and the same element of II ifand only if they are R-related, and by 12.7 we can identify the set ITwith the quotient set G/R. Moreover the group structure in n is thenthe same as the quotient group structure of G/R defined by[0] c [0] = [0 u Jil where the bracket [ ] denotes the equivalenceclass formation.

A projectivity of P(X) onto itself is called a collineation of P(X).The subset IIo of all collineations of P(X) is easily seen to be asubgroup of II and is called the projective group of P(X). A repre-sentation of 110 is obtained in a similar way.

E. Exercises

1. Let 4): P(X) - P(Y) be a projectivity and let Q,R,S,T be four

collinear points such thatL

Q

T J

is defined. Prove thatS

4'(Q) D(R) 1f is defined and

Q R 4)(Q) 4)(R)

4'(S) 4)(fl I S T 4)(S) 4)(R)

2. A collineation of a projective plane is called a perspectivity if ithas a line of fixed points. Let 4) be a perspectivity of a pro-jective plane P different from the identity map. Prove that thereexists a unique point E such that for every point Q of P thethree points E, Q and 4)(Q) are collinear. E is called the centre ofperspectivity and the line L of fixed points is called the axis ofperspectivity.

3. Let L be a line on a projective plane P and let Q, Q' be twodistinct points not on L. If E is a point on LQQ distinct from Qand Q' prove that there exists a unique perspectivity 4) withcentre E and axis L such that 4)(Q) = Q'.

4. Use the result of the previous exe cise to deduce DESARGUES'theorem.

5. Let 4): P(X) ->P(X) be a projective automorphism. Prove that 4'is a collineation if it has a line of fixed points, i.e. there exists aline L in P(X) such that (D (R) = R for all ReL. Use an exampleto show that this is not a necessary condition for 4) to be a col-lineation.


6. A collineation is called a central perspectivity if it has a hyper-plane of fixed points. Show that a projective automorphism is acollineation if and only if it is a product of perspectivities.

7. Let X and Y be linear spaces over A and let 4): P(X) ->P(Y) be aprojective isomorphism. If Z is a subset or a point of P(X),denote by Z' the direct image of Z under (D.

(a) Let L be a line onP(X) and Q, R, S three distinct points onL. Show that a mapping rL (Q, R, S) : A -+ A is defined by

TL(Q,R,S) 1 Q R = Q'R'

S T [SI T'

Show also that TL (Q, R, S) is an automorphism of A.

(b) Show that if X, Y, Z are any three distinct points on L,then TL (x,Y, z) = TL(Q, R, S) Therefore this automorphismcan be denoted by TL.

(c) Show that if L 1 and L 2 are two lines in P(X), then TL 1 = TL 2.

(d) Hence show that for every projective isomorphism4): P(X) -+ P(Y) there is a unique automorphism T of Asuch that

Q R ] = [ Q' R'r

S T S' T'

(e) Show that if p is a semi-linear isomorphism relative to anautomorphism a and if p induces 4), then a = r.

8. Let a be an automorphism of A. Let (CO, ... , C. I U) and(Do, ... , D I V) be frames of reference of the projective spacesP(X) and P(Y) respectively. Show that there is a unique pro-jective isomorphism 4):P(X) ->P(Y) such that(i) 4)(Cj) =D; for i = 0, ... , n,

(ii) It (U) = Y,

4)(Q) 4)(R)() a Q R =S T 4)(S) 4)(T)

CHAPTER V MATRICES

In Examples 5.8, we gave some effective methods of constructinglinear transformations. Among others, we saw that for any finite-dimensional linear spaces X and Y over A with bases (x 1, ... , xm )and (yI , ... , yn) respectively a unique linear transformation¢: X -> Y is determined by a family (aij)i=1, m; j=1, ..., n ofscalars in such a way that

0(xi)=a,1Y1 + ... + ainYn for i= 1, ..., m.

Conversely, let >y: X -+ Y be an arbitrary linear transformation. Inwriting each i (x i) as a linear combination of the base vectors yj ofY, i.e.,

(x1)=A-1Y1 + ... +(3inyn for i = 1, ..., m,

we obtain a family ((3,j),=1, ... ,m; j= 1, .... n of scalars. Thusrelative to the bases (x 1, . . . , xm) and (y, . . . , yn) of X and Yrespectively, each linear transformation 0: X -+ Y is uniquely charac-terized by a family ( a 1 1 ) 1 = 1 , . . . ,m/1,... ,,, of mn scalars of A.

This therefore suggests the notion of a matrix as a doubly indexedfamily of scalars. Matrices are one of the most important tools in thestudy of linear transformations on finite-dimensional linear spaces.However, we need not overestimate their importance in the theory oflinear algebra since the matrices play for the linear transformationsonly a role that is analogous to that played by the coordinates forthe vectors.

§ 13. General Properties of Matrices

A. NotationsDEFINITION 13.1. Let p and q be two arbitrary positive integers.A real (p, q)-matrix is a doubly indexed family M = (I1ij),=1, ... , p;

1, q of real numbers.

A complex (p, q)-matrix is similarly defined. Here again we use

155

156 V MATRICES

the term "M is a matrix over A" to mean that M is a real matrix or acomplex matrix according to whether A = R or A = C.

It is customary to write a (p,q) -matrix M = (pil)i° 1 , . ,p;I 1 , . . . , q in the form of a rectangular array thus:

1111 1112 .....µ1q

1121 1122 .....µ2qM=

I pp 1 AP 2 .....11p q J .

From the definition, it follows that two matrices

M=

and N =

11111 1112 .....111q

1121 1122 .....112q

11p1 11p2 ..... 11pq I

vI1 v12 ..... vlsI v2 1 v2 2 ..... v2 s................

I yr 1 yr 2 ..... vrs I

are equal if and only if (i) p = r and q = s and (ii)11il = v;l for everypair of indices i and j.

We introduce now the following notations and abbreviations tofacilitate references and formulations.

(a) The ordered pair (p, q) is called the size of the matric M. Ifthis is clear from the context, then we shall also writeM = p**. For practical purposes, we do not distinguish the(1, 1)-matrix (X) from the scalar X.

(b) The scalar 11ii is called a term of the matrix M and theindices i and j are respectively called the row index andcolumn index of the term pig.

(c) For every row index i = 1, . . . , q, the family pi*

§ 13 GENERAL PROPERTIES OF MATRICES 157

= (µ;j) j=1, ... , q is called the i-th row of M. Clearlypt*is a (1, q)-matrix over A. On the other hand pt* is alsoa vector of the arithmetical linear space Aq ; therefore wemay also refer to pr* as the i-th row vector of M.

(d) For every column index j = 1, . . . , q, the familyp*j = (pjj)1=1, ... , p is called the j-th column of M Clearlyp*j is a (p, 1)-matrix over A.On the other hand p*; is alsoa vector of the arithmetical linear space AP ; therefore wemay also refer to p*j as the j-th column vector of M.

(e) The term pi/ is therefore said to belong to the i-th row andthe j-th colomn of M.

(f) The diagonal ofM is the ordered p-tuple (i 1,1222 , , ppP)if p < q and it is the ordered q-tuple (121 I , µ 2 2 , ... 4Lgq )ifp>q.

EXAMPLE 13.2. Consider a rotation in the ordinary plane about theorigin 0 by an angle 0. The point P with cartesian coordinates (x, y)is taken into the point P' with cartesian coordinates (x', y'), where

x'=xcos0 -ysin0Y' = x sine +Y cos 0.

We call

cos B - sin 0

sin 0 - cos B

the (2,2)-matrix of rotation

TYP,=(x"Y')

= (X, Y)

0 _x

158 V MATRICES

EXAMPLE 13.3 Let M = p** be a (p, q)-matrix. Consider the(q, p)-matrix Mt = M'** where µi j = ,2 t for all i = 1, ... , p andj = 1, ... , q. The matrix Mt is called the transpose of the matrix Mand is obtained by turning M around about the diagonal.

Al\ 1112 ........ µ1q `µI I µ21 lap1

#21 \#22 ........ #2q A12 #22 Ap2

M = \\ \\.................... . . . .

µP1 1P2 ...#PP\.. Apq Mt = App

1111q 12q . Apq I

Between the row vectors and the column vectors we have yj* =At *jand A*j = ptj *. Moreover, for any matrix M, (Mt )t = M.

EXAMPLE 13.4. Let X be an n-dimensional linear space over Awith a base (x1 i ... , x ). For every (n, n)-matrix a** over A, wecan define a function F: X2 -* A by

F(x, Y) =Ea`jt'?j

wherex=tlx1 + ... +tnxn andy=rilxl+... +rinx, Thenthe function F is called a bilinear form of the linear space X i.e.,a function of X2 into A that satisfies the following conditions:

(i) F(x + x', y) = F(x, y) + F(x', y);F(x, y + y') = F(x, y) + F(x, y') and

(ii) F(Xx, y) = F(x, Xy) _ W(x,Y)

for any vectors x, x', y, y' of X and any scalar A of A. In otherwords, F is linear in each of its two arguments. Conversely, ifG: X2 -- A is any bilinear form of X, then an (n, n)-matrix P** overA is defined by

Q,j = G(xi,Yj)

for every i, j = 1, ... , n. In terms of this matrix R**, we can


calculate the value of G at x = S 1 x 1 + ... + t,, xn and y = i 1 x 1 + . +

77nxn by

G(x, y) = EoilEi11,.

Thus there is a one-to-one correspondence between the bilinearforms of X and the (n, n)-matrices. For each fixed base of X we candetermine such a correspondence.

B. Addition and scalar multiplication of matricesFor any pair of positive integers p and q, the set A(v, v) of all

(p, q)-matrices over A is clearly non-empty. We shall introduce nowappropriate composition laws into this non-empty set so as to makeit a linear space over the same A in question. Taking into accountthat there is a one-to-one correspondence between linear transforma-tions and matrices, we shall therefore define addition and scalarmultiplication of matrices in such a way that these composition lawsof matrices should correspond to those of linear transformations (see§ 14A).

For any two (p, q) -matrices M' = p' * * and M" =;L"** over A, wedefine their sum M' + M" as the (p, q)-matrix M = p** whose termsare

Ail= p'ii +

p "ii

for all i=1,...,pandj=1, ...,q.Note that the sum M' + M" is defined if and only if the matrices

M' and M" have the same size. If the i-th row p'i* of M' and thei-th row p"i* of M" are both regarded as (1, q)-matrices, then theirsum p'i* + p",* is also a (1, q) -matrix and is equal to the i-th rowpi* of the sum M' + M". Similarly p'*i + p "*/ = p *i for the columns.

For any (p, q)-matrix M' = p'** and any scalar A, we define theirproduct AM' as the (p, q)-matrix M= p** whose terms are

pii = Ap'il

for all i = 1, ... , p and j = 1, ... , q. Clearly the i-th row p i* ofthe product AM' is equal to the product Xp i*. Similarly A*, = Au'*1for the columns.

It is easy to see that the set A(M) of all (p, q)-matrices over Aform a linear space over A with respect to the addition and the scalarmultiplication defined above. In particular, the zero (p, q)-matrix 0,

160 V MATRICES

whose terms are all zero, is the zero vector of the linear spaceA(p, q).

Relative to the formation of transpose (see Example 13.3), ad-dition and scalar multiplication have the following properties:

(i) (M' + M")' = M"', and(ii) (X M'), = X M't .

Finally let us consider the dimension of the linear space A(p q).We use the Kronecker 8-symbols to define a (p, q)-matrix E,

p;1=1, . , q for each r = 1, ... , p and eachs = 1, ... , q. All terms of this matrix E,, are zero except the onethat belongs to the r-th row and the s-th column which is equal to 1.This family of pq matrices is obviously a base of A(p, q) called thecanonical base of A(P.q). Therefore dim A(p,q) = qp. For example,the canonical base of A(2' 3) consists of the matrices:

1 0 0 0 1 0 0 0 1

E11E12 =

_E

0 0 0 0 0 013 0

0 0

0 0 6 0 0 0 0 0E21 E22 =

E23 =

1 0 0 0 1 0 0 0 1

C. Product of matrices

We define now a multiplication of matrices that will correspond ina way to the composition of linear transformations. Let M'= µ'** bea (p, q)-matrix and M" = A"** a (q, r)-matrix both over A. Wedefine their product M' M" as the (p, r)-matrix M = µ** whoseterms are

Ail = 141A"11 + /l12;1"2/ + ... + µjgµ'gj

foralli=l, .. ,pandallj=l, ...,r.Note that the product M'M" is defined if and only if the number

of columns of M' is equal to the number of rows of M" and that theproduct M%" has the same number of rows as M' and the samenumber of columns as M". This state of affairs is therefore analogousto the fact that the composite Ooo of two linear transformations isdefined if and only if the range of 0 is identical with the domain of

§13 GENERAL PROPERTIES OF MATRICES 161

and that the domain and the range of the composite ioo are identi-cal respectively with the domain of 0 and the range of 4,. If the i-throw p';* of M' is regarded as a (1, q)-matrix and the j-th columnµ"*j of M" is regarded as a (q, l)-matrix, then their product

is a (1, 1)-matrix whose only term is identical with theterm µ;j of the product M' M ". Therefore

µ;j = µ *µ,.* j or l.Lij = (41µ'72 ... µ';q )

rµ 1lj

µ2j

µ"qj t

for all i = 1, ...,pandj= 1, r, and hence

M'M"= (1L';*µ"*j);=..., p;i=1, ...,r.EXAMPLE 13.5. Take the matrices

A =a b e fc d and B=

gh

Then we obtain the products AB and BA as follows:

AB =ae + bg of + bh

and BA =ea + fc eb + fd

ce + dg cf + dh ga + he gb + hd .

The usual associative law and distributive laws can be verified bystraightforward calculation. However, this will involve a heavy use ofindices. For this reason, we shall use the correspondence betweenmatrices and linear transformations to prove these laws in the follow-ing § 14.

Meanwhile we note that, in general, multiplication of matrices isnot commutative. Take for instance the matrices

162 V MATRICES

We obtain

0 1 0 0 0 0

M'M" = 1 0 0 and M"M' = 0 0 1

0 0 0 0 1 0

and therefore M'M" : M "M'. Note also that the product M'M" oftwo matrices can be a zero matrix while the matrices M' and M"themselves are both distinct from the zero matrices. For instance,

if M'= 0 1 and M" =[I 0 , then M'M"=0 0 0 0

0 0

0 0

Relative to the formation of transpose, multiplication of matriceshas the property that (M'M")t = (M"t) (M"), i.e., the transpose of aproduct is the product of the transposes but in the reverse order.Although this can be proved by straightforward calculation, we shallprove it using a different method (see § 14A).

In the multiplicative theory of matrices, the identity matrices playan important role. These matrices are defined as follows. For anypositive integer p, the identity matrix of order p is the (p,p)-matrixIp = (Si/)i,/ = 1, ... , p. Thus all terms of Ip are zero except those onthe diagonal which are all equal to 1; thus

1100. . 0)0 1 0 . . 0

0 . . 0 1 0

l0 . . . 0 1

For any (p,q)-matrix M = µ**, we get

µi/ = Silµlj + ... + Siiµt/ + ... + 6ipµpl

µi/ = µt161/ + . . . + µtj6j/ + ... + µig1gl

foralli= 1, ...,p andj= 1, ...,q. Therefore

IpM = MIq = M.


D. Exercises

1 5 7 3 101. Let A1= 1 ,A1= 1 0 2 -2 0

L32 2 1 5 1 6

I1 0 0 0 1

1 2 1 3 2 1

and A3 = -1 2 5 0 010 2 1 1 1

3 5 4 6 0

Evaluate Al (A2A3) and A3A3.

2. Show that the matrices which commute with

0 1 0 0

10 0 1 0are of the form10 0 0 1

0 0 0 0

a 0 7 S

0 a 0 7

0 0 a P

0 0 0 a

3. An (n,n) real matrix (a,) is a Markov matrix if

0 < at/ 4 1 and E 1 a i l = 1 1 1 , 2, . . . , n .

Prove that the product of Markov matrices is a Markov matrix.

a 0 b4. Let M = I c 0 and N 7 a . Show that

d e 0

M3 - (a + c)M2 + (ac - bd)M - (be - bcd)I3 = 0and

N2 - (a + S)N + (aS - f7)I2 = 0.

164 V MATRICES

5. Let E;j be the canonical base of the linear space A(n,n)

of all square matrices of order n and A = (a**) E A(n,n)

If AE, j = A, then ak; = 0 for k = i and a jk = 0 for k j.Hence show that if A commutes with all matrices of A(n,n )then A = aId .

6. Find all square matrices A of order 2 such that A 2 = 0.

7. Find all square matrices B of order 2 such that B2 =12 .8. Show that with respect to multiplication

11 0 -1

011,

1 0 -1 0

0 1 0 0 -1 0 -1

form a group.

9 A is a (n,n)-matrix whose entries are all 0, 1 or -1 and in eachrow or column there is exactly one entry which is not zero.Prove that A, A2, A3 ... are of the same type and henceshow that Ah = In for some positive integer h.

10. Let H be the set of all complex square matrices of order two ofthe form

I x+yi z+tt) =M(x,y,z,

(a) Show that the product and the sum of any two matrices ofH are again matrices of H.

(b) Show H is an associative algebra.(c) For any matrix M(x, y, z, t) distinct from the zero matrix

find the matrix M(x', y', z', t') such that

M(x, y, z, t)M (x', y', z', t') =x2+y2+z2+t2 0

0 x2+y2+z2+t2

(d) Find all the matrices M of H such that M2 I.

11. If A and B are square matrices of the same order we denote by[A, B] the matrix AB - BA. Let


0 0 0 1 0 0_Ml

_M2

_Ms

1 00

0 i 0

M

0 i _ 1

M?

i

MC

4 [ ]6 0 -i

be complex matrices where i is the imaginary unit. Evaluate all[M;, Mil and determine all matrices A which commute witheach M; (i.e. AM; = M;A).

12. Let M = I be a (2,2)-matrix where Q* 0.

(a) Find a necessary and sufficient condition for

B=

to commute with M.

(b) Find the dimension of the subspace of A(2,2) consisting ofall matrices commuting with M.

13. Prove that the set of (n,n)-matrices which commute with afixed (n,n)-matrix form a subspace of the linear space A(p.n).

14. Let A = (ai)r,i =1, 2, n be a square matrix. The trace Tr(A)of A is defined by

Tr(A) = all + a22 + + app .

Show that Tr(A + B) = Tr(A) + Tr(B) and Tr(AB) = Tr(BA).Give an example to show that Tr(AB) * Tr(A)Tr(B). Showalso that for any matrices A and B, AB - BA 0 I.

15. Let X be the linear space of all square matrices of order n. i.e.matrices of size (n,n). Show that for every feX* there is aunique A feX such that for every MeX

f(M) = Tr(AfM).

Show also that f(MN) = f(NM) holds for all M NEX if and only ifthe matrix A f is a scalar multiple of the identity matrix.

166 V MATRICES

16. A real square matrix A of order n is said to be positive definiteif XAXt is a positive real number for every nonzero (l,n)-matrix X. If A and B are symmetric square matrices of order nwe define A < B to mean that B-A is positive definite. Showthat if A <B and B <C, then A <C.

§14. Matrices and Linear Transformations

Let us now pick up the loose threads; we must study firstly thecorrespondence between matrices and linear transformations in moredetail, i.e., the formal relationship between the linear spacesHom (X,Y) and A(P.q), and secondly the effect on the matricesarising from a change of bases in X or in Y.

A. Matrix of a linear transformation

Let X be a p-dimensional linear space and Y a q-dimensional linearspace both over A. Relative to a fixed base B = (x1, . . . , xp) of Xand a fixed base C = (y 1 , ... yq) of Y, every linear transformation0: X -> Y determines a unique (p,q)-matrix MBC(cb) = a** over A insuch a way that

O(xi) = ai1y1 + ... + aigyq

for each i = 1, ...,p. We call MBC(¢) the matrix of the linear trans-formation ¢: X -* Y relative to the bases B = (x1, ... , x,)andC = (y1, ... , yq ). If the bases in question are clear from the context,we use the abbreviationM(O) for MBC(¢).

Assigning to each ¢e Hom (X, Y) its matrix M(O) relative to thefixed bases B and C in question, we obtain a surjective mappingMBC : Hom(X, Y) - A(P.) of sets. This mapping M is obviously alinear transformation of linear spaces; it is also injective, for ifMBC(¢) is the zero matrix, then 0 must be the zero linear trans-formation. Thus, for each pair of fixed bases B and C of X and Yrespectively, we obtain an isomorphism MBC : Hom (X, Y) -+ A( q) .

We see now that the algebraic structure of the linear space A(P, q) de-fined in § 13B corresponds to the algebraic structure of the linearspace Hom (X, Y) defined in § 7A in a most natural way. For thisreason, the linear space A(P,q) can be regarded as an arithmetical

§ 14 MATRICES AND LINEAR TRANSFORMATIONS 167

realization of the linear spaces Hom (X, Y) in the same way as APis an arithmetical realization of any p-dimensional linear space X.

We now justify the statement at the beginning of § 13C that themultiplication of matrices corresponds in a way to the compositionof linear transformations.

Let Y, Y, Z be three linear spaces all over A and B = (x1, ... xp ),C= (y, - Yq ), D = (z1, ... , zr) bases of X, Y, Z respectively.If 0: X -* Y and 0: Y - Z are linear transformations, then we con-sider the matrices MBC(cb) =a**, MCB(0) = R** and MBD (l / -0) = 7* * .

These matrices have size (p, q), (q, r) and (p, r) respectively; theirterms are respectively defined by

4(xi) = a11Y1 + ... + a;gyq for i = 1, ... , p,

'P(Yi) = (3irzr for j = 1, ... , q,

oO(xi) =7i1z1+... + 7irZr for i = 1, ..., p.Now 7* * and the product (a**) (/3*,,) have the same size (p, r). Since( o0) (xi) = >fi (O(xi)) for each i = 1, ... , p, we get

7j1 Z1 + ... + 7ikZk + ... + 7irZr= ko(xi))= ip (a11 Y1 + ... + aiiyj + ... + aiq yq)= (Eaii(3ik)Zk + ... + (EaiiIir)zr

=(ai:l+.1)z1 + ... + (ai.R.k)zk + ... + (ai-(3.r)zr-Therefore 7ik = ai*(3*k for all i= 1, . . . , p and all k= 1..., r, i.e.,

MBD MBC(cb)MCD or simply M(i,oO) = M(O)M(i)

So the matrix of a composition of two linear transformations is equalto the product (in the reverse order) of the matrices of the lineartransformations. Consequently, the associative law and the distribu-tive laws of multiplication of matrices follow immediately fromthose of the composition of linear transformations.

Earlier in Example 13.3, we introduced the transpose Mt of amatrix M. We study now the relationship between the transposeof a matrix and the dual of a linear transformation. The baseB = (x1, .. , x ) of X and the base C- (yl , ... ,yq) of Y determineuniquely dual uses B* = (fl, . . . , fp) of the dual space X* of X andC* _ (gI , ... , gq) of the dual space Y* of Y.

We want to know the relationship between the matrices MBC ()

168 V MATRICES

= a** and Mc*B * The matrix P** is a (q,p)-matrix whoseterms Pji are determined by the equations

0*(gi)=P11f, +...+(31Pff forallj=l, ...,q.By the definition of dual transformation, we get O*(gi) = gjoo for allj = 1, ... , q. Therefore from the equations

(0*(g1))(xi) = P/ If, (xi) + ... + Pjifi(xi) + ... +Plpfp(xi)=Pj1s1i+ ..+Piisii+ ... +P/P8Pi=iji

and

(81'0(x;) = g1(O(x1)) = gl(ai1y1 + ... + ailyj + ... + aiq yq)

=ai1g,(y1)+... +ai1gi(yi) +... +ajggj(yq)= ai 1 Sit + ... + aijSii + ... + aiP SjP =al,

we get the equations iji = ail for all i = 1, ..., p and all j = 1, ... , qIn other words, the matrices a** and P** are transposes of eachother; therefore

MBc (0)` = Mc*B*(0*) or simply M(g)t = M(O*),

i.e., relative to two fixed pairs of dual bases, the transpose of thematrix of a linear transformation is the matrix of the dual trans-formation.

Finally it follows from (M(4)M(iy))t = M(,Jlo4)t = M((Joo)*)= M(O*o0*) =M(iJi*)M(c*) =^,)tM(Of that (MN)t = NtMt for anymatrices M and N if their product MN is defined. Thus the transposeof the product of two matrices is the product in the reverse order oftheir transposes.

B. Square matrices

Let us now consider matrices of endomorphisms of a finite-dimensional linear space X. If X has dimension p , then relative toany base of X the matrices of endomorphisms of X all have the size(p,p). These matrices are called square matrices of order p. Sincethe product of two square matrices of order p is again a squarematrix of order p , the linear space A(P,P) of all square matrices oforder p together with multiplication is easily seen to be an


associative algebra. Relative to any base B of X, the isomorphismMBB : End (X) -> A(P,P) has the following property:

MBB(0o0) = MBB ()MBB (0)

In other word under this one-to-one correspondence, the composite oftwo endomorphisms corresponds to the product of their matrices inthe reverse order. In this sense, the associative algebra A(P' P) can beregarded as in an arithmetical realization of the associative algebraEnd (X).

Furthermore under this correspondence, the identity linear trans-formation ix corresponds to the identity matrix Ip, i.e., MBB (ix)= Ip, since ix(xi) = xi for all vectors x; of the base B = (x1, ... ,If 0 is an automorphism of X (i.e., a bijective endomorphism of X)and 0-' the inverse of ¢, then it follows from 0 o 0-' _ 0-' o 0 = ixthat -')M 1) ThisMBB (0 BB (0) = MBB (0) MBB (0_= MBB (ix) = Ip .suggests the notion of invertible matrix. A square matrix of order p issaid to be invertible or non-singular and only if there exists a matrixM' such that MM = MM' = Ip. In this case, M' is necessarily a squarematrix of order p; furthermore M' is uniquely determined by therequirement M'M = MM' = 1p. Indeed if M" is such that M"M = MM"= Ip, then it follows from M'(MM") = (M'M)M" that M' M". Thematrix M' is called the inverse of the invertible matrix M and isusually denoted by M-' .

Conversely if the matrix MBB ((¢) of an endomorphism 0 isan invertible matrix M, then M determines uniquely an endo-morphism / such that MBB (0) = M-' . It follows fromMBB (0)MBB (0) = MBB (VG)MBB (0) = MBB (ix) that o O _ o

= ix. Therefore 0 is an automorphism and 0 = 0-1.Summarizing, we obtain the following result: relative to any base

B of X, the automorphisms of X and the invertible matricescorrespond to one another under the one-to-one correspondenceMBB : End (X) -+ A(P,P). Furthermore under this correspondence,for every 0 e Aut (X)

MBB (O)-' = MBB (¢'') and M(0 *)-' = M(O-' )t .

Consequently for every invertible matrix M

(M'' )-' = M and (Ml )-l _ (M-' )t

Finally the set of all invertible matrices over A of order p togetherwith multiplication constitutes a group, called the general linear

170 V MATRICES

group of degree p over A and denoted by GL(p). Just as A(p.p) is anarithmetical realization of End(X), so also is GL(p) an arithmeticalrealization of the group Aut(X) of automorphism of X (see §6B).

C. Change of bases

So far, in the discussion on matrices of linear transformations, wehave had to choose a fixed frame of reference determined by a baseof the domain and a base of the range linear space. We must nowstudy the effect on such matrices arising from a change of base in thedomain and a change of base in the range linear space. This state ofaffairs is similar to the transformation of coordinate-systems inanalytic geometry.

Let B = (x 1i ... , xp) and B' = (x'1, ... , xp) be any two basesof a linear space X over A. In writing each vector of B' as a linearcombination of vectors of B:

x'1 _'y,1x1 +...+yipxp for i = 1, ...,p,

we obtain a square matrix G = y** of order p over A, called thematrix of change of the base B to the base B'. Relative to the base B(used both in the domain and the range linear spaces), G is thematrix of the automorphism r: X -+ X defined by T(x1) = xi fori = 1, . . . , p, i.e., G = MBB(T). Therefore the matrix G of the change ofB to B' is an invertible square matrix of order p; its inverse matrix G'1is the matrix of the change of the base B' to the base B. On the otherhand, if B' is used as the base of the domain and B is used as the baseof the range linear spaces, then the same matrix G is the matrix ofthe identity linear transformation ix: X -+ X relative to these bases,i.e., MB.B(ix) = G.

We shall now establish the relationship between the matrices ofone and the same linear transformation 0: X - Y relative to twopairs of bases in terms of the matrices of changes of bases. LetB = (x1, ... , xp) and B' _ (x'1, ... , x'p) be two bases of the linearspace X, and G be the matrix of the change of the base B to the baseB'. Similarly, let C = (y r , ... , Yq) and C = (y 1 , . . . , y'q) be twobases of the linear space Y, and P be the matrix of the change of thebase C to the base C. Any linear transformation ¢: X -> Y, can befactorized into 0 = iyoooif. We calculate their matrices relative tothe bases as indicated in the following diagram:


X (with base B') Ix) X (with base B)

Y(with base C) - Y(with base C').

Since G and P-1 are the matrices of the linear transformations atthe ends we get

MB.C.(q)=MB.C.(iyo oiX)=MB.B(iX)MBC(4)Mcc.(iy) = GMBC(O)P-1.

Therefore the matrix of a linear transformation changes according tothe rule:

MB-C.(cb) = GMBC(O)P_1.

As a result of the change of the base B to B', the dual B* of Bchanges to the dual base B'* of B'. From the result we have justobtained, the matrix of the change from B* to B'* is given byMB.,B,(iX,). Since

MB.,B,(iX,) = MBB.(iX)r = (MB.B(iX)r)-1,

the matrix G =-,y** of the change from B* to B'* is the inverse of thetranspose of the matrix G = 7** of the change from B to B' i.e.,(7= (G')'1; for the terms of these matrices we get

?k 17ik'1')k = St/ for i, j = 1, ... , p.

D. Exercises

1. Write down the matrices of the following linear transformationrelative to the canonical bases.

(i) : C3 - C 2 such that

p1(a1,a2, a3) = (0, a1+a2+a3)(ii) p2 : C4 - C4 such that

'P2(0(11 a2,a3,a4)=013,a4,a1,a3)-(iii) 'p3 : C2 -* C4 such that

p3(a1, a2) _ (a1, a1+a2, a1"a2, a2)

172 V MATRICES

2. The matrix of a linear transformation gyp: C4 -> C3 relative tothe canonical bases is

1 4 -1

2 -2

2 2 3jFind the images of the following vector under gyp:

(2 tom, -3 F-1, 4, 0); (1, 1, 1, 1); (0, 0, 0, ) .

3. Let p be an endomorphism of R4 whose matrix relative to thecanonical base is as follows

11 -1 1 -11 1 -1 -11 -1 -1 1

-1 -1 -1 3

Show that x + y + z + t = 0 is a necessary and sufficient conditionfor a vector (x, y, z, t) of R4 to belong to Imp. Find dim

4. For each real number 0, let Pe denote the endomorphism of R2whose matrix relative to the canonical base is

cosh --sing

sine COs,

Show that (a)`pe+©,

and (b) cpe 1 = p--6

5. Find the matrix of change of the base B to the base B' in eachof the following cases.

(a) B = ((1, 1, 0), (0, 1, 2), (-1, 1, 1)'

B'= ((0,0, 1), (-1, 1, 1), (2, 1, 1))(b) B= ((10,-2,5), (1,1,2), (3,2,1))

B'= ((0, 1 , 1 ), (4, - 1, 2), (1, 2, -1))


6. Let p be the endomorphism of R[T] which takes eachpolynomial f(T) into the polynomial f(T+ 1) - f(T) find thematrix of p relative to the base

T(T-1) T(T-1) (T-2) T(T-1) ... (T-n+2))2! 3! (n-1)!

7. Let X, Y be linear spaces with bases B and C respectively. Letgyp: X- Y be a linear transformation. If,xEX, denote by MB (x) therow vector of coordinates of x relative to the base B. Show that

MM 4x) = MB(x)MBC4)

8. Let X be an n-dimensional linear space over A with base B.Let p be an endomorphism of X. Show there exists a baseC = (yl, . . . , y,)such thatfori= 1, ...,n

tipy, = X;y;for a scalar X1EA

if and only if there exists an invertible square matrix P of ordern over A such that the matrix

P-' MBB('P)P

has zero entries off the diagonal.

9. Let X be a linear space with base B = (x1, X2, x3, x4) . Suppose ppis an endomorphism of X such that

1 0 2 I

-1 2 1 3MBB(O)_

1 2 5 5

2 2 1 -2

(a) Find Mcc(tp) whereC= (X1 -2X2 +X4, 3X2 -X3 -X4, X3 +X4, 2X4),

(b) Find Keep and Imp.(c) Select a base in extend it to a base of X and find the

matrix of p relative to this base.(d) Select a base in Imp extend it to a base of X and find the

matrix of p relative to this base.

10. Let X and Y be linear space over A of dimensions p and qrespectively. Show that if gyp: X -+ Y is a linear transformation of

174 V MATRICES

rank r, then there exist a base B of X and a base C of Y suchthat

0

Hence show that for any (p, q)-matrix M of rank r there existUE GL(q,A) and VE GL(p,A) such that

UMV = Cf' O I

-0 0J.11. Let X be a linear space, p an endomorphism of X and x a vector

of X. Suppose pk -'x 0 and pkx = 0 for k > 0.

(a) Show that ipx, pzx, . . ., pk-'x are linearly independent.

(b) Show that if k = dim X, then there exists a base B of Xsuch that

MBB (sP) =

1OJ

where all the unspecified entries are zero.

12. Let X be a linear space and let B, C be two bases of X. Show thatif o E End (X), then

(See Exercise 14 of paragraph § 13 for the definition of Tr).Define Trip = Tr(M B(op)) for all pE End (X) and show that forgyp, 0 E End (X) and XeA

Tr ('P. VI) = Tr(41 o,P),

Tr(?up) = X Trip andTr(ix) = dim X.

§15 SYSTEMS OF LINEAR EQUATIONS 175

§ 15. Systems of Linear Equations

A system of equations

(E)

a1 1X1 + ... + a1 nXn = a1a2 1X1 + . .. + a2 n Xn = a2

n+1

n+1

am1X1 + ... +amnXn =am n+1

with known scalars at1 of A and indeterminates X1 , ... , X, iscalled a system of m linear equations in n indeterminates with coeffi-cients in A. The problem is to solve these equations, i.e., to find allpossible n-tuples (X1 i ... , An) of scalars of A so that after substi-tuting each Xi for each Xi in (E), we get

allX1 + ... + aInXn = a1 n+1a2 111 + + a2 n Xn = a2 n + 1...........................am 1X1 +. .. + amnXn = am n + 1 .

Not every system of linear equations can be solved; take forinstance the system

X1+X2=0X1+X2=1.

We now wish to find a necessary and sufficient condition for thesolvability of the system (E) and we propose to study this problem interms of matrices and linear transformations. Associated with thesystem (E) of linear equations are two matrices

Ia11 . . . . alnA0=

a21 . . . . a2nand A =

I all . . . a1n a1 n+1a21 . . . a2n a2 n+I

f m l . . . . amn I `I aml . . . amn am n+ lJ .

The system (E) admits a solution if and only if there is a linearrelation X 1 a* 1 + ... + Xn a*n = a* n +I among the column vectorsof the matrix A. In other words, the system (E) can be solved if and

176 V MATRICES

only if the subspace of A"1 generated by the column vectors of A isthe same as that generated by the column vectors of A.. We enter nowinto detail discussion of the problem.

A. The rank of a matrix

Using the same notation as in the previous paragraph, we see thatthe subspace generated by the column vectors of A0 is a subspace ofthe subspace generated by the column vectors of A. Thus, to com-pare these two subspaces, it is sufficient to consider the maximalnumbers of linearly independent column vectors of Ao and of A. Ittherefore suggests the notion of column rank of a matrix.

Let M = µ** be any (p, q)-matrix over A. The column rank c(M) ofthe matrix M is the maximal number of linearly independent columnvectors of M, i.e., the dimension of the subspace of AP generated bythe column vectors of M. Similarly, the row rank r(M) of the matrixM is the maximal number of linearly independent row vectors of Mi.e., the dimension of the subspace of A4 generated by the rowvectors of M.

We prove now that r(M) = c(M). Let 0: AP --> A4 be the lineartransformation whose matrix relative to the canonical bases of APand A4 is equal to M. Then the image Im 0 of 0 is generated by therow vectors µ1*, . . . , µp* of M; hence for the rank r (0) of the lineartransformation 0 (see §5D) we get r(0) = r(M). Since the transposeMt of M is the matrix of the dual transformation O* of 0 we get r(O*)= r(MI) = c(M). From 7.8, the required equation r(M) = c(M) follows.Thus the distinction between the row rank and the column rank of amatrix disappears.

DEFINITION 15.1. The rank of a matrix M, to be denoted by r(M), isthe maximal number of linearly independent row (or column)vectors of M.

Under this notation, we obtain the following criterion for thesolvability of a system of linear equations.

THEOREM 15.2. A system

a11X1 + . .. + a1,Xn = a1 n+ 1

amtXi + ... +amnXn =am n+1

§ 15 SYSTEMS OF LINEAR EQUATIONS 177

of linear equations is solvable if and only if the matrices

all ...a1nA. = .........

amt .amnhave the same rank.

all ... aln a1 n+1A = ...............

am1... amn am n+1

B. The solutions of a system of linear equations

It follows immediately from 15.2 that a system of homogeneouslinear equations

a .11X1 +...+a1nXn = 0(E0)

am 1 X, + ... + am,Xn = 0,

that is a system of linear equations in which the constant terms areall zero, is always solvable. This also is evident from the fact that then-tuple (0, .. , 0) of zeros is a solution of (E0). We analyse now thesolutions of (E0). Each equation of the system (E0) gives rise to acovector or linear function f, of the linear space A" defined by

f(x) aj1X1 + ... + atnXn

for each vector x = (X1 , . , X,,) of A. Thus the set of all solutionsof the system (E0) is identical with the annihilator of the subspacegenerated by covectors f ... , fm of AP, i.e., the set of all xeA"such that Ji(x) = 0 for i = 1 , ... , m. It follows from 7.7 or by directverification that the set of all solutions of (E0) is a subspace of An ofdimension n-r(A0), since the rank r(A0) of the coefficient matrix A0of the system (E0) is the dimension of the subspace generated by thecovectors We have therefore proved the followingtheorem.

THEOREM 15.3. The set of all solutions of a system of homogeneouslinear equations (E0) in the indeterminates X1, . . . X,, is asubspace So of the arithmetical linear space A". Furthermore dim So= n - r(A0) where r(A0) is the rank of the coefficient matrix A0 ofthe system (E0).

178 V MATRICES

(E)

For a general system of linear equations

allXL + . .. + amnXn = a1 n+1

am, X1 +- +amnXn = am n+ i

we call the system of homogeneous linear equations

a11X1 + + amnXn = 0

(E0)

am1X1 +...+amnXn = 0

the homogeneous part of the system (E). We have seen that thesystem (E) is solvable if and only if r(A) = r(A0). Suppose this is thecase and let S denote the set of all solution of (E). Using notations inthe proof of 15.3 above, we see that xEA" belongs to S if and only iff.(x) = ain+ 1 for i = 1 , ... , m. It follows that if x and y belong to S,then f (x - y) = f(x) - f(y) = 0, i.e., x - y belongs to the solutionset So of the homogeneous part (Eo) of (E). Conversely if xES andz c= So, then z) = f(x) + f(z) = ain+ 1; therefore x + zES. Inother words the solution set S of (E) is an element of the quotientspace An/S(,; or in geometric terms, S is a linear variety of then-dimensional affine space Aff(A").

C. Elementary transformations on matrices

We study now an effective method of determining the rank of amatrix. The idea of this method is to find in the subspace generatedby the row vectors 111*, ... , pp* of a (p,q)-matrix

I's1111 1112 .... µ1q

M = 1121 1122.... 1124

Q11p1 1p2 .... lpgJ

a family (p' 1*' ... p'p*) of generators so that the maximalnumber of linearly independent vectors among them can be read offeasily. We shall see that this can be achieved by a series of elementaryrow transformations. These are operations on a matrix that changethe rows but do not change the rank of a matrix. There are three kindsof elementary row transformations namely:


the first kind that consists in a transposition of the i-th and j-throws, (notation: R(i:j)),the second kind that consists in multiplying the i-th row by anon-zero scalar X, (notation: R(Ai)),the third kind that consists in adding a scalar multiple of A andthe j-th row to the i-th row, where i 0 j,(notation: R(i + Aj)).

For example, by an elementary row transformation of the first kindthat interchanges the positions of the first and the second rows, i.e.,R(1:2), the matrix M is taken into the matrix

11211222....µ2q1211 1112 ....111q

l µp1 µp2 .... 1.Lpq J

Similiarly,by the elementary row transformation R(A1) of the secondkind and the elementary row transformation R(1 + X2) 6f the thirdkind, the matrix M can be taken into the matrix

A1211 NA12 .... AµIq

1221 µ22 .... 122q

I µp 1 µp 2 .... µp

and

11+Aµ21 µI2+Aµ22 ... µ1q+

1221 1222 ........1hq.......................

I µp1 µpa ....... 12pq jrespectively.

Clearly the rank of a matrix remains unchanged after the matrix inquestion has undergone an elementary row transformation since thesubspace generated by the row vectors remains unchanged.

The elementary column transformations C(i j), C(Ai) and C(i + Xj)are similarly defined and they can be also used for the same purposeof finding effectively the rank of a matrix.

180 V MATRICES

We show now that any (p,q)-matrix M = p** can be transformedinto an echelon matrix

CO ... 0 1 v1 l .................... v1g )

0 ............. 0 1 v2 /2+ 1 ......... .. . v2q

N=....................................0 ..................... 0 1 v, /,+ I ... vq0 ........................... .....0

O .................................0 J

by a finite number of elementary row transformations. Moreprecisely, the matrix M can be so transformed by a finite number ofelementary row transformations into a matrix N = v** for whichnon-negative integers r, j I, j2, ... , j, exist so that

(i) 0<r<p,0< jl< j2 <...< j,<q(ii) vii = 0 if i < r and j < j, ;

(iii) v./ = 0 if i > r ;(iv) v;/;=I ifi<r.

The rank of the echelon matrix N is clearly equal to r and thereforethe rank of the matrix M is also equal to r.

We shall now find the integers jl , ... , j, successively.FIRST STEP: If the matrix M is the zero matrix, then we get r = 0and need not proceed further. Assume now that M is distinct fromthe zero matrix and proceed as follows.

(a) Let j, be the column index of M such that the jl -th columnµ,,,/l is the first non-zero column of M, thus:

. 0 µ1/1

M=

l 0 0 AP/1 pp )


(b) If necessary, apply an appropriate elementary row trans-formation of the first kind and an appropriate one of the secondkind to take M into a matrix M' = µ'** of the following form

M' =11211 ... µ2q

0 11'p1 1... µ' n J

(c) If necessary, add an appropriate scalar multiple of the first rowof M' to each of the other rows of M' (i.e., an elementary rowtransformation of the third kind) to take M' into a matrix M" = µ"**of the following form

0 0 1 I'1/i+1 ...0 0 0 11'2a

M" =

0 0 0 1"'pjl+l ... 1i'p J

SECOND STEP: Consider the (p-1, q-jl)-matrix

M1 =`A'p,1+.1. µ,P11+2...... µ'p11

obtained from M" by deleting its first row and its first jl columns. Ifthis matrix M1 is the zero matrix, then we get r = 1 and need notproceed any further. Otherwise, we can apply similar operations as(a), (b) and (c) above to the matrix M1 without affecting the resulton the first row and the first jl columns achieved by the first step ofthe procedure.

But this procedure must terminate after no more than p stepssince the matrix has only p rows and with each step we transform (atleast) one row into the required form. Therefore we have succeeded

182 V MATRICES

in bringing the matrix M into an echelon matrix N = v** by a seriesof elementary row transformations. Similarly, we can use theelementary column transformations alone to take the matrix Minto an echelon matrix.EXAMPLE 15.4. Let us apply the method we have just learnt to findthe rank of the following real (3,4)-matrix A = a**

1 -2 1 Al

A = 2 1 1 A2

0 5 -1 A3

According to the procedure, we apply an elementary row trans-formation R(2-2 1) and get

11 -2 1 A,

0 5 -1 -2A,+ A20 5 -1 A3

Multiplying the second row of this matrix by the non-zero scalar 5 ,

i.e., R(s 2)

1 -2 1 A,

0 1 - 5 5 (-2A, + A2 )0 5 -1 A3

Now apply an elementary row transformation by subtracting fromthe third row five times the second row i.e., by R(3-s2) and get anechelon matrix

11 -2 1 A,

0 1 - 5 5(-2A, + A2)

0 0 0 2A, - A2 + A3 .

From this echelon matrix, we obtain for the rank r(A) of the matrixA

r(A) - 2 if 2A, - A2 + A3 = 03 if 2A, - X2 + A3 * 0.

§ 15 SYSTEMS OF LINEAR EQUATIONS

D. Parametric representation of solutions

183

When the coefficient matrix A of a system of linear equations

a11X1 + ... + alfXf = al n+I

am l X l + ... + amnin

= am n + 1

is taken into a matrix A' = a'** by an elementary row trans-formation, then the system of linear equations

a',,Xl +...+ a'1nXn - a1 n+1.........................

a11X1 + ... +amn4n - am n+1

whose coefficient matrix is A', has clearly the same solutions as thesystem (E). Therefore the elementary row transformations can alsobe used to find the solutions of a system of linear equations. It isobvious that elementary column transformations can not be used forthis purpose.

Assume now that the system (E) is solvable, then the coefficientmatrix M of (E) can be brought by a series of elementary rowtransformations into an echelon matrix N:

I- 10...01 P11 1 ........................ v1n+10.........'.+...

0 1 P212+1 .............. P2 n+1

.........................................

0 ....................... O l vri'+1 ... Prn+10 .................................... 0

......................................

0.....i.........i......... I..........0

jl-th j2-th jr-thcolumn column column

J

- r-throw

where jr * n+l; for otherwise r(A0) < r(A) and the system (E) wouldnot be solvable, contradicting the assumption. We can furthersimplify the matrix N by elementary row transformations so that onthe j;-th-column (i = I ... , r) all terms are zero except the onebelonging to the i-th row. Clearly this can be done by usingappropriate elementary row transformations of the third kind alone.

184 V MATRICES

This means that the coefficient matrix M = a** can be brought by aseries of elementary row transformations into an echelon matrix B:

10. .. 0 1014' 1 ... Rlj2-1 0

01j2+1.. 0 .......Q1 1+1

0 .............. 0 1132j2+1 ... 0 .......132 n+1

0 ............................ 0 1 13rjr+1... Or n+ 10 ......................................0

0 ......................................0

jl-th /2-th jr -thcolumn column column

J

Therefore the system of linear equations

XjI+Q111+1X11+1+.........................+131,X1 =01 n+1

X12+13212+1 X12+1+ ............... +Q21X1 =92 n+1

(E') XY1r+Rrlr+IXlr+1 + ... +jnXn

whose coefficient matrix is B, has the same solutions as the system(E). From the matrix B, we see that among the n coordinates of thesolution (X1 i . . . , Xn) of the system (E'),we can choose n-r arbitraryparameters and express the rest of the r coordinates in terms of thecoefficients Qij and these parameters. Therefore the solutions of thesystem of linear equations (E) can be given in a parametric repre-sentation:

)j is arbitrary for j Oil, ... , jr;X111 = Pi n+1 - (Qi ji+1 Aji+1 + ... + Pin),,) for i = 1, ..., r.

Finally it is emphasized that only elementary row (and nevercolumn) transformations are used in the above method of solvinglinear equations.


EXAMPLE 15.5. Let us find the solution of the system of linearequations

(E)X, - 2X2 + X3 = 12X1 + X2 + X3 = 1

5X2-X3 =-I-The coefficient matrix of the system (E) is

011 -2 1 1

A= 2 1 1 1

0 5 -1 -1which is equal to the matrix A of Example 15.4 with A, = 1, A2 = 1and A3 = -1. Therefore A can be brought into the echelon matrix.

1 -2 1 1

N= 0 1 --i-0 0 0 0

from which we see that the system (E) is solvable. According to theproceedure, our next step is to bring the second column into he form

101

0

by some appropriate elementary row transformations of the thirdkind. In this case, we have only to add to the first row two times thesecond row and get a matrix.

3 31 0

5 5

A' 0 1-1 _1

5 5

0 0 0 0

as well as a system of the linear equations

X1 +33

5X3=5(E')

1 1X2

_5X35

The solutions (A1i A2 , A3) of (E') and of (E) have therefore theparametric representation

186 V MATRICES

Al 3 _ 3 A35 5

(S) X2 =

5+

5A3

A3 is arbitrary.

To make sure that we have made no calculation mistake, we checkour result by substituting (S) into (E):

3 3A3

5 5

2 3 3 A35 5 )

-2(--L5 + 5 A3/

+ C 5 + 5A3/

5

E. Two interpretations of elementary transformations on matricesThere are two ways of interpreting an elementary row or column

transformation on matrix A. One way of doing this is to regard it as amultiplication of the matrix A by certain types of special squarematrices, the other way is to regard it as a transformation of thematrix of a linear transformation as a result of a change of base inthe domain or range linear space.

We call a square matrix of order n an elementary matrix if it is ofone of the following forms

11

E _

1

0 1

1 01

1)i-th j-th

column column

- i-th row

- j-th row


Cl

A

E' =

L 1Ji-thcolumn

A

E" =

1

j-thcolumn

- i-th row

- i-th row

where the unspecified terms are the same as those of the identitymatrix I of order n.

The first of these matrices can be obtained from the identitymatrix I. by an elementary row transformation R(i: j), the second byR(Xi) and the third by R(i + Xj). Left multiplication of a(p, q) -matrix A by elementary matrices of order p results in:

EA = R(i:j)A,E'A = R (Ai)A,E"A =R(i+Aj)A.

188 V MATRICES

Similarly the elementary matrix E can be obtained from theidentity matrix I by an elementary column transformation C(i: j),E' by C(Ai) and E" by CO + Xi). Right multiplication of a (p, q)-matrixA by elementary matrices of order q results in:

AE = C(i:j)A,AE' = C(Xi)A,AE" = CO + Xi)A.

Thus we get a one-to-one correspondence between the left (right)multiplication by elementary matrices and the elementary row(column) transformations on matrices.

Let us give the second interpretation. If 0: X -+ Y is a lineartransformation such that its matrix MBC(O) relative to a pair of fixedbases B=(x1, ...,xi, ...)x1, ..., xP)ofXandC=(yl,...,Yt,

y , .... yq) of Y is identical with the given (p, q)-matrix A,then relative to the base B' = (x 1, . . . , x1, ... , x,, ... , x p), B"

(x1, ...) Axi, ..., xi, ..., xp)andB"'_(x1, ..., xr+Axe,x1) ..., xp) of X we get

MB,C(cb) = R(i: j) A,

MB..C(¢) = R(Xi)A,

MB...c(O) = R(i + Xj)A.

Similarly, relative to the bases C' _ (yl , - ... , y1, . . . , yr, . . . , yq ),C" = (YI, ... , Ayr, ... , y1, ... , yq) and C,., = (YI, ... , yr, ... ,yi+Ayr, ..., yq)ofYweget

MBC-(O) = C(i:j)A,

MBC-.(O) = C(1i)A,MBc...(0) = C(i - Aj)A .

Thus an elementary row (column) transformation on A cor-responds to a change of base in the domain (range) linear space X(Y). The possibility of bringing a given matrix A into echelon fromcan be therefore interpreted as follows. Let 0: X - Y be a lineartransformation. Then a base B = (, ... , xp) of X and a base C of(yl , ... , yq) of Y can be found so that if r is the rank of 0, then

05Fr = yr for i = 1, ... , r

0xr = 0 for i = r + 1, ... , p.


Finally we study an effective method of fording the inverse matrixof an invertible matrix. Let A = a* * be an invertible matrix of orderp. Then the rank of A is p; therefore by a finite number ofelementary row transformations R 1, ... , Rm we can bring A intoechelon form which, in the present case, is the form of the identitymatrix Ip. Thus

Ip = Rm (Rm -1 ... (R2 (RI (A))) ...).By way of the first interpretation of elementary transformation, eachR; corresponds to the left multiplication of an elementary matrix.Therefore there exist elementary matrices E1i . . . , Em such that

Ip = EmEm -1 ... E2 EI A.

Therefore multiplying both sides of the last matrix equation on theright by A-1 , we obtain

A-1 = Em Em-I .. El Ip.

Reinterpreting left multiplication by an elementary matrix as anelementary row transformation, we get

A-' = Rm (Rm -I ... (R I (Ip )) ...) .

Therefore to obtain the inverse matrix A-' of an invertible matrixA, we apply consecutively and in the same order the row trans-formations on Ip which are performed on A to bring A into echelonform.

Let its illustrate this method by the following examples.

1 1

EXAMPLE 15.6. To take the matrix A = into the

1 02 1

0 1we apply to it consecutively the elementary trans-

formations R(2-21), R(- 12) and R(1-2). Thus

1 1 1 1 1 1 _ 1 0A= JR(2-zl

0-1R(-12

0 1

R(1-2)L 0 1

I2.2 1

190 V MATRICES

Therefore we apply to 12 consecutively the elementary trans-formations R(2-21), R(- i2) and R(1-2) to obtain A-. Thus:- 1 - -.- -1 112 = R(2- 21) R(-i2) R(1 2) =A-'.

0 1 -2

01 12-01

2 -1

EXAMPLE 15.7.-Is

19 7 6A = 11 1 2

11 1 1

1 1 1

R(3-91) 0 0 1

0-2-3

C1 0 2>

R(1-2) 0 1 i0 0 1

Therefore

1 0 0

13 0 1 00 0 1

0 0 1

R(3-91) 0 1-1

1 0-9-10

_ 1 1 1

R(1:3) 1 1 2

9 7 6

111R(2:3) 0-2-3

0 0 1

1 0 0RR(1 0 1a

001

0 0 1R(1:3) 0 1 0

100

> 0 0 1R(2:3) 1 0-9

0 1-1

1 1 1

R(-? 2) 0 1 2

001

R(2--13)12

0 0 1R(2-1) 0 1-1

L100

r- 1*1

0 0 1X-1 -2 0 2

0 1-1.10


R(1-2)

=A-1 .

2

2

202

Lo 1-i j

1 1-42 2

2020 1-1

1 1-42 2

R(2-23) ? 26

0 1-1

F.

1.

Exercises

Find the rank of the following matrices.

10 1 1 -1 2) rl -1 2 1 0(a)

0 2 -2 -2 0(b)

2-2 4 -2 0

0 -1 -1 1 1 3 0 6 -1 1

Li 1 0 1 -1 0 3 0 0 1

1

14 12 6 8 2 I1 0 1 0 0

6 104 21 9 17 1 1 0 0 0

(c) 7 .6 3 4 1(d) 0 1 1 0 0

35 30 15 20 5 J 0 0 1 1 0

Lo1 0 1 1

2. Let M and N be square matrices of order n. Show the inequalitiesof Sylvester:

r (M) + r(N) - n < r (MN) < min [ r (M), r(N) l .

3. If I = 10, O, find all (2,2)-matrices X such that

X2 -4X + 31 = 0.

4. Solve2X1-X2 +3X3 = 43X1 - 5X2 + X3 = 36X1 - 8X2 + 4X3 = 2

192

5. Solve

V MATRICES

X1 + 2X2 + 2X3 = 32X1 + 3X2 + 5X3 = 74X1 + 9X2 + 6X3 =10

6. Solve

X1+2X2+ +X4= 02X1 + 3X2 - 7X3 = 0

-X1 + 4X3 - 2X4 = -27. Solve

4f3yX 1 + ayX2 - 3af3X3 = 0

2$3yX1 + 2ayX2 + aPX3 = 4y$3yX1 - 8ayX2 - af3X3 = 8a$3y

where a(3y * 0.

8. Solve

aX2 + X3 = 22X1 + 5X2 = 1

-2X1 + X2 + j3X3 = 3

and discuss the solutions in relation to the values of a and 3.

9. Solve

(a+ 1)X1 + X2 + X3 = a2 + 3aX1 + (a+1)X2 + X3 = a3 + 3a2X1 + X2 + (a+l)X3 = a4 + 3a3

and discuss the solutions in relation to the value of a.

10. The solutions of

Ot1 I X1 + ... + a,, Xn = a1 n+ I(i)

amlXl + ... + amnXn = am n+1


form a linear variety L in the n-dimensional affine spaceAff(A"), and the solutions of

a11X1 + .. . + GYInXn - a1 n+1Xn+1 = 0(ii} .....................................

am1X1 + ... + amnXn - am n+1Xn+1 = 0

form a linear variety M in the n-dimensional projective spacePn (A). Discuss the relation between L and M.

11. Find a necessary and sufficient condition for the points(«I , R1, 71 ), (a2 , R2 , 72 ), , (an , an , 'rn) of Aff(R3) to lieon a line and find a necessary and sufficient condition for themto lie on a plane.

12. Let M be a real square matrix of order n. Suppose r(M) = I.Prove that

(a) I a1 1

M= (61, . . . , /3n) forsome a;,(31eR

a")

(b) M2 =AM foraXeR.

13. Find the inverse of the following matrices.

( a ) ys

where a S- yQ 0.

(c)1 2 0 0

3 7 2 3

2 5 1 2

194 V MATRICES

(d)

(e)

114

2 1 0 0 0

0 2 1 0 0

0 0 2 1 0

0 0 0 2 1

0 0 0 0 2

IN

0 al

0 as

Lan 01

where ar. 0 0 for i = 1, ... , n and all unspecified entries are zero.

14. Find the inverse of

1 1

0 1 1

0 0 1 1

0 0 0 1

15. Let M =0 A

. Find M-' in terms of A-1 and B-'B 0

.


16. A square matrix M is said to be nilpotent if there exists a positiveinteger r such that Mr = 0. Show that if M is nilpotent, then1-M is invertible and

(1-M)-' = 1 +M+M2 +M3 +...Apply this result to find the inverse of the matrix

rl 2 4 6 810 1 2 4 6

0 0 1 2 4

0 0 0 1 2

0 0 0 0 1

CHAPTER VI MULTILINEAR FORMS

Linear transformations studied in Chapter II are, by definition,vector-valued functions of one vector variable satisfying a certainalgebraic requirement called linearity. When we try to impose similarconditions on vector-valued functions of two (or more) vectorvariables, two different points of view are open to us. To bemore precise, let us consider a mapping 0: X x Y - Z where X, Y andZ are all linear spaces over the same A. Now the domain X x Y can beeither (i) regarded as the cartesian product of linear spaces and thusas a linear space in its own right or (ii) taken just as the cartesianproduct of the underlying sets of the linear spaces. If we take thefirst point of view then we are treating a pair of vectors xEX and yEYas one vector (x, y) of the linear space X x Y; therefore we aretreating, 0: X x Y -> Z essentially as a mapping of one linear spaceinto another and in this case linearity is the natural requirement. As alinear transformation ¢ can then be studied with the aid of thecanonical injections and projections of the product space X x Y aswell as other results of Chapter II. If we take the second point ofview and if at the same time we take into consideration the algebraicstructures of X, Y and Z separately, then it is reasonable to imposebilinearity on 0, i.e. to require 0 to be linear in each of its twoarguments. Such a. mapping is called a bilinear mapping. We have, infact, encountered one such mapping before in §6A and §8Awhen we studied composition of linear transformationsHom (A, B) x Hom (B, C) - Hom (A, Q. The most interesting andimportant examples of these mappings are the bilinear forms on alinear space X, i.e. bilinear mappings X x X -> A (where A is the1-dimensional arithmetical linear space fl. The natural generalizationof bilinear mapping and of bilinear form are n-linear mapping andn-linear form which are also called multilinear mapping and multilinearform respectively. The study of multilinear mappings constitutes anextensive and important branch of mathematics called multilinearalgebra. In this course we shall only touch upon general properties ofmultilinear mappings on a linear space (§ 16) and go into considerabledetail with determinant functions (§ 17) and much later in § 21 andin § 24 we shall study inner product in real linear spaces and hermitian

196

§16 GENERAL PROPERTIES OF MULTILINEAR MAPPINGS 197

product in complex linear spaces which are important types ofbilinear forms.

§ 16 General Properties of Multilinear Mappings

A. Bilinear mappings

We begin our discussion by laying down a formal definition ofbilinear mapping on a linear space.

DEFINITION 16.1. Let X and Z be linear spaces over the same A. Abilinear mapping on X with values in Z is a mapping 0: X x X -* Zsuch that

O(A1x1 +X2x2 ,x)=X10(X1,X)+X20(X2,X)O(x,A1x1 + A2x2) = Al (x,x1)+A20(x,x2)

for all x, x1i x2 eX and X,, A2 EA. We call 0 a bilinear form on X ifits range Z is identical with the arithmetical linear space A.

A bilinear mapping 0 on X is said to be symmetric if O(x1,x2) _¢(x2 , x1) for all x1 , x2EX and it is said to be skew-symmetric (orantisymmetric) if l (X 1 , x2) = -O(x2, x 1) for all x1 , x2FX.

EXAMPLE 16.2. For each pair of vectors x = (A1, A2 , A3) andy = (µ1,µ2, µ3) of the real 3-dimensional arithmetical linear space R3 ,we define their exterior product (or vector product) as the vector

xAY=(A2p3 -A3µ2,A3µ1 -A1p3,A1µ2 -A2µ1)

of R3. The mapping 0: R3 x R3 -> R3 defined by ¢(x,y) = x A y isthen a skew-symmetric bilinear mapping on X. Moreover the exteriorproduct satisfies JACOBI'S identity:

(xAy)Az+(yAz)Ax+(zAx)Ay=0.

EXAMPLES 16.3. For each pair of vectors x = (A1i A2 , ... , An) andY = (µ1 , 1-12, ... , µ,) of the real arithmetical linear space R", wedefine their inner product (or scalar product) as the scalar

WY) = Al µI + A2µ2 + + An An

of A. The mapping 0: R" x R" - R defined by 0(xy) = (xly) is asymmetric bilinear form on R. For n = 2, this is

(xly) =AIM + A2µ2

198 VI MULTILINEAR FORMS

which is a familiar formula in coordinate geometry.On A2, a bilinear form is defined by

'P(x,Y) = A1µ2 - A2µ1 =I

A1

Azlµ1 µ2

for x = (A 1 i A2) and y = (p 1 , µ2 ). It is easily seen that 0 is a skew-symmetric bilinear form.

EXAMPLE 16.4. Consider the real linear space V2 of all vectors onthe ordinary plane with common initial point at the origin 0. Theinner product of any two vectors a = (0, P) and b = (0, Q) of V2 isdefined as the real number

(alb) = pq cos 0

where p is the length of the segment OF, q is the length of thesegment OQ and 0 = 4POQ. The value (alb) is also the product of qand the length of the perpendicular projection of the segment OP onthe line passing through 0 and Q. Therefore the mapping 0: V2 X V2 -> Rdefined by O(a, b) = (a I b) is linear in the first argument; bysymmetry, it is also linear in the second argument. Hence 0 is asymmetric bilinear form. If we choose a suitable cartesian co-ordinate system in the plane with the origin at 0, and if P and Qhave coordinates (xl , y1) and (x2, y2) respectively, then we obtainfrom the well-known cosine law the equation

(alb) = XI X2 + Y1Y2

EXAMPLE 16.5. The inner product of vectors of the complexn-dimensional arithmetical linear space C" is defined as follows. Wedenote, as usual, by l the complex conjugate of a complex numberA; we then define the complex number

O(x,Y) = A1µ1 + A2µ2 + ... + A"µ"

as the inner product of the vectors x = (XI, A2 , ... , A") andY = (µ 1, 92, .. , of C. The mapping 0: C" x C" -> C fail to be abilinear form since it is not linear in the second argument. Howeverthe equations

(A1x1 + A2x21Y) = A1(x11'.IY) + A2(x21Y)

(xIA1Y1 + A2Y2) = A1(xly1) + A2(x(y2)

hold. For this reason, we called 0 a sesquilinear (i.e. 11/2-linear) form.


On the non-empty set B(X, Z) of all bilinear mappings 0: X x X -- Z,addition and multiplication are defined in the natural way asfollows.

(01 + 02) (x,Y) = 01 (x,Y) + 02 (x,Y)(A4S) (x,Y) = X (x,Y)

With respect to these composition laws, B(X, Z) is a linear space overA.

For a finite dimensional linear space X there is a o"e-to-onecorrespondence between bilinear forms on X and matrices ol a certainsize. Let (x1i ... , x,n) be a base of X. Then every bilinear form0 on X determines a unique (n, n)-matrix a** over A, whoseterms are

a;l = O(x1, xl) (i,l = 1, ... , n)

such that if x = A1x1 + . + Anx and y = 1A1z1 + ... + µn then4(xaY) = E EAiµl¢(xt,xl) = EAfµ/afl

This equation cant be written conveniently as a matrix equation:

( L11......... a1n- (µ1

Xx,Y)_(A1, ..., An)

ant aJ µn_)Conversely, every (n,n)-matrix a** over A defines uniquely a bilinearform 0 that satisfies the equation above. We have therefore provedthe following theorem.

THEOREM 16.6. If X is an n-dimensional linear space over A withbase (x1, ... , xn ), and a** is any (n, n)-matrix over A, then thereis one and only one bilinear form 0 on X such that O(x1, xl) = ail.

We can therefore call a** the matrix of the bilinear form 0 relativeto the bases (x1, ... , xn) of X. It is clear that addition andscalar multiplication of bilinear forms correspond respectively toaddition and scalar multiplication of matrices. From this it followsthat dimB(X, A) = (dim X)2.

It is also clear that if 0 is a symmetric bilinear form on X, thenthe matrix of 0 relative to any base B of X is a symmetric matrix.


Conversely if A = as * is a symmetric matrix and B = (x 1i ... , x )is a base of X, then the bilinear form defined by

Cal I ......... alnn 1µl1

O(x,Y)=(X1,...,Xn)

1 pn1 ......... ann ) µn )

for x = Xlxl + ... + An xn and y = µ1x1 i ... , + µn xn is a symmetricbilinear form. Indeed if we denote by L and M the (1, n)-matrices(X1 , ... , X,) and (µl , ... , µn) respectively, then

o(x,y) = L A Mt.Then it follows from the definition that

q(y, x) = M A L'.

Since L A Mt = (L A Mt)t = (Mt)t At Lt = MALt, O(x, y) = 0(y, x);hence 0 is symmetric.

Analogous to the situation in § 14C, a change of base in thelinear space X will give rise to a change of the matrix of abilinear form. Let B = (xl, ... , xn) and B' = (x'1 i ... , x',) bebases of the linear space X and G =,y** be the matrix of the changefrom B to B', thus:

x'i=ytlxl+ ... +'Yinx, i = 1, .. ., n.If a,1=O(x1,x1) anda11 =O(x;,x,)for i = 1,...,nand

n, then it follows from bilinearity that

a'1, = tkyilakl.k,I

Therefore, for the matrix A = a** of the bilinear form 0 relative to Band the matrix A' = a'* * of the same bilinear form 0 relative toB', we have the following rule of transformation:

A' = GAGt.

We note that the transformation of the matrix of a bilinear formarising from a change of bases is quite different from the way thematrix of a linear transformation changes in a corresponding situa-tion. The interesting case where they do coincide is treated in § 22E.


B. Quadratic forms

Let 0 be a bilinear form on X. Then the restriction of 0 on thediagonal {(x, x): xEX} of the product X x X defines in a natural waya function F of X into A. If 0 is further assumed to be symmetric,then the bilinear form 0 itself can be recovered from the function Fby the properties of F inherited from the bilinearity of 0. Take, foiexample, 0 to be the bilinear form defined by the inner product ofvectors of An as given in Example 16.3. In this case

0(X' Y) = (x jy) _ X vu 1 + ... + Xnµn

for the vectors x = (XI, ... , X,) and y = (µI , ... , p,) of A".Then F: A" - A is given by

F(x)=(xix)= Xi + ... +Anand we easily verify that

cb(x, y) = 2 { F(x + y) - F(x) - F(y) } .

DEFINITION 16.7. Let X be a finite-dimensional linear space over Aand let 0 be a symmetric bilinear form on X. By the quadratic formdetermined by 0, we mean the function F: X -+ A such that F(x) _4(x, x).

It follows from the definition that for a quadratic form F on X

F(x) = F(-x) for every xeX.A simple calculation using bilinearity shows that 0 can be

recovered from F through the following formula:

O(x, y) = 2 {F(x + y) - F(x) - F(y) } .

These two properties of quadratic forms turn out to be propertiesthat can be used to define quadratic form intrinsically. Moreprecisely we have the following theorem.

THEOREM 16.8. Let X be a finite dimensional linear space over A.Let G: X - A be a function such that

(a) G(x) = G(-x) for every xeX, and(b) the function 1y: X x X -- A defined by

(x, y) =2

{G(x + y) - G(x) - G(y) }is a bilinear form on X.

Then 0 is symmetric and G is the quadratic form determined by >G.


PROOF. It is immediate from the definition that Vi is symmetric. Toshow that G(x) = qi(x, x) it is enough to verify that G(2x) = 4G(x).By substituting 0 = x = y in (b), we get G(0) = 0. For arbitrary t, uand v of X, it follows from

iy(t,u + v) = (t, u) + ,y(t, v)

and the property (b) of G that

G(t+ u + v) - G(t) - G(u + v)= G (t + u) - G(t) - G(u) + G(t + v) - G(t) - G(v)

or G(t+u+v)-G(t+u)-G(u+v)-G(t+v)+ G(t) + G(u) + G(v) = 0.

Substituting t = u = x, v = -x in the last equation and taking thecondition (a) into consideration, we obtain 4G(x) - G(2x) = 0. Theproof is now complete.

If A = x** is the (symmetric) matrix of the symmetric bilinearform 0 relative to the base B = (x1, ... , x,,) of X and if F isthe quadratic form determined by 0, then by a straightforwardcalculation we obtain

tall aln I I X,

F(x)=Ea;,X1X1=(X1,... Xn

'Inand annJ \ Jfor every vector of x = XI xj + ... +)\n xn. Therefore we also say: Ais the matrix of the quadratic form F. Thus the value of a quadraticform F at a vector x of X is given by a homogeneous quadratic (i.e.of degree 2) expression in the coordinates X1, ... , X,i of x.

C. Multilinear forms

The definition 16.1 of bilinear form is easily generalized into thefollowing definition.

DEFINITION 16.9. Let X be a linear space over A. A mapping of then-fold cartesian product X x ... x X (n times) into A is called an-linear (or simply multilinear) form on X if for all i = 1, ... , n and


all vectors aj E X (j * i) the mapping of X into A defined byx -- 0(a1, ... , ai_ 1) x, ai+ 1, ... , a,) is a linear form on X.

In other words, 0 is a multilinear form iff it satisfies the followingconditions: for i = 1, . . . , n

0(x1,...)Xi-l,xi+x i,xi+1,...,xn=O(xl,..., xi-l,xi,xi+1) ..., Xn

i,xi+1, ..., xn ;

and 0(x1, ... ,x1_1,Axi,xi+1) ..., xn)=AO(xl, ..., xi-1, xi,xi+1, ..., xn).

EXAMPLE 16.10. For any three vectors x = (A1, A2, A3), y = (111 , 112, 113 )and z = (v1, v2 , v3) of the real arithmetical linear space R3 , wedefine their mixed product as the real number

(xlyl z) _ (xiyAz),where y A z and (x It) are formed according to Example 16.2 and16.3. A trilinear (i.e., 3-linear) form is then defined on R3 in termsof the mixed product. It follows from the symmetry of innerproduct and the skew-symmetry of the exterior product that

(xIYIz) = (ylzlx) = (zlx1Y) = -(xlzly) = - (ylxlz) = -(zlylx).Explicit calculation shows that (xlylz) has the same value as the 3 x 3determinant

Al

Al

A2

112

A3

113

V1 V2 V3

EXAMPLE 16.11. Let X be a linear space over A. If for eachi = 1, . . . , n, ui is a linear form of X, then we define their tensorproduct as the n-linear form ul ® ... ® un: X x . . . x X -> Asuch that

(ul ® ... ®un) (xl, ... , xn) = u1(x1) ... un(xn)

for all x,EX (i = 1, ... , n). More generally if 0 is a p-linear formand 0 is a q-linear form on X, then we define their tensor product$ Q0 as the (p + q)-linear form such that

xp,xp+1, ...,xp+q)=0(x1,...,xp)0(xp+1, ...,Xp+q).


The tensor product of multilinear forms is associative, thus:

(000)©t = 0©( © ),but in general it is not commutative. Take for instance two linearforms f and g of a linear space X, then

f©g(x,Y) =f(x)g(Y)g®f(x,Y) = g(x)f (Y)

These two expressions need not be identical.

D. Exercises

1. Show that if B is a bilinear form and p is an endomorphism of alinear space X, then C defined by

C(x, y) = B for all x, yEX

is a bilinear form of X.

2. Let X be a linear space and B a symmetric bilinear form of X.Denote by N the set of all z E X, such that B (z, y) = 0 for everyy e X. N is called the radical of B.

(a) Show that N is a subspace of X.

(b) For every pair of elements [x], [y] of XIN defineC([x 1, [y ]) = B (x,y). Show that C is a bilinear form of X/N.

(c) Show that the radical of C is 0.

3. Denote vectors of R2 by (x, y). Show that the mapping Fdefined by

F(x,y) = 2x2 + 3xy + y2 for all (x,y)ER2

is a quadratic form on R2. Find the bilinear form p whichdetermines F. Find also the matrix of p relative to the canonicalbase.

4. Let X be a finite-dimensional linear space over A and let p be abilinear form on X.

(a) If xGX, denote by cpx : X -* A the mapping defined by

px(y) _ ,p(x,y) for all yEX.

Show that'px EX *.


(b) Prove that the mapping F: X - X* defined byF(x) _ px for all xEX

is a linear transformation.

(c) Show that F is an isomorphism if and only if the matrix ofpp relative to a base of X is invertible. In this case both pand F are said to be non-degenerate.

5. Let X be a three-dimensional linear space. Let tp be a symmetricbilinear form on X which determines a non-degenerate quadra-tic form (gyp itself is said to be non-degenerate in this case). Showthat if >y is a endomorphism of X such that

pp(Jx, iyy) = p(x,y) for all

then the matrix A of p relative to any base of X has the propertythat its determinant det A = + 1.

6. Let X be a finite-dimensional linear space. Show that everybilinear form is a sum of a symmetric and a skew-symmetricbilinear form, and that these latter are uniquely determined.

7. Let X be a finite-dimensional linear space. Show that if thedimension of X is odd, then there does not exist a non-degenerate skew-symmetric bilinear form X.

8. A quadratic form F on a real linear space X is said to be positivedefinite if F(x) > 0 for all x * 0 of X. If F and G are quadraticforms on X, then we define F < G to mean that G-F is positivedefinite. Show that if F < G and G < H, then F <H.

9. Let X be a finite-dimensional linear space and let p be a sym-metric bilinear form on X. A base B = (x1, . . . , of X is saidto be orthonormal (relative to ip) if for each i, j = 1, 2, ... , nwe have

(i) p(xj,x1)=0fori:jand(ii) cp(x,,xi) = 1 or p(x1,xi) _ -1 or p(xi,xi) = 0.

(a) Show that X has an orthonormal base.

(b) If X0 is the subspace of X consisting of all vectors xEXsuch that 4p(x, y) = 0 for all yEX, then dim X. is preciselythe number of vectors xi in any orthonormal base of Xsuch that p(x1, xi) = 0 (dim X. is called the index of nul-lity of gyp).

206 Vi MULTILINEAR FORMS

(c) There exists an integer r > 0 such that for every ortho-normal base (x 1, ... , of X, there are precisely rvectors among xi such that p(xi, xi) = I. (r is called theindex of positivity of gyp).

10. Let X be a linear space and let f, g and h be linear forms on X.(a) Define f®g®h by

f®g®h (x,y,z) = f(x)g(y)h(z) for all z, y,zeX.Show that f ®g®h is a trilinear form of X.

(b) DefinefAgAhby

fngnh= f®g®h +g®h®f+ h®f®g- f®h®g-g®f®h-h®g®f .

Show that f n g n h is a trilinear form on X.11. Let f, g, h be linear functions on a linear space X.

(a) Show that for x, y, z EX,

(fAgnh)(x,y,z) _f(x) f(y) f(z)g(x) g(y) g(z)

h(x) h(y) h(z)

(b) Let (x1 , ... , be a base of X and denote by 'pijk thevalue (fngnh) (xi, X j, xk) for i, j, k = 1, ... , n. Showthat if x = Eaixi, y = E0iyi and z = 1,yiyi, than

(fAgAh)(x,y,z) __ Sp

I <j <k ilk

§ 17 Determinants

ai

Pi

7i

Determinants provide an effective (though not always veryefficient) computational tool for various purposes; in particular theyare useful to have for determining when vectors are linear inde-

§17 DETERMINANTS 207

pendent or when an endomorphism is an automorphism. They onlyplay an auxiliary role in the subsequent chapters. Thus the readerwho does not wish to see the theory, and knows how to computedeterminants can omit this section, or read only the statementsconcerning properties of determinants.

A. Determinants of order 3We have seen in Example 16.10 that the determinant

all a12 a13

a21 a22 a23 all a22 a33 + a12 a23 a31 + a13 a21 a32

a a a - all a23 a32 - a12 a21 a33 - a13 a22 a3131 32 33

of order 3 is the value (a11 az1 a3) of a trilinear form on A3 atthe vectors a1 = (all, a12, a13), a2 = (a21, a22, a23) anda3 =(a31, a32, a33)

If A = a** is a square matrix of order n, it would be possible todefine its determinant IA I or det A by a sum of products of the termsaii. However, this is obviously a rather tedious task; moreover tostart from such a complicated definition it would be difficult tostudy properties of determinants. Instead we shall try to find outsome characteristic properties of the trilinear form (alla2la3) anddefine the determinant of a square matrix by correspondingproperties. Besides being

(i) a non-zero 3-linear form on a 3-dimensional linear space, theform (alla2l%) further enjoys the following properties:

(ii) (a11a21a3)=0ifa,=a1 fori*j,and(iii) (e11 eel e3) = 1 for the canonical base (e 1, e2, e3) of A3 .

We now claim that the properties (i), (ii) and (iii) determine(alla2la3) completely. To see this, we first show that from (i) and(ii) it follows that (alla2l%) = -(a21a11a3). Indeed, from (ii) we have(a1 + a21 a1 + a21 a3) = 0 and from (i) we have

(a11a11a3) + (a11a21a3) + (a2la11a3) + (a2la21a3) = 0.

Since (a1la11a3) = (a21a2la3) = 0 we get (alla2l%) = -(a21a1la3)Similar argument shows that

(iv) (alla2l%) = -(a2Ia11a3); (a11a21a3) = -(a11a3la2);(a11a21a3) = -(a31a21a1); i.e. the value (alla2l%) is multipliedby -1 if two arguments a; and al interchange their positions.


We can now show that (i) - (iii) imply that

(a11a21a2) =

all a12 a13

a22 a23a21

a31 a32 a33

for al =--(all, a1 2, a,13),% = (a21, a22, a23) and a3 = (a31, a32, a33).Making use of property (i) we get

(a11a21a2) _ (EalietIEa2jejI ka3kek)

= Ealia2ia3k (eil ejI ek)where the last summation is taken over all i,j,k = 1,2,3.Using property (i), we see that of the 27(i.e. 33) summands in lastsum only 6 (i.e. 3!) of them, in which the indices i,j,k are distinct,can be different from zero. Therefore

(a11a21a3)= all a22 a33 (e11e21e3)+a12 a23 a31 (e21e3Ie1)

+ a13 a21 a32 (e31 e11 e2) + all a23 a32 (e1Ie31e2)

+a12 a21 a33 (e21e11e3)+a13 a22 all (e31e21e1)

Rewriting the factors (eil ejI ek) as i (el 1e21 e3) according to the rulegiven in (iv) and applying (iii), we finally get

all a12 a13

(a11a21a3) = a21 a22 a23

a3l a32 a33

From the proof above we also see that the properties (i) and (ii)essentially determine the determinant while the property (iii) can beregarded as a "normalization" requirement. While (iii) can only beformulated through the arithmetical linear space, (i) and (ii) can beadapted to a more general setting. Following the principle of thiscourse to keep the theory coordinate-free as long as possible, at thisstage we shall only make use of (i) and (ii) and introduce thefollowing definition.

DEFINITION 17.1. Let X be a linear space over A of dimension n(n > 1).A determinant function on X is a non-zero n-linear form A such thatA(xl, ... , 0 whenever xi = xi for some i 0 j.


The method used in the proof above will help us prove the mainresult on determinant functions: determinant functions exist andtwo determinant functions on one and the same linear space differonly by a non-zero scalar (see 17.5). By virtue of this resultand with the aid of certain "normalization" requirements we arriveat a viable definition of determinant of endomorphisms and ofsquare matrices. As we have seen in the example of (a 1 1 a21 a3) someworking knowledge on permutation (see § 17B) will be necessary forformulating properties similar to (iv) above. To conclude the sectionwe prove a property of determinant functions which is independentof permutation.

THEOREM 17.2. If 0 is a determinant function on a linear space X,then O(x1, ... , xn) = 0 for every linearly dependent family(x1i ... , xn) of vectors of X.

PROOF. It follows immediately from the linear dependence of(x1 i ... , xn) that an index i exists such that

Xi = Xi+1Xi+1 +. ... + XnXn

(Notice that if i = n, then the right hand side of the equation is thezero vector of X.) By n-linearly, we obtain

O(xi, ...,x,)= Xj+10(x1, ..., xi-1, xi+l,xi+1, ...,Xn)+ Xi+2 0(x1, ... , xi-1,xi+2,Xi+1i ... , xn)....................................

+ Xn O(x1, ...,xi-i,xn,xi+1, ...,Xn).Therefore ¢(x1, ... , xn) = 0, since each summand at the right handside is zero.

B. PermutationsLet Z,, = 11, 2, ... , n }. A permutation of the set Zn is a bijective

mapping of Zn onto itself. The composite roo of any two per-mutations a and r of Zn under the usual composition law of map-pings is again a permutation of Z,. The algebraic system (Sn, o),where Sn denotes the set of all permutations of Zn, satisfies theaxioms of a group; this group is called the symmetric group of degreen and is denoted by Sn. It follows from the theory of elementarycombinatorics that Sn has exactly n! elements. We remark here thatin the definition of the symmetric group Sn, only the number n ofelements of Zn is essential, whereas the nature of the elements of Znthemselves is inessential.


Any permutation a of Z, can be represented conveniently by thefollowing notation:

1 2 3 ... n-1 n)a a(2) a(3) a(n-1) a(n)

For example the cyclic permutation y, defined by y(i) = i+l fori = 1, ... , n-1 and y(n) = 1, is written as

y = rl 2 3 .... n-1 n)Q 3 4 .... n 1)

For the permutations a and T of Z4

1 2 3 4 and 1 2 3

2 3 1 4 3 4 2 1

for example, their two composites can be calculated by

1 2 3 4 1 2 3 4a 1 1 1 1 T 1 1 1 1

2 3 1 4 3 4 2 1

1 1 1 1 a 1 1 1 14 2 3 1 1 4 3 2

and therefore

TOO =1 2 3 4 and aor= 1 2 3 44 2 3 1 1 4 3 2

We are now going to classify the permutations of Zn into even andodd permutations. Consider the polynomial

P = fl (Xi - Xk)1GE<k<n

in the indeterminates X1, ... , Xn with integral coefficients. Forevery a polynomial

up= 7 (Xc(i)- Xa(k)).1<i<k<n.

Clearly each linear factor (Xi - Xk) of the polynomial P appearsexactly once with the same or the opposite sign as a linear factor ofthe polynomial aP; and vice versa. Therefore for every aESn,

aP = ±P.

§ 17 DETERMINANTS 21 1

We now define the sign of a permutation a of Z,, to be denoted bysgn(a), by the equation

aP = sgn(o)P;

and according to sgn(a) = 1 or sgn(a) = - 1, we say that the per-mutation a is an even or an odd permutation.

For any pair of permutations a and r of Z , we have(roa)P = T(aP)

and thereforesgn(Toa) = sgn(r)sgn(a).

In particular, Toa is an even permutation if and only if a and r havethe same sign, i.e., they are both even permutations or both oddpermutations.

Any permutation r of Z that leaves invariant exactly n-2 ele-ments is called a transposition of Z . For a transposition r, there areexactly two elements r and s of Z such that

(i) r(i)=i for all i Or and i *s.(ii) r(r) = s and T(s) = r.

Conversely given any pair of distinct elements r and s of Z , we haveexactly one transposition r satisfying (i) and (ii) above. Therefore wecan write r = (r:s). Clearly for a transposition r, r = T-1. The mainproperties of transpositions which will be useful later are formulatedin the following theorem.

THEOREM 17.3. (a) Every transposition is an odd permutation and(b) every even (odd) permutation is a composition of an even (odd)number of transpositions.

PROOF. (a) Let r = (r:s). We may assume that 1 < r < s < n andpartition the set Z into mutually disjoint subsets:

Z =LU{r}UMU{s}UNwhere L ={i: 1 <i<r},M={i:r<i<s} andN={i:s<i<n}. Asa result of this partition, we can factorize the polynomial

P = TT (Xi - Xk)1<i<k<n


as the product of all factors of the form(i) (X, - Xk) where i <k and i, k E L UM UN

(ii) (X, - X,) (X, - XS) where i EL

(iii) (X, - Xk) (XS - Xk) where k EN

(iv)

(v)

(X, - X1)(X1 - XS)

(Xr - XS).

wherejEM

Now factors of the first four types are left invariant by T, while(XT(r) - XT(S) -(X, - XS). Therefore TP = -P and 7 is an oddpermutation.

(b) Since sgn(Toa) = sgn(r)sgn(a), we need only prove that everypermutation of Z is a composite of transpositions of Z . If a is apermutation that leaves more than n-2 elements of Z invariant, thena is the identity mapping which, is the composite of any twoidentical transpositions. If a is a permutation that leaves p elementsof Z invariant and displaces the element r, i.e., a(r) * r, then a alsodisplaces a(r) and a' _ (a(r): r)oa is a permutation that leaves rinvariant and does not displace any element of Z that is left in-variant by a. Therefore a' leaves invariant at least p+l elements and a= (a(r): r)oa'. Further factorization of a' leads to the desired result.

C. Determinant functionsWe are now in a position to study determinant functions in greater

detail. The property of determinant functions similar to (iv) at thebeginning of § 17A can be formulated as follows.THEOREM 17.4. Every determinant function A is antisymmetric, i.e.

A(xi, ..., Xn)=sgn (a)A(xa(1), ...,for every aESn and every x,EX (i = 1, ... ,n).

PROOF. It follows from 17.3 that it is sufficient to show that

A(xi, ... , xn) +A(xr(1), ... , XT(n)) = 0

for every transposition rrSn . Let r = (i : j) where 1 < i < j < n. Then

0(xi, ..., xn)+A(xT(i), ..., XT(n))=A(X1,..., x,, ..., xi, ..., xn)+A(xi, ..., x,, ...,x,, ...,xn)+ A(xi, ... , x,, ... , xi) ... xn) + A(xi, ... xi, .. . , xi, ... , NO=A(xi, .. ., x;+xj, ..., x;+xY, ..., xn)=0.

§ 17 DETERMINANTS 213

We have seen in 17.2 that A(x1 , ... , xn) = 0 if x, , ... ,x, arelinearly dependent vectors of X. In the following we show that A iscompletely determined by its values at a base (y1, ... , yn) of X.Indeed for any n vectors xi = aj1yl + - . + ajnyn U = 1, ... , n) ofX, we get, by the n-linearity of A,0(x,, ...,xn)= E{ali1a(Yi,,x2, ...,xn):i1,EZn}

= E{ali1a2;2 A(Yi1,Yi2,x3, ..., xn):i1,i2(-=Zn}

F{a,i1 ... anin0(Yi1,...,Yin):r1,12,...,inEZn}Since A is a determinant function, in the last multiple summation above,we need only consider those summands a,i, . anin A(yil, .. - , Yin)where (i1, ... , in) is a permutation of (1, 2, ... , n). Thereforewe obtain

A(x,, ... , xn)s

,0(1) ... ana(n)(Ya(1), ... ,YO(n)).n

Now A is antisymmetric; therefore we can further simplify the lastequation and get

l.,A(X,, ... , Xn) _ sgn(a)al0(1) ... anU(n) A(Y1, ... ,Yn),oESn

proving our contention.Since a determinant function is, by definition, a non-zero

function, it follows from the last result that if A is a determinantfunction on X and if (y1, ... , yn) is a base, then A(y1, ... , Yn)# 0. Consequently if A, and A2 are two determinant functions on X,then A, = X02 where X = A, (y1, ... , Yn )/02 (Yi , - , Yn) for anybase (Y1, . , Yn) of X.

It now remains to show that for every n-dimensional linear spaceX determinant functions exist.

Let 0 be an n-linear form on X and let us denote by a(x) thefamily (xa(,), ... , XU(,)) for any family x = (x1, .. , xn) ofvectors of X and any The mapping ;T Xn - A, called theantisymmetrization of ¢ and defined by

fi(x) =Q2; sgn(a)0(a(x))

is clearly an n-linear form of X since each summand on the righthand side is such. If xi = x, for 1 < i < j < n, then the permutationT = (i: j) has the property that

T(x) = 7;-1 (X) = x


and aor(x) = a(x) for all aES,,. Pair off each aE 1& with ao1ESn andwrite consequently fi(x) as a summation of all matching summands ofthe form

sgn(a)¢(a(x)) + sgn(aor)b(aor(x)) _ (sgn(a) + sgn(uor))0(a(x))Therefore f (x) = 0 since sgn(a) = - sgn (oor). The existence ofdeterminant function on X will follow if we can exhibit an n-linearform such that its antisymmetrization is non-zero. Let(y1,. , yn) be a base of X and (g1 , ... , gn) the base of X* dualto(y1,..., y,),i.e.

gi(yj) = sij 1,1= 1, .. , n.

Define 0 to be the tensor product g1® ® g, , i.e.

Xx 1 , ... , xn) = 81(x1) ... gn (xn) for all

Then the antisymmetrization 45 of 0 is a determinant function on Xsince

Ryl , ... , y,) = E sgn(a)g1(ya(1)) ... gn(ya(n))= E sgn(o)S10(1) ... Sno(n)= 1.

We summarize our results in the following theorem.

THEOREM 17.5. For any n-dimensional linear space X over A (n > 1)there exist non-zero determinant functions. If A. is a non-zero deter-minant function of X then A = X4 (A * 0) for all determinantfunctions A of X. Furthermore A0(x1 i ... , xn) 0 0 if and only ifx1, ... , xn are linearly independent.

D. Determinants

Let X be an n-dimensional linear space and 0: X -> X an endomor-phism. If A is a determinant function of X, then A': Xn -+ A,defined by

A'(xl , ... , Xn) = 00(x1), ... , O(X,))

for all xi eX is clearly again a determinant function of X. In view of17.5, there exists a unique scalar A that depends on 0 and A such thatA' = XA, i.e.,

O(xn )) = Xl&(x 1 , ... , Xn )


for all x;EX. We claim that the dependence of X on A is onlyapparent. Let H be any determinant function of X. Then weget a non-zero scalar p such that H = p A and therefore

H0(x1), ... , ¢(x, )) = AA(0(x1), ... , O(xn

= pXA(x1, ..., x,)_ XH(x1, ..., xn).To summarize, we have

THEOREM and DEFINITION 17.6. Let X be an n-dimensional linearspace over A (n > 1). Then for any endomorphism 0 of X, thereexists a unique scalar det(o) of A such that

00(x1), ... , O(xn )) = det($)A(x1, ... , x,)for any x;EX and for any determinant function A of X. This scalardet(o) is called the determinant of ¢.

Thus we have defined determinant as a scalar-valued function ofendomorphisms; classically it is defined as a scalar-valued function ofmatrices. Let (x 1, ... , xn) be a base of X, A = a** a square matrixof order n. If 0 is the endomorphism of X such that

Xx1) = aI x I + ... + afn xn for i = 1, 2, ... , n

then for any non-zero determinant function A of X

A0(x1), ... ,O(xn)) = A(Ea,jxt, ... , Eanlxl)= E sgn (a)alv(1) ... an a(n)A(xI , ... , xna n

Comparing this equation with 17.6, we obtain

det(O)v; sgn(o)ala(I) ... ano(n) .esn

We now define the determinant of a square matrix A = a** oforder n by

det(A) = E sgn(u)a1 a(I) ano(n)QESn

Therefore the connection between the two approaches to theconcept of determinant is given by the following equation

det (0) = det (M(4)) for any 4eEnd(X)

where M(O) is the matrix of the endomorphism 0 relative to any baseof X (see § 14A).


Using the customary rectangular arrangement of matrices, we alsowrite

a1 1

a2 1

a21 a22

a1n

a2 n

an 1 an n

For matrices of small sizes, we have explicitly

Ial1 = a11;

all a12= a11a22 - a12a21;

all a12 a13

a21 a22 a23

a31 a32 a33

det(A) =

a11a22a33 + a12a23a31 + a13a21a32

-a12a21a33 - a11a23a32 - a13a22a31

For the last equation, there is a convenient way of determining thesign of the six summands; this is indicated by the following diagrams.

all_ a12 a{ al 1 .. a12

a21 aa a2123 2z

/''/ 2a31 a3 . a33a31a32

+\al1 a12 a13

a21 a22 a23

a31 a32 a33

alf a12 a13 _a21 a22 a23

i a31 P32' a33i


The rule of signs above applies only to determinants of order 3; fordeterminants of higher order we do not have simple rules.

Finally let us derive some special properties of determinants (ofendomorphisms and of matrices).

THEOREM 17.7. Let X be an n-dimensional linear space over A andA(n,n) the matrix algebra of all square matrices of order n over A.Then for any endomorphisms 0 and >G of X and any matrices M andN of A(n,n) the following statements hold.

(a) det(O) = 1 if 0 = ix, the identity endomorphism of X.(a') det(M) = 1 if M = In , the identity matrix of order n.(b) det(X¢) = Xndet(O) for any Ac=A.(b') det(XM) _ An det(M) for any XEA.(c) det(lf/oo) =

(c') det(MN) = det(M)det(N).(d) det(¢) * 0 if and only if 0 is an automorphism and in this

case det(O-1) = 1/det(0).(d') det(M) * 0 if and only if M is an invertible matrix, and in

this case det(M'') = 1 /det(M).(e) det(¢) = det(0*) where 0*is the dual endorphism of 0.(e') det(M) = det(Mt) where Mt is the transpose of M.

Proof. (a), (b) and (c) are direct consequences of the definition. Thecorresponding statements follow from the relation det(0) = det(M(O)).

(d) If ¢ is an automorphism, then it follows from (a) and (c) that1 = det(¢)det(¢-' ). Therefore det(0) * 0 and det(O"') = 1 /det(O).Conversely if det(¢) * 0, then for a determinant function Aand a base (x,, xn) of X, the equation

A(0(xl), ... , 0(x, )) = det(O)A(x , ... , xn)

holds. Since both factors on the right-hand side are non-zero, we getA(Xxl ), ... , ¢(xn )) 0 0, which means that the vectors0(x1), ... , O(xn) are linearly independent. Hence 0 is anautomorphism. Clearly (d') follows (d).

(e') Let M = (llij)i,j=j, ... , n and Mt = (i'tj)t,t=1, ... , n. Thenµi j for all indices i and j. Now

det(M) = sgn(a)i. 6(1) ... µnu(t )Qn


and det(Mt) = E sgn(T)p'i r(j) µ'nT(n)TES,

_ ; sgn(T)ELT(i)i ... Ar(n)n.res,

Now for each summand sgn(v)µiv(i) ... µnv(n) of det(M) therecorresponds a unique summand sgn(T)MT(1)i Wr(n)n, whereT = a ; and vice versa. For these corresponding summands, we havesgn((7) = sgn(T) and µT(1)1 AT(n)n = 1 ia(1) ... Antherefore det(M) = det(MI).

(e) Since M(O*) = (M(O)), we get

det(O) = det(M(o)) = det(M(0)1) = det(M(O*)) =det(O*).

E. Some useful rules

The determinant of a square matrix

A=

all a12...ain

a21 a22 ... a2n

ant ant ann

was defined in the last subsection as

det(A) = Zsgn(a)aia(1) ... ano(n).R

This is however capable of another interpretation. Consider the n-dimensional arithmetical linear space An together with its canonicalbase (e1, ... , en ). By 17.9, there is a unique determinant functionA of An such that

thereforeA(e1, .. , en) = 1;

det(A) = A(a1 *, ... , an*)

where al* _ (ail , at2i ... , atn) is the i-th row vector of A for eachi = 1, 2, ... , n. Hence the determinant det(A) of a matrix A is adeterminant function of the row vectors of A. From this certain rulesof practical computation of determinants follow:


(a) det(A) = 0 if A has two equal rows.(b) det(A) _ -det(A') if A' is the matrix obtained from A by a

transposition of two rows.(c) det(A) = det(A") if A" is the matrix obtained from A by

adding to one of its rows a linear combination of the otherrows.

(d) X det(A) = det(A"') if A"' is the matrix obtained from A bythe multiplication of one of its rows by a scalar X.

Similarly, det(A) is a determinant function of the column vectors ofA and from this similar rules are derived.

Finally we see that employing appropriate elementary trans-formations (see § 15C) we can bring the matrix A into a triangularform

E _

612 .....Eln622 .....62n

0

l 0...........Enn Jwhere all terms below the diagonal are zero, such that

det(A) = det(E) = E11 E22For example, we have

1 -2 4

2 1 1

0 5 2

E-i

1 -2 1

0 5 -7

0 5 2

.. Enn

1 -2 1

0 5 -7 = 45.

0 0 9

Using a similar argument, we see that if a square matrix A of ordern is in the form

A =B C

O D

where B is a square matrix of order p, D is a square matrix of orderq, C is a (p, q)-matrix and 0 is the zero (q, p)-matrix, then

det(A) = det(B)-det(D).


F. Cofactors and minorsAnother way of evaluating the determinant of a matrix A = a**

is provided by expansion by a row. This method is effective in thesense that it works; it is not efficient for determinants of order 5 orhigher, because of the amount of arithmetic involved. The same istrue for CRAMER's rule described below. Let i be a fixed row indexand A the determinant function of An such that A(e1 i . , e,) = 1at the canonical base (e1 , ... , en) of An . Then for each j = 1, ... , n,we have

Aij = A(al...... at-1.) e1, ai+1*, , an.)

all ... a1 j-1 a1 ja1 j+1 a1 n

ai-11

... ai-1 j-1 ai-1 j ai-1 j+ 1 ... ai-1 n

0 ... 0 1 0 ... 0

ai+i i ... at+i j-i ai+1 j a,+1 j+1 ... ai+1 n

an i ... an j-1 an j an j+1 an n

all ... ai j-i 0 ai /+1 ... ai n

ai-i 1 ... ai-1 j-I 0 ai-1 j+1 ... ai-i n

0 ... 0 1 0 ... 0

ai+1 I ... ai+1 j-1 0 ai+1 j+1 ... ai+1 n

an 1 an j-i 0 an j+1 an n

§17 DETERMINANTS

a, , a1 j-1 0 a, j+, ...

1 ... ai-1 j-1 0 ai-1 j+1 ...ai , ... ai j-1 1 ai j+, ...ai+, ... ai+1 j-1 0 ai+, j+1 ...

an 1 an j-1 0 an j+1

= A(a*1, ... , ei, a*j+,, ... ) asn)-

We call A; j the cofactor of the term aij. A; j is the determinant of thematrix obtained from A when we replace the i-th row ai* of A by ejand the j-th column a,j of A by ei, i.e., all terms of the i-th row andof the j-th column are changed to 0 except the term aij which isto 1.

In terms of cofactors, we get for each row index in

det(A) = EaijAij ,j=i

for det(A) = 0(a, *, . .. , an * )

A(a,*, ... , ai-i*, E aijej, ai+,*, ... , an*)n nZ.aijA(aj*, ... , ai-1*, ei,ai+1s, ... , an*) = EiaijAij.

This is called the expansion of det (A) by the i-th row.The determinant Aij of order n is essentially a determinant of order

n-1. By shifting the rows and columns in Aij we geta, , ... a1 j-, 0 a, /+1 ... a, n

ai-1 1 ... ai-1 j-1 0

221

a1 n

an n

ai-1 j+1 ai-1 n

Aij=I 0 ... 0 1 0 0

ai+1 , ... ai+, j-1 0 ai+, j+1 ... ai+1 n

0an , ... an i-i an i+, . an n


= (-1)` +j

1 0 .................................0

0 all a1 j-1 a1 i+I a1 n

ai-1 1 ... ai-1 j-1 ai_1 j+1 ... ai-1 n

ai+1 1 ... ai+1 j_1 ai+1 j+ 1 ... ai+1 n

an 1

all

ai-11 ...

an j+1

a1 j_1 ai j+1 ... a1 n

a1-1 j-1 ai-1 j+1 ... ai-1 n

ai+11 ... aJ+1 j_1 ai+1 i+1 ... ai+1 n

an 1 ... an j_1 an j+1 ... an n

an n

where Mij is called the minor of the term ai, which is the determinantof the square matrix of order n-1 obtained from A when we deletethe i-th row and the j-th column of A. Consequently we have anexpansion of det(A) by its i-th row:

ndet(A) = I (-1)ai1Mij.

j=1For example

a R 7 S

a'a R 7q 6.,

am0N, ,1 m 6'

=a

+7

at 0' 5'all got f

-0a' 7' S'

a" ,r 61,

a"' s'n

at (3' 7'

a°' 11? s,n


Similarly the expansion det(A) by the i-th column is:

det(A) = E (-1)i+jajiMji .Pi

The minors and cofactors are also useful in the evaluation of aninverse matrix. For each square matrix of order n, A = a**, we definethe adjoint matrix ad(A) _ (3** as the square matrix of order nwhose terms are :

Qij =(-1)i+1Mji = Aji (i, / = 1, . . . , n).

In other words, ad(A) is the transpose of the matrix (Aij)f, j = 1, ... , nof the cofactors of A. It follows from the expansion of det(A) thatfor each pair of indices i and k, we haven n n

an *)EaijOik = EaijAkj = E aij0(al *+ ... , ak-1*, ej, ak+ 1*, ... ,j=1 1=1 /=1

_ A(a1*, ... , ak-1*, aijej, ak+l*, ... , anj=1

= A(a1*, ... , ak-1 ai*, ak+l*, ... , an*) = det(A)Sik,

and henceA - ad(A) = det (A )In .

Similarly, using the expansion of det(A) by its columns, we get

ad(A) A = det(A)II.

From this we derive that if the matrix A is invertible, thendet(A) 0 0 and

A-1 = det(A)-' ad(A).

Finally we shall derive the well-known CRAMERS rule for thesolution of systems of linear equations. Any system of linear equations

(E)

a11X1 + ... + a1nXn t31

a21X1 + ... + a1nXn = Q2

........................

a., X1 +...+a.nXn =On


can be written as a matrix equation

AX* = Q*

where

a1 n \ (X1 )

A= a2 l a2n X* =X2

R*

,an i . . . ann J `Xn / on)

If (E) has a unique solution, then A has a rank equal to n and henceit is invertible. Therefore

or X* = det(A)-nad(A)(i*det (A) X, =

iI1Ai1PiOn the other hand Aii = A(a*1 , ... , a*1_1 , ei, a*1+1, .. , a*n)where (el , ... , en) is the canonical base of A" and A is thedeterminant function of An such that A(el, ... , en) = 1. By then-linearity of A, we get

n

iEQiAii = 0(a*1 , a*i-i , R*, a*i+ , ... ,

and hence the solution of (E) is given byfi all ... al i-1 01 al i+1 ... ain

Xi =

a21 a2i-1 R2 a2 i+1 ... a2n

...............................

and ani-1 Rn ani+i ann

all ... ai nall ... a2n

and ... ann

a*n )

i=1,.. ,n


where the denominator is the determinant of the coefficient matrixA while the numerator is the determinant of the matrix obtainedfrom A on replacing the i-th column a*, by the column matrix/3*. These equations are know as CRAMER s rule.

G. Exercises1. Determine the parity of the following permutations

(a) (1 2 3 4 5 6 7 8 9)1 3 4 7 8 2 6 9 5

(b) /1 2 3 4 5 6 7 8 9)2 1 7 9 8 6 3 5 4

(c) 1 2 . . . . . . . . n-1 n ln n-1........ 2 1 /

(d) 1 2 . . . . . . . . n-l n

2 3 . . . . . . . . n 1 /

2. Find i, j, k, m such that

(a)

(1

2 3 4 5 6 7 8 9

ll 2 7 4 i 5 6 j 9

(b) /1 2 3 4 5 6 7 8 91l k 2 5 m 4 8 9 7

. is even,

is odd.

1 2 3 4 5 63. Let a = ( 2 3 1 4 6 5 ) be an element of S6.

(a) Find the least positive integer n such that a" is the identity(n is called the order of a).

(b) Let G = {a, a2, a3, ... , all }. Show that G is a group withrespect to composition.

(c) Find mutually disjoint subsets J1 , J2, J3 of 16 = 11,2,..., 6)such that (i) JI UJ2UJ3 = 16, (ii) TV,) = J1 for all rEGandi = 1,2,3.


4. Evaluate

1 2 6 -2(a) 7 4 -1 19

3 5 -3 14

1 1 2 1

(c)

1 6 -5 3 5

7 3 -8 2 4

-1 2 3 -1 3

0 5 2 1 8

2 -3 6 -5 0

5. Prove that

a+(3 a/3 0

1 a+(3 a13

0 1 a+13

(b)

(d)

I 1 a+Q

where all the unspecified entries are zero.

6. Evaluate

(a)

246 427 327

1014 543 443

-324 721 621

(b)

a+6 3a 0+2a 0+6

203 6+i3 'y-j3 7-6a+7 y-26 6 a+36

9-6 7-6 a+7 a+6

4 2 0 -2 4

2 0 -2 -4 4

0 -2 -4 4 2

-2 -4 4 2 0

-4 4 2 0 -2

=an+1 - Rn-1

a- R

l+x 1 1 1

1 l x I I

1 1 l+y 1

1 1 1 ly


7. Prove thatI all ... aii 0--- 0

Cj1. Cll0...0 ail. a1; R11...MIk

711...71] RI1 131 k a71 ... akI...akk

17k 1 ... 7kj 13k I... Rkk

8. Evaluate det(a. j) in each of the following cases.

(a) at j = 1 for i = j ± 1 and at j = 0 otherwise.(b) a = I for i < j and for i = j+l ; a= j = 0 for i = j and for

(c) a1j =0fori=janda1j= 1 fori#j.

9. Let a1 = a, a2= a+S, a3= a + 28, ... , an= a + (n-1)S.Show that the value of the cyclic determinant

a1 . .a2 an-1 an

a2 a3 an a1

a3 a4 a1 a2

Ian011

. . . an-2 an -1

is (-1)n(n-1)12(ns)n-1(a+ i 6).

10. Show that the value of the Vandermonde determinant

1 a a2 an-1I 1 1

l a2 a2 . . . a2-I

is f(xt -xj) .

i>i1 aq ann-1


11. Let M,,. . . , Mm be square matrices. Show that the determinantof the matrix

M,

M = M2

M)consisting of blocks M1, . . . , M, on the diagonal and blocksof zero matrics off the diagonal is equal to the product

detM=detM1 .... detMm.12. Let X be the linear space of all square matrices of order n over A

and let AEX. Define LA : X -+ X to be the mapping such thatLA (M) = AM. Show that LA is a linear transformation and that

detLA =(detA)".

13. If a(t), b(t), c(t), d(t) are functions of R into R, we can formthe determinant

a(t) b(t)

c(t) d(t)

just as with numbers.

(a) Let f(t), g(t) be functions having derivatives of all orders.Let -p(t) be the function obtained by taking the determinant

p(t) =f(t)

f'(t)

g(t)

g'(t)

Show that

'(t) _f(t) g(t)

f"(t) g"(t)

(b) Generalize (a) to the 3 x 3 case, and then to the n x n case.


14. Let al , ... , an be distinct non-zero real numbers. Show thatthe function

ea,t, ea2t, ... , eant

are linear independent over R. [Hint: make use of the methodin Exercise 13.1

15. Solve the following systems of linear equations by CRAMER'S rule.

(a) 2X, - X2 + 3X3 + 2X4 = 43X1 - 3X2 + 3X3 + 2X4 = 63X1 - X2 - X3 + 2X4 = 63X1 - X2 + 3X3 - X4 = 6.

(b) 5X1+6X2 =0X1 + 5X2 + 6X3 = 0

X2 + 5X3 + 6X4 = 0X3+ 5X4+ 6X5 = 0

X4 + 5X5 = 1.

16. Show that the solutions of a system of r homogeneous linearequations

a1ixi+ .. . +amXn=O

ai 1 X 1 + . . . . + Xn = 0

of rank r can be obtained by the following method due toFROBENIUS. Augment the coefficient matrix to a square matrixA = (ai1)i,1= 1, , n such that det A 0 0. Then the vectors(Ar+i,1, Ar+i,2, ..., Ar+i,n) (i = 1, ..., n-r) where Ai1denotes the cofactor of ail are n-r linearly independent solutionsof the system.

CHAPTER VII EIGENVALUES

Given a single endomorphism a of a finite-dimensional linearspace X, it is desirable to have a base of X relative to which thematrix of a takes up a form as simple as possible. We shall see in thischapter that some endomorphisms can be represented (relative tocertain bases) by matrices of diagonal form; while for everyendomorphism of a complex linear space we can find a base relativeto which the matrix of the endomorphism is of JORDAN form. In § 18we give a rudimentary study on polynomials needed in the sequel.Characteristic polynomials will be studied in § 19. Finally we provethe JORDAN Theorem in §20.

§ 18 PolynomialsA. Definitions

Let A denote again the set R of all real numbers or the set C of allcomplex numbers. As before elements of A are referred to as scalars.Consider the set S of all infinite sequences

f = (ao, a1 , a2, . . .) = (a1),= 0,1,2, .. .

of scalars of finite support, i.e. sequences for each of which an indexn exists such that a; = 0 for all i > n. For any two elementsf = (ao, a1 i a2, ...) and g = (00, (31 i R2' ..) of S and any scalar X of A,we define the sum f + g as the sequence

f+g=(ao +(30,a1 +31,a2 +R2,.. .)

and the scalar multiple Nf as the sequence

Xf=(Map,Xal,Xa2,...).

Then it is easy to see that both f + g and Xg are again elements of Sand that S is an infinite-dimensional linear space over A with respectto the composition laws defined above. The zero vector of the linearspace S is then the sequence 0 = (0, 0, 0, ...) and the additive inverse-f off is the sequence -f = (-ao, -a1, -a2, ...).

230

§18 POLYNOMIALS 231

Let us now introduce another internal composition law in S,called multiplication. For f and g of S, we define their productfg asthe sequence fg = (7o , 71 , y2, . ) where

7k= E a1 for all k= 0,1,2,...1+j=k

Thus the terms 7, of the product fg are defined as follows:

7o = ao go71 = a0R1 + a1 go

72 = a0(32 + a1 a1 + a2 go

73 = a°a3 +a1 Q2 +a2 Q1 +a3ao

Obviously fg is an element of S.It is readily verified that S satisfies, besides the usual axioms of a

linear space, the following properties:

(i) f(gh) = (fg)h;(ii) (an (lig) = (X 1) (fg) ;(iii) fg = gf;(iv) f(g + h) = fg + fh.

Following the customary terminology in algebra, we say that Stogether with the three composition laws constitute a commutativeA-algebra. The sequence

1 = (1,0,0, ...)is the unit element of S, by which we mean to say that If = fl = f forall f eS. Moreover, for any two sequences f and g, fig = 0 if and onlyiff=0org=0.

Consider the sequence T = (0, 1, 0, 0, ...) of S and its powers:T°= (1, 0, 0, 0, ... )

T'= (0, 1,0,0, ...)T2= (0, 0, 1, 0, ...)

Let f = (a°, a1 i a2,. ..). If a, = 0 for all i > n, then we can writefin the more familiar form:

f=a"T" +an-1T"-1 +...+a2T2 +a1T' +a°T°.

232 VIl EIGENVALUES

In this notation, we write S = A[T] and call it the commutativeA-algebra of all polynomials in the indeterminate T with coefficientsin A. Addition and multiplication can be expressed as follows:

EaiT`+ E(3iT4 = E(ai +Oi) T`;AEaiT` = EXaiT`;

(EaiT`)(E(3iT`) = E7kTk where 7k E aigii+j=k

It is the custom to write simply T for T1 ; they are just differentnotations for one and the same polynomial (0, 1, 0, . . .). Considernow the subspace A[ T] o of all polynomials of the form a° T°. Thenit is easily seen that A[T] o is isomorphic to A both as linear spacesand as A-algebras under the correspondence a. To --> ao. Moreover

(aoT°) ((3,T" +... +(31 T+(30T°) = ao(an T" +- -+(3T+(3oT°).Therefore we can further write ao in place of ao T °; hence everypolynomial in A[ T] has the form

anTn +an_1T"-' +...+a1 T+ao.

B. Euclidean algorithmLet f = (a0, a1, (X2, . . .) be a polynomial different from zero. Then

by definition there exists a unique non-negative integer n such thatai = 0 for all i > n and an : 0. This integer n is called the degree ofthe polynomial f and it is denoted by deg f. A polynomial f ofdegree n can be written in the simplest form as

f=anT" +an_1Tn"1 +...+a1T+ao

where the non-zero coefficient an is called the leading coefficient off and the last term a° is called the constant term of f.

The degree of a product and a sum satisfies the following equalityand inequality:

(i) deg (fg) = deg f + deg g(ii) deg(f +g) < max(deg f, deg g).

We observe that the degree of the zero polynomial is not defined.We may find it convenient to set the degree of 0 to be the symbol -00for which we make the convention that -oc + n = -°°, -°° < n for allintegers n, and -o + (-o) = -o, -. o < Under this convention,statements (i) and (ii) hold for all polynomials of A[T].


For any two non-zero polynomials f and g of A[ T] we say that fis divisible by g, g divides f, f is a multiple of g, g is a factor of f or gis a divisor off if f = gh for some polynomial h. If f is divisible by gthen deg g < deg f. The converse of this is not true.

THEOREM 18.1. (Euclidean algorithm) Let f and g be two non-zeropolynomials of A[T] . Then there exists polynomials s and r of A[T]such that f = sg + r and deg r < deg g.

PROOF. If deg f < deg g, then f = Og + f satisfies the requirement ofthe theorem. Therefore we assume that deg f > deg g.

Letf=an T" + +a1T+a0 andg=/3mTm+ +(31T+gowhere n > m > 0 and an * 0, gm 0. If s 1 = In T"-,, then thedegree of the polynomial

r1 =f-s1gclearly satisfies deg r1 < deg f. If deg r1 < m, then the desired resultholds since

f=slg+r1.

Otherwise we can apply the above operation to the polynomial r1and g and get a polynomial s2 of A[ T] such that the degree of thepolynomial

r2 = r1 - s2g

satisfies the inequality deg r2 < deg r1. Then

f = (s1 + s2)g + r2and so if deg r2 < m the desired result holds. Otherwise we canrepeat the process and arrive after no more than n-m steps at

r3 = r2 sag

rk = rk-1 skg

where deg rk < m. Putting s = s1 + .. + Sk and rk = r we get

f = sg+rand so the theorem is proved.

234 VII EIGENVALUES

Let us now study an application of the euclidean algorithm. For narbitrary polynomials f 1, f2 , . . . , fn of A [Ti, we consider the subsetF consisting of all polynomials of A[ T] of the form

r1f1 + r2f2 + . .. + rn fn

where r, are arbitrary polynomials of A[T] . F is called the ideal ofA[ T] generated by the polynomials f1 , f2, ... , fn and it satisfiesthe following properties:

(i) if f and g belong to F, then f -g belong to F.(ii) if f belongs to F, then rf belongs to F for all polynomials r

of A[T].

In general any non-empty subset F of A[T] that satisfies (i) and (ii)above is called an ideal of A[T] .

Clearly the subset 0 consisting of the zero polynomial alone is anideal of A[T], and A [ T] itself is an ideal of A[T]. The subset of allmultiples of a fixed polynomial f of A [ T] is also an ideal of A [ T1.The next theorem shows that every ideal of A [ T] can be so obtained.

THEOREM 18.2. Every ideal F of A[ T] can be generated by a singlepolynomial g of F, i.e. F = {sg: seA[T] }.

PROOF. If F = 0, then the theorem is trivial. Suppose now that F con-tains non-zero polynomials. In F there exists a polynomial g such that

0 < degg<degffor all non-zero polynomial for F. The theorem is proved if we showf = sg for all non-zero feF. By 18.1, there exist polynomials r and s ofA[ T] such that f = sg + r and deg r < deg g. Since F is an ideal, itfollows that r belongs to F. Hence deg r = -00, i.e., r = 0, provingthe theorem.

C. Greatest common divisor

Let f1, f2, ... , fn be polynomials in A [ T1. A polynomial h ofA[T] is called a greatest common divisor of the polynomialsf1, f2, ... , fn if (i) h divides each of f1i f2, ... , fn and (ii) anypolynomial that divides each of f1 ,f2, ... , fn divides h. If h and ifare the two greatest common divisors of the same polynomialsf1, f2, . . . , fn , then it follows from the definition that h = ph' andh' = qh for some non-zero polynomials p and q of A[T]. Since

§ 18 POLYNOMIALS 235

h = pqh, it follows that both p and q are non-zero constants. There-fore the greatest common divisor of the non-zero polynomialsf, , f2, ... , f,, is uniquely determined up to a non-zero constantfactor. We say that the non-zero polynomials f, , f2, . . . , f arerelatively prime if they have I as their greatest common divisor. Weshall now use Theorem 18.2 to prove that the greatest commondivisor h of the non-zero polynomials f, , f2, . . . , f can be writtenas h = r, f, + r2f2 + . + r f for some polynomials ri of A[T].Indeed, let F be the ideal of A [ T] generated by f,, f2, . , f, andh a polynomial of F that generates F. Then h = r, f, + + r f forsome and h is obviously a greatest common divisor of

In particular if f, , ..., f are relatively prime, then1 = r, f, + ... + rn f for some r; .

D. SubstitutionsIn the linear space A[T], the family (1, T, T2, ...) constitutes a

base. Therefore for each scalar A of A a linear transformation OA:A[ T] -> A is uniquely defined by

OX(T) = A' i = 0, 1, 2, .. .

where T °= 1 and A° = 1. For a polynomial f = a T" + + a, T + a°,we shall write OA(f) = f (X), i.e.

f(X) = a A" + ... + a,A+a°.

Since OA is a linear transformation, we get for f, geA[T] and A, µcA

(f +g)(A) = f(A) + g(A)(µ1)(A) = A(Ax))-

By straightforward calculation, we can further verify that

(fg)(X) = f(A)g(A)

Each polynomial in Ker¢A is said to have A as a root or zero; inother words is a root of feA[T] if f(X) = 0. The followingtheorem characterizes roots of a polynomial in terms of divisibility.

THEOREM 18.3. Let f be a non-zero polynomial of A[ T1. A scalar Aof A is a root off if and only if f is divisible by the polynomial T -A.

236 V11 EIGENVALUES

PROOF. If f is divisible by T - X, then f = (T- X)gfor some polynomialgin A [ T] . Since f(A) = (X - X)g(X) = 0, X is by definition a root of f.Conversely if X is root of f, then deg f > 1, since a non-zero constantpolynomial has no root. Applying the euclidean algorithm to f andT - X, we get polynomials s and r of A [ T] such that f = s(T - X) + rand deg r < 1. Since X is a root of f, we get 0 = f(X) = s(X)(X - X) + r(X).That means X is a root of the constant polynomial r; thereforer = 0. Hence f is divisible by T - X.

Consequently a polynomial fin A[ T] of degree n > 1 can have atmost n roots. A root X off is said to have the multiplicity m (m > 1)if m is the largest integer such that f is divisible by the polynomial(T - X)m . Accordingly a root of multiplicity 1 (respectively > 1) iscalled a simple (respectively multiple) root.

A polynomial with real coefficients may fail to have real roots; forinstance, the polynomial TZ + I has no real roots. On roots ofpolynomials with complex coefficients, we have the followinginformation: every polynomial f in C[T] of degree n > I has exactlyn roots in C when each root is counted by its multiplicity. This is theso-called fundamental theorem of algebra.

We shall make use of this theorem in a few places in this and thefollowing chapter. The reader is asked to accept this theorem whoseproof requires the use of results in topology or analysis.

Recall that for every linear space X over A we have an associativealgebra End(X) of all endomorphisms of X, in which the productra of any two endomorphisms r and a is defined as the endomor-phism To v.

For any endomorphism a of X, we define a linear transformation>ya: A[ T] --> End(X) such that

ya(l) = ix and l1a(T`) = ai for i = 1, 2, ... .

Hence for each polynomial f = a" T" + . + a1T + ao we get anendomorphism ip(f) of X which we write as

f(a) = a, a' +.. +a1a+aoix.

If there is no danger of confusion we shall write the last term aoix inf (a) simply as ao ; thus f (a) = a" v" + + a 1 a + ao .

Analogously for each square matrix M over A of order r and eachpolynomial f of A[ T] we get a square matrix of order r:

f(M) = a1M + aol,.


Here again, we may delete the identity matrix Ir from the term aolrif it is clear from the context; thus f(M) = aOW + . + a1M + ao.

REMARK 18.4. In our definition of polynomial, we only makeuse of some very fundamental algebraic properties of A (A = Ror A = Q. A definition of polynomial with coefficients in the algebraEnd(X) or in the matrix algebra A(',') can be given with a fewobvious modifications. For instance, a polynomial with coefficientsin End(X) is a family

f =(00,01,02 ...)=(0i)i=0, 1, 2,

of endomorphisms of X for which an index n exists such that ai = 0for all i > n. If we put

T = (0, ix, 0, ...),then write f as

f=anTn + an-1Tn 1 + ...Addition and multiplication are defined similarly. We notice that

since multiplication in End(X) is not commutative, for two poly-nomials f and g with endomorphisms as coefficients, fg may bedifferent from gf. Finally if f = an Tn + ... + a1 T + ao is a poly-nomial in the indeterminate T with coefficients in End(X) and T is anendomorphism of X, then

f(T) = anTn + ... + a1T + a0

is an endomorphism of X. In particular we say that r annuls f iff(r) = 0. Polynomials with matrix coefficients are analogously defined.

E. Exercises

1. Find in each of the following cases q and r in R[T] such thatf = gq + r with deg r < deg g.(a) f = T 3 -3T2-T- 1, g=3T2 -2T+ 1;(b) f = T 4 -2T+5, g=T2 +T+2;(c) f=3T4 -2T3 - T+ 1, g=3T2 - T+ 2.

2. In each of the following cases, find necessary and sufficient con-ditions on a, 13, such that

(a) T3 + (3T + 1 is divisible by T 2 + a T - 1,(b) T4 + RT2 + 1 is divisible by T2 + aT + 1.

238 VII EIGENVALUES

3. In each of the following cases find the greatest common divisor(f, g) in R [ T1. Find also u and v in R [ T] such that of + vg = (f, g).(a) f=T4 +2T3 -T2-4T-2,g=T4 +T3 -T2-2T-2;(b) f=3T3 -2T2 +T+2, g=T2 -T+ 1;(c) f=T4 g=(1 -T)4.

4. p E R [ T] is said to be irreducible if it cannot be written as aproduct p =fg with f,geR[T],degf <degp and degf<degp.(a) Let p, f, g e R[T] and let p be irreducible. Prove that if fg is

divisible by p, then f is divisible by p or g is divisible by p.

(b) Prove that every f E R [ Ti can be written as a product ofirreducible polynomials:

f=P1P2 . . . Pr.(c) Prove that if

.f =P1P2 ... Pr = gig2 ... qswhere all p, and q, are irreducible polynomials of degrees >1,then (i) r = s and (ii) after suitable renumbering p, = c;q;for i = 1, ... , r where c, is a non-zero constant.

5. Let f r= R[T]. Suppose f has degree n. Prove that f has at most nroots.

6. Let f, g E R[T] be of degree < n. Suppose for n+l distinct realnumbers al, a2, . . . , an + 1 , Pa,) = g(a,) for i = 1, 2, ... , n+l.Prove that f = cg where c is a non-zero constant.

7. Let P, (i, j = 1, ... , n) be polynomials in one indeterminate T. Letc,i (0 0) be the leading coefficient of Pip Assume the deg P11 = d;for all i, j = 1, . . . , n. Show that

P11 Pin

= cTd + terms or degree < d.I Pn1 ... An I

where c=det(cii)andd=d1+ ... +dn.

§ 19 EIGENVALUES 239

§ 19. Eigenvalues

In the rest of this chapter, we shall concern ourselves with a moredetailed study of endomorphisms of a finite-dimensional linear space.Intuitively speaking, we shall take an endomorphism apart to seehow its components work. Technically, we decompose the linearspace X in question into a direct sum of invariant subspaces uponeach of which the endomorphism a operates in a way that isrelatively easy for us to handle.

A. Invariant subspacesLet a be an endomorphism of a linear space X and Y a subspace of X.

We say that Y is invariant under a if a[ Y] C Y. If Y is invariant undera, then a defines an endomorphism a': Y -+ Y where a' (y) = a(y)for all y E Y. Trivial invariant subspaces are the zero subspace0 and the entire space X. Moreover, both Ker a and Im a are invariantunder a.EXAMPLE 19.1. Let a be the endomorphism of R2 defined by

a(e1) = X1 e1 and a(e2) = X2e2

where (e1 , e2) is the canonical base. If X1 * X2, then the only1-dimensional invariant subspaces are those generated by e1 and bye2. In the case where X1 = X2, every subspace of R2 is invariantunder a.EXAMPLE 19.2. Let a be an endomorphism of a linear space X over A.The existence of a 1-dimensional subspace invariant under a isequivalent to the existence of (i) a non-zero vector x of X and (ii) ascalar X of A such that a(x) = Xx. That means the existence of anon-zero vector x in Ker(a - X) for some scalar X of A. Consider nowthe endomorphism a: R2 --+ R2 defined by

a(e1)= -e2 and a(e2)=e1.Then for any real number X the endomorphism a - X is an auto-morphism. Therefore R2 and 0 are the only subspaces of R2invariant under a.EXAMPLE 19.3. Let be the real linear space of all polynomialsof R [ T] of degree < n and D the differential operator, i.e.

D(a" T" + ... + a1 T + a0) = na"T"-1 + ... + 2a2 T+a1.Then for every k < n, the subspace Pk+ 1 of P. + 1 of all polynomials

240 V11 EIGENVALUES

of R[T] of degree < k is invariant under D. Furthermore, these arethe only subspaces of Pn+

1invariant under D.

If X is a finite-dimensional linear space over A and Y is a subspaceinvariant under an endomorphism a of X, then relative to somebases of X, the matrix of a takes up a simple form. For instance, ifB = (x 1, . . ., x n) is a base of X such that B'= (x 1, ... , x,.) is a baseof Y, then

and

a(xi)=aiiXi +...+airxra(xj) = aj1 x1 + ... + ajrxr + aj r+i xr+ i + ... + ajnxn

for i = 1, 2, ... , r and j = r +1, ... , n. Therefore the matrix MBBof a relative to the base B is in block form:

all . . . a1r

MBB (a) =

art . . . arr

ar+ii . . . ar+1r

0 ................ 0

....................

....................

....................0 ................ 0

ar+i r+1 ........ ar+1 n

....................

ant ...... anr an r+1 ........... an n

We observe that the block at the lower left corner need not be zerogenerally. Suppose the block at the lower left corner is zero. Then

a(xj) = ajr+1,Xr+ 1 + ... + ainxn

for all j = r+l, ... , n. Consequently the subspace Y' generated bythe vectors xr+ 1, ... , xn is invariant under a, and X = Y ® Y' is adirect sum of two invariant subspace.


Conversely suppose X = Z1 ® Z2 where Z1 and Z2 are invariantunder a. If B 1 = (z 1 , ... , z,) and B2 =(Z,-I," .. , zn) are bases ofZ1 and Z2 respectively then

a(z1) = oil Z1 + . Ri,Z, 1 = 1, ... , ra(Z1) =gir+1Zr+1+ .. +ginZn 1 r+l,..., n.

Therefore the matrix MBB(a) of a relative to the base B = (z1, ... , z,,Zr+1,..., Zn)is

7a11............ Qir..................

0,1 ............Orr

0 .............. 0

..................

..................

..................0 .............. 0

0 ............... 0......................................................0 ............... 0

Or+ lr+1 ...... Q,+ 1 n

Onr+1.......... Qnn

where the upper right and lower left blocks are both zero. It is easilyrecognized that the upper left block is the matrix MB1 B1 (al) and thelower right block isMB2B2 (a2) where a1 : Z1 -+Z2 and a2: Z2 -> Z2are the endomorphisms defined by a on the invariant subspaces.Thus

r-

MBB (a) =MB1B1(a1) 0

0 MB2B2 (a2)

----------

It follows that if Z1 is further decomposed into a direct sum

242 VII EIGENVALUES

Z, = UI ® U2 of subspaces invariant under a, (and hence also under a)and Z2 is decomposed into a direct sum Z2 = U3 ® U4 of subspacesinvariant under a2 (and hence also under a), then relative to asuitable base of X the matrix of a will have the form

where A; are the matrices of endomorphism on U; defined by a andall unspecified terms are zero.

The above discussion shows that decomposition of the linear spaceX into a direct sum of subspaces each of which is invariant under theendomorphism in question is a suitable method for tackling theproblem posed at the beginning of this chapter. Taking this asour point of departure, we shall branch out in different directions inthe remainder of this chapter.

B. Eigenvectors and eigenvalues

We shall now study in more detail 1-dimensional invariantsubspaces. This will lead to the central concepts of this chapter:namely, eigenvectors and eigenvalues.

Let Y be a 1-dimensional subspace of a linear space X over A andx be a non-zero vector of Y. Then Y is invariant under anendomorphism a of X if and only if a(x) belongs to Y, i.e. a(x)= Axfor some scalar X of A. In this case a(y)=Ay for all vectors y of Y.

DEFINITION 19.4. Let X be a linear space over A and a an endomor-phism of X. An eigen vector of a is a non-zero vector x of X such thato(x) = Ax for some scalar A of A. An eigen value of a is a scalar A of Athat satisfies an equation a(x) = Ax for some non-zero vector x of X.

Eigenvectors are also called proper vectors or characteristicvectors; eigenvalues are also similarly called proper values orcharacteristic values.


It follows from the above discussion that if x is an eigenvector ofan endomorphism a, then x generates a 1-dimensional subspaceinvariant under a. Conversely, every non-zero vector of a 1-dimensionalsubspace invariant under a is an eigenvector of a. From Examples19.1, 19.2 and 19.3 we see that some endomorphisms possesseigenvectors whereas others do not.

For any fixed scalar A of A, the existence of a non-zero vector x ofX such that a(x) = X x is equivalent to the existence of a non-zerovector x of X such that (a - A)(x) = 0. Therefore we obtain thefollowing useful theorem.

THEOREM 19.5. Let X be a finite-dimensional linear space over Aand a an endomorphism of X. Then for every scalar A of A thefollowing statemen's are equivalent:

(a) A is an eigenvalue of a.(b) Ker(a - A) 0(c) det(a - A) = 0.

C. Characteristic polynomials

Theorem 19.5 suggests that it is worthwhile to investigate theexpression det(a - A). We have seen in § 17D that we can evaluate thedeterminant det(a - A) of the endomorphism a -'X with the aid of anybase B = (x1, ... , xn) of X, i.e. det(a - A) is equal to the determinantof the matrix MBB (a - A) of a - A with respect to any baseB = ( X-.-1, , Xn) of X. Thus if MBB(a) = µ* *, then we get

µ1 1-A

1112 A13 . . . . . µ1n

1121 µ22-A µ23 . .. .

. 1A2n

det(a - A) =

llnl .............: /Ann-A

244 VII EIGENVALUES

If we replace X in this expression by an indeterminate T, we obtain apolynomial

pa =

#11-T 1A12 A13 . . . . . 111n

A21 1122-T 112 3 . . . . . µ2n

Ant ............. '1Mn-T

in the indeterminate T with coefficients in A. This polynomial pathat depends only on the endomorphism a (and does not depend onthe base B of X) is called the characteristic polynomial of theendomorphism a.

It follows from the definition that the degree of the characteristicpolynomial pa is n, which is the dimension of the linear space X.The leading coefficient of pa is (-1)n and the constant term of pa isdet(a). Thus

pa = (-l)nTn + a.,t_1 Tn-1 + ... + al T + det(a).

The scalar (-1)n-lan_1 = (µl 1 + - -is called the trace of the

endomorphism and is denoted by tr(a). Except in the exercises, weshall not make use of the trace of an endomorphism in the sequel.

The most important property of the characteristic polynomial paof an endomorphism a is that any scalar X of A is an eigenvalueof a if and only if it is a root of the characteristic polynomial paof a, i.e. pa(X) = 0. Thus by means of determinants we havesuccessfully reduced the geometric problem of the existence of1-dimensional subspaces invariant under an endomorphism to analgebraic problem of the existence of roots of the characteristicpolynomial. This, to some extent, is the justification for introducingdeterminants in § 17.

Since polynomials with real coefficients do not always have realroots, endomorphisms of a real linear space need not have (real)eigenvalues. Consequently, we may not be able to decompose thereal linear space in question into a direct sum of 1-dimensionalinvariant subspaces. In the complex case, the situation is more pro-mising. By the so-called fundamental theorem of algebra, everypolynomial with complex coefficients has complex roots. But, as we

§19 EIGENVALUES 245

shall see later, even for endomorphisms of a complex linear space, itis not always possible for us to obtain such a simple decomposition.

The characteristic polynomial pM of a square matrix M =p** overA of order n is similarly defined as the polynomial

1111-T 1112 1113 . . . . . Pin

PM =

1121 1122-T 1123 . . . . ' 92n

I pnI 11nn-T I

of A[T] . Eigenvalues of the matrix M are then the roots (in A) of pM.Therefore an endomorphism a of X and its matrix MBB(a) relative toany base B of X have the same eigenvalues.

D. Diagonalizable endomorphisms

Let X be an n-dimensional linear space over A. Then we call anendomorphism a of X a diagonalizable endomorphism or semi-simpleendomorphism if a has n linearly independent eigenvectors. Diag-onalizable endomorphisms are among the simplest of the endomor-phisms. Indeed if x1i ... , xn are n linearly independent eigen-vectors of a diagonalizable endomorphism a of X, then these eigen-vectors form a base of X such that a(x;) = );x; for i = 1, 2, ... , n.Therefore the matrix of a relative to this base is a diagonal matrix

where all terms outside of the diagonal are zero. Furthermore theendormorphism a is completely determined by its eigenvalues X1, X2 ,

246 VII EIGENVALUES

, An ; by this we mean that a(x) = A 1 a 1 x 1 + + A,; a xn for+anxn.

Conversely if the matrix of an arbitrary endomorphism a of X is adiagonal matrix relative to some base (x 1, x2 , ... , xn) of X, thenthe vectors x1 are linearly independent eigenvectors of a. Hence a isdiagonalizable.

By means of the correspondence between endomorphisms andmatrices, we can define diagonalizable matrix in the following way.We say that a square matrix over A of order n is diagonalizable ifthere exists an invertible square matrix P over A of order n such thatthe matrix PMP-1 is a diagonal matrix. It follows from results of§14C that an endomorphism is diagonalizable if and only if itsmatrix relative to any base is diagonalizable.

Let us study one important case in which an endomorphism iscertain to be diagonalizable. We lead up to this case by proving thefollowing theorem.

THEOREM 19.6. I f x1 , x2i ... , XP are eigenvectors of an endomor-phism and the corresponding eigen values A1, A2 , ... , XP are distinct,then x1 , x2, ... , xp are linearly independent.

PROOF. For p = 1, the proposition is trivial since eigenvectors arenon-zero vectors. We assume the validity of the proposition for p - Ieigenvectors and proceed to prove for the case of p eigenvectors. Ifx1i X2, ... , xp are linearly dependent, then by induction assumpt-ion, we get

xp = a1 x1 + a2x2 + .. + ap-1 xp_1.

Applying a to both sides of this equation, we getApxp = a1 Al x1 + a2A2x2 + . + ap_1Ap_1xp_1.

Subtracting from this the first equation multiplied by Ap, we are ledto the relation:

0 = a1(A1- Ap)x1 + ... + ap-1 (Ap-1- Ap)xp-1Since all the eigenvalues are distinct, we have A; - XP * 0 fori = 1, ... , p-1. The linear independence of x1, ... , xp_1 thereforeimplies that a1 = a2 = = ap _1 = 0 and hence xp = 0. But thiscontradicts the fact that xp is an eigenvector of a and therefore theassumption that the eigenvectors x 1i X2, ... , XP are linearlydependent is false.

From the theorem above, we derive a sufficient condition for anendomorphism to be diagonalizable.


COROLLARY 19.7. Let X be an n-dimensional linear space over A anda an endomorphism of X. If the characteristic polynomial pa of ahas n distinct roots in A, then a is diagonalizable.

In the language of the theory of equations, we can rephraseCorollary 19.7 as follows. If the characteristic polynomial pa of anendomorphism a of a linear space X over A has n simple roots wheren is the dimension of X, then the endomorphism a is diagonalizable.

This, however, is not a necessary condition for a to be diagonal-izable. For instance, consider the identity endomorphism ix of X.The characteristic polynomial of ix is (1 - 7)"; thus the n roots ofthis polynomial are equal to 1. But ix is a diagonalizable endo-morphism.

Finally let us consider an example of a non-diagonalizable endo-morphism whose characteristic polynomial has a root of multi-plicity n.EXAMPLE 19.8. Let P be the n-dimensional real linear space of allpolynomials of R[T1 of degree <n-1 and let D be the differentialoperator. Thus

D(c,,_1T"-`+...+a,T+a0) = (n-1)ai_1Tn,+ +2a2T+a,.Using the base (1, T, ... , T'-') of P we find the characteristicpolynomial to be

PD = (-1)n Tn.

0 is a root of PD with multiplicity n, therefore all eigenvalues of theendomorphism D are equal to 0. Hence a polynomial f of P is aneigenvector of the endomorphism D if and only if f is a non-zeroconstant polynomial. D is therefore non-diagonalizable since themaximal number of linearly independent eigenvectors of D is 1.

However relative to the base 1, T, 21 T 2, T3, ... , (n11 T" 'the matrix of the endomorphism D takes up the following form

1o

l OJ

where all terms are zero except those immediately below thediagonal which are all equal to 1.

248 VII EIGENVALUES

E. Exercises

1. Find the characteristic polynomial and the eigenvalues of thefollowing matrices.

(a) Ia1 0 0

0 a2 0

(b)

- 0 0 ... an

a21 a22

J1

0

ant 2 ann j2. The following matrices are matrices of endomorphisms of

complex linear spaces. Find their eigenvalues and their cones-ponding eigenvectors.

(i) 3 4 (ii) ro a

5 2 L_a 0

(iii) 11 1 1 1 (iv) 5 6 -3

I1 1 -1 -1 -1 0 1

1 -1 1 -1 1 1 2 -1J1 -1 -1 1 ,

(v) (-0 0 11 (vi) CO 2 1l

0 1 0 -2 0 3

1 0 O j - 1 -3 0)


(vii) r 1 i (viii) 1 i

i -2 -i 1

(ix) 11 2i

L0 2

3. Show that the matrix

cos9 -sinsing cos9

has no eigenvector in R2 if 9 is not a multiple of 7r.

4. Find eigenvectors of

cos9 sing

sing -cos9

in RZ.

5. Which of the following matrices are diagonalizable over C?

-'s e- el0 0 1 0 0 1 0 0 1

0 0 0 1 0 0 10 0 0

1 0 0 0 1 0, -1 0 OJ-loo -.0 1

What about over R?

6. Prove that all eigenvalues of a projection are 0 or 1.

7. Prove that all eigenvalues of an involution are +1 or-1.

8. Let X be an n-dimensional linear space and p an endomorphismof X. Show that if tp has n distinct eigenvalues, then p has 2"distinct invariant subspaces.

9. Let p be an endomorphism of a complex linear space. Supposegypm is the identity mapping for a positive integer m. Prove that pis diagonalizable.

250 VII EIGENVALUES

10. Let p be an endomorphism of an n-dimensional complex linearspace and let Al , ... , An be the eigenvalues of p. Find theeigenvalues of

(i) p2,

(ii) ,p'' (if p is further assumed to be an automorphism),(iii) f(op) where f(T) is a polynomial.

11. Let p be an endomorphism of a linear space X. Prove that ifevery non-zero vector of X is an eigenvector of gyp, then %p is ascalar multiple of the identity mapping.

12. Let p and >y be endomorphisms of a complex linear space X.Suppose,poi = do gyp. Prove that

(i) if A is an eigenvalue of 1 then the eigensubspace XAconsisting of all vectors x of X such that px = Ax is aninvariant subspace of ,,

(ii) p and >V have at least one eigenvector in common,

(iii) there exists a base of X relative to which the matrices of pand , are both triangular matrices.

(iv) Generalize (iii) to a statement in which more than twoendomorphism are involved.

13. Let p be an endomorphism of a linear space X over A and letf e A[T] . Show that if A is an eigenvalue of gyp, then f(A) is aneigenvalue of f (gyp).

14. Let p and > i be endomorphisms of a linear space X. Prove thattpo 1 and 4iotp have the same characteristic polynomial.

§20. Jordan Form

In this section, we shall concern ourselves with endomorphisms ofcomplex linear spaces exclusively. Our problem is still the same as inthe previous § 19; that is finding a base relative to which the matrixof the endomorphism in question has the simplest form possible. Weformulate our result here as Theorem 20.12. The reason for therestriction to complex linear spaces is that we can make use of thefundamental theorem of algebra to ensure that an endomorphism ofan n-dimensional linear space has n (not necessarily all distinct)eigenvalues.

§20 JORDAN FORM 251

A. Triangular form

A triangular matrix (or a matrix in triangular form) is a squarematrix A = a** of the form

all a12. . . ain

a22 . . . a2n

ann)

where all terms below the diagonal are zero, or of the form

Call

a21

l_ an i an!)

where all terms above the diagonal are zero.In the following theorem we show that the matrix of any endo-

morphism of a complex linear space can be "reduced" to a triangularmatrix with the eigenvalues forming the diagonal.

THEOREM 20.1. To every endomorphism a of a complex linear spaceX, there is a base (x1 , ... , xn) of X such that

a(x1)=a11x1a(x2) = a21x1 + a22x2

COO - a31 x1 + a32x2 + a33x3

a(xn)=an1x1 +an2X2 +an3x3+... +annxn-

252 V11 EIGENVALUES

In other words, there is a base (x1, ... , xn), relative to which thematrix of a is in triangular form. Moreover the characteristic poly-nomial pa is given by

pa = (a11- 7)(a22-T) ... (ann- T).

PROOF. We prove this theorem by induction on the dimension of X.For dim(X) = 1, the theorem is trivial. We assume the validity of thetheorem for any complex linear space of dimension < n-1 andproceed to prove the theorem in the case where dim(X) = n. Since weare dealing with complex linear spaces, we may apply the fundamen-tal theorem of algebra to the characteristic polynomial pa. Let allbe an eigenvalue of a and x1 a corresponding eigenvector, i.e.a(x 1) = a, 1 x 1. Denote by Y the 1-dimensional subspace generatedby x1 and by Z a complementary subspace of Y in X, i.e. X = Y ® Z.If we denote by 7r: X -> Z the projection of X onto its direct summandZ defined by

7r(x) = z for x = y + z where yEY, zEZ,

then an endomorphism p of Z is defined by

p(z) = Tr(a(z)) for all zEZ.

Since dim(Z) = n-1, the induction assumption is applicable to thecomplex linear space Z. Hence there exists a base (x2, x3, ... , xn )of Z such that

p(x2) = a22x2

p(x3) = a32x2 + a33x3

p(xn) =an2x2 + an3X3 + ... +annXn.

On the other hand it follows from the definition of the endomor-phism p of Z that for each vector z of Z

a(z) = ax, + p(z).

Therefore we get for the base (x1i x2, ... , xn) of X

a(x1)= a11x1o(x2) = a21 x1 + p(x2)

a(xn) = an 1 x 1 + p (xn ).


This proves the first part of the theorem. The second part follows thenimmediately from the definition of the characteristic polynomial.

The matrix formulation of the above theorem is given in thefollowing corollary.

COROLLARY 20.2. For every complex square matrix A of order nthere is an invertible complex square matrix P of the same order suchthat the matrix PAP-' is in triangular form.

B. HAMILTON-CAYLEY theorem

As a first step towards the main result on JORDAN forms, we provethe famous HAMILTON-CAYLEY theorem on characteristic polynomials:

THEOREM 20.3. If PA is the characteristic polynomial of a squarematrix A over A (where A = R or A = C) of order n > 1, thenPA (A)=0.

PROOF. Let PA = 7nTn + + 7I T + yo. Then PA = det(B) whereB = A - TIn. In evaluating the determinant det(B), we can make useof the results of § 17F. Thus

det(B)II =where ad(B) is the adjoint matrix of B which is the transpose of thematrix of cofactors B11 of B. We recall that B,j is (-1)`+' times thedeterminant of the (n-1,n-l)-matrix obtained from B by deleting itsi-th row and j-th column. Therefore B.1 is a polynomial in T of degreen-l. Consequently ad(B) is an (n, n)-matrix whose entries are allpolynomials in T of degree n-1 and hence we can write ad(B) as apolynomial in T with matrix coefficients:

ad(B) = Bn-1 Tn-1 + ... + B,T + Bo

where the coefficients B; are (n, n)-matrices. It follows fromdet(B)II = B. ad(B) that

(7nI,)Tn + ... + (7iln)T + (7oln)_ (A-TIn)(Bn-,Tn-i + ... +B,T+B0)

= Bn-,Tn + (ABn-1- Bn-2) Tn-, + ... +(AB2 -B, )T2 + (AB, -B0)T + AB,.

254 VII EIGENVALUES

Comparing coefficients, we obtain

7nln = - Bn-1

7n-IIn = AB,-, - Bn-2

711, = AB 1 - Bo7o I, = AB,

If we multiply the first equation by A', the second by An 1 , thethird by An-2, ... , the last by In and add up the resulting equationswe get

,ynAn + 7n-1An-1 + ... + 71A + 7oln = 0,

and hence pA(A) = 0. This completes the proof of the theorem.

The endomorphism formulation of 20.3 is given in the followingcorollary.

COROLLARY 20.4. If pa is the characteristic polynomial of anendomorphism a of a linear space X over A, then pa(a) = 0.

Besides the HAMILTON-CAYLEY theorem, we shall also make use ofthe following theorem on factorization of characteristic polynomials.

THEOREM 20.5. Let a be an endomorphism of a linear space X overA of dimension n > 1 and Ya subspace of X invariant under a. If a'is the endomorphism of Y defined by a, i.e. such that a'(y) = a(y)for all yeY, then the characteristic polynomial p., of a is divisible bythe characteristic polynomial pa, of a'.

a

-ROOF. Let B = (x1 , ... , x xr+1 , ... , x,) be a base of X suchthat B' = (x1i ... , Xr) is a base of Y. Then the matrix M of arelative to B has the block form

o1

EI


where C is the matrix of a' relative to B', D is an (n-r, r)-matrix and E(n-r, n-r)-matrix. Consequently, if T is an indeterminate, then

M - TIn=C-TIr 0

D E - TIn- r

and

det(M - TIn) = det(C - TIr) det(E - TIn_r) .

Therefore pM = pCpE ; but pyr = pa and pc = po'. Hence pa isdivisible by pa-.

C. Canonical decompositionConsider an arbitrary endomorphism 0 of a complex linear space

X. By the fundamental theorem of algebra we can factorize thecharacteristic polynomial of 0 into linear factors; thus

P = (XI - T)ml (X2 - T)M2 ... (Xq - T)mq

where the eigenvalues X1, . . . , X. are all distinct. For eachj = 1, ... , q, we define Xi = Ker(( - Xjy" i. It follows from (¢ -), )mi_ (0 - Xi `/ 0 that Xi is a subspace invariant under 0.

Consider now each of these invariant subspaces Xi separately. OnXj 0 defines an endomorphism 0j (0j(x) = Ox for all x(=-Xj); in thisway the behaviour of 0 on vectors of Xi is manifested by that of 0j.We are now going to show that ci is the sum of a diagonalizable (orsemi-simple) endomorphism aj and a nilpotent endomorphism vi. Ingeneral an endomorphism v of a linear space is said to be nilpotent ifPs = 0 for some positive integers. We define aj and vi as follows

aj(x) = X jx and vi(x) = 0ix - Aix for all x E Xi .

Then aj = XiiX/ (or a, = X,) is clearly semi-simple and vi = Oi - Aiis nilpotent since Xi = Ker(0 - X1)mi and vim/ = 0. Thus 0i = ai + viis the sum of a semi-simple and a nilpotent endomorphism. We shall

256 Vii EIGENVALUES

see in Theorem 20.11 that relative to a suitable base of Xi the matrixof the nilpotent endomorphism vi takes the simple form

(0.E2 ' .

EP

where the terms c, immediately below the diagonal are either 0 or Iand all other terms are 0. Consequently the matrix of Oj relative tosuch a base takes the form

rX,

Al =

E2

EP XJIn other words, if we "localize" 0 to the invariant subspace Xj, it canbe represented by the matrix Al which is in simple enough form.Clearly this result simplifies the study of 0 a great deal; however theimportance of this result is much enhanced if we can furthermoreshow that X = X 1 ® ® XQ . Becuase this will mean that a base ofX can be found relative to which the matrix of 0 takes the so-calledJORDAN form

J


This leads us to prove the following theorem.

THEOREM 20.6. Let 0 be an endomorphism of a complex linearspace of dimension n > I and let

P0 = (A, - T)m' (A2 - T)m2 ... (Aq - T)mq

be the characteristic polynomial of 0, where the eigenvalues A.are all distinct from each other. Then for each i = 1, 2, . . . , q,X; = Ker(O-A,)mi is an m;-dimensional subspace of X invariant under0 and X is the direct sum of these invariant subspaces.

PROOF. For convenience of formulation, we introduce the followingabbreviations

f = (Xi - T)mi and gi = f1f2 ... ti ... fqfor i = 1, . , q, where fi under the symbol A is to be deleted. Ourproof will be given in three parts.Part I. X=X1 + .. +Xq.

Since the eigenvalues Ai are all distinct from each other, thegreatest common divisor of the polynomials g1 , ... , gq is the con-stant polynomial 1. Consequently, as we saw in § 18C, there arepolynomials h,E C[T] such that(1) g1h, + g2h2 + ... + gghq = 1.The 3q polynomials f , gi and h; determine 3q endomorphisms of X:

V i = fi(0), ti = g, (O) and 1h1(0) for i= 1, ... , q.From equation (1) we get

ti- J 1 + t2 ' J 2 + . . + tq o tq = iX ;

hence

(2) Im(

On the other hand we have

p0 = fig; f o r i = 1 , 2, ... , q;

and therefore, by the HAMILTON-CAYLEY theorem,

of-t; = f Mgi (0) = Po (0) = 0.

Consequently O,0t,0t, = 0 and C Ker 01 = X,. ThereforeX = X1 + X2 + + Xq follows from (2).

258 V11 EIGENVALUES

Part II. dim(X,) < mi.Since both and ; are polynomials of the same endomorphism 0,

we get Oo Oi = >)/10O. Therefore X; = Ker f is a subspace of Xinvariant under 0. Let dim(X,) = r and consider the endomorphism Oiof X1 defined by 0, i.e. O;(x) = ¢(x) for all x(=-X1. It follows from20.1, that a base B of X; exists such that relative to B the matrix of /,is in triangular form

1

a21 a22

('arl. . . . . . . arr J

We contend that the terms akk on the diagonal are all equal to X,. Tosee this, we recall that X; = Ker(tp - A,)mi and hence O'; = (O; - X,)'"i= 0. Therefore the matrix of 4', relative to the base B is the zeromatrix. But the terms on the diagonal of this matrix are

(al 1 -Xr)ml, (a22 -Xj)"1t, ... , (arr - X1 )m1,therefore a1 I = a22 = ... arr = X,. Hence for the characteristicpolynomial of 0i we get

PP

On the other hand, we know from 20.6 that the characteristic poly-nomial pp of 0 is divisible by ppi . Therefore r < m; .

Part III. X=X1 e X2 a ... eXy anddim(X,) = m;.Since pp _ (X1 - T)mI (A2 - T)m2 ... (Xq - T)mq and deg(Po) = n,

we get

On the other hand the sum X = Xl + .. + Xq yields the inequalitydim(X1) + dim(X2) + . . + dim(Xq) > n.

Therefore it follows from dim(X,) < m; and dim(X1) > 0 thatdim(X1) = mi. Suppose x1, y, eX; are such that

x1 + ... + xq =Y1 + ... + yq.


Then the vector x 1- y 1 = (Y2 _X2) + ... + (yq - xq) belongs toY= X1 n (x2 + ... + Xq). It follows from the dimension theorem thatdim X 1 + dim(X2 + ... + Xq) = dim X + dim Y. On the other handm2 + ... + mq > dim(X2 + ... + X ); therefore dim Y < 0 andhence x 1 = yl. Similarly we can prove that x2 = Y2, ... , xq = yq .Hence X = X1 ® 9 Xq. The proof is now complete.

The above theorem will be used in § 20D for the proof of theJORDAN theorem 20.12 which also needs other results on nilpotentendomorphisms. For the present we generalize the result discussedat the beginning of this section to the following theorem.

THEOREM 20.7. For each endomorphism 0 of a complex linearspace X there exist a semi-simple endomorphism a and a nilpotentendomorphism v of X such aov = voo and 0 = a + v; moreover aand v are uniquely determined by these conditions.

PROOF. Using the same notations as in the proof of 20.6, we definefor each x = x1 + ... + xq with xj eX

andax=X1x1+ ... + Xgxq

v x = (cbx 1 - X 1 x 1 ) + ... + (bxq - ?q xq ).

Then clearly a and v satisfy the conditions of the theorem.Suppose a' and v' are another pair that satisfies the conditions. Sincea' commutes with v', it commutes with 0; hence with (O, - X,)mi.Therefore a'Xj C X1. Since 0 - a' is nilpotent, its eigenvalues are allzero; therefore the eigenvalues of a' on Xi are the same as those of0, i.e. those of 0j. Since Oi has the unique eigenvalue Jy and a' is semi-simple, it follows that the restriction of a' to X. is given by scalarmultiplication by Aj. Therefore a' = a and hence V= - a' = - a = P.

The unique decomposition 0 = a + v satisfying the conditions of20.7 is called the canonical decomposition of 0. By theorem 20.7 thestudy of endomorphisms in general is now reduced to the study ofsemi-simple endomorphisms which we have done in § 19D and thestudy of nilpotent endomorphism which we shall carry out in § 20Dbelow.

D. Nilpotent endomorphismsWe recall that an endomorphism v of a linear space over A is said

to be nilpotent if vs = 0 for some positive integer s. The index ofnilpotence of a nilpotent endomorphism v is defined to the leastpositive integers such that Ps = 0.

260 V11 EIGENVALUES

By Theorem 20.7, our problem is now reduced to finding a suit-able base of each X; so that relative to these bases the matrices of thenilpotent endomorphisms 0, - 'X, and hence also the matrices of theendomorphism O, themselves take up the simplest possible form. Thiswe shall do in theorem 20.11 for the proof of which the followingthree lemmas on nilpotent endomorphisms are needed.

LEMMA 20.8. Let v be a nilpotent endomorphism of an arbitrarylinear space X with index of nilpotence s and let K; = Ker vi fori=0,1,...,s,then

(i)(ii) v[K,] CK1 _1 fori=1,2, ...,s;and

(iii) Kt-1 is a proper subspace of K, for i = 1, ... , s.

PROOF. (i) follows immediately from the definition of K, where v° is,by the usual convention, the identity endomorphism ix of X.(ii) follows from the equation vi = pi-' o P.() Suppose Ki-I = Ki for a fixed index j = 1, ... , s. Then it

follows from (ii) that

v[X] CKs-1,v2[X] CKs-2 , ...,vsf[X] CKi,

hence vs i[X] C Ki-1 by our assumption. Applying vi-1 to both sidesof this inclusion, we get vs-' [X] = 0. Therefore Ps-1 = 0, con-tradicting the definition of index of nilpotence.

LEMMA 20.9. Using the same notations as in 20.8, let Y be asubspace of X such that y n Ki = O for some j = 1, 2, ... , s-1. Then(i) v[Y] n K,.1 = 0 and (ii) v induces an isomorphism of Y ontov[Y].

PROOF. (i) Let v(y) E v[Y] n Ki-1 for some y E Y. Then vi(y) _P/-' (v(y)) = 0 and hence y E Y n Ki. Since y n Ki = 0, we gety = 0. Therefore v(y) = 0.

(ii). It is sufficient to show that if v(y) = 0 for some yEY, theny = 0. Since y r= Ker P = K1, by 20.8(i) we get y e Ki. ThereforeyEYnKiandhencey=0.LEMMA 20.10. Using the same notations as in 20.8. there exist ssubspaces Y1, Y2, ... , Ys of X such that

(i) K;=Ki_1 9 Y, fori=l,2, ...,s;


(ii) v induces an injective linear transformation of Y, intofor each i = 2, 3, ... , s; and

(iii)

Yi-i

PROOF. Let Ys be a subspace of Ks = X complementary to K,-,.Then Ks = Ks-1 ® Ys. Since Y n Ks_, = 0, by 20.9(i) we getv[YS] n Ks-2 = 0. Furthermore v[Ys] C Ks-1 by 20.8(ii). Hencethere exists a subspace Ys_, of Ks_, complementary to Ks_2 such thatv[Y5] C Ys-1. Thus Ks-1 = Ks-2 ® Ys-1 and v induces injective lineartransformation of Ys into Ys-1 by 20.9(u). Again we getYs_, n Ks_2 = 0 and v[Ys_,] C Ks_2, therefore we can proceed tofind subspaces Ys-2, Ys-a, , Y, = K, that satisfy condition (i)and (ii). Statement (iii) follows readily from (i) and (ii).

THEOREM 20.11. If v is a nilpotent endomorphism of an arbitrarylinear space of dimension n > 1, then there exists a base of X relativeto which the matrix of v is of the form

EZ

where the terms e, immediately below the diagonal are either 0 or 1and all other terms are 0.

PROOF. Let Y, , Y2, ... , Y. be s subspaces of X satisfying theconditions of Lemma 20.10. Let (y 1, ... , ya) be a base of Y..

By condition (ii) of 20.10, we can get a base of Ys_, that includesthe images of the base of Ys. Let

v(y1), . . .,V(Ya);za+1, ...,zb

be such a base of Ys_1. Similarly there is a base

v2(y1), ...,v2(ya);v(Za+1), ..., v(Zb);tb+1, ...,tc

262 VII EIGENVALUES

of Ys-2 . Obviously we can continue this procedure until we find abase

VS-I(YI), ... , Ys-I(ya);vs-2(za+1), ... , vs-2(zb);vs-3 (tb+0, ... , VS-3 (to ); .... Ud

of YI = KI . (Observe. that the vectors of this base are taken into zeroby a since they belong to KI = Ker v.)By 20.20(iii), X is the directsum of Y1, Y2, .. , YS ; therefore the vectors

vs-1 (y1 ), ... , pS-1 (Ya); vs-2(za+ 1), . . . , vs-2(zb); VS-3 (tb+I ), ... ;

vS-2 (Yr), ... , vs-2(Ya); vs-3(za+1), ... , vS-3(zb); .......... ;...........................................

v2(Y1)....., v2(ya); v(Za+I)....... v(zb);tb+1v(Y1)....... v(Ya); za+1,......., zb;

Y1, ..... , ya.

.,tc;

form a base of X. Rearranging these vectors column by column fromthe left into P'-1(Y1), . . . , v(YI), YI; vs-1 (Y2),

, v(Y2), Y2;... ya; vs-2(za+1), ... , zb; ... Ud, we get a base (x1 , x2, ... , x,1)such that

either v(x1) = 0 or v(x1) = x;-1.

Therefore the matrix of v relative to this base has the required form.

E. JORDAN theorem

We have encountered diagonal matrices and triangular matrices inour previous endeavour; lying between them in the order of"simplicity" are the reduced matrices or the elementary JORDANmatrices. A square matrix of the form.

l AJ


where all terms on the diagonal are equal to a fixed scalar X, all termsimmediately below the diagonal are equal to 1 and all other termsare 0 is called a reduced matrix or an elementary JORDAN matrix witheigenvalue X. For example the following three matrices

Ia 01 Ix 0 01

1 X 0t

xL0 1 XJ

are reduced matrices of orders 1, 2 and 3 respectively. In particular,the zero square matrix of order 1

(0)

is also a reduced matrix.We are now in the position to formulate the main result on

endomorphisms of finite-dimensional complex linear spaces.

THEOREM 20.12. Let 0 be an endomorphism of a complex linear spaceX of dimension n > 1. There exists a base of X relative to which thematrix of 0 is of the JORDAN form

(Al

AJ

where each A. is a reduced complex matrix and all other terms are 0.

PROOF. Since the characteristic polynomial po of 0 belongs to C[T],we may suppose

po = (XI - T)" ()2 - T )m2 ... (Xq - T)mq

where the eigenvalues Xi are all distinct from each other. According toTheorem 20.6, X is the direct sum of the subspaces Xi = Ker(¢ - I\i)'ifor j = 1, 2, . . . , q, each of which is invariant under 0. If foreach X. a base B. can be found relative to which the matrix of theendomorphism 0,- of Xi induced by 0 is of the JORDAN form, then thebases of the subspaces Xi constitutes a base of X relative to which

264 VII EIGENVALUES

the matrix of 0 is of the JORDAN form. Therefore we need onlyexamine the endomorphisms 0i of Xi. The endomorphism

vi = 0i - Ai

of Xi is nilpotent, therefore by Theorem 20.11 a base Bi of Xi can befound relative to which the matrix of vi is of the form

(0

E2

Er Jwhere Ei is either 0 or 1. Therefore relative to the same Bi the matrixof ¢i is of the form

rXi

E2

which is of the required form. For example, if this matrix is

1 0 0

l oil

10L --1 ---


then we can write this as

0 0 0

0 A2 0 0

0 0 A3 0

L 0 0 0 A4)

where

X/ 0 Aj 0 0

0A, = ; A2 = A3 = (X,); A4 = 1 01

0 1 Xj

This proves Theorem 20.12.

Finally we note that in Example 19.8, the matrix ofthe differential endomorphism D of P, relative to the base(1, T, 2 . T 2 , ... , (n 11) i T'-') is in JORDAN form.

F. Exercises

1. Find the Jordan form of the following matrics.

(i) 1 2 0 (ii ) 3 0 8

0 2 0 3 - 1 6

-2 - 2 -1 -2 0 -5

(iii) 3 7 -3 (iv ) 1 1 -1

1-2 - 5 2 -3 - 3 3

-4 - 1 0 3 -2 - 2 2

(v) 0 3 3 (vi ) 3 1 0 0

1 8 6 -4 - 1 0 0

2 - 1 4 -10 7 1 2 1

I-7 - 6 -1 0

266 VII EIGENVALUES

(vii) 1 2 3 4 (viii) 1 -3 0 3

10 1 2 3

1-2 6 0 13

0 0 1 2 0 -3 1 3

0 0 0 1 -1 2 0 8

(ix) ( 0 i

0

where all the unspecifies terms are zero.

2. Prove that a nilpotent linear transformation of a finite dimensionalvector space has trace zero.

3. Let gyp, be endomorphisms of a linear space X. Prove thatix + po41 - > Io p is not nilpotent.

4. Let p be an endomorphism of a complex n-dimensional linearspace X. Suppose the matrix of p relative to a base B =(x I, x2 , ... , x,) is in elementary JORDAN form. Prove that(a) X is the only invariant subspace of 'p that contains x ;(b) all invariant subspaces of 'p contain x, ;(c) X is not a direct sum of two non-trivial invariant subspaces.

CHAPTER VIII INNER PRODUCT SPACES

We began in Chapter I by considering certain properties of vectorsin the ordinary plane. Then we used the set V2 of all such vectorstogether with the usual addition and multiplication as a prototypelinear space to define general linear spaces. So far we have entirelyneglected the metric aspect of the linear space V2 ; this means thatwe have only studied the qualitative concept of linearity and haveomitted from consideration the quantitative concepts of length andangle of a linear space. Now we shall fill this gap in the presentchapter. We use the 2-dimensional arithmetical real linear space R2 asa realization of V2 and consider the bilinear form on R2 (in Example16.3) defined by the inner product.

(xly)=a1131 + a2132 wherex=(a1,a2) andy=(81,fs2)

Then according to the usual distance formula of plane analyticgeometry, the distance between x and y is given by the positive root

(a1 - R1 )2 + (a2 - R2 )2 . Therefore it can be expressed in terms ofthe inner product as

11X-Y11 = x-yx-y.It turns out that the cosine of the angle between x and y can also

be expressed in terms of the inner product as suggested in Example16.4.

A

Fri

11X11 a2

a1, a2)

Fig 27 Fig 28

267

268 VIII INNER PRODUCT SPACES

Let 0 (respectively w) be the angle between x (respectively y) andthe first coordinate-axis. Then we get for the sine and cosine of theseangles the following expressions:

cos 0 =

cos w

sin 0 =

sin w=

Hence the cosine of the angle w - 0 between x and y is given by

cos(w - 0) = cos w cos 0 + sin w sin 0 = al R1+a,292

_ (x 1Y)

Ilxll IIYII Iixll Ilvll

We shall now use R2 with its inner product as a model and enter in-to detailed study of linear spaces with inner product. The real and thecomplex cases are now treated separately. In §21 and §22 we studyeuclidean space where the underlying linear space is a real linearspace. In §23 we study the unitary space whose underlying linearspace is a complex linear space.

§21. Euclidean Spaces

A. Inner product and normLet X be a real linear space and 4): X 2 -* R a bilinear form. We say

that (i) (P is symmetric if 4) (x, y) = 4) (y, x) for all vectors x andy of X and that (ii) 4) is positive definite if 4)(x, x) > 0 for allnon-zero vectors x of X. The bilinear forms of Examples 16.3 and16.4 are both symmetric and positive definite.

A euclidean space is an ordered pair (X, (D) consisting of a finite-dimensional real linear space X and a positive definite symmetricbilinear form 4) of X. When there is no danger of confusion about thebilinear form 4), we shall simply denote the euclidean space (X, (P)

by X. Vectors of the euclidean space X are then vectors of theunderlying linear space X. For 4)(x, y) we write (xly) and call it theinner product of the vectors x and y of the euclidean space X.

§21 EUCLIDEAN SPACES 269

Under these abbreviated notations, a euclidean space is a finite-dimensional real linear space X together with a real valued functionthat assigns to each pair x and y of vectors of X a real number (x1 y)such that the following axioms are verified for all vectors x, y and zof X and all real numbers A:

[El] (xly) = (ylx);[E2] (x+y Iz) = (xIz) + (yIz);[E3] (Axly) = A(xly); and[E4] (xlx) > 0 ifx*0.

It follows that if X is a euclidean space and Y is a subspace of thelinear space X, then Y is also a euclidean space with the innerproduct defined in the obvious way.

For any vector x of a euclidean space X, the non-negative squareroot (x x) is called the norm or length of the vector x and is de-noted by llxll in the sequel. It follows from axiom [E4] that, forany vector x of a euclidean space X, IIxit = 0 if and only if x = 0.In other words 0 is the only vector with norm equal to 0. Moreoverwe can verify that for any two vectors x and y of a euclidean space Xand any real number A the following equations hold:

(a) IlXxll = IAI Ilxll(b) IIx + y112 = IIXII2 + I1y112 + 2(xly)(c) Ilx + yll2 + llx -y112 = 2(11x112 + llyll2)

In (a) the expression IA I is the absolute value of the real number A.Equation (b) is called the cosine law from which it follows that theinner product of two vectors can be expressed in terms of theirnorms. Equation (c) is called the parallelogram identity.

Before we study euclidean space in more detail let us pause here toconsider some examples.

ExAMPLE 21.1. We consider the real linear space V2 of all vectorson the ordinary plane with common initial point 0. We definedearlier in Example 16.4 for any two vectors x = (0, P) andy = (0, Q) of V2 their inner product as (x ly) = p q cos 0 wherep and q are the lengths of the fine segments OP and OQ and 0 = 4POQ.This inner product obviously verifies the axioms [El ] to [E4]. Thusthe real linear space V2 endowed with this inner product is aeuclidean space.


EXAMPLE 21.2. In the n-dimensional real arithmetical linear spaceR" , we define for vectors x = (a1, ... , an) and y On)their inner product as

(x 1y) = Q1 91 + ... + anon

which satisfies the axioms [E1 ] to [E4] . R" endowed with this innerproduct is a euclidean space.

EXAMPLE 21.3. If X is a euclidean space and (x1i ... , xn) is a baseof the linear space X, then we obtain a square matrix M = p** oforder n whose terms are

pij = (xi I xj) i, 1 = 1, ... , n.It follows from [E1 ] that the matrix is symmetric, i.e. M =MI. Forvectors x =a, x, + ... +anxn andy =131x1 + ... +Onxn ofXweget from [El ], [E2] and [E3] that

(xly) = 1pijaiIji,j

Furthermore it follows from [E4] that the matrix M is positive de-finite, i.e. E pi jaiaj > 0 for any non-zero n-tuple (a1, . . . , an ). There-fore relative to a base (x1, . . . , xn) the inner product can be ex-pressed by means of a symmetric, positive definite matrix.

Conversely if X is a real linear space, (x 1i . . . , x,) a base of Xand M = p** a symmetric positive definite matrix, then we candefine an inner product by

(x ly) =

where x = uix1 + + anxn and y = QIx1 + + Qnxn . In this wayX becomes a euclidean space.

In particular, if X1, 12 , ... , Xn are positive real numbers, then Xendowed with the inner product defined by

(x 1y) = X1a1(31 + ... + XnanQn

where x = aix1 + + anxn and y = 81x1 + + Rnxn is aeuclidean space.

EXAMPLE 21.4. Consider the four-dimensional real arithmeticallinear space R4 . A symmetric bilinear form, called LORENTZ'S form, isdefined by

(xly) = a1/31 + a2(32 + a3f33 -c2a4R4


where x = (a1, a2, a3, a4) and y = 01 , ( 3 2 , ( 3 3 , Q4) are vectors of R4and c is a constant (in applications to special relativity, c is thevelocity of light). Lorentz's form fails to be positive definite. We saythat the real linear space R4 together with LORENTZ'S form constitutea pseudo-euclidean space, which is very useful in the theory of re-lativity.

EXAMPLES 21.5. (a) In the infinite-dimensional linear space F of allreal-valued continuous functions defined on the closed interval[a, b I, where a < b, we define an inner product by:

b

(f 19) =J

f(t)g(t)dt.I,It is not difficult to verify that the axioms [E1 I - [E4] hold for theinner product (f Ig).

(b) Consider the set H of all sequences x = (xi)i= 1, 2, , of realnumbers such that the series.

Ix1I + Ix21 + Ix31 + ...

is convergent. For x = (x1)t = 1, 2, . . . and y = (yi)i=1,2, , , . of Hand for any real number A, we define

and

x + y = (xi + Y0i=1, 2, .. .

Ax = (Axi)i=1, 2, .. .

Then with respect to the addition and multiplication above, H is aninfinite-dimensional real linear space. Furthermore we define aninner product by

(xiY) _n im0(x1Y1 + ... +

This inner product satisfies the axioms [E 1 ] - [E4].However both F and H fail to be euclidean spaces since they are

not finite-dimensional. They are examples of Hilbert spaces, whichare studied in functional analysis and have many important applica-tions in the theory of quantum mechanics.

B. Orthogonality

The most important and interesting relation between vectors of aeuclidean space is orthogonality, by virtue of which we can express


the metric properties of the euclidean space in a most convenientform. Two vectors x and y of a euclidean space X are said to beorthogonal or perpendicular (notation: x 1 y) if (xly) = 0. It followsfrom the positive definiteness of the inner product that the zerovector 0 of X is the only vector of X that is orthogonal to itself.

Consider now a family (x1, . . . , xp) of vectors of a euclidean spaceX. If .(x, lxi) = 0 for all i 0 j, i.e. if the vectors of the family are pair-wise orthogonal, then we say that the family (x1 , ... , xp) is anorthogonal family. It follows from the cosine law that a generalizedPythagoras theorem:

IIx1 + ... + XP 112 = 11x1112 + ... + II xp 112

holds for an orthogonal family (x1 , ... , xp ).

If the vectors of an orthogonal family F = (x1 , ... ,xp) are allnon-zero vectors, then the family F is linearly independent. For ifX1xI + ... + Apxp = 0, then

0 = (X1xl + ... + Apxp lxi) = X1(xl 1x1) + ... +ap(xplxi)

= X.(xiIx,)

for all i = 1, 2, ... , p. Since x{ * 0, 0 0. Hence X. = 0 for alli = 1, 2, ... , p, proving the linear independence of the family F.Replacing each vector x; of the family by y, = xi/l1xill, we get anorthogonal family F' _ (y I, ... , yn) of vectors of unit length,i.e. 11y; ll = 1 for all i = 1, 2, ... , p. For this family the equation(yi l y1) = S,j holds for all i and j.

In general we say that a family (y1, yp) of vectors of aeuclidean space X is an orthonormal family if (y,Iy1) = S,, for alli, j = 1, . . . , p. It follows that orthonormal families are linearlyindependent. If an orthonormal family (y1, ... , yn) is a base of theeuclidean space X, then we call it an orthonormal base of X. Theadvantage of using an orthonormal base (y1, ... , yn) of X as a baseof reference is self-evident since for vectors x = aI y I + ' + an y,and y = Ply, + + Rnyn of X we get

and(xly) = a, A, + ... +anjn

IIxU= ale+...+an2Let us now show that in every euclidean space there are always

"enough" orthogonal vectors to operate within comfort.


THEOREM 21.6. In every euclidean space there exists an orthonormalbase.

PROOF. We prove this theorem by a constructive method known asthe GRAM-SCHMIDT orthogonalization method. Since the underlyingreal linear space of a euclidean space X is a finite-dimensional linearspace, we can assume the existence of a base (z1 i . . . ) z,) of X andproceed to find vectors y1 , ... , y of X successively such that theyform an orthogonal base of X. To begin the construction we put

y1 =z1.

Then y1 : 0. Next we want to find a real number A such that thevector

Y2 =22 +Xy1

is orthogonal to the vector yl. This amounts to solving for X in theequation

(z2IY1)+X(Y1IY1)=0.

Since y 1 = 0, we get X = -(z2 iy1)/(yl Iy1). Moreover y2 * 0, forotherwise z1 and z2 would be linearly dependent.

Fig 29

The third step of the construction is to find real numbers y and vsuch that the vector

Y3' Z3+µy1 +vy2


is orthogonal to both y, and Y2. This amounts to finding realnumbers p and v such that

(z3 [l'1) + µ(Y1 IY1) = 0and (z3IY2) + v(Y2IY2) = 0.

Since both y 1 and y2 are non-zero vectors, these equations can besolved and we get p = - (z31Y1)/(Y11Y1) and v = - (z3IY2)/(Y21Y2)Moreover the linear independence of vectors z1 , z2, and z3 impliesthat y3 * 0. Carrying out this process to the n-th step, we finally getan orthogonal family (y1, Y2, ... , of n non-zero vectors.Therefore the vectors

Yix, =

11y111i=1,2, ...,n

form an orthonormal base (x 1i ... , x,,) of X.

One of the nice features of the GRAM-SCHMIDT orthogonalizationmethod is that the vectors xi of the orthonormal base are con-structed successively from the first i vectors z1, z2, ... , zi of thegiven base. Consequently a strong version of the supplementationtheorem holds: if Y is a subspace of a euclidean space X, then everyorthonormal base (x1, ... , xp) of Y can be augmented to form anorthonormal base (x 1i ... , xp , xp+ 1 , ... , of X-

A closer inspection of the proof of the above theorem reveals that ifwe denote by Yi the subspace generated by the vectors z 1, z2, . . . , zi,then the construction of the i-th vector yi consists in finding anon-zero vector in Yi fl (Yi_ 1)1 where (Yi_ 1)1 is the set of all vectorsof X that are orthogonal to each vector of Y1_1.

In general we consider, for an arbitrary subset U of a euclideanspace X, the subset U1of all vectors of X that are orthogonal to eachvector of U, i.e. the set

U1= {xEX: (x(y) = 0 for all yEU} .

We verify without difficulty that U1 is a subspace of X, andmoreover if Y is the subspace of X generated by the subset U, thenY1 = U1. We call Y1= U1 the orthogonal complement in X of thesubspace Y or of the subset U. Corresponding to the resolutionof aforce into a sum of perpendicular components in elementarymachanics we have the following theorem which also justifies theterm orthogonal complement.


THEOREM 21.7. If Y is a subspace of a euclidean space X, thenX=Y®YI.

PROOF. Since for the case where Y = 0 the theorem is trivial, we cansuppose Y * 0. Any vector of the intersection y n Yl is, by definit-ion, orthogonal to itself; therefore it must be the zero vector of X.Hence Y fl Yl = 0. To show that X = Y + Yl, we first select anorthonormal base (yl, ... , yp) of Y. For an arbitrary vector x of X,let y = Xlyl + ... + Xpyp where A, = (xlyt) for i = 1, 2, ... ' p. Thenz = x - y is a vector of Yl since (z ly,) = 0 for all i = 1, 2, ... , p.Hence the vector x of X can be written as x = y + z where y and zare vectors of Y and Yl respectively.

For any vector x of X the unique vector y of the subspace Y ofX such that

x = y + z where yEY and

is called the orthogonal projection on Y of the vector x. ByPythagoras' theorem, we get

Ilxll = Ily112 + IIz112

and consequently we have proved the following corollary.

COROLLARY 21.8. Let X be a euclidean space and Y a subspace ofX. Then for each vector x of X and its orthogonal projection y onY BESSEL s inequality:

IIx11 3 Ilvll

holds. Moreover the equality sign holds if and only if x = y.

C. ScuwARz's inequality

In order to define the cosine of the angle between two vectors xand y of a euclidean space X by the expression

(xly)Ilxll hull

as we have done at the beginning of this chapter, we have first toverify that its absolute value is not greater than one. This we doin the next theorem

276 Vill INNER PRODUCT SPACES

THEOREM 21.9. Let X be a euclidean space. Then for any vectors xand y of X, SCHWARZ'S inequality

I(xIv)I < IIxII IIYII

holds. Furthermore, the equality sign holds if and only if x and yare linearly dependent.

PROOF. If y = 0 then the theorem is trivially true. Suppose nowthat y is a non-zero vector. Then, by 21.7, we can write x = Xy + zwhere A = (xly)/Ilyll2 and z is orthogonal to y. By Bessel's inequality21.8, we get IIXYII < IIxII. Hence

I (xly)I/IIYII < 11X II

and SCHWARZ'S inequality follows. For the second statement of thetheorem, we observe that by 21.8 the equality I(xly)I = IIxII 11IIholds if and only if x = Xy i.e. if and only if x and y are linearlydependent vectors.

Let us now study some important consequences of SCHWARZ'Sinequality. For any non-zero vectors x and y of a euclidean spaceX the expression (x1Y)/IIxII 1111 is defined and satisfies the follow-ing inequalities:

(xly)-1 IIxII IIYII

< 1.

Therefore we can formulate the following definition.

DEFINITION 21.10 Let X be a euclidean space and x, y twonon-zero vectors df X. Then the cosine of the angle 0 (0 < 0 < 7r)between x and y is

cos 9 = (xly)IIxII IIYII

We notice that'xly if and only if the angle between x and y is it/2and that x and y are linearly dependent if and only if the anglebetween them is 0 or a. The cosine law of the inner product,

Ilx +y112 = IIxI12 + I1yll2 + 2(xIy),

can now be written as follows:

Ilx +y112 = 11x112 + Ilyll2 + 211x11 11Y 11 cos 0.


In the case where X is the euclidean space V2 or R2 , the abovedefinition of angle coincides with the usual one in elementarygeometry and the cosine law above coincides with the usual cosineformula of a triangle ABC in trigonometry:

c2 = a2 + b2 - 2ab cos C.

--------'.\

x+y

The difference in signs is due to using C = 7r - 0 instead of 0 inconsidering the angle between two vectors.

Another important geometric consequence of SCHWARZ'S inequalityis the well-known triangle inequality.

COROLLARY 21.11. Let X be a euclidean space. Then for anyvectors x and y of X, the triangle inequality

IIx +y11 < 11X 11 + Ily ll

holds. Furthermore, the equality sign holds if and only if x = 0 ory = 0 or x = Xy for some positive number X.

PROOF. It follows from the cosine law and Schwarz's inequality that

IIx +yl12 = IIxl12 + Ilyl12 + 2(xiy)

< 11x112 + I1y112 + 2lix11 Ilyii

= (11x11 + Ilyll)2.

Hence the triangle inequality follows. To discuss the equality sign,we may exclude the trivial case where x = 0 or y = 0. If IIx + y II =Ilxll + ilyll, then, by the cosine law, we get

11x112 + IIyI12 + 2(x1y) = IIXII2 + Ilyll2 + 21lxil Ilyll,

and hence(xly) = IIxllllyll.


By 21.9, the non-zero vectors x and y are linearly dependent, andhence x = Xy for some real number A. Therefore it yields

X(yly) = IAlilyll2.Hence A = IXI, i.e. A is a positive real number. Conversely if x = Xy fora positive real number A, then

Ilx+yll = ii(X+ l)yll = (A+ 1)IIyAI

= AIIYII + Ilyii = Ilxll + IIYII.

Then proof of the theorem is now complete.

D. Normed linear space

Lying between the concept of linear space and the concept ofeuclidean space in the order of generality, there is the concept ofnormed linear space. A normed linear space is an ordered pair (X,n)where X is a real (finite-dimensional) linear space and n : X -> R is amapping such that

(i) n(x) > 0, and n(x) = 0 if and only ifx=0(ii) n(Ax) = lXln(x)

(iii) n(x +y) 5 n(x) + n(y).

The non-negative real number n(x) is called the norm of the vectorx. If X is a euclidean space and if we define n(x) = Ilxll, then (X,n) isa normed space. In other words, every euclidean space is a normedlinear space. The converse of this is not necessarily true: given anormed linear space (X,n) there does not always exist a positivedefinite symmetric bilinear form (xly) such that Ilxll = n(x). Anecessary condition for (X,n) to be a euclidean space is that thenorm function n satisfies the parallelogram identity:

n(x + y)2 + n(x - y)2 = 2(n(x)2 + n(y)2 ).

It turns out that this condition is also sufficient. For the interestedreaders we refer to JOHN VON NEUMANN, Functional Operators, Chap-ter XII (Princeton University Press).

E. Exercises

1. In R4 find a unit vector orthogonal to. (1, 1,-1, 1), (1, 1),(2, 1, 1, 3).


2. Find an orthonormal base of the solution space of

3X1-X2-X3+X4-2X5=0X1+X2-X3 + X5=0

3. Let B = (x1 , ... , xn) be an orthonormal base of a euclideanspace X. Apply the GRAM-SCHMIDT orthogonalization processto the base (x1 , x1 + X21 x1 + x z + x3 , ... , x1 + X2 + + Xn )to get an orthonormal base C. Determine the matrix of thechange from B to C.

4. Let M be a real (n,n)-matrix such that det M 0 0. Show thatthere exist unique real (n, n)-matrices Q and T such that

(i) M = QT(ii) the row vectors of Q form an orthonormal base of Rn

(iii) all terms of T below the diagonal are zero(iv) all diagonal terms of T are possitive.

5. Let Y be a subspace and x an arbitrary vector of a euclideanspace X. Suppose (x1, ... , Xm) is an orthonormal base of Y.Prove that the orthogonal projection of x is

(xIx1 )x1 + ... + (XIXm )Xm .

6. Let el , ... , e,j be n vectors of a euclidean space X such that(i)

(ii)1k111 = I for all i = 1, ...11e1- e111 = 1 for all i * j.

, n, and

Find the angle between e; and ej and the angle betweens - ei and s + ei where

s=(e1+ +en)/(n+1).Find a geometric interpretation of the vectors for n = 2 andn = 3.

7. In the linear space of all polynomials in T with real coefficientand of degree < 4 an inner product is defined by

(f18) =,f if(x)g(x)dx.

Find an orthonormal base.


8. Let x1, ... , xm be m vectors of an n-dimensional euclideanspace and

'(x1Ix1) (x1Ix2) . .. . (xllxm)

0 =

(xmlX1) (xmIx2) .... (xmlxm) I .

Prove that A * 0 if and only if x1, ... , xm are linearindependent.

9. The distance d(x, U) from a vector x to a subset U of a euclideanspace is given by d(x, U) = min { IIx - u II : uE U} . Show that if Uis the orthogonal complement of a unit vector e, thend(x, U) = I(xle)l.

10. Let X be a euclidean space. For every ip e End(X) define thenorm IIoII of p by max {llpxli : IlxII = 1). Prove thatIl'p+ 'II < 11 L11 and 114, 11

11. Let X be a euclidean space and z a fixed non-zero vector of X.Determine the set of all vectors x of X such that x-z isorthogonal to x + z.

§ 22. Linear Transformations of Euclidean Spaces

A. The conjugate isomorphism

Let X be a euclidean space. For each vector x of X, we denote byax the covector of the linear space X defined by

ax (y) = (y Ix) for all y EX.

A linear transformation a: X - X * of the linear space X into its dualspace X* is then defined by

a(x) = ax for all xEX.

By the positive definiteness of the inner product, the linear trans-formation a is injective. On the other hand, since X is a finite-dimensional real linear space, the dual space X* of X has the samedimension as X. Therefore a is also surjective and hence it is anisomorphism. Thus we have proved the following theorem.

§22 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 281

THEOREM 22.1. Let X be a euclidean space and X* the dual space ofthe linear space X. Then the linear transformation a: X -> X* suchthat

a(x) = ax and ax(y) = (ylx)

for all x, y of X is an isomorphism.

As a consequence of the theorem, an inner product in the dualspace X* is defined by

(fig) = (a-' (g) la' (f)) for all f,and the dual space X* becomes a euclidean space itself. We call theeuclidean space X * the dual space of the euclidean space X. Theisomorphism a of Theorem 22.1 is called the conjugate isomorphismof the euclidean space X onto its dual space.

Another consequence of Theorem 22.1 is that we may regard, bymeans of the conjugate isomorphism a, the euclidean space X as'self-dual'. It is easy to see that if (x, , ... , x,) is an orthonormalbase of X and (fl, ... , f,) is the base of X* dual to (xl , ... , x1)then a(x;) = fi for i = 1, ... , n. Hence any orthonormal base maybe regarded as 'self-dual'. However we shall not press this point ofview any further here. Finally we observe that the reverse of theorder of f and g in the equation (f 1g) = (a'(g)Ia'(f)) is deliberate,so that the same formula is also used for the complex unitaryspace (see §24B).

REMARKS 22.2. In his famous treatise, The Principles of QuantumMechanics, P.A.M. DIRAc uses the ket vector Iy> to denote a vectory of a euclidean space X and the bra vector (xl to denote the covectorax of X. In this way, every ket vector determines uniquely a bravector, and vice versa; a complete bracket (xiy) is then the innerproduct ax(y) = (xly).

Analogously, we have a conjugate isomorphism a': X* -+ X** ofthe euclidean space X* onto its dual. It is now interesting to comparethe composite linear transformation a'oa: X - X** of the conjugateisomorphisms with the functorial isomorphism tX: X -+ X** ofTheorem 8.3 which is defined by

tX(x) = Fx and Fx(f) = f(x) for all xEX and feX*.We contend that a'oa = tX. Before proving this let us observe thatfor any xEX and feX*

f(x) = (xla'(f)).


Now for any xEX and any we get

((a'oa) (x)) (f) = (a(x)If) _ (xI d-' (f)) = f(x),

proving FX = (a'o a) (x) and hence tX = a'o a.To summarize: Everything we say about a euclidean space X holds

for the euclidean space X*; and X is also in a functorial conjugateisomorphic relation with its second dual space X**.

B. The adjoint transformation

The conjugate isomorphism a: X -> X* of a euclidean space ontoits dual space (or rather its inverse a') enables us to interpretevarious properties of certain covectors of X as properties of certainvectors of X. For example the properties of the annihilator AN(Y) ofa subspace Y of X given in Theorem 8.5 can be interpreted as pro-perties of the orthogonal complement Yl of Y.

Another application of the conjugate isomorphism leads us to thevery important concept of the adjoint of a linear transformation. Werecall that the dual transformation 0* of a linear transformation0: X -+ Y is defined as a linear transformation 0*: Y* X* such that

0*(g) = goo for allgEY*,

or diagrammatically

X-Z0---+ Y

RSuppose now that both X and Y are euclidean spaces. If we denote

by a: X X* and 0: Y Y* the conjugate isomorphisms, then foreach linear transformation 0: X -> Y of linear spaces, we get a lineartransformation ¢ = a' oo*o$3, called the adjoint of the linear trans-formation ¢. *

Y*-- X*

Y --------'X


Now for each pair of vectors xEX and yeY, we have an innerproduct (xl¢(y)) of the euclidean space X and an inner product(t(x)Iy) of the euclidean space Y. We shall prove that the adjoint ¢of the linear transformation 0 is characterized by the equality ofthese two inner products:

(O(x)ly) = (x1(y)) for all xEX and yeY.

Indeed

(xla `o$*oj3(Y)) = (xla1(0*(ly)))_ 0*(Qy)(x) = (QyoO)(x) = Qy(O(x))_ (0(x)IY).

If 4j, and 4/2 are two linear transformations of Y into X such that(O(x)lv)=(xliP,(y))and(O(x)ly)=(xI02(y)), then xl(#1-02)(y)) = 0for all xeX; therefore (P1 - 412)(y) = 0 for all yEY. Hence if/, = 1'/2.We formulate our results in the following theorem:

THEOREM 22.3. Let X and Y be euclidean spaces and 0: X - Y alinear transformation of linear spaces. Then there exists a uniquelinear transformation $: Y -> X, called the adjoint of the lineartransformation 0 such that

(0(x)Iy) = (xI(Y))for all vectors x of X and y of Y.

The formation of the adjoint satisfies the following properties:

(a) lx = ix ;

(b) 0.0 _oThese follow from

(ix(x)iy) = (xlix(y)) forx,yEX,and

(I,o0(x)Iz) = WO(x)AZ) = (0(x)I RZ)) = (x*Rz)) = (xIOokZ))Let us now examine the formation of the adjoint in the language

of categorical algebra. If we put (i) A(X) = X for every euclideanspace X, and (ii) A(0) = for every linear transformation 0: X -* Y,then we get a contravariant functor A of the category of all euclideanspaces (where morphisms are linear transformations and composition


has the usual meaning) into itself. Moreover for all x c= X and Y E Y,we get

('(x)ly) = (yl(x)) = a(y)Ix) = (xl(y)) = (cb(x)Iy)Therefore

(c)This means that the functor A is idempotent, i.e. A2 is equal to theidentity functor of the category of euclidean spaces.

Equally self-evident are the following equations:

(d) aO- = X ; and

(e) 0 + +

Finally let us compare the functor A with the contravariantfunctor D of §8B defined by (i) D(X) = X* for every euclidean spaceX, and (ii) D(¢) = ¢* for every linear transformation of euclideanspaces. A natural isomorphism u: A - D is defined by puttingu(X): A(X) - D(X) to be the conjugate isomorphism a: X -> X*.The requirement that u, as a natural transformation, has to satisfy isthat the diagram u (X)

A (0) D (0)

A(X) - ) D(X)

A(Y) D(Y)U(Y)

is commutative for every linear transformation 0: X -> Y. But thisfollows from the definition of the adjoint A (¢) = y5 of 0.

ax X*

Y0 Y*

No wonder that the theory of the adjoints runs so parallel to thatof the dual transformations!


C. Self-adjoint linear transformations

Since a euclidVan space X is a finite-dimensional real linear spacewith an inner product, the endormorphisms of the linear space X canbe classified according to certain properties relative to the innerproduct. A most important class of endomorphisms of X is that ofthe self-adjoint transformations; these are defined as follows:

DEFINITION 22.4. Let X be a euclidean space. Then an endomor-phism 0 of the linear space X is a self-adjoint (or symmetric) trans-formation of the euclidean space X if _ or equivalently if(0 (x)ly) _ (xjO(y)) for all x,

Let us now study the matrix of a self-adjoint transformation re-lative to an orthonormal base.

THEOREM 22.5. If 0 is an endomorphism of a euclidean space X and(x, , ... , xn) is an orthonormal base of X, then the matrix M = a**of 0 relative to this base is given by

ai = (0(xi)Ix1) i, j = 1, 2, ... , n.

PROOF. If follows from the orthonormality of the base (x1, ... , xnthat if x = X1 x1 + .. + Xnxn is a vector of X, then X. = (xlxi) fori = 1, ... , n. Therefore we get

0(x1) _ (O(xi)lxl )x1 + ... + (O(x1)Ixn)xn

for i = 1, 2, ... , n and the theorem follows.

COROLLARY 22.6. Let X be a euclidean space, B = (x 1, ... , x )

an orthonormal base of X. Then for any endomorphism 0 of X andits adjoint , the following statements hold

(a) MBB () = MBB (ct)l

(b) det $ = det 0.

COROLLARY 22.7. Let X be a euclidean space, and $ an endormor-phism of the linear space X. If 0 is a self-adjoint transformation ofthe euclidean space X, then the matrix of 0 relative to any orthonor-mal base of X is a symmetric matrix. Conversely, if the matrix of 0relative to some orthonormal base of X is a symmetric matrix, then 0is a self-adjoint transformation of the euclidean space X.


PROOF. The first statement of the corollary follows from Corollary22.6. Let B = (x1, ... , xn) be an orthonormal base of X, such thatMBB(cb) = a** is a symmetric matrix. Then ' for any vectorsx = Al x1 + .. + Anxn and y = µ1 x1 + ... + we get

(4)(x)ly) = (Ea,,XIxil Eµkxk) = Ea11X iiii

and WOW)O(y)) = (EAkxk I Eailµix;) = Eai,A;pi.

Since a.1 = air, (4)(x)ly) _ (xI4)(y)). Hence 0 is a self-adjoint trans-formation of the euclidean space X.

D. Eigenvalues of self adjoint transformations

Our main result is that every self-adjoint transformation of aeuclidean space is a diagonalizable endomorphism. Let us first provethe following important lemma on invariant subspaces of anendomorphism of a linear space.

LEMMA 22.8. Let X be a finite-dimensional real linear space and 0 anendomorphism of X. Then there exists a 1-dimensional or a 2-dimensional subspace of X invariant under ¢.

PROOF. Let (x1, ... , xn) be an arbitrary base of X and M = a* * thematrix of 0 relative to the base (x1, ... , xn ). Then the existence ofan eigenvector x of 0 is equivalent to the existence of a non-zeron-tuple (p1 , . . . , µn) of real numbers that is a solution of thefollowing system of linear equations:

a1 1 µ1+ ... + an

1 An = Aµ1

a12µ1 + .. . + an2An = AIA2

a1ngi + ... + annµn = Aµn

for some real number A. But this is the case if and only if A is a rootof the characteristic polynomial p4) of OA. Now p4) is a polynomialwith real coefficients; regarding it as a polynomial with complex coef-ficients and applying to it the fundamental theorem of algebra, wesee that P4) always has a complex root A. We may therefore considertwo cases: (i) A is real and (ii) A is not real.


In the case (i), A is a eigenvalue of 0. Therefore there exists anon-zero vector x of X such that 0(x) = ?.x and the 1-dimensionalsubspace generated by x is invariant under 0.

In the case (ii), we write A = (3 + iy where both Q, y are real andi2 = -1, y * 0. Consider the linear transformation ': C" -- C" suchthat relative to the canonical base of C" the matrix of 41 is equal to M.Then the linear transformations 0 and 4 as well as the matrix M allhave the same characteristic polynomial. Therefore, the abovesystem of linear equations has a non-zero complex solution, say(pt +iv,, ... , An +ivn):

a11(2i +iv,) + + an,(pn+ivn)=(I3+iy)(2i +iv,)

at2(pt + iv,) + ... + an 2 (An + ivn) _ (R + iy) (p2 + iv2 )

...............................................

a,n(21 + iv1) + ... + a,,,, (pn +ivn)=(Q+iy)(pn +iv").

Separating the real and imaginary parts, we get

a, t p, + ... + an t An = apt - 7vta, 2 p1 + ... + an 2 pn = Op2 _ yv2

ainp1 + ... + annpn = Qpn - '1'vnand

a12v1 +

+ an i vn = 3vi + -nut+ an2vn = (3v2 + ?'p2

atnvn + ... + annvn = (vn + ypn

where all terms are real numbers. From these equations, it followsthat the vectors

X=21x1 + . - + pnXn and y = v1x, vn Xn

of the real linear space X satisfy the following equations

O(x) = Ox - yy and m(y) = yx + PY.


Therefore the subspace Y generated by the vectors x and y isinvariant under 0. Moreover since (pi + ivl , ... , P, + isnon-zero, the vectors x and y cannot be both zero. Therefore theinvariant subspace Y has either the dimension 1 or 2.

For self-adjoint transformations of a euclidean space we are ableto obtain a better result.

COROLLARY 22.9. Let ¢ be a self-adjoint transformation of a eucli-dean space X. Then there always exists a 1-dimensional subspace ofX invariant under 0.

PROOF. Obviously we need only show that the case (ii) of the proofof 22.8 is impossible under the additional assumption that 0 is self-adjoint. If case (ii) presents itself then we get two vectors x and yof X which are not both zero together with two real numbers (3 and7 * 0 such that

O(x)_(3x-7Y and 0(y)='yx+(3y.From these equations it follows that

(O(x)IY) = Q(xly) - 7(YIY)and (xIO(Y)) = 7(xlx) + R(xly) .

Since 0 is self-adjoint, (¢(x)ly) = (xIO(y)). Therefore

7[(xlx) + (Yly)] = 0.Hence x = 0 and y = 0, or y = 0, contradicting the assumption.

Starting out from the very promising corollary 22.9 we proceed toprove our main result:

THEOREM 22.10. For every self-ad joint transformation 0 of a eucli-dean space X, there exists an orthonormal base of X consisting ofeigenvectors of 0.

PROOF. We shall prove this theorem by induction on the dimensionof X. For dim(X) = 1, the theorem is trivial. Assume now thatthe theorem holds for any n-dimensional euclidean space. Letdim(X) = n+ 1 and 0: X -> X be self-adjoint. Then by 22.9 thereis an eigenvector x of 0 with real eigenvalue X. For any vector y of Xorthogonal to x, the vector 0(y) is again orthogonal to x, for

(xlO(Y)) = (O(x)ly) = (Axly) = X(xl y) = 0.


Consequently if Y is the orthogonal complement of the subspacegenerated by the eigenvector x, then the linear transformation¢': Y -+ Y, such that 0'(y) = 0(y), is a self-adjoint transformation ofthe n-dimensional euclidean space Y. By the induction assumption,there is an orthonormal base (yl , ... , yn) of Y consisting ofeigenvectors of 0'. Then (yl, . . . , yn, yn +1) , where yn + x/II x IIis an orthonormal base of X consisting of eigenvectors of 0.

In terms of matrices, we can formulate Theorem 22.10 as: everyreal symmetric matrix is diagonalizable. A more precise result is givenin Corollary 22.13.

E. Bilinear forms on a euclidean space

The study of a bilinear form on a euclidean space is equivalent tothe study of a pair of bilinear forms on a real linear space, one ofwhich is further known to be symmetric and positive definite. Herethe conjugate isomorphism a: X -+ X* again plays a crucial role as inthe study of endomorphisms.

THEOREM 22.11. Let X be a euclidean space and let 41 be a bilinear.form on the linear space X. Then there exists a unique endomor-phism ' of X such that

1P (x,y) = (xIjy) for all x,yEX.PROOF. The uniqueness of 4 is an immediate consequence of thepositive definiteness of the inner product. To prove the existence of

we consider for each x EX the linear form T.,: X -+ R defined by41, (y) = 4'(x,y) for ally r=-X.

Denoting by a: X - X * the conjugate isomorphism of §22A(i.e. (a(x))(y) = (ylx) for all x, y E X), we claim that the mapping>': X - X defined by

IJi(x) = a 1 (4X) for allis an endomorphism of X which satisfies the requirement of thetheorem. If xl and x2 are two vectors of X, then it follows from

`I'(xI + x2, y) = 'I'(xl ,y) + `1'(x2, y)that

*X 1 +y2 = *X I + *X2Therefore t (xl + x2) = a1(4'1+2) = a' ('YXI + *X2)= a-l(*XI) + a-'(41X2) _ ,i(xl) + Iy(x2). Similarly 1'!(Xx) = A1j/(x),


proving the linearity of i. For any two vectors x and y of X, weobtain

(Jixly) _ (a ' (`I'x) IY) = 4'x(Y) = `I'(x,Y)Moreover (>'x l y) = (x I iyy); therefore

4'(x,y) = (>[ixI y) = (xl y) for all x, yEX.This proves the theorem.

Conversely every endomorphism > of X determines a bilinearform ' on X by the relation

'(x, y) = (>('xly) for all x, y EX.

Therefore there exists a one-to-one correspondence between theendomorphisms of X and the bilinear forms on X. If 0 correspondsto ' under this correspondence, then we say that 0 represents 'I'. Bymeans of this correspondence, we can formulate properties ofendomorphisms as properties of the bilinear forms that they re-present. For example, a vector x of X is called an eigenvector of abilinear form 'I' if it is an eigenvector the endomorphism 0 whichrepresents 'I'. More precisely, x is an eigenvector of 'I' with cor-responding eigenvalue X if and only if

'(x, y) = A(xly) for ally EX.

A further property of the correspondence is that every symmetricbilinear form is represented by a self-adjoint endomorphism and viceversa. Consequently the diagonalization theorem 22.10 gives rise to abilinear form counterpart as follows

THEOREM 22.12. For every symmetric bilinear form on a eucli-dean space X, there exists an orthonormal base (x,, x,) ofX and a family (Al , ..., of real numbers such that

*Y(xi,xt)=Xi(xilxl)=A;Si/ for all i, j= 1, ...,n.In other words, the matrix of ' relative to the base B is the

diagonal matrix

N

lxi

AnJ


Hence if x = a1 x1 + . + a, xn and y = 131x1 + . +0,,x,, arevectors of X, then

`I'(x,y)=X1a,01 + ... +Anan&-The diagonalization theorem 22.12 can be also paraphrased as

follows. Let 4) and ' be two symmetric bilinear forms on a finite-dimensional real linear space X. If 4) is positive definite, then 4) and4, can be simultaneously diagonalized. More precisely there exists abase B of X relative to which the matrix of ' and the matrix of 'are respectively the diagonal matrices

C 1 lxi 1

and

ll l

XnJ

We leave the formulation of the corresponding theorem on quadraticforms to the reader.

F. Isometry

There is another important class of endomorphisms of a euclideanspace that consists of all endomorphisms which leave invariant theinner product of the euclidean space in question. These are charac-terized by the equivalent conditions in the following theorem.

THEOREM 22.13. Let X be a euclidean space and 0 an endomor-phism of X. Then the following statements are equivalent.

(i) I10(x)II = Ilxil for all xEX.(ii) (O(x)I 0(y)) _ (x1 y) for all x, yeX.(iii) 0 is invertible and 0-' = 0.(iv) For any orthonormal family (x 1i ... , xp ), where I <p <n,

the family (0(x 1), ... , 0(xp)) is orthonormal.(v) For some orthonormal base (x1,.. . , Xn) of X, the family

(0(x1), ... , O(xn)) is an orthonormal base of X.

PROOF. (i) (ii). This follows from the cosine law:

2(xly) = lix + yll2 - 11xII2 - 11y112.(ii) (iii). If 0(y) = 0, then (yl y) _ (0(y)I0(y)) = 0 and therefore


y = 0. Thus 0 is injective and hence invertible. For any x, y of X, thecondition (ii) yields that

(xIy) = (O(x)10(y)) = (XI-0(y))Therefore ix and 0-1

= .

(iii) - (iv). If (x I, . . . , xp) is orthonormal, then

(O(X1)I0(X1)) _ (Xrl oO(XJ)) = (xil x;) = Sii.Therefore (O(x1), ... , ¢(xp)) is orthonormal.

(iv) (v). This is obvious.(v) = (i). Let (x1, ... , xn) be an orthonormal base of X for

which (v) holds. Then for each x = X1x1 + + Xnxn, we get

11X11 = X12 + ... + Xn 2 = 11o(X)112 .The proof of the theorem is complete.

We call an endomorphism ¢ of a euclidean space X an isometry oran orthogonal transformation of X if any one of the equivalentconditions of 22.13 is satisfied. It follows from (iii) of 22.13 that forany isometry 0 of X

det(0) = ±1An isometry 0 is called a proper rotation if det 0 = I and an im-proper rotation if det 0 = -1.

EXAMPLE 22.14. Let X be a euclidean plane (i.e. a 2-dimensionaleuclidean space) and (XI , x2) an orthonormal base of X. Then theidentity map ix of X is clearly a proper rotation of X and so also theisometry -ix. The isometry 0 defined by

OXI = X2, 4X2 = X1has a determinant -1 and is therefore an improper rotation. To per-form the improper rotation 0 "physically" one would have to lift upthe plane, turn it over and then put it down again.

x2 x1

0 >

x1 . x2

Whereas both ix and -ix can be executed without lifting.


The set of all orthogonal transformations of an euclidean space Xtogether with composition form a group O (X) called the orthogonalgroup of the euclidean space X which is a subgroup of the generallinear group GL(X) of all automorphisms of the linear space X.Similarly the proper rotations of X form a group SO(X) called thespecial orthogonal group of the euclidean space X which is subgroupof O(X). These groups play an important role in euclidean geometry.

A real square matrix is called an orthogonal matrix if A'A= A At = I, i.e. if A` = A-'. In other words the row vectors (columnvectors) of A form an orthonormal base of the arithmetical euclideanspace. Between orthogonal matrix and isometry we have thefollowing relation.

THEOREM 22.15. Let 0 be an endomorphism of a euclidean spaceX. If 0 is an isometry, the matrix of 0 relative to any orthonormalbase of X is an orthogonal matrix. Conversely if relative to someorthonormal base of X the matrix of 0 is an orthogonal matrix, then0 is an isometry of X.

Consequently the matrix of the change of one orthonormal baseto another orthonormal base is an orthogonal matrix. Combiningthiswith Theorem 22.10, we obtain:

COROLLARY 22.16. For any real symmetric square matrix A oforder n, there is an orthogonal matrix P of the same order, such thatthe matrix PAP-' is a diagonal matrix.

EXAMPLE 22.17. The orthogonal matrices of order 2 have a rathersimple form. Let A be an orthogonal matrix of order 2 with det A = 1.Then a simple calculation in trigonometry shows that A has the form

rcos 0 - sin 61

sin 0 cos

for some real number 0. If 0 is a proper rotation of a euclidean planeX and B is an orthonormal base of X, then the matrix of 0 relative toB has the above form. We call 0 the angle of rotation of 0, and it iseasily seen that 0 does not depend on the choice of the orthonormalbase B. Clearly ix and -ix have as angles of rotation 0 and itrespectively. It is easily verified that if a proper rotation 0 leavesinvariant a single non-zero vector of X, then 0 = ix.


EXAMPLE 22.18. Let Y be a 3-dimensional euclidean space and p aproper rotation of Y different from iy. Then the characteristic poly-nomial Pp is a polynomial of degree 3 with leading coefficient equalto -1 and constant term equal to 1. It follows from elementarytheory of equations that Pp has at least one positive real root A;therefore p has eigenvectors. Let a be an eigenvector of p with eigen-value X. Then it follows from (tall = Ilpall that A = ± 1. Therefore A = 1.If A is the 1-dimensional invariant subspace generated by a, then pleaves each vector of A fixed. We claim that px * x for all x 0- A.Suppose to the contrary that x 0 A is such that px = x. Since 0 leavesfixed all vectors of the plane generated by a and x, we may assumewithout loss of generality that alx. Applying the strong version ofthe supplementation theorem, we can find a vector y of Y such that(a, x, y) is an orthonormal base of Y. Since p is an orthogonal trans-formation, either py = y or py = -y. The first case is impossible sincep * l y ; the second case is impossible since det p = 1. Therefore Aconsists of all vectors left fixed by p. A is called the axis of rotationof p. It follows from 22.13 (ii) that X = A is invariant under p.Denoting by 0 the endomorphism of X defined by p, we see easilythat 0 is a proper rotation of X. Applying the result of 22.17, we seethat the matrix of p relative to any orthonormal base (e, , e2, e3)with e3 e A is of the form

(cos 9 - sin 9 00

in 0 cos B 0

0 0 1

where 0 is called the angle of rotation of p.

G. Exercises

1. Show that if p is a self-adjoint linear transformation of aeuclidean space X, then there exists a unique self-adjoint lineartransformation >y of X such that q3 = gyp.

2. Let pp be a self-adjoint linear transformation of a euclidean spaceX. p is said to be positive definite if 0 for all non-zeroxeX. Prove that(i) p is positive definite if and only if all eigenvalues of Sp

are positive;(ii) for every positive definite tip there is a unique positive

definite 0 such that 2 = tip.


3. Let p be a mapping of a euclidean space X into itself. Prove thatif (sp4py) = (xly) for all x, yeX, then p is an isometry.

4. Let a be a unit vector in a euclidean space X. Define p: X -- Xby putting p(x) = x - 2(alx)x. Prove that(i) p is an isometry (such an isometry is called a reflexion);

(ii) det p = -1.

5. Let p be an isometry of an n-dimensional euclidean space X.Prove that if I is an eigenvalue of p and if all eigenvectors witheigenvalue 1 form a (n-l)-dimensional subspace of X, then tp iseither a reflexion or the identity.

6. Let a and b be two distinct unit vectors of a euclidean space X.Prove that there exists a reflexion p such that pa = b. .

7. Show that every isometry of a euclidean space is a productof reflexions.

8. A linear transformation p of a euclidean space X is said to beskew self-adjoint if

Opxl y) (xl ipy) for all x, yeX.Prove that(i) p is a skew self-adjoint if and only if the matrix of sp

relative to an orthonormal base of X is anti-symmetric;(ii) if p is skew self-adjoint and if Y is an invariant subspace

of 4p then Y t is an invariant subspace of gyp.

9. Prove that if X is an eigenvalue of a skew self-adjoint lineartransformation, then X = 0.

10. Show that for every skew self-adjoint linear transformation ip ofa euclidean space, there exists a base B such that MBB (gyp) hasthe following form

J


11. Give an example of two self-adjoint linear transformations whoseproduct is not self-adjoint.

12. Let p and P be self-adjoint linear transformations. Prove that,p o 0 + iji ocp is self-adjoint and tip o iJ/ - iJi o ip is skew. What happensif both p and 0 are skew? What happens if one of p and i isself-adjoint and the other one is skew?

13. For each pair of complex numbers a a-id 13, define

(a113) = Re(a(3).

(a) Show that RC 2 is a 2-dimensional euclidean space withinner product defined above.

(b) For each complex number y, define spy: RC2 -> RC2 byPy (a) = yet. Show that spy is a linear transformation ofthe real linear space RC 2 . Find the matrix of oy relativeto the base (1, i) of W.

.

(c) For what values of -y is'p-,

an isometry?

14. Show that an isometry p is self-adjoint if p2 is the identitymapping.

15. Let (e, .... e,) be an orthonormal base of eigenvectors of aself-adjoint linear transfromation p with corresponding eigen-values X1, ... , A,,. Define for every real number a

Pa = p - a.

(a) Prove that ifais different from all eigenvalues A1i ... , A ofpp, then

(x) = E ` e;Ai-a

(b) Suppose a is an eigenvalue of gyp. Prove that the equation

PPax = y

is solvable if and only if y is orthogonal to the eigenspaceXa. Find a solution of the equation.

§23 UNITARY SPACES 297

16. Bring the following quadratic forms by appropriate orthogonaltransformations into diagonal form

(a) x; + 2x2 + 3x3 - 4x1 x2 - 4x2x3 ;(b) x; - 2x2 - 2x3 - 4x1x2 + 4x1x3 + 8x2x3;(c) 2x1x2 + 2x3x4;(d) x + X2 + x3 + x4 - 2x1 x2 + 6x1 x3 - 4x1 x4 - 4x2x3

+ 6x2 x4 - 2x3 x4 -

17. Find the eigenvalues of the quadratic form

ip (x) = 2 E ti tii<i

§23. Unitary Spaces

In this present §23 we study unitary space which is the complexcounterpart of euclidean space. We shall use in the sequel thecustomary representation of complex numbers. The letter i is usedexclusively to denote the imaginary unit; for any complex number Awe write A = a + ig where the real numbers a and 0 are the real andthe imaginary parts of A respectively. The complex conjugate of A isthe complex number X = a - ip, and the modulus of A is the non-negative real number IX I= /(a2 + 132).

A. Orthogonality

Let X be a complex linear space. A sesquilinear form on X is amapping 4': X2 -- C such that

4'(x1 +x2,Y) = 4'(x1,Y)+(Xx2,Y)4'(Ax,Y) = A4'(x,Y)

4'(x, Y1 +Y2) = 4'(x,Y1) + 4'(x,Y2)4'(x, Ay) = X F(x,Y),

for any complex number A, and any vectors x,x1,x2,y,Y1,Y2 of X.In other words 4' is linear in its first argument and semi-linear(relative to the conjugate automorphism of C) in its second argument;one might say, 4' is one-and-a-half linear.


A sesquilinear form 4) of a complex linear space X is an hermitianform if

4)(x, y) = 4)(y, x) for all x, yEX.

It follows that if 4) is a hermitian form, then 4)(x, x) is a real numberfor vectors x of X. Hence we have the following definition: ahermitian form (P of a complex linear space X is positive definite if

4)(x, x) > 0 for all non-zero vectors x of X.

If 4) is a positive definite hermitian form on X, then the conjugate4P_ of 4) defined by

if (x, y) = 4)(x, y) for all x, yEXis again a positive definite hermitian form on X.

EXAMPLE 23.1. Let X be a complex linear space of finite dimensionn > 1 and (x1, ... , xn) a base of X. If 0 is a sesquilinear formon X, then for any vectors x = X1x1 + + Xnx andy =µ1x 1 + + µnxn of X we get

0(X' Y) = EXrµi4)(xr,x,)

Therefore the sesquilinear form 0 is completely determined by thecomplex square matrix A = a** of order n whose terms are

07i = 4)(x1, xi) i,j = 1, 2, ... , n.Conversely every complex square matrix A = a** of order ndetermines uniquely a sesquilinear form 0 relative to the base(x1, ...,xn)by

4)(x, y) = EX1a1jiii where x = EA1x1 and y = Eµjx1

Under this one-to-one correspondence between matrices andsesquilinear forms, the hermitian form 0 correspond to the hermitianmatrix A= a**, i.e. a matrix whose terms satisfy the equations

a1/=airfori,j=1, ...,n.

A unitary space is an ordered pair (X, 4)) that consists of a finite-dimensional complex linear space X and a positive definite hermitianform 4). When there is no danger of confusion about the hermitianform CF in question, we shall simply denote the unitary space (X,'F)by X. Furthermore for any vectors x and y of X we write (xly) for


4)(xl y) and call it the inner product in the unitary space X of thevectors x and y.

Thus a unitary space is a finite-dimensional complex linear space Xtogether with a complex-valued mapping that assigns to each pair x,y of vectors of X a complex number (xly), called their inner product,such that for each complex number A and all vectors x, y and z of Xthe following conditions are satisfied:

lull (Xly)=(Ylx)[U2] (x +ylz) _ (xlz) + (yIz)[U3] (Xxly) _ A(xly)[U4] (xlx)>Oifx*0.It follows that if (X, 4)) is a unitary space, then (X, if), where is

the conjugate of 4), is a unitary space called the conjugate unitaryspace of the unitary space (X, (P). The conjugate of the conjugate of aunitary space (X, (D) is identical with (X, (h) itself.

EXAMPLE 23.2. Let C" be the complex arithmetical linear space. Aninner product in C" is defined by

(xly) =aji +... +anTinwhere x = (a1, ... , an) and y = (j31, ... , (3n ). Another inner pro-duct in C" is defined by

(xly) = A1a1a1 + ... + AnanFn

where ai are arbitrary positive real numbers. These inner productssatisfy axioms [U1 ] - [U4] and turn C" into unitary spaces.

If X is a unitary space and x is a vector of X, then the norm Ilxli ofthe vector x is again defined as the non-negative square root

Ilx II = (xix)Vectors of X with norm equal to 1, i.e. Ilxil = 1, are called unitvectors of the unitary space X.

Orthogonality and its related concepts are formulated similarly tothose given in §21B. In particular, using similar arguments, we canprove the following theorems.

THEOREM 23.3. In every unitary space there exists an orthonormalbase.

300 Vlll INNER PRODUCT SPACES

THEOREM 23.4. Let X be a unitary space and Y a subspace of thelinear space X. Then the orthogonal complement

Yl = (xly) = 0 for all yEY}

is a subspace of the linear space. Furthermore X = Y®Y1.

As a consequence of 23.4 we obtain the fundamental BESSEL'sinequality: if x =y +z where yEYandyEYl, then

Ilxll > IIYII.

It follows from BESSEL's inequality that for any vectors x and y ofa unitary space the following inequalities hold:

SCHWARZ'S inequality: IIxIIIIYII >I (xIY)I;

the triangle inequality: Ilxll + IIYII > Ilx + YII.

B. The conjugate isomorphismThe results obtained in §22A can also be adopted for unitary

spaces. Thus if X is a unitary space, then the conjugate isomorphisma: X - X* of the linear space X onto its dual space X* is defined by

a(x) = ax and ax(y) = (y Ix) for all x, y EX.By means of the conjugate isomorphism, a positive definite hermitianform in X* is defined by

(fig) = (a-'(g)Ia'(f)) = (a 1(f)Ja'(g))

for all f, gEX*. Then the dual space X* of the linear space X endowedwith the inner product defined above is a unitary space called thedual space of the unitary space X. Now the conjugate isomorphisma: X -- X* of the linear spaces can be regarded as an "isomorphism"between the dual space X* of the unitary space X and the conjugatespace of the unitary space X (see § 23E).

In exactly the same way we prove that a'oa is identical with thenatural isomorphism tX : X - X** where a': X* -> X** is theconjugate isomorphism.

C. The adjoint

Let X and Y be unitary spaces. Then the adjoint of a linear trans-


formation 0: X -> Y of the linear space X Into the linear space Y isdefined as the linear transformation 0: Y -> X such that

(O(x)ly) = (xlo(y)) forallxEXandyEY.

The existence and uniqueness of can be proved in exactly the sameway as in § 22B. Furthermore the results of § 22B can be adopted forunitary spaces except the formula (d) which is modified here as

A0 = A¢ .

D. Self-adjoint transformations

Let X be a unitary space. An endomorphism 0 of the linear spaceX is called a self-ad joint transformation of the unitary space X if

or equivalently

(O(x) IY) = WOW) for all x, yr=X.Self-adjoint transformations of a unitary space X are related to

(complex) hermitian matrices in the same way as self-adjoint trans-formations are related to (real) symmetric matrices (see Corollary22.6).

In the present case, self-adjoint transformations are characterizedby a third condition.

THEOREM 23.4. Let X be a unitary space and 0 an endomorphism ofthe linear space X. Then 0 is a self-adjoint transformation of theunitary space X if and only if the inner product (xl ¢(x)) is real for allx of X.

PROOF. If 0 is self-adjoint, then (xtO(x)) = (0(x)Ix) = (xj0(x))and therefore (xIO(x)) is real. Conversely, assume that (xIO(x)) is realfor all x of X and consider

(x +YIO(x +Y)) _ (xIO(x)) + (YIO(Y)) + (xIO(Y)) + (YI0(x)),(x +iyIO(x +iy)) _ (xIO(x)) + (YIO(y)) - i(xIO(Y)) + i(YI4(x))

where the left hand sides as well as the first two terms on the righthand sides of these equations are all real. Therefore

and henceWOW) = WOW)(0(x) IY) = (xWO(Y)).


An important property of self-adjoint transformations of a unitaryspace is formulated below:

THEOREM 23.5. The eigenvalues of a self adjoint transformation of aunitary space are real numbers.

PROOF. Let A be an eigenvalue of a self-adjoint transformation of aunitary space X and x * 0 a corresponding eigenvector. Then

0(x) = Ax,and WOW) = (O(x)lx) = A(xlx).It follows from 23.4 that (xIO(x)) is real; on the other hand (xlx) isalso real. Therefore A is real number.

Consequently all eigenvalues of a hermitian matrix are realnumbers.

E. Isometry

Let X be a unitary space. An endomorphism 0 of the linear space Xis called an isometry, an isometric transformation or a unitary trans-formation of the unitary space X if

(xly) = (0(x)I0(y)) forallx,yeX.

The results of §22F are adaptable to the present case withoutmajor modification. We observe here whenever the cosine law ofeuclidean space is used in §22, we have to substitute it by itscounterpart:

2(x1y) = Ilx +yll2 + i llx + iy112 (1 + i) { IIx112 + 11y112I

F. Normal transformation

Let X be a unitary space. We call an endomorphism 0 of the linearspace X a normal transformation of the unitary space X if

-o = 0°or equivalently if

(O(x)10(y)) = (fi(x) Ic(y)) for all x, yEX.

Thus unitary transformations and self-adjoint transformations ofthe unitary space X are normal transformations of the unitary space


X. The following is an example of a normal transformation which isneither unitary nor self-adjoint.

EXAMPLE 23.6. Let (x1, x2) be an orthonormal base of a2-dimensional unitary space X. Consider the linear transformationof X defined by

O(xl) = ix1O(x2) = 0.

Then the adjoint $ of 0 is such that

0(x1) _ -ixl0(x2) = 0.

0 is therefore a normal transformation of X, but it is neitherself-adjoint nor unitary.

In § 19, we defined a diagonalizable endomorphism 0 of a linearspace X over A as an endomorphism of X for which a base of Xexists that consists of eigenvectors of 0. Here we are interested inendomorphisms 0 of a unitary space X for which an orthonormalbase of X exists that consists of eigenvectors of 0. These are then thenicest endomorphisms of the unitary space X, and we are tempted tocall them the orthonormally diagonalizable endomorphisms of theunitary space X. However, this is unneccessary, since it turns out thatthey are just the normal transformations defined above. Clearlyorthonormally diagonalizable endomorphisms are normal trans-formations; the converse will be proved in 23.8.

LEMMA 23.7. Let 0 be a normal transformation of a unitary space X.Then for any complex number A and any non-zero vector x of X,

0(x) = a.x if and only if fi(x) = Xx .

PROOF. Since 0 is a normal transformation, for any complex numberA the endomorphism 0 - A is also a normal transformation. On theother hand we have

Therefore for all non-zero vectors x of X

II0(x) - AXII = Ax11.

Hence the lemma follows.


Our main result on normal transformations is as follows:

THEOREM 23.8. Let X be a unitary space. If 0 is a normal trans-formation of X, then there exists an orthonormal base of X consistingof eigenvectors of 0.

PROOF. We prove this theorem by induction on the dimension of X.For dim X = 1, the theorem is self-evident. Assume that the theoremholds for unitary spaces of dimension n. Let X be a unitary space ofdimension n + 1 and 0 a normal transformation of X. Since 0 is anendomorphism of a complex linear space, there exist (complex)eigenvalues. Let X be an eigenvalue of 0 and x * 0 a correspondingeigenvector. If Y is the orthogonal complement of the subspacegenerated by x, then Y is an n-dimensional unitary space. Now for allvectors y of Y, we have

(M(y) Ix) = (yl fi(x)) _ (yl xx) = X(ylx) = 0.

Therefore 0(y)EY for Hence 0 defines an endomorphism 0'of Y by

0'(y) = 0(y) for all yEY.

Since 0 is a normal transformation of the unitary space X, 0' is anormal transformation of the unitary space Y. By the inductionassumption there exists an orthonormal base (y, , ... , yn) of Xconsisting of eigenvectors of 0'. Therefore (y, , ... , yn , yn + 1)where yn + 1 = x/11x11 is an orthonormal base of X consisting ofeigenvectors of 0.

G. Exercises

1. Let A = a** be a hermitian matrix. Show that At and T = a**are hermitian. If A is invertible, show that A-' is hermitian.

2. Let A and B be hermitian matrices of the same size. Show thatA+B is hermitian. If AB = BA is hermitian, show that AB ishermitian.

3. Let X be a unitary space and peEnd(X). Show that if 0for all xEX, then p = 0. Show that the above statement isfalse for linear transformations of euclidean spaces.


4. Let X be a unitary space End(X). Prove that the followingconditions are equivalent.

(a)

(b) IIipxII = for every xEX.

(c) There exist self-adjoint linear transformations a and (3 suchthat,p=a+i(3and ao(3=(3oa.

5. Let X be a unitary space and ,pE End(X) a self-adjoint lineartransformation. Show that ix + and ix - are automorphisms.

6. Show that the eigenvalues of a unitary transformation are allof the form e'0 with real 6.

7. Show that if

a Tn -Pan-1 Tn-1 + ... + a1T + ao

is the characteristic polynomial of an endomorphism 'p of aunitary space X, then

anTn + an-1Tn-1 + ... + a1T +ao

is the characteristic polynomial of the adjoint p. From this,what can you say on the characteristic polynomial of a self-adjoint linear transformation?

8. Show that if p is an automorphism of a unitary space X, thenthere exist a unitary transformation p1 and a self-adjointtransformation '2 such that 'p = 'p1 o p2.

INDEX

abelian group 4abstract algebra 4addition 6additive group 6additive inverse 6adjoint 300adjoint functors 90adjoint transformation 282affine group 117affine space 98affine transformation 114affinity 117algebraic structure 4algebraic structure of abelian group 5angle of rotation 293, 294annihilator 80antisymmetric linear mapping 197antisymmetrization 213arithmetical linear space 12arithmetical projective space 123arrow category 89associative A- algebra 66associative law 7, 9automorphism 66, 146axially perspective 128axioms 4axis of perspectivity 153axis of rotation 294

barycentre 102barycentric coordinates 110base 19Bessel's inequality 275, 300bifunctor 91bilinear form 158, 197bilinear mapping 197bilinearity 64

canonical affine space 100canonical decomposition 259canonical injection 68canonical projection 68cardinal number 34cartesian product 17, 68category 79, 86category of additive groups 87category of linear spaces 88

central perspectivity 154centrally perspective 128centre of perspectivity 153characteristic polynomial 244, 245characteristic value 242characteristic vector 242class 85cofactor 221cokernel 61collinear 109collineation 153column index 156column vector 157commutatiR A- algebra 231commutative law 7complementary subspace 39complete quadrangle 138complete quadrilateral 138composite 4composition 86conjugate 298conjugate isomorphism 281conjugate space 74contravariant functor 90convex 112convex closure 112coordinates 28coproduct 94cosine law 269covariant functor 89covector 74Cramer's rule 223cross ratio 131cyclic permutation 210

determinant 215determinant function 208diagonal matrix 245diagonal of a matrix 157diagonalizable endomorphism 245difference 7dimension 27, 34direct product 69direct sum 17, 39direction 106distributive law 9double summation 8

306

INDEX

dual configuration 138dual functor 91dual space 74dual theorem 138dual transformation 76

echelon matrix 180eigenspace 250eigenvalue 242, 245eigenvector 242elementary Jordan matrix 263elementary matrix 186elementary transformation 178endomorphism 59,66equipotent 34equivalent objects 87Euclidean algorithm 233Euclidean space 268even permutation 210exact sequence 60expansion by a column 223expansion by a row 221extension by linearity 48exterior product 197external composition law 4

ideal 234identity 87identity functor 89identity matrix 162image 55improper rotation 292index of nullity 205index of positivity 206infinite-dimensional linear space 27initial object 93inner product 197, 299internal composition law 4invariant subspace 239inverse isomorphism 52inverse matrix 169invertible matrix 169involution 71irreducible polynomial 238isometric transformation 302isometry 292, 302isomorphic linear spaces 53isomorphism 52, 87

Jacobi's identity 197join 37Jordan form 263

family of finite support 9family of generators 19finite-dimensional linear space 24free linear space 13, 14functor 79, 89fundamental theorem of algebra 236fundamental theorem of projective geometry

146

general linear group 169generalized associative law 8generalized commutative law 8generalized distributive law 11generate 19Gram-schmidt orthogonalization method 273greatest common divisor 234group of automorphism 67

Hamilton-Cayley theorem 253harmonic quadruple 131hermitian form 298hermitian matrix 298Hilbert space 271homogeneous coordinates 125homogeneous part of a system of equations

178homomorphism 46

kernel 54Kronecker symbol 65

length 269line 107line at infinity 120linear combination 18linear construction 133linear dependence 21linear form 74linear function 74linear functional 74linear independence 21linear mapping 46linear space 9, 11linear transformation 46linear variety 105linearity 46linearly dependent 108linearly independent 108

Markov matrix 163matrix 155matrix of a bilinear form 199matrix of a change of bases 170matrix of a linear transformation 166maximal element 32

307

308

midpoint 103

minor 222mixed product 203morphism 87multilinear form 202multiple 9multiple root 236multiplicity of a root 236

natural isomorphism 93natural transformation 79, 92neutral element 6nilpotent 255non-singular matrix 169norm 269, 299normal transformation 302normed linear space 278nullvector 11

object 85odd permutation 210orthogonal 272orthogonal complement 274orthogonal endomorphisms 71orthogonal family 272orthogonal group 293orthogonal matrix 293orthogonal projection 275orthogonal transformation 292orthonormal base 272orthonormal family 272orthonormally diagonalizable 303

parallel coordinates 110parallelogram identity 269pencil of lines 141pencil of planes 141permutation 209perpendicular 272perspectivity 153point at infinity 120polynomial 232positive definite 298positive definite matrix 166positive definite quadratic form 205principle of duality 138product 9, 94projection 67, 71projective group 153projective isomorphism 142projective line 123projective plane 123projective space 122

INDEX

projective structure theorem 151projectivity 62proper rotation 292pseudo-euclidean space 271Pythagoras theorem 272

quadratic form 201quotient space 41

radical of a bilinear form 204range of points 141rank 56rank of a matrix 176reduced matrice 263reflexion 295relatively prime polynomials 235replacement theorem 25root a polynomial 235row index 156

scalar 10scalar multiplication 9scalar product 197Schroder-Bernstein theorem 34Schwarz's inequality 276, 300segment 112self-adjoint transformation 285, 301semi-linear transformation 147semi-simple endomorphism 245sesquilinear form 198, 297sign of a permutation 211simple root 236size of a matrix 156skew self-adjoint 295skew-symmetric bilinear mapping 197special orthogonal group 293subspace 35substitution 235sum 6, 37summand 6, 8summation sign 8supplementation theorem 25, 33symmetric bilinear mapping 197symmetric transformation 285system of homogeneous linear equations 177system of linear equations 175

tensor product 203term of a matrix 156terminal object 93theorem of Desargues 129theorem of Pappus 127trace 16Strace of an endomorphism 244

transpose of a matrix 158transposition 211triangle inequality 300triangular matrix 251trilinear form 203

unit vector 299unitary transformation 302upper bound 32

vector 110vector product 197vector space 9

weight 102

zero linear transformation 46zero of a polynomial 235zero vector 11Zorn's lemma 32

INDEX 309

kam-tim leung linear algebra and geometry 1974

Documents

hong kong university

hong kong universitypress

theuniversity of hong

university of hong kongsince

hong kong byeverbest

university of cincinnati

university of zurich

set of coordinates