linear algebra notes

HONOURS MODERATIONLINEAR ALGEBRA I, 2008/09

ANNE HENKE

Contents

Topics Covered 1

Some remarks 1

Solving mathematical problems 2

1. Fields 4

2. The Algebra of Matrices 6

3. Vector Spaces 12

4. First Properties of Vector Spaces 17

5. Subspaces of Vector Spaces 19

6. Linear Dependence, Linear Independence and Spanning 25

7. Bases of Vector Spaces 30

8. Steinitz Exchange Procedure 34

9. Dimension of Vector Spaces and an Application to Sums 38

10. Linear Transformations 43

11. The Rank-Nullity Theorem 48

12. The Matrix Representation of a Linear Transformation 52

13. Row Reduced Echelon Matrices 59

14. Systems of Linear Equations 64

15. Invertible Matrices and Systems of Linear Equations. 68

16. Elementary Matrices 72

17. Row Rank and Column Rank 75

HONOURS MODERATION LINEAR ALGEBRA I, 2008/09 1

Topics Covered

Algebra of matrices.

Vector spaces over the real numbers; subspaces. Linear dependence and linearindependence. The span of a (finite) set of vectors; spanning sets. Examples.Finite dimensionality.

Definition of bases; reduction of a spanning set and extension of a linearly inde-pendent set to a basis; proof that all bases have the same size. Dimension of avector space. Co-ordinates with respect to a basis.

Sums and intersections of subspaces; formula for the dimension of the sum.

Linear transformations from one (real) vector space to another. The image andkernel of a linear transformation. The rank-nullity theorem. Applications.

The matrix representation of a linear transformation with respect to fixed bases;change of basis and co-ordinate systems. Composition of transformations andproduct of matrices.

Elementary row operations on matrices; echelon form and row-reduction. Matrixrepresentation of a system of linear equations. Invariance of the row space underrow operations; row rank.

Significance of image, kernel, rank and nullity for systems of linear equations.Solution by Gaussian elimination. Bases of solution space of homogeneous equa-tions. Applications to finding bases of vector spaces.

Invertible matrices; use of row operations to decide invertibility and to calculateinverse.

Column space and column rank. Equality of row rank and column rank.

Some remarks

This set of notes is a collection of material which will be covered in this course, inabout the given order. It contains a few things which are not part of the syllabus,like the section on fields, examples of vector spaces over a field different from R,or some comments on infinite dimensional vector spaces. I will use this collectionof material to prepare the lectures. This means that I may spontaneously decideto give different examples, to elaborate on something that I did not write downin these notes. I would strongly advise you to take your own notes in the lectures– already as it helps to concentrate on the lecture. Equally important is that youread in linear algebra books; they are not only written with much more care thanthese notes, they also contain more examples, more details and more backgroundmaterial.

Please let me know of any necessary corrections of the mathematics, preferablyby email to: [email protected].

2 ANNE HENKE

Solving mathematical problems

Mathematical problems. Problems play a central role in mathematics. Tosolve a problem, we need to analyse the problem, we need to play with the prob-lem, spend time with it, in order to eventually solve it with phantasy, with sensefor elegance, symmetry. Mathematical problems are the natural way to learnthese abilities. When studying, your aim should not be to constantly think abouthow to prepare for the examinations. It is the other way around: the examinationwill test whether you learned to solve problems. Don’t forget that the problemsheets coming with your lecture courses are something entirely different to yourschool homework. Copying solutions means you miss the most important part ofyour education. Do not restrict yourself to just the easy problems which are thereto train a new concept. These are just warm up exercises. The real learning effectcomes when you stretch your mind beyond the things you know already. This islike in sports. And as more you exercise, as better you get. The wise student willsolve many many exercises, going beyond the problem sheets of the course.

Analysing. Typically you have one week time to solve a problem sheet. Startimmediately, take the maximal time span to think about the problems. Thismeans thinking, rethinking, trying repeatedly, eventually improving the solution,finding alternative solutions. Put it away in between and do something else,and repeatedly return to the problems you have not yet solved. You cannotexpect to solve a problem in a few minutes. Many ideas need to ripen first inyour subconscious, before you see light. It is vital that you really know theproblem. This is not meaning that you learn the problem by heart, no, you needto understand the problem. In order to understand the problem it is necessarythat when reading it first, you spend so much time thinking about what you read,that you can formulate the problem with your own words, that you can explain theproblem to a friend, any time without thinking and without using the problemsheet. Look up definitions, understand the context in which these definitionsappear. Ask yourself which theorems from the lectures could be helpful: is it aspecial case of a theorem, or is it generalising a theorem? If the problem dealswith a general situation, form examples (special cases). Often when you knowthe right examples, then you are able to understand why a general statement iscorrect, and hence you can write down a proof. Try to visualise the problems (forexample, real functions you can draw etc). Check which methods of proof whereused in the relevant lecture material. Trust your own intuition. Are there anysituations the problem reminds you of?

Talking. Talk about the problem. In general, try to talk as much as possibleabout mathematics. Talking helps to structure the own thoughts. You can talkto your friends, tutorial partners, your tutor. Use that chance! But be aware thatit only makes sense to talk about the problem if you spend some time thinkingabout it. Working in groups also can be a good way of learning, but only if thereis a healthy balance between giving and taking. In the end you are measured onyour own abilities. To develop these it is not enough that someone just explainsyou the solutions. You need to actively participate in the process of finding the


solution. When you found a solution, it could prove helpful to see how otherpeople solve the problem, or to let other people criticise your own solution.

Writing. This is a critical moment. Here it shows whether the thought solutioncan really be written down. Every correct solution can be written down in asensible way. If you have problems to write down your solution, then you havenot yet ordered your thoughts enough, you have not yet fully understood thesolution, the mathematical mechanism. Think again, you have not yet reachedthe final goal! There are two bad extremes of writing styles: the first one is to justwrite a calculation without an argument; the second one is to write a whole novelwithout talking precisely about the problem. The correct way is somewhere inthe middle. Give precise arguments. Moreover, a solution to a problem consistsof a properly readable English text. We are not talking maths-language, we speakand write English when we explain a solution. We write full sentences (not justa formula without giving any context). Can you read your text loudly and it stillmakes sense? Expect that you are not handing in the first written version of yoursolution. And of course, be kind and respectful to your tutor: write clearly anddo not hand in pages where half of the text is crossed out or your cafe pot spilledover. Your tutors also care about you making progress. Do you still understandyour own solution a couple of days later? If not start again. Getting a correct,elegant and well written solution is often hard work! But you will feel good whenyou achieved it.

Presenting. The communication of a solution is an important part of math-ematical work. It is part of your education at university to give a clear andunderstandable presentation of your work. You will likely need it whatever youdo after university. Exercise now, to learn it later will be harder.

The above is a free translation of parts of the student advice given by Prof MLehn (University of Mainz); for the original see http://www.mathematik.uni-mainz.de/Members/lehn/le/uebungsblatt.

4 ANNE HENKE

1. Fields

This chapter is not part of the syllabus of this course. It is included in these notesto indicate the more general setting in which linear algebra is defined. Fields willproperly be introduced and studied in some depth in the second year. The objectsthat we study in this course – vector spaces, subspaces, linear maps (but similarlygroups, fields and many other mathematical objects which you meet later) – aretypically defined by a list of axioms that need to be satisfied.

Notation 1.1.

C = {a + bi | a, b ∈ R} = set of all complex numbers,

R = set of all real numbers,

Q = set of all rational numbers ={

mn| m,n ∈ Z, n 6= 0

},

Z = set of all integers,

N = set of all natural numbers.

Note: Z ⊆ Q ⊆ R ⊆ C.

Recall that the addition and multiplication of complex numbers is given by:

(a + bi) + (c + di) = (a + c) + (b + d)i,

(a + bi) · (c + di) = (ac− bd) + (ad + bc)i,

Note that this generalises the addition and multiplication of the subsets N, Z, Qand R.

Definition 1.2. Let K be a subset of the complex numbers. Then K is called afield if it satisfies the following conditions:

(K1) If x, y ∈ K, then x + y ∈ K and x · y ∈ K.(K2) If x ∈ K, then −x ∈ K. If furthermore x 6= 0, then x−1 ∈ K.(K3) The elements 0 and 1 are elements of K.

Example 1.3. (1) Claim: Q is a field.Proof. We need to check that the axioms (K1)-(K3) hold. Let x, y ∈ Q,then – by the definition of Q, see above – there exist a, b, c, d ∈ Z, b 6=0, d 6= 0 with x = a

band y = c

d. To show that a number is in Q, we need

to write it as a fraction.(a) We have:

x + y =a

b+

c

d=

ad + bc

bd,

x · y =ac

bd.

Since a, b, c, d ∈ Z =⇒ ad + bd ∈ Z,

bd ∈ Z,

ac ∈ Z.

Since b 6= 0, d 6= 0 =⇒ b · d 6= 0. Hence ad+bcbd∈ Q and ac

bd∈ Q, that

is x + y, x · y ∈ Q. Hence (K1) holds.


(b) Since a ∈ Z =⇒ −a ∈ Z. Hence −x = (−a)b∈ Q. Moreover

x−1 =(

ab

)−1= b

a∈ Q. Hence (K2) holds.

(c) 0,1 are elements in Q as 0 = 01

and 1 = 11.

(2) Similar to (1) we have: R and C are fields.

(3) Claim: Z is not a field.Proof. It is enough to show that one of the three axioms in Definition1.2 fails. We show that (K2) fails. Let x = 3, then x ∈ Z and x 6= 0.However x−1 = 1

36∈ Z. Hence (K2) does not hold.

(4) Define the set: Q(√

2) = {a + b√

2|a, b ∈ Q}.Claim: Q(

√2) is a field.

Proof. Clearly Q(√

2) consists of real numbers. So Q(√

2) is a subsetof C. Let x = a + b

√2, y = c + d

√2 where a, b, c, d ∈ Q. Then x +

y = (a + c) + (b + d)√

2. Since a, c ∈ Q it follows by Example (1) thata + c ∈ Q. Similarly b, d ∈ Q implies b + d ∈ Q. Hence x + y ∈ Q(

√2).

Next, x · y = (a + b√

2)(c + d√

2) = (ac + 2bd) + (ad + bc)√

2. Sincea, b, c, d ∈ Q and Q is a field, it follows that ac + 2bd and ad + bc areelements in Q. So x · y ∈ Q(

√2). Hence (K1) holds. To check the rest of

the axioms is left as an exercise to the reader.

(5) Q(i) := {a + bi | a, b ∈ Q} is a field.

Can you find further examples of fields?

Remark 1.4. There is a more general definition of a field, not assuming that wework with a subset of the complex numbers. The interested reader is referred tothe literature. Linear algebra deals with vector spaces. Vector spaces are definedover a field K. In this course we always will take K = R. The results presentedin this course could easily be generalised to hold for any field K.

6 ANNE HENKE

2. The Algebra of Matrices

We will meet matrices in this course at various places. They will provide an ex-ample for the most important object of linear algebra, the so-called vector spaces.They also will be of fundamental importance when we study maps between vectorspaces. Finally they will be important when we study systems of linear equations.In this section, we introduce matrices and operations defined for matrices. We areinterested in which algebraic relations matrices satisfy. Matrices can be definedover any field K. In this course we always take K = R. We assume in this sectionthe usual rules of how to add and multiply elements in R. In the more generalsetting of a field K these rules are precisely part of the definition of what a fieldis.

Definition 2.1. Let m,n be natural numbers. An m × n matrix over R is anarray

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

,

where aij ∈ R. We shortly write A = (aij)1≤i≤m,1≤j≤n, or A = (aij) if the shapeof the matrix is understood. A matrix of shape m × 1 is called a vector. Wedefine Mm×n(R) as the set of all m × n matrices with real entries. In particularMn(R) = Mn×n(R) is the set of real square matrices of size n.

Example 2.2. Matrix A given below is a 3× 3 matrix. We have three rows andthree columns. Matrix B below is a general 3 × 3 matrix with entries bij ∈ Rwhere 1 ≤ i, j ≤ 3. We speak of bij as the (i, j)th entry of the matrix B. Thisentry lies in row i and column j. Matrices need not be square; note that thedefinition takes account of a general m×n matrix, a matrix that has m rows andn columns. Matrix C below is an example of a 2× 4 matrix.

A =

2 1 14 1 0−2 2 1

, B =

b11 b12 b13

b21 b22 b23

b31 b32 b33

, C =

(−8 1 1 −14 1 0 5

)

.

Example 2.3. Define 0n = (aij) where aij = 0 for i = 1, . . . ,m and j = 1, . . . , n.This is called the zero matrix of Mm×n(R). Moreover, define matrix In = (aij)where

aij =

{1 if i = j,0 if i 6= j,

for 1 ≤ i, j ≤ n. This is called the n × n identity matrix. Often the zero matrixand the identity matrix are just denoted by 0 and I respectively:

0 =

0 0 · · · · · · 00 0 · · · · · · 0...

. . . . . ....

0 0 · · · · · · 0

, I =

1 0 · · · 00 1 · · · 0...

. . ....

0 0 · · · 1

.

We next define addition of matrices:


Definition 2.4. Let A,B be two m × n matrices, say A = (aij) and B = (bij).Then the sum of A and B, written as A + B, is the matrix A + B = (cij) wherecij = aij + bij with 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Remark. Note that addition of matrices is only defined for matrices of the sameshape. If A,B are two m× n matrices then also A + B is an m× n matrix. Wesay: We add two matrices by adding the entries coordinate-wise. Note that thisalso defines addition of vectors (which are special matrices by definition).

Example 2.5. For example, for 3× 3 matrices we have:

a11 a12 a13

a21 a22 a23

a31 a32 a33

+

b11 b12 b13

b21 b22 b23

b31 b32 b33

=

a11 + b11 a12 + b12 a13 + b13

a21 + b21 a22 + b22 a23 + b23

a31 + b31 a32 + b32 a33 + b33

.

We next define scalar multiplication of matrices. Note that this definition includesalso the definition of the scalar multiplication of vectors.

Definition 2.6. The product of a matrix A ∈ Mm×n(R) by a scalar λ ∈ R,written as λA is the matrix C = (cij) obtained by multiplying each entry of A byλ: cij = λaij for 1 ≤ i ≤ m and 1 ≤ j ≤ n:

λA =

λa11 · · · λa1n

.... . .

...λam1 · · · λamn

.

Example 2.7. If A = (aij) is a square matrix with aij = 0 for all i 6= j, then Ais called diagonal matrix, and we write A = diag(a11, . . . , ann). A special type ofa diagonal matrix is the scalar matrix. Matrix B is a scalar matrix if B = kIn

for some k ∈ R.

A =

a11 0 . . . 00 a22 . . . 0...

.... . .

...0 0 . . . ann

, B = kIn =

k 0 . . . 00 k . . . 0...

.... . .

...0 0 . . . k

.

Proposition 2.8. For all A,B,C ∈Mm×n(R) and for all r, s ∈ R we have:

(1) A + B = B + A,(2) A + (B + C) = (A + B) + C,(3) A + 0 = A = 0 + A,(4) s(rA) = (sr)A,(5) (r + s)A = rA + sA,(6) r(A + B) = rA + rB.

Note that 0 in statement (3) denotes the zero matrix of shape m× n. Note thatstatement (2) says, when forming a sum of matrices, then brackets can safely beomitted.

The proof of the above statements is straight forward. To demonstrate how theyshould be written, we give an example by proving the first statement.

8 ANNE HENKE

Proof. Let A = (aij), B = (bij) ∈ Mm×n(R). Define (cij) = A + B and(dij) = B + A. By the definition of addition of matrices (see Definition 2.4), wehave cij = aij + bij and dij = bij + aij . Since real numbers are commutative withrespect to addition, that is x + y = y + x for any x, y ∈ R, this implies thatcij = dij . Hence A + B = B + A.

There is one more operation for matrices which we need, namely the product ofmatrices. We first recall the summation notation: We write shortly

∑n

i=1 ai forthe sum a1 + a2 + . . . + an of real numbers ai.

Definition 2.9. If A = (aij) is an m × n matrix over the real numbers andB = (bij) is an n × p matrix over the real numbers, then the product AB is anm × p matrix, defined as follows: AB = C = (cij) where cij =

∑n

k=1 aikbkj for1 ≤ i ≤ m and 1 ≤ j ≤ p.

Remark. Note that multiplication of two matrices A and B is only defined if thenumber of elements in a row of A equals the number of elements in a column ofB. If it is defined then the (i, j)th entry of the matrix C = AB is obtained bymultiplying the ith row of matrix A with the jth column of the matrix B:

cij =n∑

k=1

aikbkj = ai1b1j + ai2b2j + . . . + ainbnj.

Example 2.10. The above definition includes the multiplication of a matrix witha vector. For example, let A be a 3× 3 matrix and x be a 3× 1 vector, then Axis defined and is another 3× 1 vector. For example,

2 1 14 1 0−2 2 1

·

2−1

0

=

37−6

.

Proposition 2.11. Suppose A,B,C are matrices.

(1) If AB is defined then for all r ∈ R we have (rA)B = r(AB) = A(rB).(2) If AB,AC and B + C are defined then so is A(B + C) and AB + AC =

A(B + C).(3) If BA,CA and B + C are defined then so is (B + C)A and BA + CA =

(B + C)A.(4) If AB and BC are defined then so are (AB)C and A(BC), and (AB)C =

A(BC).

Remark. Note that statement (4) says, when forming a product of matrices thenbrackets can safely be omitted. Proofs of all except (4) are straight forward; theyare left as an exercise to the reader.

We conclude this section with defining some more language about matrices andby giving some more examples.

Example 2.12. (1) For every matrix A = (aij), there exists a matrix B suchthat A + B = 0 = B + A, namely B = (−aij). We call B the additive


inverse of A and write −A for it. Note that −A = (−1)A; this lastequation says that the additive inverse of A is given by multiplying thematrix A with the scalar (−1).

(2) If A ∈ Mn(R) (a square matrix) and there exists B ∈ Mn(R) such thatAB = BA = In, then we call B the (multiplicative) inverse of A. We writeA−1 for the inverse matrix of A. For example, the matrices A,B,D beloware invertible with A−1 = A, B−1 = D and D−1 = B. Matrix C is notinvertible. So there are non-zero square matrices which are not invertible.

A =

(0 11 0

)

, B =

(1 a0 1

)

, C =

(0 10 0

)

, D =

(1 −a0 1

)

.

Example 2.13. If A ∈ Mm×n(R) and A = (aij) then AT = (bij) where bij = aji.We call AT the transposed matrix of A. In particular AT ∈Mn×m(R). Note rowsof AT are columns of A, and columns of AT are rows of A. For example,

A =

(2 1 43 1 2

)

then AT =

2 31 14 2

.

We say A is a symmetric matrix if AT = A. We say A is skew symmetric ifAT = −A. We say A is orthogonal if AAT = AT A = I. Equivalently, A isorthogonal if A is invertible and A−1 = AT .

Proposition 2.14. Let A,B ∈Mm×n(R), C ∈Mn×p(R). Then

(1) (AT )T = A,(2) (A + B)T = AT + BT ,(3) (λA)T = λAT ,(4) (BC)T = CT BT .

Proof. We prove the first property and leave the others as straight forwardexercises to the reader. Let A = (aij) be a matrix of shape m × n. Note thatA and (AT )T have the same shape. By the definition of a transposed matrix,the (i, j)th entry of (AT )T equals the (j, i)th entry of AT , which in turn equalsthe (i, j)th entry of A. So the entries of A and (AT )T coincide. Hence indeedA = (AT )T .

Example 2.15. Let’s see on an example how to check that a matrix is symmet-ric. Let A,B be symmetric matrices of the same size. Is the matrix AB againsymmetric? We claim that AB is symmetric if and only if AB = BA. To provethis, we need to show two directions:

(a) Assume that AB = BA. We want to show that AB is symmetric. Byassumption we know that AT = A and BT = B. So by the assumption,and by using Proposition 2.14(4) we have: (AB)T = BT AT = BA = AB.

(b) Conversely, assume now that AB is symmetric. We want to show thatAB = BA. Indeed, AB = (AB)T = BT AT = BA, where the first equalityholds by assumption that AB is symmetric, the second one uses Proposi-tion 2.14(4) and the third equality uses that A and B are symmetric byassumption.

10 ANNE HENKE

Definition 2.16. If A is an n × n matrix, say A = (aij), then the trace of A isdefined to be tr(A) =

∑n

i=1 aii, the sum of all the elements on the main diagonalof A.

Example 2.17. We have tr(In) = n and tr(A) = 2 + 1 + 1 = 4 where

A =

2 1 14 1 0−2 2 1

.

Proposition 2.18. Let A,B ∈Mm(R), C ∈Mm×n(R), D ∈Mn×m(R). Then

(1) tr(AT ) = tr(A),(2) tr(A + B) = tr(A) + tr(B),(3) tr(λA) = λtr(A),(4) tr(DC) = tr(CD).

Proof. We prove (4), the rest is left as an exercise to the reader. Let (xij) = DC.Then xij =

∑m

k=1 dikckj. Similarly, let (yij) = CD. Then yij =∑n

t=1 citdtj. Sinceaddition and multiplication in R is commutative, we have:

tr(DC) =n∑

t=1

xtt

=n∑

t=1

m∑

k=1

dtkckt =m∑

k=1

n∑

t=1

cktdtk

=m∑

k=1

ykk = tr(CD).

Exercise 1. Let ω be a complex cube root of 1 (this means ω3 = 1) with ω 6= 1.Prove that 1 + ω + ω2 = 0. Letting A be the matrix

1 1 11 ω ω2

1 ω2 ω

Determine A2 and A−1.

Exercise 2. Let A and B be two matrices such that AB and BA are defined andof the same size. We say that A and B commute with respect to multiplicationif AB = BA. Now let A be a 2× 2 matrix with entries in R.

(a) Show that A commutes with

(1 00 −1

)

if and only if A is diagonal.

(b) Show that A commutes with

(1 00 0

)

if and only if A is diagonal.

(c) Which 2× 2 matrices A commute with

(0 10 0

)

?


(d) Deduce that A commutes with all 2 × 2 matrices if and only if A =(

λ 00 λ

)

for some λ ∈ R.

(For those of you who found this problem too easy: Find all n×n matrices whichcommute with any matrix A ∈Mn(R) for fixed n ∈ N. Justify your answer.)

Exercise 3. For each a ∈ R, define the matrix A(a) by

A(a) =

1 a 12a2

0 1 a0 0 1

.

Show that for all a, b ∈ R we have A(a+ b) = A(a)A(b). Deduce that each matrixA(a) is invertible.

Exercise 4. Let A and B be two square matrices of the same size with A sym-metric and B skew symmetric. Determine which of the following matrices aresymmetric and which are skew symmetric, and justify your answer:

(a) AB + BA,(b) AB −BA,(c) A2,(d) B2,(e) BT (AT + A)B,(f) BT (A− AT )B.

(Do you need the assumptions on A and B in all cases?)

Exercise 5. Let A,D be square matrices of the same size. Show that the followingstatements are true.

(a) If A is invertible then the inverse is unique.(b) If A,D are invertible then AD is also invertible.(c) Let B ∈Mm×n(R) and C ∈Mn×p(R). Then (BC)T = CT BT .(d) If A is invertible, then so is AT and (AT )−1 = (A−1)T .

Exercise 6. Show that a 2× 2 matrix A =

(a bc d

)

has an inverse if and only

if ad− bc 6= 0. Find A−1.

Exercise 7. Let C and D be square matrices of the same size. Show that if bothC, D are orthogonal, then so is C−1, CD and C−1D.

12 ANNE HENKE

3. Vector Spaces

We next define the main objects of linear algebra, the vector spaces over a fieldK. Although we will define vector spaces over any field K, you may always takeK = R. The concrete example of vectors will underpin the abstract definition.

Definition 3.1. A vector space V over K is a triple (V,+, ·) where

(a) V is a non-empty set,(b) + is addition of vectors, that is

+ : V × V → V with (u, v) 7→ u + v for any u, v ∈ V ,

(c) · is scalar multiplication of vectors by elements of K, that is

· : K × V → V with (λ, v) 7→ λ · v for any λ ∈ K and v ∈ V ,

such that:

(V1) u + v = v + u for all u, v ∈ V.(V2) (u + v) + w = u + (v + w) for all u, v, w,∈ V .(V3) There exists a special element in V , denoted by 0V , satisfying v + 0V =

0V + v = v for all v ∈ V . We call 0V the zero element of V .(V4) For every v ∈ V there exists a special element in V , denoted by −v,

satisfying v + (−v) = (−v) + v = 0V . We call −v the additive inverse ofv.

(V5) λ(u + v) = λu + λv for all λ ∈ K and all u, v ∈ V .(V6) (λ + µ)v = λv + µv for all λ, µ ∈ K and for all v ∈ V .(V7) λ(µv) = (λµ)v for all λ, µ ∈ K and all v ∈ V .(V8) 1 · v = v for all v ∈ V .

The elements of V are called vectors. If it is understood what + and · are, we willshortly write V is a vector space instead of writing (V,+, ·) is a vector space. Wealso will typically write λv instead of λ ·v, and 0 instead of 0V . Note also that theaddition and scalar multiplication are examples of so-called binary operations.

Remark. Note that the statements in (b) and (c) mean that when checking thatsome set V is a vector space, you have to check that for all u, v ∈ V and λ ∈ K,the resulting elements u + v and λv are indeed elements belonging to V .

Example 3.2. The canonical example of a vector space is V = Rn where

Rn =

x1...

xn

∣∣∣∣∣∣

xi ∈ R

.

Given two elements u, v ∈ V , then there are xi, yi ∈ R for 1 ≤ i ≤ n such that

u =

x1...

xn

v =

y1...

yn

.


We define the addition of elements in Rn and the scalar multiplication as follows:

u + v :=

x1 + y1...

xn + yn

, λu :=

λx1...

λxn

.

We now need to check several things to prove that (V,+, ·) is indeed a vectorspace.

(a) Clearly V is a non-empty set.(b) We need to check that V is closed with respect to addition: Note that as

xi, yi ∈ R we have xi + yi ∈ R for i = 1, . . . , n. This implies that indeedu + v ∈ Rn = V . Hence V is closed with respect to addition.

(c) We need to check that V is closed with respect to scalar multiplication:Note that if λ, xi ∈ R then λxi ∈ R. Hence λu ∈ V . So V is indeed closedwith respect to scalar multiplication.

We next have to check that the axioms (V 1)− (V 8) hold in V = Rn. Given anyelements u, v, w ∈ V , then there are xi, yi, zi ∈ R for 1 ≤ i ≤ n such that

u =

x1...

xn

v =

y1...

yn

w =

z1...zn

.

(V1) Since for real numbers we have xi+yi = yi+xi (for i = 1, . . . , n), it followsthat

u + v =

x1 + y1...

xn + yn

=

y1 + x1...

yn + xn

= u + v.

(V2) Since for real numbers we have (xi+yi)+zi = xi+(yi+zi) (for i = 1, . . . , n),it follows that

(u + v) + w =

(x1 + y1) + z1...

(xn + yn) + zn

=

x1 + (y1 + z1)...

xn + (yn + zn)

= u + (v + w).

(V3) We take

0V =

0R

...0R

.

Note that 0R denotes here zero of the real numbers. Clearly 0V ∈ Rn.Over the real numbers we have xi + 0 = 0 + xi = xi (for i = 1, . . . , n),which implies that

u + 0V =

x1 + 0R

...xn + 0R

=

x1...

xn

= u.

Similarly, 0V + u = u.

14 ANNE HENKE

(V4) Given a vector p ∈ V with say coordinate entries ai, we take q to havecoordinate entries −ai:

p =

a1...

an

, q =

−a1...−an

.

Clearly then q ∈ V , and moreover

p + q =

a1 + (−a1)...

an + (−an)

=

0...0

.

So indeed p + q = q + p = 0V .(V5) If λ ∈ R then

λ(u + v) = λ

x1 + y1...

xn + yn

by the definition of addition for vectors,

=

λ(x1 + y1)...

λ(xn + yn)

by the definition of scalar multiplication,

=

λx1 + λy1...

λxn + λyn

using properties of real numbers,

= λ

x1...

xn

+ λ

y1...

yn

= λu + λv,

using again the definition of addition of vectors and the definition of scalarmultiplication.

(V6) If λ, µ ∈ R then

(λ + µ)

x1...

xn

=

(λ + µ)x1...

(λ + µ)xn

definition of scalar multiplication in Rn,

=

λx1 + µx1...

λxn + µxn

properties of real numbers,

=

λx1...

λxn

+

µx1...

µxn

definition of addition in Rn,

= λ

x1...

xn

+ µ

x1...

xn

definition of scalar multiplication in Rn.

Hence indeed (λ + µ)u = λu + µu for all u ∈ V .


(V7) Let λ, µ ∈ R. For real numbers we have λ(µxi) = (λµ)xi in R, wherei = 1, . . . , n. Then

λ(µu) = λ

µx1...

µxn


=

λ(µx1)...

λ(µxn)


=

(λµ)x1...

(λµ)xn

properties of real numbers,

= (λµ)

x1...

xn

definition of scalar multiplication in Rn.

(V8) The element 1 in R has the property that 1xi = xi for all xi ∈ R. So

1 ·

x1...

xn

=

1x1...

1xn

=

x1...

xn

.

So 1v = v for all v ∈ Rn.

Hence we have verified that Rn with addition and scalar multiplication as aboveis indeed a vector space.

Example 3.3. Fix a natural number n. Define Rn[x] to be the set of all polyno-mials f(x) of degree less than or equal to n with coefficients in R. So

Rn[x] = {f | f(x) = a0 + a1x + · · ·+ anxn with ai ∈ R for i = 0, . . . , n},which clearly is a non-empty set. Addition is defined on Rn[x] as follows. Let

f(x) = a0 + a1x + · · ·+ anxn,

g(x) = b0 + b1x + · · ·+ bnxn.

then (f + g)(x) := (a0 + b0) + · · · + (an + bn)xn. Clearly f + g ∈ Rn[x] sinceai + bi ∈ R for i = 0, . . . , n. Hence Rn[x] is closed with respect to addition.Scalar multiplication is defined by (λf)(x) = (λa0)+ · · ·+(λan)xn. As λai ∈ R itfollows that Rn[x] is closed with respect to scalar multiplication. To check axioms(V 1)− (V 8) is left as an exercise to the reader. Note that the set of polynomialsof (some fixed) degree n is not forming a vector space.

Example 3.4. Consider the m× n matrices with entries in R, that is, consider

V = Mm×n(R) = {A | A = (aij), aij ∈ R, i = 1, . . . ,m, j = 1, . . . , n}.Then Mm×n(R) forms a vector space with component-wise addition (see Defini-tion 2.4) and scalar multiplication (see Definition 2.6). The proof is left as anexercise. Hint: the element 0V is given in Example 2.3. Note that Proposition 2.8shows some of the vector space axioms for Mm×n(R).

16 ANNE HENKE

Example 3.5. There are many more examples of vector spaces, turning up indifferent areas of mathematics. Vector spaces are defined over any field K. Similarto above we have that

(1) (K,+, ·) is a K-vector space,(2) (Kn[x],+, ·) is a K-vector space,(3) Mm×n(K),+, ·) is a K-vector space.

The precise definitions of the sets and the binary operations is left to the reader.When doing this exercise it becomes apparent which properties a field needs tohave. We also have the following variations of the above examples:

(1) (Rn,+, ·) is an R-vector space (by Example 3.2).(2) (Cn,+, ·) is a C-vector space (by generalising Example 3.2 to fields).(3) (C,+, ·) is an R-vector space.(4) (R,+, ·) is a Q-vector space.

And so on. In this course we only consider vector spaces over the real numbers.Vector spaces over other fields than the real numbers are not part of the syllabus.From now on we will work only with vector spaces over R. It should however benoted that we could develop our theory equally for vector spaces over fields. Theinterested reader may try this as a (not so difficult, and eventually boring) exer-cise. Examples (3) and (4) are examples of so-called field extensions, somethingstudied in the second year in the course “Fields”.

Example 3.6. Let X be a non-empty set. Let V = {f : X → R}. We writex 7→ f(x) for x ∈ X. Then V is a vector space over R with the following additionand scalar multiplication:

(a) Addition is defined by: For f1, f2 ∈ V define (f1 + f2)(x) := f1(x)+ f2(x).Note that f1(x) ∈ R and f2(x) ∈ R, and hence f1(x) + f2(x) ∈ R. Sof1 + f2 ∈ V and hence V is closed with respect to addition.

(b) Scalar multiplication is defined by: For f ∈ V and λ ∈ R define (λf)(x) =λ · f(x). Since λ ∈ R and f(x) ∈ R, this implies λ · f(x) ∈ R. Henceλf ∈ V and V is closed with respect to scalar multiplication.

In case X = R, we denote the vector space V by RR.


4. First Properties of Vector Spaces

Using the axioms (V 1) − (V 8) of vector spaces, we now can derive some firstproperties of vector spaces. Throughout this section, let V be a vector space overR.

Lemma 4.1. The following statements hold:

(i) The zero vector 0V is unique.(ii) The additive inverse of any element in V is unique.

Proof. (i) Let 0V , 0′V ∈ V both have the property described in Axiom (V3),that is:

v + 0V = 0V + v = v ∀v ∈ V, (a)v + 0′V = 0′V + v = v ∀v ∈ V. (b)

Then put v = 0′V in (a) and we have 0′V + 0V = 0V + 0′V = 0′V . Put v = 0V in (b),then 0V + 0′V = 0′V + 0V = 0V . Hence 0V = 0′V .

(ii) Suppose we have two elements −v and v′ both satisfying Axiom (V4), that is:

v + (−v) = (−v) + v = 0V , (a)v + v′ = v′ + v = 0V . (b)

Then

v + (−v) + v′ = (v + (−v)) + v′ by Axiom (V2),= 0V + v′ by (a),= v′ byAxiom(V3).

Now v + (−v) + v′ = (v + v′) + (−v) byAxioms(V1), (V2)= 0V + (−v) by (b),= −v by Axiom (V3).

Hence v′ = −v.

Lemma 4.2. Let V be a vector space over R. Then for all u, v ∈ V and λ ∈ Rwe have:

(i) 0R · v = 0V ;

(ii) λ · 0V = 0V ;

(iii) if λ · v = 0V then λ = 0R or v = 0V ;

(iv) (−1) · v = −v;

(v) (−1) · (−v) = v.

(vi) −0V = 0V .

18 ANNE HENKE

Proof. (i) We need to show that 0R · v satisfies Axiom (V3). If so, thenLemma 4.1(i) implies 0R · v = 0V . Now

v + 0R · v = 1 · v + 0R · v by (V8),

= (1 + 0R) · v by (V6),

= 1 · v since 1 + 0R = 1 in R,

= v by (V8).

Therefore 0R · v = 0V .

(ii) Exercise.

(iii) Let λ · v = 0V and λ 6= 0. We prove that v = 0V . Now

v = 1 · v by (V8),

= (λ−1 · λ) · v as λ−1 · λ = 1 in R and λ 6= 0,

= λ−1 · (λ · v) by (V7),

= λ−1 · 0V by assumption,

= 0V by (ii).

(iv)-(vi) Exercise.

Exercise 8. In each of the following cases, either give a careful proof that V is avector space over R, or give a reason why it is not:

(a) V is the set of all polynomials over R (in one variable, say x) which havea non-zero constant term, with the usual addition of polynomials and theusual scalar multiplication.

(b) V is the set of all functions f : X → R (for some fixed non-empty set X),and if f, g ∈ V , α ∈ R, then the functions f + g, αf are defined by setting

(f + g)(x) = f(x) + g(x), (αf)(x) = αf(x).

(c) V is the set of all symmetric n× n matrices over R.(d) V is the set of all skew-symmetric n× n-matrices over R.(e) V is the set of all invertible n× n-matrices over R.(f) V = R2 with the usual scalar multiplication and the new addition ⊕ :

V × V → V given by(

x1

x2

)

⊕(

y1

y2

)

=

(x1 + y2

x2 + y1

)

.

Exercise 9. Let V be a vector space over R. Use the vector space axioms toshow that for all v ∈ V and all λ ∈ R the following holds:

(a) λ · 0V = 0V , (b) (−1)v = −v.


5. Subspaces of Vector Spaces

Definition and Examples. Given an object in mathematics, one typically alsodefines ’sub-objects’ of this object. Assuming that the object is defined as a setwith certain properties, then the sub-objects are subsets of the original objectwith the same properties. In this section we define ’sub-objects’ for vector spaces.Throughout this section V denotes a vector space over R.

Definition 5.1. Let (V,+, ·) be a vector space over R. A non-empty subset Wof V is a (vector) subspace if and only if (W, +, ·) is a vector space over R. Wewrite W ≤ V and read this as W is a subspace of V .

Remarks. (a) Every vector space V has at least two subspaces: {0V } and V itself.Any subspace W of V with W 6= {0V } and W 6= V is called a proper subspace.

(b) The zero element of a subspace W of V always coincides with the zero elementof V . To see this, use Lemma 4.1 and Definition 3.1 for V .

(c) The two binary operations (addition and scalar mulitiplication) needed todefine a vector space structure on W are precisely the two binary operationsgiven with the vector space V , restricted to the subset W . The definition nowsays, that in order to check that a subset of a vector space is again a vectorspace, we need to check that W is closed with respect to addition and scalarmultiplication, and that the eight axioms (V1)-(V8) of a vector space hold for W .Similar as W inherits the binary operations from V , several of the axioms holdautomatically for the elements of a subset of a vector space.

Lemma 5.2 (First subspace test). A non-empty subset W of a vector space V isa subspace of V if and only if it is closed under addition and scalar multiplication:

(i) If w1, w2 ∈W then w1 + w2 ∈W .(ii) If w ∈W and λ ∈ R then λw ∈W .

Proof. The proof consists of showing two statements.“⇒”: For this direction of the proof, nothing is to show. We assume that W ⊆ Vis a vector space. By Definition 3.1 (applied to W ), the set W is closed underaddition and scalar multiplication. Hence (i) and (ii) hold.

“⇐”: Given W ⊆ V and W 6= ∅ and (i), (ii) hold. We claim that W is a vectorspace. By (i) and (ii) we know that W is closed with respect to addition andscalar multiplication. We need to check that (V1)-(V8) hold for W .

(1) Axioms (V1), (V2) and (V5)-(V8) are inherited from V as W ⊆ V .(2) Since W 6= ∅, there exists w ∈ W . By assumption (ii) we know that

0Rw ∈ W . By Lemma 4.2 (applied to the vector space V ) we have:0Rw = 0V ∈W . Axiom (V3) holds in V by assumption. Hence 0V +w = wfor all w ∈W ⊆ V . Hence 0W = 0V , and (V3) holds in W .

(3) Let w ∈ W ⊆ V . Then −w = (−1)w by Lemma 4.2 (applied to V ).Assumption (ii) implies that −w ∈ W . Axiom (V4) holds in V , hencew + (−w) = 0V for all w ∈ W ⊆ V . Since 0W = 0V it follows that (V4)holds in W .

20 ANNE HENKE

We call Lemma 5.2 the first subspace test. It obviously speeds up the checkingthat we have to do in order to prove that a given non-empty subset of a vectorspace is a subspace. Often you will see conditions (i), (ii) in Lemma 5.2 simplifiedinto one condition. This is called the second subspace test.

Lemma 5.3 (Second subspace test). A non-empty subset W of a vector space V isa subspace if and only if for any λ1, λ2 ∈ R and w1, w2 ∈W then λ1w1+λ2w2 ∈W .

Proof. The proof consists of showing two statements.“⇒:”: If W 6= ∅ is a vector space then from closure laws we have λ1w1+λ2w2 ∈Wfor any λ1, λ2 ∈ R and w1, w2 ∈ W .

“⇐:”: Let W ⊆ V with W 6= ∅. Assume λ1w1 + λ2w2 ∈ W for all w1, w2 ∈ Wand λ1, λ2 ∈ R. We need to show that W is a vector space over R. It is sufficientto show that (i), (ii) of Lemma 5.2 hold. Given w1, w2 ∈ W .

(1) Choose λ1 = λ2 = 1. We calculate in the vector space V . Then byassumption, λ1w1 + λ2w2 = 1 · w1 + 1 · w2 = w1 + w2 lies in W . Here weused (V8) for V . This shows that W is closed with respect to addition.

(2) Choose λ1 = 1 and λ2 = 0. Then by assumption

λ1w1 + λ2w2 = 1 · w1 + 0 · w2 = w1

is an element of W . Here we used Lemma 4.2 and the axioms (V3) and(V8) for V . Hence W is closed with respect to scalar multiplication.

Lemma 5.2 now implies that W is a subspace of V .

Lemma 5.4. Let U be a subspace of V . For any k ∈ N and u1, . . . , uk ∈ U andα1, . . . , αk ∈ R we have

α1u1 + . . . + αkuk =k∑

i=1

αiui ∈ U.

Proof. The proof is by induction on k, using Lemma 5.3. Let k = 1 then theclaim follows from the definition of a vector space, see Definition 3.1. If k = 2then the claim follows from Lemma 5.3. Assume the claim is true for some k ≥ 2.Consider

x = α1u1 + . . . + αkuk + αk+1uk+1.

Put u = α1u1 + . . . + αkuk. By induction assumption, u ∈ U . By (V8) we haveu = 1 · u. Hence x = 1 · u + αk+1uk+1. By Lemma 5.3 we have x ∈ U .

Example 5.5. We give various examples of subspaces:

(1) R is a subspace of the R-vector space C = {a + bi | a, b ∈ R}.(2) A subset {0} 6= U ( R2 is a subspace if and only if U is a straight line

through the origin.


(3) The subspaces of R3 are precisely the following subsets of R3:(a) the origin,(b) lines through the origin,(c) planes through the origin,(d) and R3.

(4) Consider the set

W = {A ∈Mn(R) | A = (aij) with aij = 0 for i > j}.So elements in W are matrices A of the form:

A =

a11 a12 · · · · · · a1n

0 a22 · · · · · · a2n

0 0 a33 · · · a3n

.... . .

. . ....

0 · · · 0 ann

with aij ∈ R. A matrix A of this form is called an upper triangular matrix.We claim: The set W is a subspace of Mn(R). We use Lemma 5.2 to showthis. Note W 6= ∅ since the zero matrix lies in W . If A = (aij), B =(bij) ∈W then A + B = (cij) where cij = aij + bij, and if aij = 0 for i > jand bij = 0 for i > j then cij = 0 for i > j. Hence A + B ∈ W . Similarlyfor λ ∈ R, if A = (aij) with aij = 0 for i > j then λaij = 0 for i > j. SoλA ∈ W . By Lemma 5.2 it follows that W is a subspace of Mn(R).

(5) Let A ∈ Mm×n(R). Then W = {x ∈ Mn×1(R) | Ax = 0} is a subspace ofMn×1(R).

Example 5.6. We next give examples of subsets of vector spaces which are notsubspaces.

(1) Let V = R3 and

W =

xy0

∣∣∣∣∣∣

x ≥ 0, y ≥ 0

.

So the elements of W consist of precisely all points in the first quadrantof the x-y-coordinate system. Let

v =

110

. Then − v =

−1−1

0

.

Here v lies in W but the additive inverse −v /∈ W . Hence (V4) does nothold, and W is not a subspace of V .

(2) Let V be the set of square matrices of size n. Then the subset of invertiblematrices of V is not a subspace.

(3) Let V be the set of square matrices of size n. Then the subset of orthogonalmatrices of V is not a subspace.

Intersection, union and sum of subspaces. Given two subspaces U,W of avector space V . Let U ∩W be the set theoretic intersection of the sets U and W ,

22 ANNE HENKE

and let U ∪W be the set theoretic union of the sets U and W :

U ∩W = {v ∈ V | v ∈ U and v ∈W},U ∪W = {v ∈ V | v ∈ U or v ∈ W}.

Definition 5.7. Let U,W be subspaces of a vector space V . The sum of thesubspaces U,W is the set

U + W = {u + w | u ∈ U,w ∈W}.

Clearly U + W is a subset of V .

Example 5.8. Let V = R2 and let U be the x-axis and W be the y-axis:

U =

{(x0

)∣∣∣∣x ∈ R

}

, W =

{(0y

)∣∣∣∣y ∈ R

}

.

Then U and W are subspaces of V and

U ∩W =

{(00

)}

6= ∅,

U ∪W =

{(x0

)∣∣∣∣x ∈ R

}

∪{(

0y

)∣∣∣∣y ∈ R

}

,

U + W =

{(x0

)

+

(0y

)∣∣∣∣x, y ∈ R

}

=

{(xy

)∣∣∣∣x, y ∈ R

}

= R2.

The sets U ∩W and U + W are vector spaces, and hence subspaces of V . Theset U ∪W is not a vector space. It is not closed under addition: (1, 0)T ∈ U and(0, 1)T ∈W , hence (1, 0)T , (0, 1)T ∈ U ∪W . However

(10

)

+

(01

)

/∈ U ∪W.

Example 5.9. Let V = M2×2(R), the set of 2×2 matrices with entries in R. Let

U =

{(a b0 0

)∣∣∣∣a, b ∈ R

}

, W =

{(x 0y 0

)∣∣∣∣x, y ∈ R

}

.

Then U and W are subspaces of V , and

U + W =

{(z by 0

)∣∣∣∣z, b, y ∈ R

}

,

U ∩W =

{(a 00 0

)∣∣∣∣a ∈ R

}

,

since a + x = z describes the whole of R as x and a do. The sets U ∩W andU + W are vector spaces, and hence subspaces of V . The set U ∪W is not avector space: It is not closed under addition, similar as in the previous example.


The proofs of the following (important) propositions are left as an exercise to thereader.

Proposition 5.10. Let U,W be subspaces of a vector space V .

(a) Then U ∩W is a subspace of V .(b) Then U + W is a subspace of V .(c) Then U ∪W is a subspace of V if and only if W ⊆ U or U ⊆ W .

Remark. In general U ∪W is not a subspace of V , but as a set of vectors in V ,it will generate a subspace. The space generated by U ∪W is precisely the sumof U,W . The sum U + W is in fact the smallest subspace containing both U,W .What it means for some elements to generate a vector space, we explain in thenext section.

Exercise 10. Let U,W be subspaces of a vector space V . Prove that

(a) U ∩W is a subspace of V ;(b) U + W = {u + w | u ∈ U,w ∈W} is a subspace of V ;(c) U ∪W is a subspace of V if and only if U ⊂W or W ⊂ U .

Exercise 11. Let V = R[x], the vector space of all real polynomials in onevariable x. Determine whether or not U is a subspace of V when:

(a) U consists of all polynomials with degree ≥ k for fixed k, together withthe zero polynomial;

(b) U consists of all polynomials with only even powers of x;(c) U consists of all polynomials with integral coefficients;(d) U consists of all polynomials p(x) ∈ R[x] with p(1) = p(5).

Exercise 12. (a) Let α ∈ R. Prove that Uα = {(x1, x2, x3) ∈ R3 | x1 + x2 +x3 = α} is a subspace of R3 if and only if α = 0.

(b) Is the set U = {(x1, x2, x3, x4) ∈ R4 | x21 = 2x2 and x1 + x2 = x3 + x4} a

subspace of R4? Justify your answer.

Exercise 13. (a) Let S be the subset {(x, 0) | x ∈ R and x > 0} of R2.Is S a subspace of the vector space R2 with respect to the usual scalarmultiplication and the usual addition of R2?

(b) Let S be the subset {(x, 0) | x ∈ R and x > 0} of R2. Define the scalarmultiplication ∗ and addition ⊕ on S by:

α ∗ (u, 0) = (uα, 0), (u, 0)⊕ (v, 0) = (uv, 0)

for all α, u, v ∈ R with u, v > 0. Show that S is a vector space with respectto ∗ and ⊕. Is (S, ∗,⊕) a subspace of R2?

Exercise 14. If A is a real n× n-matrix, prove that {x ∈Mn×1(R) | Ax = 0} isa subspace of Rn.

Exercise 15. For each of the following statements about subspaces X,Y, Z of avector space V either give a proof of the statement, or find a counterexample. R2

and R3 will provide all the counterexamples required.

(a) V \X is never a subspace of V ;

24 ANNE HENKE

(b) (X ∩ Y ) + (X ∩ Z) = X ∩ (Y + Z);(c) (X + Y ) ∩ (X + Z) = X + (Y ∩ Z);(d) if Y ⊆ X, then Y + (X ∩ Z) = X ∩ (Y + Z).


6. Linear Dependence, Linear Independence and Spanning

Throughout this section V denotes a vector space over R. Whenever we refor-mulate equations involving vectors in this (or later) sections, the reader is urgedto determine which vector space axioms or earlier lemmatas have been applied toobtain the reformulated equation.

Spanning.

Definition 6.1. Let S = {v1, . . . , vn} be a subset of a vector space V .

(1) We call any expression of the form λ1v1 + · · ·+ λnvn with λi ∈ R a linearcombination of v1, . . . , vn.

(2) The span of S, denoted by Span(S) or Span{v1, . . . , vn} or 〈v1, . . . , vn〉, isthe set of all linear combinations of v1, . . . , vn:

〈v1, . . . , vn〉 = {λ1v1 + · · ·+ λnvn | ∀λi ∈ R}.Example 6.2. We give various examples of vectors spanning a vector space. Inparticular the examples show that a vector space V has many different spanningsets (also called generating system).

(1) Let V = Rn. Define vector ei ∈ V by letting all coordinates be zero exceptthe ith coordinate which is one:

ei =

0...010...0

← ithcoordinate.

Then Span{ei | 1 ≤ i ≤ n} = Rn since for any vector (x1, . . . , xn)T ∈ Vwe have (x1, . . . , xn)T =

∑n

i=1 xiei.(2) Define the vectors

vi =i∑

j=1

ej

for 1 ≤ i ≤ n. Then Span{vi | 1 ≤ i ≤ n} = Rn. The proof is left as anexercise to the reader.

(3) Let V = R3. Then

V = Span

100

,

010

,

001

= Span

100

,

110

,

111

= Span

100

,

010

,

110

,

011

.

26 ANNE HENKE

So the spanning sets of a vector space V can have different cardinality.(4) Let V = Mn(R). Define Eij = (akl)1≤k≤n,1≤l≤n by

akl =

{1 if k = i, l = j,0 otherwise.

Then V = Span {Eij | 1 ≤ i ≤ n, 1 ≤ j ≤ n} since any matrix B = (bij) =∑n

i=1

∑n

j=1 bijEij .

Proposition 6.3. Let V be a vector space over R and S = {v1, . . . , vn} ⊆ V.Then Span(S) is a subspace of V . It is the smallest subspace of V containing S.

Proof. Write X = Span{v1, . . . , vn}. Note that X 6= ∅ as S ⊆ X. Let u,w ∈ X.Then there exist αi, βi ∈ R with

u =

n∑

i=1

αivi, w =

n∑

i=1

βivi,

see Definition 6.1. Let λ, µ ∈ R. Then

λu + µw = λ

(n∑

i=1

αivi

)

+ µ

(n∑

i=1

βivi

)

= (λα1 + µβ1)v1 + · · ·+ (λαn + µβn)vn ∈ X,

since λαi +µβi ∈ R. By Lemma 5.3 it follows that X is a subspace. Assume thatY is the smallest subspace of V containing S. Then any linear combination ofthe vectors vi lies in Y , see Lemma 5.4. Hence certainly X ⊆ Y . Since Y is bydefinition the smallest subspace containing S, it follows that X = Y .

Remark. At this point we can revisit the last remark given in Section 5. It isnow an easy exercise to show that the space generated by U ∪W is precisely thesum of U and W . It then follows from the last proposition, that it is the smallestsubspace containing both U and W .

Linear (in)dependence.

Definition 6.4. Let V be a vector space over R and let {v1, . . . , vn} ⊂ V.

(1) We say {v1, . . . , vn} is linearly dependent if there exist scalars λ1, . . . , λn ∈R, not all zero, such that λ1v1 + · · ·+ λnvn = 0.

(2) We say {v1, . . . , vn} is linearly independent if whenever α1v1+· · ·+αnvn =0 for αi ∈ R then αi = 0 for 1 ≤ i ≤ n.

Remark 6.5. (1) The phrase “not all zero” means that there is at least oneof the λi which is not zero.

(2) Note that a set of vectors is linearly independent if and only if it is notlinearly dependent.

(3) By convention the empty set is linearly independent.(4) If vi = 0 for some i then the set {v1, . . . , vn} is linearly dependent: take

λi = 1 and λj = 0 for i 6= j, then∑n

k=1 λkvk = 0.


Example 6.6. Let V = R3. Let

v1 =

100

, v2 =

010

and v3 =

−210

.

(1) Then {v1, v2, v3} are linearly dependent as (−2)v1 + v2 + (−1)v3 = 0.(2) Then {v1, v2} are linearly independent.

Proof. Assume α1v1 + α2v2 = 0 for any α1, α2 ∈ R. Then

000

= α1

100

+ α2

010

=

α1

α2

0

.

Hence α1 = 0 and α2 = 0.

Lemma 6.7. Let V be a vector space over R, and let S = {v1, . . . , vn} ⊆ V .

(1) If S is linearly independent then any T ⊆ S is linearly independent.(2) If S is linearly dependent then any V ⊇ T ⊇ S (T finite) is linearly

dependent.

Proof. We prove the first statement. Assume there is a subset T ⊆ S whichis linearly dependent. Without loss of generality we may assume that T ={v1, . . . , vk} with k ≤ n. Then there exist ai ∈ R for 1 ≤ i ≤ k with a1v1 + · · · +akvk = 0, and not all ai zero. Extend this to a relation between v1, . . . , vk, . . . , vn :

0 = a1v1 + · · ·+ akvk + 0vk+1 + · · ·+ 0vn.

Here not all the ai are zero. This implies {v1, . . . , vn} is linearly dependent, acontradiction. Hence any subset T ⊆ S is linearly independent. The secondstatement is equivalent to the first one.

Proposition 6.8. Let n ≥ 2. The vectors v1, . . . , vn are linearly dependent if andonly if one of them can be expressed as a linear combination of the others.

Proof. “⇒”: Suppose v1, . . . , vn are linearly dependent. Then there is a relation:

λ1v1 + · · ·+ λnvn = 0,

and not all λi are zero. Suppose λj 6= 0. Then

λjvj = −n∑

l=1l 6=j

λlvl.

Since λj ∈ R with λj 6= 0, hence 1λj

exists. So

vj = − 1λj

n∑

l=1l 6=j

λlvl =n∑

l=1l 6=j

(− λl

λj)vl.

Hence vj is a linear combination of the other vectors.

28 ANNE HENKE

“⇐”: Suppose vector vk is a linear combination of v1, . . . , vk−1, vk+1, . . . , vn. Thenthere exist αi ∈ R for 1 ≤ i ≤ n, i 6= k with

vk =n∑

i=1i6=k

αivi.

Hence

0 =n∑

i=1i6=k

αivi + (−1)vk.

Let αk = −1. Then 0 =∑n

i=1 αivi and v1, . . . , vn are linearly dependent.

Example 6.9. For the first two examples let V = Rn.

(1) For 1 ≤ i ≤ n, define the vector ei ∈ V as above, by letting all coordinatesbe zero except the ith coordinate which is one. Then {e1, . . . , en} is linearlyindependent. Any subset of {e1, . . . , en} is also linearly independent.

(2) The vectors

v1 =

10...0

, v2 =

11...0

, . . . , vn =

11...1

are linearly independent. The proof is left as an exercise to the reader.(3) Let 1 ≤ i, j ≤ n and let V = Mn(R). Define Eij = (akl)1≤k≤n,1≤l≤n by

akl =

{1 if k = i, l = j,0 otherwise.

Then {Eij | 1 ≤ i ≤ n, 1 ≤ j ≤ n} is linearly independent.

Remark. We have worked with finite subsets S of a vector space, both for spanningand linear independence. In fact, if we would want to include infinite sets S inour study, we would need to be more careful with our definitions. If S is any(possibly infinite) subset of a vector space V , we define the span of S to be theset of all linear combinations of finite subsets of S. We say an infinite family ofvectors S is linearly independent, if each finite subset of S is linearly independent.In this course we work only with so-called finite dimensional vector spaces. It willtherefore be enough to always assume that S is finite.

Exercise 16. (a) Which of the following sets of vectors in R3 are linearlyindependent?(i) {(1, 3, 0), (2,−3, 4), (3, 0, 4)},(ii) {(1, 2, 3), (2, 3, 1), (3, 1, 2)}.

(b) Which of the following sets of vectors in V = {f : R → R} are linearlyindependent?(i) {f, g, h} with f(x) = 5x2 + x + 1, g(x) = 2x + 3 and h(x) = x2 − 1.(ii) {p, q, r} with p(x) = cos2(x), q(x) = cos(2x) and r(x) = 1.


(c) Determine all α ∈ R for which the set {(1, α, α), (α, 1, α) and (α, α, 1)} islinear independent.

Exercise 17. Let V be an R-vector space, n ∈ N and v1, . . . , vn ∈ V . Definevectors wi for 1 ≤ i ≤ n by

wi =i∑

j=1

vj.

(a) Show that Span{v1, . . . , vn} = Span{w1, . . . , wn}.(b) Show that {w1, . . . , wn} is linearly independent if and only if {v1, . . . , vn}

is linearly independent.

Exercise 18. Consider the vector space R3.

(a) Find four vectors a, a′, b, b′ ∈ R3 such that if A = Span{a, a′} and B =Span{b, b′} then A + B = R3 and A ∩B = Span{(1, 1, 1)}.

(b) Are there vectors c, c′, d, d′ ∈ R3 such that if C = Span{c, c′} and D =Span{d, d′} then C + D = {(x, y, z) | x + 2y + 3z = 0} and C ∩ D =Span{(1, 1,−1), (5,−1,−1)}?

30 ANNE HENKE

7. Bases of Vector Spaces

Throughout this section, let V be a vector space over R. In the previous twosections, we introduced the span of a finite set of vectors, and we studied what itmeans for a finite set of vectors to be linearly independent. These two conceptscome together in the basis of a vector space.

Definition 7.1. Let V be a vector space over R and let {v1, . . . , vn} be elementsin V such that:

(1) V = Span{v1, . . . , vn},(2) {v1, . . . , vn} are linearly independent.

Then we say {v1, . . . , vn} is a basis of V.

Example 7.2. Let ei and vi be defined as in Example 6.2(1) and 6.2(2).

(1) Rn has basis {ei | 1 ≤ i ≤ n}. See Example 6.2(1) and Example 6.9(1).(2) Rn has basis {vi | 1 ≤ i ≤ n}. See Example 6.2(2) and Example 6.9(2).(3) Mm×n(R) has basis {Eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n}. See Example 6.2(4) and

Example 6.9(3).(4) C as R-vector space has basis {1, i}.(5) Claim: Rn[x] has basis {1, x, . . . , xn}.

Proof.

(a) Let f(x) ∈ Rn[x]. Then f(x) = a0 + a1x + · · ·+ anxn for some ai ∈ R

and clearly f(x) ∈ Span{1, x, . . . , xn}.(b) The vectors {1, x, . . . , xn} are linearly independent: Assume λi ∈ R

with λ0 + λ1x + · · ·+ λnxn = 0. Then the polynomial

f(x) := λ0 + λ1x + · · ·+ λnxn

is zero for any x ∈ R. The fundamental theorem of algebra statesthat any polynomial of degree n has over C precisely n roots – rootsof f(x) are by definition those values x ∈ R with f(x) = 0. Over Rsuch a polynomial then has at most n roots. Since R has more thann elements this implies that f(x) is the zero polynomial, that is

λ1 = λ2 = . . . = λn = 0.

Proposition 7.3. Let {v1, . . . , vn} be a basis of a vector space V. Then everyelement v ∈ V has a unique expression as a linear combination of v1, . . . , vn.

Proof. By Definition 7.1, {v1, . . . , vn} is a spanning set. So any v ∈ V isexpressible as:

v = a1v1 + · · ·+ anvn,(1)

for ai ∈ R. Assume v = b1v1 + · · · + bnvn with bi ∈ R is another expression of v.Then

0 = v − v = (a1 − b1)v1 + · · ·+ (an − bn)vn.


Since v1, . . . , vn are linearly independent, this implies ai − bi = 0 for all i. Henceai = bi for all i, and the expression for every element v ∈ V is unique.

Definition 7.4. We call the column vector

a1...

an

defined by Equation (1) the coordinate vector of v with respect to the basis{v1, . . . , vn}, and ai is called the ith coordinate of v.

Proposition 7.5. Let {v1, . . . , vk} be a finite subset of a vector space V . Thefollowing statements are equivalent:

(1) {v1, . . . , vk} is a maximal linearly independent set.(2) {v1, . . . , vk} is a minimal spanning set.(3) {v1, . . . , vk} is a linearly independent spanning set (=a basis).

Proof. We proof that (1) ⇔ (3) and (3) ⇔ (2).

• (3) ⇒ (1): Let {v1, . . . , vk} be a basis of V. By Definition 7.1, then{v1, . . . , vk} is linearly independent. By Definition 7.1, any v ∈ V isexpressible as v = a1v1 + · · ·+akvk, for some ai ∈ R. Hence for any v ∈ V ,the set {v1, . . . , vk, v} is linearly dependent by Proposition 6.8. Hence{v1, . . . , vk} is maximal linearly independent.• (1) ⇒ (3): Assume {v1, . . . , vk} is maximal linearly independent. So in

particular {v1, . . . , vk} is linearly independent. By assumption, for anyv ∈ V which is adjoint to {v1, . . . , vk}, we get a linearly dependent set.Hence there exist ai ∈ R with a1v1 + · · · + akvk + ak+1v = 0 and notall ai are zero. Assume ak+1 = 0. Then a1v1 + · · · + akvk = 0 with notall ai zero. This contradicts to {v1, . . . , vk} being linearly independent.Hence we have ak+1 6= 0. So v = − 1

ak+1(a1v1 + · · ·+ akvk) and hence

v ∈ Span{v1, . . . , vk}. So {v1, . . . , vk} spans V , and hence is a basis of V.• (3) ⇒ (2): We assume {v1, . . . , vk} is a basis of V . Suppose {v1, . . . , vk}

is not minimal as spanning set. Then there exists a vector in {v1, . . . , vk}which is not necessary in order to span V . Say this is vk. Then {v1, . . . , vk−1}spans V and hence in particular vk = c1v1 + · · ·+ck−1vk−1 for some ci ∈ R.But by Proposition 6.8, this defines a linear dependence between the ele-ments {v1, . . . , vk} which contradicts the assumption. Hence {v1, . . . , vk}is a minimal spanning set.• (2) ⇒ (3): We assume that {v1, . . . , vk} is a minimal spanning set. We

need to show that {v1, . . . , vk} is linearly independent. Assume not. Thenthere exist elements ci ∈ R, not all zero, such that c1v1 + · · · + ckvk = 0.We may assume that c1 6= 0. Then v1 = − 1

c1(c2v2 + · · · + ckvk). Hence,

v1 ∈ Span{v2, . . . , vk} which contradicts the fact that {v1, . . . , vk} was aminimal spanning set. So our assumption was wrong and {v1, . . . , vk} isindeed linearly independent.

32 ANNE HENKE

Corollary 7.6. Let V be a vector space over R. If S = {v1, . . . , vr} is spanning V ,then a subset of S forms a basis of V ; this means, there exist indices i1, . . . , in ∈{1, . . . , r} with {vi1 , . . . , vin} ⊆ S is a basis of V.

Proof. This follows from Proposition 7.5(2). Roughly speaking, we need todelete vectors from the spanning set S, until we reach a set T ⊆ S which is aminimal spanning set. To choose a linear dependent vector which can be deletedfrom the spanning set without changing the span of the set, use Proposition 6.8.The details are left to the reader.

Example 7.7.

We demonstrate the idea for the proof of the last corollary on an example. LetV = R3. Define

v1 =

100

, v2 =

010

, v3 =

110

, v4 =

011

, v5 =

101

.

Find a linear dependence involving some (or all) of the five vectors, for examplev3 = v1 + v2. Hence Span{v1, v2, v3, v4, v5} = Span{v1, v2, v4, v5}. We next lookfor a linear dependence of the remaining four vectors, for example v1 − v2 + v4 =v5. Hence Span{v1, v2, v3, v4, v5} = Span{v1, v2, v4}, and the latter is a minimalspanning set.

There are various other ways to obtain a minimal spanning set. Here is one such al-ternative. Note that v1 = v3−v2. Hence Span{v1, v2, v3, v4, v5} = Span{v2, v3, v4, v5}.Next we use that v2 = 1

2(v3+v4−v5). Hence Span{v1, v2, v3, v4, v5} = Span{v3, v4, v5},

and the latter is a minimal spanning set.

Exercise 19. Which of the following system of vectors of R3 are linear indepen-dent, which form a generating system, which are a basis of R3?

(i) {(1,−1, 0), (0, 1,−1);(ii) {(1,−1, 0), (0, 1,−1), (1, 0,−1)};(iii) {(1, a, 1− a) | a ∈ R};(iii) {(1, a, a2) | a ∈ R};(iii) all (x, y, z) with x, y, z ∈ R and x + 2y + 3z = 1.

Exercise 20. Show that the vectors

(a)

(1 00 1

)

,

(0 11 0

)

,

(1 01 0

)

,

(0 11 −1

)

form a basis of M2(R);

(b) 1, 1+x, 1+x+x2, . . . , 1+x+. . .+xn form a basis of Rn[x], the polynomialsof degree at most n in one variable x.

Exercise 21. (a) Consider the vectors v1 = (3, 5, 7), v2 = (2, 1, 2), v3 =(1, 4, 5), v4 = (5, 3, 2) and v5 = (−4,−2,−4) in R3. List all subsetsS ⊆ {v1, v2, v3, v4, v5} which form a basis of R3.


(b) Extend the set {(8, 2, 5), (−3,−5, 9)} to a basis of R3.

Exercise 22. Let V,W be vector spaces over R. Consider the cartesian productV ×W with componentwise addition and scalar multiplication:

(v1, w1) + (v2, w2) = (v1 + v2, w1 + w2), λ · (v1, w1) = (λv1, λw1)

for all v1, v2 ∈ V,w1, w2 ∈W and λ ∈ R (which defines on V ×W a vector spacestructure). Let S be a basis of V and T be a basis of W . Give a basis for thevector space V ×W and justify your answer.

Exercise 23. (i) Let S and T be subsets of a vector space V . Which of thefollowing statements are true? Give reasons.

(a) Span(S ∩ T ) = Span(S) ∩ Span(T );(b) Span(S ∪ T ) = Span(S) ∪ Span(T );(c) Span(S ∪ T ) = Span(S) + Span(T ).

(ii) Let U1, U2 be subspaces of a vector space V , and let X1, X2 be bases of U1

and U2. Which of the following statements are correct? Justify your answer.

(a) X1 ∪X2 generates U1 + U2.(b) X1 ∪X2 is linear independent in U1 + U2.(c) X1 ∩X2 generates U1 ∩ U2.(d) X1 ∩X2 is linear independent in U1 ∩ U2.(e) X1 ⊆ X2 if and only if U1 ⊆ U2.(r) If U1 ∩ U2 = {0} then X1 and X2 are disjoint.

34 ANNE HENKE

8. Steinitz Exchange Procedure

Let V be a vector space over R. We would like to define the dimension of a vectorspace V to be the cardinality of any basis of V . To do this, we need to know thattwo bases of the same vector space have the same cardinality. To prove such aresult, we use the exchange procedure going back to E. Steinitz. In this coursewe only deal with vector spaces that have a finite basis.

Lemma 8.1 (Steinitz Exchange Lemma). Let {v1, . . . , vn} be a basis of a vectorspace V . Let

w = λ1v1 + · · ·+ λnvn(2)

with λi ∈ R. If there exists an index k with 1 ≤ k ≤ n and λk 6= 0, then the set{v1, . . . , vk−1, w, vk+1, . . . , vn} is a basis of V.

Proof. Without loss of generality, we assume k = 1, so λ1 6= 0. We want toshow that {w, v2, . . . , vn} is a basis of V.

(1) Let v ∈ V . Since {v1, . . . , vn} is a basis of V , we have v = µ1v1+ · · ·+µnvn

for some µi ∈ R. Since λ1 6= 0, we get from Equation (2):

v1 = 1λ1

w − λ2

λ1v2 − · · · − λn

λ1vn

and hence

v = µ1(1λ1

w − λ2

λ1v2 − · · · − λn

λ1vn) + µ2v2 + · · ·+ µnvn,

= µ1

λ1w +

(

µ2 − µ1λ2

λ1

)

v2 + · · ·+(

µn − µ1λn

λ1

)

vn.

Hence v ∈ Span{w, v2, . . . , vn} and so Span{w, v2, . . . , vn} = V.(2) To show that {w, v2, . . . , vn} is linearly independent, assume

µw + µ2v2 + · · ·+ µnvn = 0

for µ, µ2, . . . , µn ∈ R. Then

0 = µ(λ1v1 + · · ·+ λnvn) + µ2v2 + · · ·+ µnvn,

= µλ1v1 + (µλ2 + µ2)v2 + · · ·+ (µλn + µn)vn.

Since {v1, . . . , vn} is linearly independent, this implies

µλ1 = µλ2 + µ2 = . . . = µλn + µn = 0.

As λ1 6= 0, it follows µ = 0, and hence µ2 = . . . = µn = 0. So {w, v2, . . . , vn}is linearly independent.

We now apply Steinitz exchange Lemma repeatedly to a set of linearly indepen-dent vectors. We call this repeated use Steinitz exchange procedure.


Proposition 8.2. Let S = {v1, . . . , vn} be a basis of a vector space V and let{w1, . . . , wr} be linearly independent vectors. Then r ≤ n and there are indicesi1, . . . , ir ∈ {1, . . . , n} such that after exchanging in S

vi1 against w1,vi2 against w2,

...vir against wr,

we obtain a set T which is again a basis of V. If we rearrange vectors so thati1 = 1, i2 = 2, . . . , ir = r, then T = {w1, . . . , wr, vr+1, . . . , vn}.

Remark. Note that the inequality r ≤ n is part of the claim.

Proof. By induction on r.

(1) If r = 0, nothing has to be shown (induction beginning). Let r ≥ 1 andassume the statement is true for r − 1. In particular: If {w1, . . . , wr} islinearly independent, then by Lemma 6.7 also {w1, . . . , wr−1} is linearlyindependent. The induction assumption – after rearranging the vectors –then says that {w1, . . . , wr−1, vr, . . . , vn} is a basis of V and r − 1 ≤ n.

(2) We prove next that r ≤ n. By induction assumption we known thatr−1 ≤ n. Assume r−1 = n. In this case (1) says that {w1, . . . , wr−1} is abasis of V. By assumption {w1, . . . , wr−1, wr} is also linearly independentwhich provides a contradiction to Proposition 7.5. Hence r − 1 6= n, andso indeed r ≤ n.

(3) Since {w1, . . . , wr−1, vr, . . . , vn} is a basis of V by induction assumption,we can express wr as a linear combination, say

wr = λ1w1 + · · ·+ λr−1wr−1 + λrvr + · · ·+ λnvn,

with λi ∈ R. Assume λr = . . . = λn = 0. This would give

wr = λ1w1 + · · ·+ λr−1wr−1.

By Proposition 6.8, this contradicts the linear independence of {w1, . . . , wr}.So at least one element of {λr, λr+1, . . . , λn} is non-zero. Without loss ofgenerality assume that λr 6= 0. Using Lemma 8.1 we can exchange vr

against wr, and obtain that {w1, . . . , wr−1, wr, vr+1, . . . , vn} is a basis of V.

We next collect some important consequences of Steinitz exchange procedure.

Theorem 8.3. Any two finite bases of a vector space V have the same numberof elements.

Proof. Assume B = {v1, . . . , vn} is a finite basis of V and B′ is any otherbasis of V with at least n elements. Pick n elements of B′, say {w1, . . . , wn}.Then {w1, . . . , wn} is linearly independent, see Lemma 6.7. Now apply Steinitz’sexchange prodecure: By Proposition 8.2 it then follows that {w1, . . . , wn} is a

36 ANNE HENKE

basis of V . Since {w1, . . . , wn} ⊆ B′, and B′ is also a basis of V , it follows thatB′ = {w1, . . . , wn} and hence B and B′ have both precisely n elements.

Remark. It should be noted that this last proof also indicates that it is not sodifficult to prove that if a vector space V has a finite basis, then any basis of V isfinite. For this purpose we would need to generalise some of the earlier definitions(see the Remark at the end of Section 6) and statements to infinite sets. We calla vector space V with a finite basis finitely generated. In this course we only dealwith finitely generated vector spaces.

Theorem 8.4. Let V be a finitely generated vector space and let S = {w1, . . . , wr} ⊆V be linearly independent. Then V has a basis B with S ⊆ B.

Proof. By assumption, the vector space V has a finite basis, say {v1, . . . , vn}.Apply Proposition 8.2. Then after possibly a suitable rearrangement of the vec-tors, the set B = {w1, . . . , wr, vr+1, . . . , vn} forms a basis of V , and S ⊆ B.

Remark. To prove Theorem 8.4 for infinite families of vectors (for a vector spacewhich is not finitely generated) is more complicated. The result of Theorem 8.4still would be true in the infinite setting, however it would need a not completelyunproblematic tool from set theory, called Zorn’s lemma. Examples of vectorspaces with no finite basis are:

(1) V = R[t] as vector space over R.(2) V = R as vector space over Q.

Example 8.5. We conclude this section with an example on extending a finitelinearly independent set to a basis. To do so we use Theorem 8.4. Let V = R3.Let

v1 =

110

, v2 =

011

.

Let B = {e1, e2, e3} be the canonical basis of V , see Example 7.2(1). Thenusing Steinitz exchange procedure, we can construct a basis B containing {v1, v2}.Write v1 = e1 + e2 + 0 · e3. Then we can exchange e1 in B with v1. Next writev2 = 0 · v1 + e2 + e3. Hence we can exchange e2 with v2. We obtain the basisB′ = {v1, v2, e3} of V . Note that the basis B′ extending the set {v1, v2} to a basisis by no means unique.

Exercise 24. Consider the vector space R4. Let

U1 := Span{(1, 1, 3, 2), (0, 1, 0, 2)}U2 := Span{(2, 2, 4, 0), (2,−1, 3, 1), (2, 1, 1, 1)}.

(a) Find a basis B of U1 ∩ U2.(b) Use the exchange procedure by Steinitz to get a basis Bi of Ui with B ⊆ Bi

for i = 1, 2.(c) Prove that U1 + U2 = R4.


Exercise 25. Consider the vector space R4. Let

E := {(1,−2, 6, 4), (2,−6, 15, 8), (0, 2,−9,−8), (3,−8, 21, 7)}and S = {(0, 0, 1, 0), (0, 0, 0, 1)}.

(i) Show that E is linearly independent. Why is E a generating set for R4?(ii) Use the exchange procedure of Steinitz to get a basis B with S ⊆ B ⊆

S ∪ E.

38 ANNE HENKE

9. Dimension of Vector Spaces and an Application to Sums

Dimension of vector spaces. Given any vector space V , we have not yetshown, that there always exists a basis for V . Indeed to prove existence of abasis is another consequence of the more general version of Theorem 8.4 (see theremark following it): any family S of linearly independent vectors of a vectorspace V (not necessarily finitely generated) can be extended to a basis B of V . Inparticular, if we take S to be the empty set, then S is linearly independent andby the generalised version of Theorem 8.4, S can be extended to a basis B of V .

Theorem 9.1. Every vector space has a basis.

In the case of a finitely generated vector space V , we have seen in Theorem 8.3 andthe remark following it, that any basis of V is finite and of the same cardinality.

Definition 9.2. If a vector space V has a finite basis, then we define the dimen-sion of V as the number of elements in a basis of V. We denote the dimension ofV shortly by dim(V ) or dim V and say that V is finite dimensional. If V has nofinite basis, we call V infinite dimensional and write dim V =∞.

Example 9.3. Compare the following with the Examples in 7.2.

(1) dim(Rn) = n.(2) dim(Mm×n(R)) = m · n.(3) Let V = C be a vector space over C. Then dim(V ) = 1.(4) Let V = C be a vector space over R. Then dim(V ) = 2.(5) dim(Rn[X]) = n + 1.(6) dim(R[X]) =∞.(7) Let V = R be a vector space over Q. Then dim(V ) =∞.(8) Define R∞ = {(ai) | (ai) = (a1, a2, . . . )}, the vector space of sequences of

real numbers. This is a vector space over R, and dim(R∞) =∞.

Remark 9.4. Let V be a finite dimensional vector space and let W be a subspaceof V. As a consequence of Theorem 8.4 we have:

(1) dim W ≤ dim V,(2) dim W = dim V if and only if V = W.

Remark 9.4(2) is not true for infinite dimensional vector spaces: the vector spaceW of polynomials is a subspace of the vector space V of continuous functions anddim(W ) = dim(V ) =∞.

Sums and intersections of subspaces. The second part of this section dealswith an important dimension formula. Recall from Section 5:

Proposition 9.5. Let V be a vector space and let U,W be subspaces of V . ThenU ∩W is a subspace of V .

Proof. We show this by using Lemma 5.3, the second subspace test. Since U,Ware subspaces, then 0 ∈ W and 0 ∈ U , so 0 ∈ U ∩ W , hence U ∩W 6= ∅. Ifv1, v2 ∈ U ∩W , then v1, v2 ∈ U and v1, v2 ∈ W . Let a1, a2 ∈ R. Since U is a


subspace, we know that a1v1 + a2v2 ∈ U . Similarly since W is a subspace, weknow that a1v1 + a2v2 ∈ W . Therefore a1v1 + a2v2 ∈ U ∩W . So by Lemma 5.3,U ∩W is a subspace of V .

Proposition 9.6. Let V be a vector space and let U,W be subspaces of V . ThenU + W is a subspace of V .

Proof. We will use Lemma 5.3, the second subspace test. Since U,W are bothsubspaces, then 0 ∈ U and 0 ∈ W , so 0 + 0 = 0 ∈ U + W . Hence U + W 6= ∅.Let v1, v2 ∈ U + W , then v1 = u1 + w1 for some u1 ∈ U and w1 ∈ W . Similarly,v2 = u2 + w2 for some u2 ∈ U and w2 ∈W . So

a1v1 + a2v2 = a1(u1 + w1) + a2(u2 + w2)= (a1u1 + a2u2) + (a1w1 + a2w2) =: v say,

Since u1, u2 ∈ U and U is a subspace, we have u := a1u1 + a2u2 ∈ U . Similarlyw := a1w1 + a2w2 ∈W . Hence v = u + w ∈ U + W , and so by Lemma 5.3 we seethat U + W is a subspace of V .

Dimension formula for sums of subspaces. Let V be a finite dimensional vec-tor space and let U,W be subspaces of V . Our aim is to determine the dimensionof U +W . Recall from Example 5.9 that in general dim(U +W ) 6= dim U +dim W .

Theorem 9.7. Let V be a finite dimensional vector space over R. Let U,W besubspaces of V. Then dim(U + W ) = dim(U) + dim(W )− dim(U ∩W ).

Proof. (a) Let {v1, . . . , vn} be a basis of U ∩W.

• By Theorem 8.4, we can extend the basis of U ∩W to a basis of U. Thismeans there exist elements {u1, . . . , us} ∈ U such that {v1, . . . , vn, u1, . . . , us}is a basis of U.• By Theorem 8.4, we can extend the basis of U ∩W to a basis of W. This

means there exist elements {w1, . . . , wt} ∈W with {v1, . . . , vn, w1, . . . , wt}is a basis of W.

We will prove in (b) and (c) below that B = {v1, . . . , vn, w1, . . . , wt, u1, . . . , us} isa basis of U + W. This indeed then would imply that

dim(U + W ) = n + s + t = (n + s) + (n + t)− n

= dim U + dim W − dim(U ∩W ).

(b) Claim: Span(B) = U + W.

Let v ∈ U + W . Then v = u + w for some u ∈ U , w ∈ W. Since u ∈ U, u equalsa linear combination of the above basis of U : there exist ai, bj ∈ R with

(3) u = a1v1 + · · ·+ anvn + b1u1 + · · ·+ bsus.

Similarly, since w ∈ W , w is expressible as a linear combination of the above basisof W. So there exist ci, dj ∈ R with

(4) w = c1v1 + · · ·+ cnvn + d1w1 + · · ·+ dtwt.

40 ANNE HENKE

By Equations (3) and (4): v = u + w = (a1 + c1)v1 + · · · + (an + cn)vn + b1u1 +· · ·+ bsus +d1w1 + · · ·+dtwt. Hence v is a linear combination of the vectors in B.Hence U + W ⊆ Span(B). Conversely, any element in the Span(B) lies in U + Was every vector z ∈ B lies in U + W.

(c) Claim: B is linearly independent.

Assume B is linearly dependent. Then we have a relation between the vectors inB, say

(5)n∑

i=1

aivi +s∑

i=1

biui +t∑

i=1

ciwi = 0,

and not all coefficients are zero. Then:

(6) −t∑

i=1

ciwi =n∑

i=1

aivi +s∑

i=1

biui =: z.

Since z is a linear combination of w1, . . . , wt (which is a subset of W ), it followsthat z ∈ W. Since z is a linear combination of v1, . . . , vn, u1, . . . , us (which is abasis of U), it follows that z ∈ U. So z ∈ U ∩W. Hence z is expressible in termsof a basis of U ∩W. So in particular, z is expressible with respect to v1, . . . , vn,say z = f1v1 + · · ·+ fnvn for some fi ∈ R. Hence:

z =n∑

i=1

fivi = −t∑

i=1

ciwi by Equation (6).

It follows:∑n

i=1 fivi +∑t

i=1 ciwi = 0. But v1, . . . , vn, w1, . . . , wt are a basis of W.Hence these vectors are linear independent, i.e.

fi = 0 for 1 ≤ i ≤ n, ci = 0 for 1 ≤ i ≤ t.

So z = 0 and hence Equation (6) reads: 0 =∑n

i=1 aivi +∑s

i=1 biui. Since thevectors {v1, . . . , vn, u1, . . . , us} form a basis of U, this implies

ai = 0 for 1 ≤ i ≤ n, bi = 0 for 1 ≤ i ≤ s.

As all coefficients in Equation (5) are zero, B is linearly independent.

Example 9.8.

Let V = M3(R), and define

U = {A | A = (aij) with aij = 0 for i > j} ⊆ V,

W = {B | B = (bij) with bij = 0 for i < j} ⊆ V.

So

A =

a11 a12 a13

0 a22 a23

0 0 a33

∈ U, B =

b11 0 0b21 b22 0b31 b32 b33

∈W.

Then

(1) U,W are subspaces of V. Note that dim V = 9, dim U = 6, dim W = 6.(2) U ∩W = {C | C = diag(c11, c22, c33)} ⊆ V. So dim(U ∩W ) = 3.


(3) U + W = {A + B | A ∈ U,B ∈W} = V = M3(R).

We can verify the dimension formula from Theorem 9.7:

dim(U + W ) = dim V = 9 = 6 + 6− 3

= dim U + dim W − dim(U ∩W ).

Direct sums. Let V be a finite dimensional vector space with subspaces U andW . Sometimes the formula in Theorem 9.7 holds without the correction termdim(U ∩W ), see Example 5.8. This gets an own name.

Definition 9.9. A vector space V is called a direct sum of the subspaces U andW, written as V = U ⊕W, if the following holds:

(D1) V = U + W,(D2) U ∩W = {0}.

Often some other characterisations of this phenomenon are useful:

Proposition 9.10. Let U,W be subspaces of a vector space V. Then the followingare equivalent:

(1) V = U ⊕W,(2) For every v ∈ V there is a unique u ∈ U and w ∈ W with v = u + w.

Proof. (1) ⇒ (2): We only need to show uniqueness. Let v = u + w and alsov = u′+w′ for u, u′ ∈ U , w,w′ ∈W. Then u+w = u′+w′ and so w−w′ = u′−u. Butw−w′ ∈W and u′−u ∈ U as U,W are subspaces. Hence w−w′ = u′−u ∈ U∩W.As U ∩W = {0} this implies w − w′ = 0 = u′ − u. So w = w′ and u = u′.

(2) ⇒ (1): By assumption (D1) holds. We show (D2). Assume v ∈ U ∩W. Thenv ∈ U and v ∈ W . Since U,W are subspaces, 0 ∈ U and 0 ∈ W . Note thatv = 0 + v with 0 ∈ U, v ∈ W ; and v = v + 0 with v ∈ U, 0 ∈ W . Both theseexpressions for v ∈ U + W need to be the same by assumption (2). Hence v = 0and so U ∩W = {0}.

Proposition 9.11. For subspaces U,W of a finite dimensional vector space Vthe following conditions are equivalent:

(1) V = U ⊕W ;(2) V = U + W and dim V = dim U + dim W ;(3) U ∩W = {0} and dim V = dim U + dim W.

Proof. (1) ⇒ (2): Follows from Theorem 9.7 and Definition 9.9.(2)⇒ (3): By Theorem 9.7 it follows that dim(U ∩W ) = 0, and so U ∩W = {0}.(3) ⇒ (1): Use Theorem 9.7 and the assumption in (3) to get:

dim(U + W ) = dim U + dim W − dim(U ∩W )

= dim U + dim(W ) = dim(V ).

Note U + W ≤ V. By Remark 9.4(2) it follows that V = U + W.

42 ANNE HENKE

Exercise 26. A magical square is a table with nine digits with the followingproperties: the sum of all numbers in each row, and in each column, and in eachdiagonal is equal. This number is called the magical number. For example,

4 3 89 5 12 7 6

and the magical number is 15, the number in the center of the square is 5. Considerthe set of all magical squares with entries from the set of real numbers R.

(a) Show that the magical squares form a vector space.(b) Show that the magical number is always three times the number in the

center of the square.(c) Find a basis of the vector space of magical squares and determine its

dimension.

Exercise 27. Let Mn(R) be the set of all n× n matrices over R.

(a) Compute the dimension of Mn(R). Show that Mn(R) has a basis withthe property that each matrix in the basis is either symmetric or skew-symmetric.

(b) Compute the dimension of the subspace of Mn(R) consisting of all diagonalmatrices.

(c) Compute the dimension of the subspace of Mn(R) consisting of all matricesof zero trace (that is, where the sum of the diagonal entries is zero).

Exercise 28. (a) Let U and V be two subspaces of R2n−1 and let dim(U) =dim(V ) = n. Prove that U ∩ V 6= {0}.

(b) Let X,Y, Z be subspaces of a vector space V . Is the following formulacorrect:

dim(X + Y + Z) = (dimX + dimY + dimZ)

−(dimX ∩ Y + dimY ∩ Z + dimZ ∩X) + dimX ∩ Y ∩ Z?

(c) Given are three two-dimensional subspaces U1, U2, U3 of a vector space Vsuch that the intersection of each two subspaces is one dimensional. Whichdimensions can occure as dim(U1 + U2 + U3).

Exercise 29. Let V be a vector space of dimension n over R.

(a) Prove that for each r such that 0 ≤ r ≤ n, V contains a subspace ofdimension r.

(b) Let U,W be subspaces of V with U ⊆ W . Show that there exists asubspace W ′ in V such that W ∩W ′ = U and W + W ′ = V .

Exercise 30. Let U := Span{(1, 2, 1, 0), (2, 3, 2, 2), (0,−1, 0, 2)}. Determine avector space V ≤ R4 such that U ⊕ V = R4.


10. Linear Transformations

This course deals with finite dimensional vector spaces only. From now on wealways assume the vector spaces under consideration to be finite dimensional,without explicitely saying so. Throughout this section, let V,W be (finite dimen-sional) vector spaces over R. Assume that T is a map from V to W , where weconsider V and W as sets. If T respects the structure of the underlying vectorspaces, then T is called linear transformation. More precisely:

Definition 10.1. Let V,W be vector spaces over R. Then a map T : V → W issaid to be a linear transformation (or a linear map) if and only if:

(L1) T (v1 + v2) = T (v1) + T (v2) ∀v1, v2 ∈ V,(L2) T (λv) = λT (v) ∀λ ∈ R,∀v ∈ V.

Note that (L1) and (L2) are equivalent to requesting:

(L) T (λ1v1 + λ2v2) = λ1T (v1) + λ2T (v2) ∀λ1, λ2 ∈ R,∀v1, v2 ∈ V.

Remarks. (a) Instead of T (v) we also write Tv.(b) Note that λ·v is scalar multiplication in V , while λ·T (v) is scalar multiplicationin W. Note that v1 + v2 is addition in V and T (v1) + T (v2) is addition in W .

Example 10.2. Let T : R2 → R2 be given by T (

(x1

x2

)

) =

(x2

x1

)

. We verify

(L1) and (L2) :

T (

(x1

x2

)

+

(y1

y2

)

) = T (

(x1 + y1

x2 + y2

)

) definiton of + in R2,

=

(x2 + y2

x1 + y1

)

definition of T ,

=

(x2

x1

)

+

(y2

y1

)

definition of + in R2,

= T (

(x1

x2

)

) + T (

(y1

y2

)

) definition of T .

T (λ

(xy

)

) = T (

(λxλy

)

) definition of scalar multiplication,

=

(λyλx

)

definition of T ,

= λ

(yx

)

definition of scalar multiplication,

= λT (

(xy

)

) definition of T .

Hence (L1) and (L2) hold, and so T is a linear transformation.

Example 10.3. (1) Let T : R2 → R2 be defined by T (

(xy

)

) =

(x0

)

.

This is a linear transformation, called the projection onto the first coordi-nate (or x-axis).

44 ANNE HENKE

(2) Let f(x) = a0 + a1x + · · · + anxn. We define T : Rn[x] → Rn−1[x] by

differentiating: Tf(x) = ddx

f(x) = a1+2a2x+· · ·+nanxn−1. We can verifythat T (λf) = λT (f) and T (f1 + f2) = Tf1 + Tf2 for all f, f1, f2 ∈ Rn[X],and λ ∈ R. Hence T is a linear transformation.

(3) Similarly, taking T to be the operation of integration, then T : Rn−1[x]→Rn[x] with T (a0 + a1x + · · · + an−1x

n−1) = a0x + a1x2

2+ · · · + an−1xn

nis a

linear transformation.(4) The zero map 0 : V → {0} with v 7→ 0 for all v ∈ V is linear. The identity

map idV : V → V with v 7→ v for all v ∈ V is linear.(5) Fix vector v0 ∈ V with v0 6= 0. The constant map T : V → V given by

v 7→ v0 for all v ∈ V is not linear.

Example 10.4. Fix a square matrix M ∈ Mn(R). Let T : Mn(R) → Mn(R) begiven by multiplication with M from the right, that is T (A) = AM . Then T is alinear transformation.

Proof. Use Proposition 2.11 and the definition of T . We have

T (A1 + A2) = (A1 + A2)M = A1M + A2M = T (A1) + T (A2).

T (λA) = (λA)M = λ(AM) = λT (A).

So (L1) and (L2) hold for T . Hence T is linear.

Let us collect some properties of linear maps.

Lemma 10.5. Let T : V → W be linear. Then

(1) T (0) = 0,

(2) T (v − w) = T (v)− T (w) for v, w ∈ V ,

(3) T

(r∑

i=1

λivi

)

=r∑

i=1

λiT (vi) for vi ∈ V, λi ∈ R.

Proof. We have

T (0V ) = T (0R · 0V ) by Lemma 4.2,

= 0R · T (0V )︸︷︷︸

∈W

by (L2),

= 0W by Lemma 4.2.

Similarly, as v − w = v + (−w) we have,

T (v − w) = T (v + (−1)w) by Lemma 4.2,= T (v) + (−1)T (w) by (L),= T (v)− T (w) by Lemma 4.2.

For the last statement do induction on r using (L) .

Lemma 10.6. Let T : V → W be linear.


(1) If {v1, . . . , vr} is linearly dependent in V, then {Tv1, . . . , Tvr} is linearlydependent in W.

(2) If {Tv1, . . . , Tvr} is linearly independent in W , then {v1, . . . , vr} is linearlyindependent in V.

Remark. The converse of this Lemma is wrong. However, if T is injective, thenthe converse holds.

Proof. Assume λ1v1 + · · ·+ λrvr = 0 with not all λi zero. Then applying T andusing Lemma 10.5 we get 0 = T (0) = T (λ1v1+· · ·+λrvr) = λ1T (v1)+· · ·+λrT (vr)with not all λi zero. Hence {T (v1), . . . , T (vr)} is linearly dependent. The secondstatement is shown in the same way.

Proposition 10.7. (1) Let S : U → V and T : U → V be linear, λ ∈ R.Then λT and S + T are linear.

(2) Let T : U → V and S : V → W be linear. Then the composition S ◦ T :U →W is linear.

(3) Let T : V → W be linear. If the inverse map T−1 of T exists thenT−1 : W → V is again linear.

Proof. Let λ, µ ∈ R and let u, v ∈ U . Since S is linear we have S(λu + µv) =λS(u) + µS(v). Since T is linear we have T (λu + µv) = λT (u) + µT (v). Bydefinition of addition of maps we have (S + T )(x) = S(x) + T (x) for any x ∈ U .Hence

(S + T )(λu + µv) = S(λu + µv) + T (λu + µv)

= λS(u) + µS(v) + λT (u) + µT (v)

= λ · (S(u) + T (u)) + µ · (S(v) + T (v))

= λ · (S + T )(u) + µ · (S + T )(v).

Hence S + T is linear. The proof of the other statements is left as an exercise tothe reader.

Remark. The set of all R-linear maps from a vector space U to a vector spaceV is denoted by HomR(U, V ). The set HomR(U, V ) is in fact an R-vector spacewhere the first property of the last proposition gives the scalar multiplicationand addition of this vector space. If one considers the set HomR(U,U) – alsodenoted by EndR(U) – then this is a so-called ring (a slightly more general objectthan what a field is). The second property of the last proposition defines themultiplication of this ring. Rings are studied in the second year algebra course.

Theorem 10.8. Let V,W be vector spaces over R. Let {v1, . . . , vn} be a basisof V and {w1, . . . , wn} a set of vectors in W. Then there is precisely one lineartransformation T : V →W with T (vi) = wi for 1 ≤ i ≤ n. Moreover

(1) im(T ) := {Tv | v ∈ V } = Span{w1, . . . , wn},(2) T is injective if and only if {w1, . . . , wn} is linearly independent.

46 ANNE HENKE

Proof. (a) Given v ∈ V , write v = λ1v1 + · · ·+ λnvn for λi ∈ R. By Proposition7.3 the scalars λi are uniquely determined. Define T (v) =

∑n

i=1 λiwi. Then T islinear with T (vi) = wi. Hence such a linear transformation exists. Moreover, forany linear transformation, T : V → W with T (vi) = wi we have that T (v) =∑n

i=1 λiwi. Hence the map T is uniquely determined.

(b) We next show part (1) of the claim. Let w ∈ Span{w1, . . . , wn}. Then w =∑n

i=1 λiwi for some λi ∈ R. Since T (vi) = wi we have

w =n∑

i=1

λiwi =n∑

i=1

λiT (vi)

= T (λ1v1 + · · ·+ λnvn),

and hence for v = λ1v1 + · · ·+ λnvn we have Tv = w. So w ∈ im(T ). This showsSpan{w1, . . . , wn} ⊆ im(T ). Conversely, if v ∈ V , say v =

∑n

i=1 λivi, then Tv =∑n

i=1 λiwi, and hence Tv ∈ Span{w1, . . . , wn} and so im(T ) ⊆ Span{w1, . . . , wn}.(c) Assume that T is injective. This means whenever Tu = Tv for some u, v ∈ Vthen u = v. We show that {w1, . . . , wn} is linearly independent. Consider

0 = λ1w1 + . . . + λnwn.(7)

We need to show that λi = 0 for 1 ≤ i ≤ n. As T (vi) = wi, Equation (7) equals0 = λ1T (v1)+. . .+λnT (vn). Since T is linear, this implies 0 = T (λ1v1+. . .+λnvn).By Lemma 10.5 we have T (0) = 0, and hence T (0) = T (λ1v1 + . . . + λnvn). As Tis injective, this implies 0 = λ1v1 + . . . + λnvn. As {v1, . . . , vn} is a basis, it is inparticular linearly independent. Hence λi = 0 for 1 ≤ i ≤ n.

(d) Next, let {w1, . . . , wn} be linearly independent. Assume that Tu = Tv forsome u, v ∈ V . Since {v1, . . . , vn} is a basis, there exist λi, µi ∈ R with

u =n∑

i=1

λivi, v =n∑

i=1

µivi.

Applying T to u and v and using that Tvi = wi we get

Tu =

n∑

i=1

λiwi, Tv =

n∑

i=1

µiwi.

Since Tu = Tv this implies 0 = Tu− Tv =∑n

i=1(λi − µi)wi. Since {w1, . . . , wn}is linearly independent, this implies that λi−µi = 0. Hence λi = µi for 1 ≤ i ≤ n.Hence u = v, and so T is injective.

It should be noted that in Theorem 10.8 the assumption that {v1, . . . , vn} is a basisis very important. To see this the reader is advised to try out examples where{v1, . . . , vn} is either a linearly dependent set of vectors or it is not spanning V .

Example 10.9. (a) Choose vectors

v1 =

(11

)

, v2 =

(22

)

, w1 =

(10

)

, w2 =

(01

)

.


Note that {v1, v2} is linearly dependent and is not spanning V . It is easily seenthat for the vectors choosen above, there is no linear map T : R2 → R2 withT (vi) = wi for i = 1, 2.

(b) Choose vectors

v1 =

(11

)

, v2 =

(22

)

, w1 =

(10

)

, w2 =

(20

)

.

Define

T (

(xy

)

) =

(x0

)

,

S(

(xy

)

) =

(x

x− y

)

.

Then both S and T are linear with T (vi) = wi and S(vi) = wi.

Exercise 31. Which of the following mappings T : R3 → R3 are linear transfor-mations:

(i) T (x, y, z) = (y, z, 0);(ii) T (x, y, z) = (|x|,−z, 0);(iii) T (x, y, z) = (x− 1, x, y).(iv) T (x, y, z) = (2x, y − 2, 4y).

Exercise 32. Let T : U → V be a linear transformation between vector spacesU, V , let λ ∈ R. Show that λT is a linear transformation.

Exercise 33. Let U, V,W be vector spaces over R, and T : U → V and S : V →W be linear transformations.

(a) Show that the composition S ◦ T : U →W is linear.(b) Show that if the inverse map S−1 of S exists, then S−1 : W → V is linear.

48 ANNE HENKE

11. The Rank-Nullity Theorem

Let V and W be finite dimensional vector spaces over R. In this section wecontinue our study of linear maps between vector spaces. We derive in particularthe important rank-nullity theorem.

Definition 11.1. Let T : V → W be a linear transformation.

(1) We define ker(T ) = {v ∈ V | Tv = 0}, called the kernel of T.(2) We define im(T ) = {Tv | v ∈ V }, called the image of T.

Proposition 11.2. Let T : V → W be linear. Then ker(T ) is a subspace of V ,and im(T ) is a subspace of W.

Proof.

(1) Let v1, v2 ∈ ker(T ). Then Tv1 = 0 and Tv2 = 0. Let λ1, λ2 ∈ R. Then

T (λ1v1 + λ2v2) = λ1T (v1) + λ2T (v2) by (L),

= λ1 · 0 + λ2 · 0= 0 + 0 = 0 by Lemma 4.2.

By Lemma 5.3, it follows that ker(T ) is a subspace.(2) Let w1, w2 ∈ im(T ). Then there exist v1, v2 ∈ V with Tv1 = w1 and

Tv2 = w2. Let λ1, λ2 ∈ R. Then

T (λ1v1 + λ2v2) = λ1T (v1) + λ2T (v2) by (L),

= λ1w1 + λ2w2.

Hence λ1v1 + λ2w2 ∈ im(T ). By Lemma 5.3, it follows that im(T ) is asubspace.

Proposition 11.3. For a linear map T : V → W we have: T is injective ⇐⇒ker(T ) = {0}.

Proof. “⇒”: Assume T is injective. Let v ∈ ker(T ). Then T (v) = 0, and henceby Lemma 10.5: T (v) = T (0). Since T is injective, this implies v = 0. Henceker(T ) = {0}.“⇐”: Let v1, v2 ∈ V with T (v1) = T (v2). By Lemma 10.5:

T (v1 − v2) = T (v1)− T (v2) = 0.

So v1 − v2 ∈ ker(T ) = {0}. Hence v1 = v2, which proves that T is injective.

Remark. A linear map T : V → W is injective, if and only if whenever {v1, . . . , vr} ⊆V is linearly independent, then {Tv1, . . . , Tvr} is linearly independent. The proofis left as an exercise to the reader. Compare with Lemma 10.6


Definition 11.4. Let T : V → W be linear. We define the nullity of T to bedim(kerT ) and write n(T ). We define the rank of T to be dim(imT ) and writerk(T ).

Example 11.5. Define T : R3 → R2 by

T (

xyz

) =

(x0

)

.

Note that T is linear and

imT =

{(x0

)

| x ∈ R

}

, kerT =

0yz

| y, z ∈ R

.

Hence rk(T ) = dim(imT ) = 1, and n(T ) = dim(kerT ) = 2. Note that

dim(R3) = 3 = 1 + 2 = rk(T ) + n(T ).

Theorem 11.6 (The Rank-Nullity Theorem). Let T : V → W be a linear trans-formation of finite dimensional vector spaces over R. Then

dim(V ) = rk(T ) + n(T ).

Proof.

(1) By Lemma 11.2, we know that ker(T ) is a subspace of V. Let {u1, . . . , uk}be a basis of ker(T ). So n(T ) = k. By Proposition 8.2, we can ex-tend {u1, . . . , uk} to a basis of V , say {u1, . . . , uk, uk+1, . . . , un} wheren = dim V. We show that {Tuk+1, . . . , Tun} is a basis of im(T ), thatis rk(T ) = n− k. Then indeed rk(T ) + n(T ) = dim(V ).

(2) Let w ∈ im(T ). Then there exists v ∈ V with Tv = w. Write v =∑n

i=1 aiui

with ai ∈ R. Then

w = T (v) = T (n∑

i=1

aiui)

=n∑

i=1

aiT (ui) since T is linear,

=n∑

i=k+1

aiT (ui) since T (uj) = 0 for j ≤ k.

Hence w ∈ Span{Tuk+1, . . . , Tun}, and so im(T ) ⊆ Span{Tuk+1, . . . , Tun}.Since Tui ∈ im(T ) for k + 1 ≤ i ≤ n, this implies that

Span{Tuk+1, . . . , Tun} = im(T ).

(3) Assume we have 0 = λk+1Tuk+1+· · ·+λnTun for some λi ∈ R with k+1 ≤i ≤ n. By linearity of T , this implies that 0 = T (λk+1uk+1 + · · · + λnun).Let z := λk+1uk+1 + · · ·+ λnun, then z ∈ ker(T ). Hence we can express zas a linear combination of the basis elements of ker(T ), say

λk+1uk+1 + · · ·+ λnun = z = λ1u1 + · · ·+ λkuk.

50 ANNE HENKE

This implies: 0 = −λ1u1 − · · · − λkuk + λk+1uk+1 + · · · + λnun. Since{u1, . . . , un} is linearly independent (it is a basis of V ), this implies that

λ1 = λ2 = . . . = λn = 0.

Hence {Tuk+1, . . . , Tun} is linearly independent.

Corollary 11.7. Between finite dimensional vector spaces V and W , there existsa bijective linear map (called isomorphism) T : V → W if and only if dim V =dim W.

Proof. Assume V and W are of dimension n. By Theorem 10.8, there existsa bijective linear map, mapping a basis element vi to a basis element wi for1 ≤ i ≤ n.

Conversely, assume there exists a bijective linear map T : V → W . So in partic-ular, T is injective, and hence Proposition 11.3 implies ker(T ) = {0}. Since T issurjective, im(T ) = W . By Theorem 11.6, dim(V ) = dim(imT ) = dim(W ).

Exercise 34. Describe the kernel and image of each of the following linear trans-formations, and in each case give the rank and nullity of the transformation:

(a) T : R4 → R3 is given by T (x) = Ax for x a column vector in R4, andwhere A is the matrix

1 −1 1 11 2 −1 10 3 −2 0

.

(b) V is the vector space of all polynomials in x of degree ≤ n, and T : V → Vis given by differentiation with respect to x.

(c) V = Mn(R), and T : V → R is given by T (A) = tr(A) =∑n

i=1 aii forA = (aij) ∈ V .

(d) Determine the dimension of ker(f1) ∩ ker(f2) and ker(f1) + ker(f2) forthe following linear maps from R5 → R1:

f1(x1, x2, x3, x4, x5) = x2 + 5x4, f2(x1, x2, x3, x4, x5) = x1 + 2x2 + x3 + 10x4 − x5.

Exercise 35. Let V be a vector space of dimension n ≥ 1. If T : V → V is alinear transformation, prove that the following statements are equivalent:

(a) im(T ) = ker(T ); (b) T 2 = 0, n is even and rk(T ) = 12n.

Exercise 36. Let T : R3 → R3 be a linear transformation. Show that im(T 2) ⊆im(T ) and that ker(T ) ⊆ ker(T 2). Prove the equivalence of the following state-ments:

(a) R3 = ker(T )⊕ im(T );(b) ker(T ) = ker(T 2);(c) im(T ) = im(T 2).


(We write R3 = ker(T ) ⊕ im(T ) if R3 = ker(T ) + im(T ) and ker(T ) ∩ im(T ) ={0}.)Exercise 37. Define the vectors a1 = (1, 0, 0), a2 = (0, 1, 0), a3 = (0, 0, 1) a4 =(2, 1, 3), b1 = (1, 2, 4, 1), b2 = (1, 1, 0, 1), b3 = (−1, 0, 4,−1) and b4 = (0, 5, 20, 0).

(i) Show that there is precisely one linear map f : R3 → R4 with f(ai) = bi

for i = 1, 2, 3, 4.(ii) Describe the kernel and the image of f and give the rank and the nullity

of f .

Exercise 38. Let V be an n-dimensional vector space and let S and T be lineartransformations on V .

(a) Prove that nullity(ST ) ≤ nullity(S)+ nullity(T ).(b) If Sn = 0 but Sn−1 6= 0, then determine nullity(S).

52 ANNE HENKE

12. The Matrix Representation of a Linear Transformation

Let T : V → W be a linear transformation between finite dimensional vectorspaces V and W . Let B1 = {v1, . . . , vn} be a basis of V, B2 = {w1, . . . , wm} be abasis of W. Define (using so-called column convention) a matrix

A = (aij) =

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

am1 am2 . . . amn

∈Mm×n(R)

by expressing T (vi) as a linear combination in the basis elements of B2:

T (v1) = a11w1 + · · ·+ am1wm

...

T (vi) = a1iw1 + · · ·+ amiwm

...

T (vn) = a1nw1 + · · ·+ amnwm.

Note that by Proposition 7.3, the coefficients aij are uniquely determined. Hencethe matrix A is well-defined.

Definition 12.1. Matrix A is called the matrix of T (or the matrix representingT , or corresponding to T ) with respect to bases B1 and B2. Write A = MB1

B2(T ).

Example 12.2. Let T : Rn[x] → Rn[x] be differentiation. Let B1 = B2 ={1, x, . . . , xn}. Then

T (1) = 0 = 0 · 1 + 0 · x + · · ·+ 0 · xn,

T (x) = 1 = 1 · 1 + 0 · x + · · ·+ 0 · xn,

T (x2) = 2x = 0 · 1 + 2 · x + · · ·+ 0 · xn,...

T (xn) = nxn−1 = 0 · 1 + 0 · x + · · ·+ nxn−1 + 0 · xn.

Hence the matrix of T with respect to B1 and B2 is

A =

0 1 0 . . . 0

0 0 2 . . ....

0 0 0. . .

......

......

. . . n0 0 0 . . . 0

.

Recall the definition of a coordinate vector, given in the remark after Proposi-tion 7.3.


Proposition 12.3. Let T : V → W be linear. Let x be the coordinate vector ofv ∈ V with respect to a basis B1. Then the coordinate vector of T (v) with respectto a basis B2 is MB1

B2(T )x.

Proof. Let MB1

B2(T ) = A = (aij) with B1 = {v1, . . . , vn} and B2 = {w1, . . . , wm}.

By assumption v =∑n

i=1 xivi for xi ∈ R. Let x = (x1, . . . , xn)T be the coordinatevector of v with respect to B1. Then

T (v) =n∑

i=1

xiT (vi)

=n∑

i=1

xi

m∑

j=1

ajiwj

=m∑

j=1

(n∑

i=1

ajixi

)

wj

=m∑

j=1

(jth entry of Ax

)· wj .

Hence the coordinate vector of Tv with respect to basis B2 is Ax.

Theorem 12.4. If S : U → V and T : U → V are linear with BU basis of U andBV basis of V, then S + T and λT are linear with

MBU

BV(S + T ) = MBU

BV(S) + MBU

BV(T ),

MBU

BV(λT ) = λMBU

BV(T ).

Proof. Follows from Proposition 10.7 and Definition 12.1 The details are left asan exercise to the reader.

Theorem 12.5. Let U, V,W be vector spaces with basis B1,B2 and B3 respectively.Let T : U → V and S : V → W be linear. Then ST is linear with

MB1

B3(ST ) = MB2

B3(S) ·MB1

B2(T ).

Proof. To check that ST is linear is left as an exercise to the reader (seeProposition 10.7). Let

B1 = {u1, . . . , un} be a basis of U,

B2 = {v1, . . . , vm} be a basis of V,

B3 = {w1, . . . , wk} be a basis of W.

54 ANNE HENKE

Let A = MB1

B2(T ) = (aij), so

T (u1) = a11v1 + · · ·+ am1vm

...

T (ui) = a1iv1 + · · ·+ amivm(8)...

T (un) = a1nv1 + · · ·+ amnvm.

Let B = MB2

B3(S) = (bij), with

S(v1) = b11w1 + · · ·+ bk1wk

...(9)

S(vm) = b1mv1 + · · ·+ bkmwk.

Let C = MB1

B3(ST ) = (cij), with

(ST )(u1) = c11w1 + · · ·+ ck1wk

...

(ST )(ui) = c1iw1 + · · ·+ ckiwk(10)...

(ST )(un) = c1nw1 + · · ·+ cknwk.

Then

(ST )(ui) = S(Tui)= S(a1iv1 + . . . + amivm) by Equation (8),= a1iS(v1) + . . . + amiS(vm) since S is linear,= a1i(b11w1 + . . . + bk1wk) by Equation (9),

+a2i(b12w1 + . . . + bk2wk)...+ami(b1mw1 + . . . + bkmwk)

= (a1ib11 + a2ib12 + . . . + amib1m)w1 (by reordering)+(a1ib21 + a2ib22 + . . . + amib2mw2...+(a1ibj1 + a2ibj2 + . . . + amibjm)wj

...+(a1ibk1 + a2ibk2 + . . . + amibkm)wk

=∑k

j=1 (∑m

l=1 bjlali) wj .

Hence cji =∑m

l=1 bjlali. Hence C = B · A.

Corollary 12.6. Let T : U → V be linear. If T is invertible then T−1 is linearwith

MBV

BU(T−1) = MBU

BV(T )−1.


Proof. Use Theorem 12.5:

I = MBU

BU(idU) = MBU

BU(T−1 ◦ T ) = MBV

BU(T−1) ·MBU

BV(T ).

In the rest of this section, we consider some special cases and applications of theresults obtained so far in this section.

Definition 12.7. Let id : V → V be the identity map, and let B1 and B2 bebases of V. We call MB1

B2(id) the base change matrix associated with the change

of basis from basis B1 to basis B2.

Proposition 12.8. Let V be a vector space. Let x be the coordinate vector of vwith respect to basis B1. Let y be the coordinate vector of v with respect to basisB2. Then y = MB1

B2(id)x.

Proof. Take T = id : V → V in Proposition 12.3.

Theorem 12.9. Let T : V → W be linear. Let BV1and BV2

be bases of V, BW1

and BW2be bases of W. Then M

BV2

BW2

(T ) = MBW1

BW2

(id) ◦MBV1

BW1

(T ) ◦MBV1

BV2

(id)−1.

Proof. Consider the composition of maps T = idW ◦T ◦ idV with respect to thefollowing bases:

Vid−−→ V

T−−→ Wid−−→ W.

basis BV2basis BV1

basis BW1basis BW2

Then by Theorem 12.5 and Corollary 12.6 the claim follows:

MBV2

BW2

(T ) = MBV2

BW2

(idW ◦ T ◦ idV )

= MBW1

BW2

(idW ) ◦MBV1

BW1

(T ) ◦MBV2

BV1

(idV ) by 12.5,

= MBW1

BW2

(idW ) ◦MBV1

BW1

(T ) ◦MBV1

BV2

(idV )−1 by 12.6.

Example 12.10. (i) Let T : R2 → R3 be given with respect to basis B1 ={u1, u2} and B2 = {v1, v2, v3} as

A = MB1

B2(T ) =

1 20 1−3 −1

.

So by Definition 12.1 we have:

Tu1 = 1 · v1 + 0 · v2 − 3v3,(11)

Tu2 = 2 · v1 + v2 − v3.

56 ANNE HENKE

(ii) We take new bases for R2 and R3, say B3 = {w1, w2} and B4 = {z1, z2, z3}with

w1 = u1 − 2u2,w2 = u1 + u2,

z1 = v1 + v2,z2 = v2 + v3,z3 = v1 + v3.

What is MB3

B4(T )?

(iii) Note

v1 = 12(z1 − z2 + z3),

v2 = 12(z2 − z3 + z1),(12)

v3 = 12(z3 + z2 − z1).

Then

Tw1 = Tu1 − 2Tu2

= (v1 − 3v3)− 2(2v1 + v2 − v3) by Eq. (11),

= −3v1 − 2v2 − v3

= −32(z1 − z2 + z3)− (z2 − z3 + z1)− 1

2(z3 + z2 − z1) by Eq. (12),

= −2z1 + 0z2 − 1z3

Similarly,

Tw2 = Tu1 + Tu2

= (v1 − 3v3) + (2v1 + v2 − v3)

= 3v1 + v2 − 4v3

= 32(z1 − z2 + z3) + 1

2(z2 − z3 + z1)− 2(z3 + z2 − z1)

= 4z1 − 3z2 − z3

Hence B = MB3

B4(T ) =

−2 40 −3−1 −1

.

(iv) What is the relationship between A and B? Let Q = MB2

B4(id) and let

P = MB1

B3(id). Then by Equation (12) we have:

Q =1

2

1 1 −1−1 1 1

1 −1 1

.

Moreover, from (ii) we see that

P−1 = MB3

B1(id) =

(1 1−2 1

)

.

It is now easily seen that indeed B = QAP−1.

Exercise 39. Let E2 and E3 denote the canonical bases for R2 and R3 respec-tively, that is E2 = {(1, 0), (0, 1)} and E3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Letf : R2 → R3 and g : R3 → R2 be given by

f(x, y) = (x + 2y, x− y, 2x + y),

g(x, y, z) = (x− 2y + 3z, 2y − 3z).


(a) Determine the matrices ME2

E3(f), ME3

E2(g), ME2

E2(g ◦ f) and ME3

E3(f ◦ g)

representing the linear maps f, g, g ◦ f and f ◦ g with respect to bases E2

and E3.(b) Show that g ◦ f is bijective and determine ME2

E2((g ◦ f)−1).

Exercise 40. Consider the vector spaces R3 and R2 with the bases B and Crespectively, where

B = {(1, 1, 0), (1,−1, 1), (1, 1, 1)} and C = {(1, 1), (1,−1)}.

(a) Determine the matrices ME3

B (id), MBE3

(id) representing the identity map

on R3, and determine the matrices ME2

C (id), MCE2

(id) representing theidentity map on R2.

(b) For the linear maps f and g in the previous question, determine the ma-trices MC

B (f), MBC (g), MC

C (g ◦ f) and MBB (f ◦ g) representing the linear

maps f, g, g ◦ f and f ◦ g with respect to bases B and C.

Exercise 41. Let n ∈ N. Consider the vector space Rn[x] of polynomials ofdegree at most n. Let Bn = {1, x, . . . , xn}. Define Dn : Rn[x] → Rn−1[x] byf 7→ f ′, where f ′ denotes the first derivative of f .

(a) Show that Dn is linear.(b) Determine MBn

Bn−1(Dn).

(c) Show that there is a linear map In : Rn−1[x] → Rn[x] with Dn ◦ In = id

and determine MBn−1

Bn(In).

Exercise 42. Consider the vector space W of functions from R to R. Let

B = {sin(x), cos(x), sin(x) · cos(x), sin2(x), cos2(x)},and define V = Span(B) ⊆ W . Consider the map F : V → V given by f(x) 7→f ′(x) where f ′ denotes the first derivative of f .

(i) Show that B is a basis of V .(ii) Determine the matrix MB

B (F ).(iii) Give a basis of ker(f) and im(f).

Exercise 43. Let V = R3[x] be the vector space of all polynomials of degree atmost three with basis B = {1, x, x2, x3}. Consider the maps

F : V → R, f 7→∫ 1

−1

f(x)dx and G : V → R3, f 7→ (f(−1), f(0), f(1)).

(i) Show that F and G are linear.(ii) Let E1 and E3 be the canonical bases of R and R3. Determine MB

E1(F )

and MBE3

(G).(iii) Show that Ker(G) ⊆ Ker(F ).(iv) Show that there is a linear map H : R3 → R with H ◦G = F .

Exercise 44. Let V be a vector space of dimension n and F : V → V a linearmap with F 2 = F .

58 ANNE HENKE

(a) Show that there are subspaces U,W of V with V = U⊕W and F (W ) = 0,F (u) = u for all u ∈ U .

(b) Show that there exists a basis B of V and some r ≤ n such that

MBB (F ) =

(Ir 00 0

)

.

[Here Ir denotes the identity matrix of size r, and 0 denotes zero matrices(of possibly different size).]


13. Row Reduced Echelon Matrices

The last part of this lecture course deals with how to solve system of linearequations. We will study in particular when a system of linear equations hasprecisely one solution. In this section we introduce the row reduced echelon formof a matrix.

Definition 13.1. An m× n matrix M is in row reduced echelon form if

(1) The zero rows of M (if any) all come below the non-zero rows.(2) In each non-zero row the leading entry (= the left most non-zero entry) is

one.(3) If row i and row i + 1 are non-zero, then the leading entry of row i + 1 is

strictly to the right of the leading entry of row i.(4) If a column contains a leading entry of a non-zero row, then all its other

entries are zero.

Example 13.2. Matrices in row reduced echelon form are

000

,

1000

,

(0 0 00 0 0

)

,

(0 10 0

)

,

1 0 0 0 3 10 1 4 0 −2 00 0 0 1 0 1

.

Or more generally, the following matrix is in row reduced echelon form:0

B

B

B

B

B

B

B

B

B

B

B

B

B

@

0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗ · · · ∗

.... . . 0 · · · · · · 0 1 ∗ · · · ∗ 0

......

......

...... 0 · · · 0 1 ∗ · · · ∗ 0 ∗

...... 0 · · · · · · 1 ∗ ∗

0 · · · · · · 0 · · · · · · 0 · · · · · · 0...

...0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0

1

C

C

C

C

C

C

C

C

C

C

C

C

C

A

.

Remark 13.3. If A ∈ Mm×n(R) is in row reduced echelon form then for each kwith 1 ≤ k ≤ n, the matrix made from the first k columns is also in row reducedechelon form.

Matrices not in row reduced echelon form are:

(1 00 2

)

,

0 1 2 10 1 0 30 0 1 0

,

1 0 0 0 00 0 1 0 00 0 0 0 00 0 0 0 1

.

Definition 13.4. (1) The following operations are called elementary row op-erations (eros) on a matrix:Type I: Swap row i and row j. (Write Ri ←→ Rj.)Type II: Multiply row i by a non-zero λ ∈ R. (Write Ri −→ λRi.)Type III: Add to row i a multiple (c times) of row j for i 6= j. (WriteRi −→ Ri + cRj.)

60 ANNE HENKE

(2) If matrix B is obtained from A by applying eros, we call A and B rowequivalent.

Theorem 13.5. Every m×n-matrix may be brought to a row reduced echelon formby applying elementary row operations. The row reduced echelon form obtained isunique.

Proof. For existence, see Algorithm 13.9 below. We omit the proof for theuniqueness of the row reduced echelon form obtained.

Theorem 13.5 allows to make the following definition:

Definition 13.6. The row rank of a matrix A is defined to be the number ofnon-zero rows in the row reduced echelon form of A.

Example 13.7. We give two examples of reducing a matrix to row reducedechelon form by applying eros. In the first case the matrices have row rank two,in the second case the matrices have row rank three. A further example is givenin Example 13.10.

(1)

M1 =

(0 2 −12 4 8

)

e1 = R2 −→ 12R2

M2 =

(0 2 −11 2 4

)

e2 = R1 ←→ R2

M3 =

(1 2 40 2 −1

)

e3 = R2 −→ 12R2

M4 =

(1 2 40 1 −1

2

)

e4 = R1 −→ R1 − 2R2

M5 =

(1 0 50 1 −1

2

)

Notation: Given matrix M. Suppose when applying ero e to M we obtainthe matrix N. Then we write N = e(M).Remark: In the example above we have ei(Mi) = Mi+1 for 1 ≤ i ≤ 4.


(2)

M1 =

0 1 00 0 12 2 0

e1 = R3 −→ 12R3

M2 =

0 1 00 0 11 1 0

e2 = R1 ←→ R3

M3 =

1 1 00 0 10 1 0

e3 = R2 ←→ R3

M4 =

1 1 00 1 00 0 1

e4 = R1 −→ R1 − R2

M5 =

1 0 00 1 00 0 1

Remark 13.8. Given matrix M = (mij). We will apply eros to M using thefollowing language:

(1) If mij 6= 0 then we can normalise by Ri −→ (mij)−1Ri, so that the (i, j)th

entry becomes 1.(2) We can move an entry mij up and down in its column by Ri ←→ Rv.(3) If mij = 1 then we can purge all other entries in column j (i.e. make them

zero) by applying Type III operations: Rs −→ Rs−msjRi, for s 6= i. Theelement mij = 1 used to “clean out” the rest of the column is called thepivot of the purging operation.

Algorithm 13.9 (for reducing a matrix to row reduced echelon form by eros).

Input: Matrix M of size m× n.The algorithm is an n stage process. Starting with matrix Mk, we haveat the end of stage k the matrix Mk+1 in which the first k columns makean m× k matrix in row reduced echelon form.

Stage 1: Inspect column 1 of M1 := M. If the column is zero, then stop andStage 1 is complete. Otherwise find the first non-zero entry in the firstcolumn (reading downwards), normalise it and move it to the top. Use itas pivot to purge the rest of the column. We obtain:

M2 =

0 ∗ · · · ∗0

......

......

...0 ∗ · · · ∗

62 ANNE HENKE

or

M2 =

1 ∗ · · · ∗0

......

......

...0 ∗ · · · ∗

Stage k: We start with matrix Mk of the form

Mk = (Ak | Bk)

where Ak is an m× (k− 1)-matrix in row reduced echelon form and Bk isan m× (n− k + 1)-matrix.

Case 1: If Ak = 0 (has no non-zero rows), then apply the Stage 1 process tothe first column of Bk. We obtain Matrix Mk+1.

Case 2: If Ak has m non-zero rows then stop altogether. In this case Mk isalready in row reduced echelon form.

If neither Case 1 nor Case 2 occur then

Case 3: Write

Mk =

(Ek Fk

0 Gk

)

,

where Ek consists of the non-zero rows of Ak, Fk is the continuationof these non-zero rows of Ek. As Case 1 did not occur, this impliesEk has at least one row. As Case 2 did not occur, this implies Gk hasat least one row. Note Ek is in row reduced echelon form. Inspectthe first column of Gk in Mk. If it is zero, then stop and Stage k iscomplete. Otherwise select the first non-zero element, normalise andmove it to the top left hand corner of Gk. Use it as a pivot to purgethe kth column of Mk. Stop. Stage k is complete. We obtain matrixMk+1.

In Case 1 and Case 3: Move on to the next stage if k ≤ n−1, or otherwisethe algorithm stops.

Output: matrix in row reduced echelon form.

Remarks. (a) This algorithm proves the first part of Theorem 13.5. We have notshown uniqueness of the row reduced echelon form in Theorem 13.5.

(b) Note that the algorithm has been applied in Example 13.7. However manymore inbetween steps have been given in these examples, and hence the labellingof the matrices Mi does not agree with the labelling of matrices used in Algo-rithm 13.9.

Example 13.10. We perform Algorithm 13.9 on matrix M1 given below. Thereare three steps in the algorithm and the matrices Mi from the algorithm are given


by:

M1 =

1 0 1 −2 22 1 1 −2 33 1 3 −6 41 0 2 −4 1

M2 =

1 0 1 −2 20 1 −1 2 −10 1 0 0 −20 0 1 −2 −1

M3 =

1 0 1 −2 20 1 −1 2 −10 0 1 −2 −10 0 1 −2 −1

M4 =

1 0 0 0 30 1 0 0 −20 0 1 −2 −10 0 0 0 0

It follows that the row rank of the matrices Mi for 1 ≤ i ≤ 4 is three.

Exercise 45. Find the row reduced echelon form of the matrices A and B where

A =

2 −2 2 1−3 6 0 −1

1 −7 10 2

, B =

2 2 −1 6 44 4 1 10 138 8 −1 26 23

.

64 ANNE HENKE

14. Systems of Linear Equations

Notation 14.1. Given is a system of linear equations, say

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...

am1x1 + am2x2 + . . . + amnxn = bm.

We can shortly write this as A · x = b where

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

am1 am2 · · · amn

, x =

x1

x2...

xn

, b =

b1

b2...

bm

.

We call A the coefficient matrix of the system of equations; the m×(n+1)-matrix(A | b) is called the augmented matrix of the equation system. A system of linearequations Ax = b is called homogeneous if b = 0.

Remark: Note that the set of solutions {x ∈ Rn | Ax = b} of a sytem of linearequations Ax = b forms a subspace of Rn if and only if b = 0.

Example 14.2. (1) The system of equations

x1 − 2x2 + 2x3 = 0

3x1 + x2 + 4x3 = −1

2x1 + x2 + = 2

has augmented matrix

1 −2 2 03 1 4 −12 1 0 2

.

(2) We give some examples of systems of linear equations where the corre-sponding augmented matrix is in row reduced echelon form:(a) Consider

(0 0 00 0 0

)

,0 · x + 0 · y = 00 · x + 0 · y = 0.

Solutions are all (x, y) = (α, β) with α ∈ R, β ∈ R.(b) Consider

1 0 0 −20 1 0 30 0 1 4

,x1 = −2x2 = 3x3 = 4.

There is precisely one solution to the system of equations, namely

(x1, x2, x3) = (−2, 3, 4).


(c) Consider the homogeneous system of equations

1 0 0 0 3 00 1 4 0 −2 00 0 0 1 0 0

,x1 + 3x5 = 0

x2 + 4x3 − 2x5 = 0x4 = 0.

The solutions of this system of equations are given by all

(x1, x2, x3, x4, x5) = (−3β, 2β − 4α, α, 0, β)

= α(0,−4, 1, 0, 0) + β(−3, 2, 0, 0, 1)

with α ∈ R and β ∈ R. The set of all solutions forms a subspace of R5

of dimension two with basis vectors (0,−4, 1, 0, 0) and (−3, 2, 0, 0, 1).

Remark. Our aim is to solve systems of linear equations. If the augmentedmatrix of a system of equations is in row reduced echelon form, we can read offthe solutions of the equation system. Now suppose we are given a system ofequations with augmented matrix not in row reduced echelon form. We then canperform row operations on the augmented matrix. Doing this, we do not changethe set of solutions of the system. The following algorithm describes how to solvea system of linear equations by so-called Gaussian elimination.

Algorithm 14.3 (Gauss-Jordan Solution of Linear Equations).Input: Given Ax = b where A is an m× n matrix.

Step1: Form the augmented matrix M = (A | b).Step2: Transform M to row reduced echelon form E.Output:

Case (a): E = 0. Let αi ∈ R for 1 ≤ i ≤ n, then (x1, . . . , xn) = (α1, . . . , αn)is a solution of the equation system Ax = b.

Case (b): E contains (0, 0, . . . , 0, 1) as a row. Declare the system inconsis-tent, that is, it has no solution.

Case (c): If (a) and (b) do not occur, then all leading entries of E occur incolumns 1 to n. If column j (for 1 ≤ j ≤ n) does not contain a leadingentry, then give a parameterised value αj to xj. Use the equations corre-sponding to the non-zero rows of E to solve for the remaining variables.

Example 14.4. Given is

2x1 + x2 + x3 − 2x4 = 3

x1 + x3 − 2x4 = 2

x1 + 2x3 − 4x4 = 1

3x1 + x2 + 3x3 − 6x4 = 4.

Using Gaussian elimination, see Algorithm 14.3, we obtain the row reduced ech-elon matrix

1 0 0 0 30 1 0 0 −20 0 1 −2 −10 0 0 0 0

.

Solutions: for any α ∈ R, then (x1, x2, x3, x4) = (3,−2,−1 + 2α, α) is a solution.

66 ANNE HENKE

Example 14.5. For which values of a, b, c is the system of equations

3x + y + 2z = −1

x + 2y +−z = a

x + z = −1

2x + by − z = c

consistent? Which values of a, b, c give infinitely many solutions? Taking thethird equation first, the augmented matrix is

1 0 1 −13 1 2 −11 2 −1 a2 b −1 c

Performing row operations on the augmented matrix in accordence with Algo-rithm 14.3, we get:

1 0 1 −10 1 −1 20 2 −2 a + 10 b −3 c + 2

(we cleaned the first column by subtracting multiples of row one),

1 0 1 −10 1 −1 20 0 0 a− 30 0 b− 3 c− 2b + 2

.

(we cleaned the second column by subtracting multiples of row two).Instead of solving the system completely, note at this stage, that the system ofequations is consistent only for a = 3. So put a = 3 and swap rows three and fourto get

1 0 1 −10 1 −1 20 0 b− 3 c− 2b + 20 0 0 0

.

We hence obtain the following different cases:

(1) If a = 3, b 6= 3, then the system has a unique solution.(2) If a = 3, b = 3, c = 4 we have a solution (x, y, z) = (−1− α, 2 + α, α) for

any α ∈ R. Hence we have infinitely many solutions.(3) If a 6= 3, or if a = 3, b = 3, c 6= 4, then the system is inconsistent.

Exercise 46. Solve the following system of equations using Gaussian elimination:(a)

x + 2y − 4z = −42x + 5y − 9z = −103x − 2y + 3z = 11

,


(b)x + 2y − 3z = −1

−3x + y − 2z = −75x + 3y − 4z = 2

,

(c)x + 2y − 3z = 1

2x + 5y − 8z = 43x + 8y − 13z = 7

.

Exercise 47. Use the Gaussian elimination algorithm to do the following:

(i) to determine a basis of the subspace of R4 spanned by the following vectors:(1,−2, 5,−3), (2, 3, 1,−4) and (3, 8,−3,−5).

(ii) to decide whether the vectors (2, 5,−3,−2), (−2,−3, 2,−5), (1, 3,−2, 2)and (−1,−6, 4, 3) form a basis of R4.

Exercise 48. Determine whether the following sums are direct sums or not:

(i) Span{(3,−2,−5, 4), (−5, 2, 8,−5)}+ Span{(−2, 4, 7,−3), (2,−3,−5, 8)},(ii) Span{(1,−2, 5,−3), (4,−4, 6,−3)}+ Span{(3, 4, 0, 1), (−3, 8,−2, 1)}.

68 ANNE HENKE

15. Invertible Matrices and Systems of Linear Equations.

Notation 15.1. Recall Definitions 11.1 and 11.4 where we defined the image,kernel, rank and nullity of a linear transformation. Let A be an m × n matrix.Consider the map fA : Rn → Rm given by fA(x) = Ax. Note that fA is a linearmap. We define:

imA := imfA image of A, kerA := ker fA null space of A,

rkA := dim(imA) rank of A, n(A) := dim(ker A) nullity of A.

Proposition 15.2. Let A ∈Mn(R), x ∈Mn×1(R). The following are equivalent:

(1) Ax = 0 has x = 0 as unique solution.(2) The row reduced echelon form of A is In.(3) For any b ∈Mn×1(R), the equation system Ax = b has a unique solution.

Proof. (1)⇒ (2): Bring the augmented matrix (A | 0) into row reduced echelonform, say (E | 0). So Ax = 0 if and only if Ex = 0. As Ax = 0 has a uniquesolution, this implies that Ex = 0 has a unique solution. Hence there are noparameters in the general solution of Ex = 0. Hence each of the columns of Econtains a leading entry. This implies E = In.

(2)⇒ (3): Bring (A | b) into row reduced echelon form. Since A has row reducedechelon form In, it follows that (A | b) has row reduced echelon form (In | c). SoAx = b if and only if Inx = c. But Inx = c has unique solution x = c. HenceAx = b has a unique solution.

(3) ⇒ (1): By assumption, Ax = b has a unique solution for every b ∈Mn×1(R).Take b = 0, then Ax = 0 has a unique solution.

Remark: Note that the conditions in the last proposition are also equivalent tonullity of A is zero; to row rank of A is n; to rank of A is n (use the rank-nullityformula).

Next, recall that a square matrix A ∈ Mn(R) is invertible if and only if thereexists B ∈ Mn(R) such that AB = BA = In. Also recall that if A is invertiblethen B is uniquely determined. Write B = A−1.

Proposition 15.3. Suppose A ∈ Mn(R) is satisfying one of the conditions inProposition 15.2. Then there exists B ∈Mn(R) such that AB = In. (We say thatB is a right inverse of A.)

Proof. Let e1, e2, . . . , en be the columns of In. By Proposition 15.2 we havea unique solution bi for every equation system Ax = ei, for 1 ≤ i ≤ n. DefineB = (b1 | b2 | . . . | bn), the matrix with columns b1, . . . , bn. Then AB = In.

Proposition 15.4. Supposer A ∈ Mn(R) has a right inverse. Then A is aninvertible matrix.


Proof. By assumption, we have a matrix B with AB = In. We need to showthat also BA = In.

(1) We show that B has a right inverse C: Assume we have a vector x withBx = 0. Then: 0 = A · 0 = ABx = Inx = x. Hence Bx = 0 hasunique solution x = 0. By Proposition 15.3, there exists C ∈Mn(R) withBC = In.

(2) Note that C = InC = (AB)C = A(BC) = AIn = A. Hence AB = In andBA = In and so, by definition, A is invertible with A−1 = B.

Corollary 15.5. The following are equivalent:

(1) Ax = 0 has x = 0 as a unique solution.(2) A has a right inverse.(3) A is invertible.(4) A has a left inverse.

Proof. This follows from Proposition 15.3 and Proposition 15.4 and from show-ing that (4) implies (1). Let B be the left inverse of A, so BA = In. Assume thatx is a solution of Ax = 0. Then 0 = B · 0 = BAx = Inx = x. Hence Ax = 0 hasx = 0 as a unique solution.

Algorithm 15.6. (for calculating the inverse of an n×n matrix A, or for declaringA to have no inverse)

Input: Given matrix A ∈Mn(R).

Step1: Form the augmented n× 2n matrix M = (A | In).Step2: Bring M into row reduced echelon form (E | F ) with E,F ∈Mn(R).Output:

(a) If E 6= In, declare A to be non-invertible.(b) If E = In, declare A−1 = F.

70 ANNE HENKE

Example 15.7. Let A =

1 0 22 −1 34 1 8

. Then A−1 =

−11 2 2−4 0 1

6 −1 −1

as

1 0 2 | 1 0 02 −1 3 | 0 1 04 1 8 | 0 0 1

R2 → R2 − 2R1

R3 → R3 − 4R1

1 0 2 | 1 0 00 −1 −1 | −2 1 00 1 0 | −4 0 1

R2 → −R2

R3 → R3 −R2

R3 → −R3

1 0 2 | 1 0 00 1 1 | 2 −1 00 0 1 | 6 −1 −1

R1 → R1 − 2R3

R2 → R2 −R3

1 0 0 | −11 2 20 1 0 | −4 0 10 0 1 | 6 −1 −1

.

Exercise 49. Find the inverses of the following matrices using elementary rowoperations:

A =

1 −1 0 01 0 −1 01 0 0 −10 1 1 1

, B =

1 1 2 22 1 1 22 2 1 23 3 1 3

, C =

1 −1 1 20 1 2 −13 1 1 13 2 1 0

,

Exercise 50. Let A be a matrix with entries in R. Prove the following statements:

(a) A system of linear equations Ax = 0 with fewer equations than variablesalways has a non-trivial solution.

(b) A system of linear equations Ax = b with fewer equations than variableseither has no solution or has several different solutions.

(c) A system of linear equations Ax = b where the rank of A equals thenumber of equations in the system, always has a solution.

(d) A system of linear equations Ax = b where the rank of A equals thenumber of variables in the system, has at most one solution.

(e) A system of linear equations Ax = b where the rank of A equals thenumber of variables in the system and equals the number of equations inthe system, has precisely one solution.

[Denote by B the row reduced echelon form of a given matrix A. Then the rankof A is defined to be the number of non-zero rows of B.]

Exercise 51. The n× n Van der Monde matrix is the matrix A defined by

A =

1 x1 x21 · · · xn−1

1

1 x2 x22 · · · xn−1

2...

......

. . ....

1 xn x2n · · · xn−1

n


where x1, x2, . . . , xn are distinct real numbers. Show that A is invertible. [Hint:let

x =

a0

a1...

an−1

be a solution to the simultaneous equations Ax = 0. Show that x1, . . . , xn are allroots of the polynomial a0 + a1x + a2x

2 + . . . + an−1xn−1.]

72 ANNE HENKE

16. Elementary Matrices

Recall Definition 13.4 of an elementary row operation (ero) on a matrix. If amatrix B is obtained from a matrix A by applying elementary row operations,we call A and B row equivalent. In this section, we study how row equivalentmatrices are related.

Definition 16.1. Let e be an elementary row operation. An n × n matrix E =e(In) obtained by applying e to the identity matrix In is called an elementarymatrix.

Example 16.2. The elementary matrices for Example 13.7(1) are:

I2 =

(1 00 1

)e1=(R2→

12

R2)−−−−−−−−→ E1 := e1(I2) =

(1 00 1

2

)

I2 =

(1 00 1

)e2=(R1↔R2)−−−−−−−→ E2 := e2(I2) =

(0 11 0

)

I2 =

(1 00 1

)e3=(R2→

12

R2)−−−−−−−−→ E3 := e3(I2) =

(1 00 1

2

)

I2 =

(1 00 1

)e4=(R1→R1−2R2)−−−−−−−−−−−→ E4 := e4(I2) =

(1 −20 1

)

.

Note that in Example 13.7 we have EiMi = Mi+1.

Lemma 16.3. Let A ∈Mm×n(R) and let e be an elementary row operation. Thene(A) = e(Im) · A.

Proof. We sketch a proof for the type II elementary row operation. We leaveit as an exercise to the reader to formally write down the matrix multiplications.Similarly, to prove the claim for the other types of elementary row operations isleft as an exercise to the reader. Let A = (aij) and let e be scalar multiplicationof row i by λ ∈ R (Type II ero). Then e(Im) = diag(1, . . . , 1, λ, 1, . . . , 1) with λin row and column i. Hence

e(Im) ·A =

1 0 . . . 0

0. . .

1...

. . . λ. . .

...1

. . . 00 . . . 0 1

·

a11 a12 . . . a1n

......

......

ai1 ai2 . . . ain

......

......

am1 am2 . . . amn

=

a11 a12 . . . a1n

......

λai1 λai2 . . . λain

......

am1 am2 . . . amn

= e(A).


Lemma 16.4. Any elementary matrix is invertible.

Proof.

(1) Note that each elementary row operation is invertible:(i) Ri ↔ Rj is its own inverse.(ii) The inverse of e = (Ri → cRi) with c 6= 0 is f = (Ri → 1

cRi).

(iii) The inverse of e = (Ri → Ri+cRj) with i 6= j is f = (Ri → Ri−cRj).(2) Let e be an elementary row operation and f be its inverse elementary row

operation. Let E and F be the corresponding elementary matrices. Weapply e and f to the identity matrix I. Then

FE = FEI = I,

EF = EFI = I.

Hence F is the inverse matrix of E.

Proposition 16.5. Matrices A and B are row equivalent if and only if thereexists an invertible matrix P with B = PA.

Proof. “⇒”: Since A and B are row equivalent, there are elementary rowoperations ei (for 1 ≤ i ≤ t) such that

(13) B = e1e2 · · · et(A).

Let Ei = ei(I) for 1 ≤ i ≤ t. Let P = E1E2 · · ·Et. By Lemma 16.4 we have P isinvertible, and by Equation (13) and Lemma 16.3 we have B = PA.

“⇐”: Suppose A and B are matrices with B = PA where P is invertible. SinceP is invertible, it follows by Corollary 15.5 and Proposition 15.2, that the rowreduced echelon form of P is the identity matrix I. This means, P can be broughtto row reduced echelon form I by applying elementary row operations. So P =EsEs−1 · · ·E1 for some elementary matrices Ei with 1 ≤ i ≤ s. Then B = PA =EsEs−1 · · ·E1A, and so B can be obtained from A by applying elementary rowoperations corresponding to E1, E2, . . . , Es.

Remark 16.6. Why is Algorithm 15.6 (for inverting a matrix or for declaringit to be non-invertible) working? If A ∈ Mn(R) is invertible, then by Corollary15.5 and Proposition 15.2 matrix A is row equivalent to In. Hence there existelementary matrices Ei for 1 ≤ i ≤ k such that

In = EkEk−1 · · ·E1A.

As A is invertible, we can multiply this equation by A−1 from the right to get:

A−1 = EkEk−1 · · ·E1In.

The meaning of this last equation is the following: the elementary row operationsused to get the row reduced echelon form of A, if applied to In, give precisely theinverse of A.

74 ANNE HENKE

Exercise 52. (a) Write the following matrix C as a product of elementary ma-trices:

C =

1 1 11 2 21 2 3

.

(b) Given are matrices A and B with

A =

1 1 01 0 22 1 2

, B =

1 0 00 1 00 0 0

.

Find matrices P and Q such that PAQ = B.


17. Row Rank and Column Rank

Row and column operations. Similarly to the elementary row operationsdefined in Section 13, we can define elementary column operations:

Definition 17.1. (1) The following operations are called elementary columnoperations (ecos) on a matrix:Type I: Swap column i and column j: Ci ↔ Cj.Type II: Multiply column i by a non-zero λ ∈ R: Ci → λCi.Type III: Add to column i a multiple (λ times) of column j for i 6= j:Ci → Ci + λCj.

(2) Two matrices A and B are column equivalent, if we can pass from A toB by a sequence of elementary column operations.

Definition 17.2. Let A = (aij) be an m× n matrix with entries in R.

(1) We define the row space of A as the vector space spanned by the rowsai = (ai1, ai2, . . . , ain) of A, for 1 ≤ i ≤ m. We consider the row space asa subspace of Rn.

(2) We define the column space of A as the vector space spanned by thecolumns ai = (a1i, a2i, . . . , ami) of A, for 1 ≤ i ≤ m. We consider thecolumn space as a subspace of Rm.

The proof of the following statement is left as an exercise to the reader.

Lemma 17.3. Let V = 〈v1, . . . , vi, . . . , vj , . . . vn〉 be the vector space spanned byvectors v1, . . . , vn. Then:

(1) V = 〈v1, . . . , vj , . . . , vi, . . . , vn〉 for j > i (swap vectors vi and vj);(2) V = 〈v1, . . . , λvi, . . . , vn〉 for λ 6= 0 (multiply vector vi by scalar λ);(3) V = 〈v1, . . . , vi + λvj, . . . , vn〉 for any λ ∈ R and i 6= j (add to vector vi a

multiple of a vector vj for i 6= j).

Proposition 17.4. Let A,B be matrices where B is obtained from A throughelementary row (column) operations. Then the row (column) space of A equalsthe row (column) space of B.

Proof. The rows of B are obtained from the rows of A by- reordering rows,- scalar multiplication of rows,- addition of rows.By Lemma 17.3, all these operations preserve the space spanned by the rows. Theproof that the column space is preserved under elementary column operations issimilar.

Remark. Note that the column space of a matrix B is in general not equal to thecolumn space of a matrix A, if matrix B is obtained from matrix A by elementaryrow operations. And the row space of a matrix B is in general not equal to the

76 ANNE HENKE

row space of a matrix A, if matrix B is obtained from matrix A by elementarycolumn operations.

Lemma 17.5. Let A ∈Mm×n(R), let c be an elementary column operation appli-cable to A. Let r be the corresponding elementary row operation (that is of sametype, applied to the row with the same number). Then c(A) = (r(AT ))T .

Proof. Elementary column operations applied to A have the same effect ascorresponding elementary row operations applied to AT (and then transposingthe latter result).

Remark 17.6. We leave it to the reader to find a similar definition to Defini-tion 13.1 for the column reduced echelon form of a matrix. The column reducedechelon form of a matrix A should be equal to the transposed of the row reducedechelon form of AT .

Recall that by Definition 13.6, the row rank of a matrix A equals the number ofnon-zero rows in the row reduced echelon form of A.

Definition 17.7. We define the column rank of a matrix A to be equal to thenumber of non-zero columns in the column reduced echelon form of A.

Corollary 17.8. Let A be an m×n matrix over R. Then the row rank of A equalsthe dimension of the row space of A. Moreover, the column rank of A equals thedimension of the column space of A, which in turn equals the rank of A.

Proof. (1) Let B be the corresponding row reduced echelon form of the givenmatrix A. Note that the non-zero row vectors in B are linearly independent.Hence the dimension of the row space of B equals the number of non-zero rowsof B. By Proposition 17.4, it follows that the row rank of A equals the dimensionof the row space of A.

(2) The proof that the column rank is equal to the dimension of the column spaceis similar to the argument given in (1).

(3) By definition, the rank of a matrix A equals the dimension of the image ofA, see 15.1. Let {e1, . . . , en} be the canonical basis of Rn. Then the image of Ais equal to Span{Ae1, . . . , Aen}. Note that Aei equals precisely the ith column ofA. Hence the image of A is equal to the column space of A. This implies thatthe rank of A is equal to the column rank of A.

Row and column rank are equal. We define elementary matrices for elemen-tary column operations, similar to the elementary matrices defined for elementaryrow operations (see Section 16). Note that when applying an elementary columnoperation to a matrix A, it means multiplying A with the corresponding ele-mentary matrix from the right (not from the left.) We hence have similar toProposition 16.5:

Proposition 17.9. Matrices A and B are column equivalent if and only if thereexists an invertible matrix Q with B = A ·Q.


Theorem 17.10. Let A ∈ Mm×n(R) with row rank r. Then there exists an in-vertible matrix P ∈Mm(R) and an invertible matrix Q ∈Mn(R) such that

PAQ =

(Ir 00 0

)

.

Proof. By Propositions 16.5 and 17.9, it is sufficient to show that A can bebrought into this form by a combination of elementary column operations andelementary row operations.

Step 1: Bring A to row reduced echelon form, say E, by a sequence of elementaryrow operations. Then E has r non-zero rows.

Step 2: Take the r columns containing the leading entries of the non-zero rowsand use them to make up the first r columns. This can be done by elementarycolumn operations. We obtain the matrix

(Ir ∗0 0

)

.

Step 3: Use the leading entry of each non-zero row to purge the first r rows. Thismeans applying elementary column operations. We obtain the matrix

(Ir 00 0

)

.

To calculate P and Q, keep track of the elementary row operations and elementarycolumn operations applied (see the proof of Proposition 16.5).

Theorem 17.11. Let A ∈ Mm×n(R). Then the row rank of A equals the columnrank of A (equals the rank of A).

Proof. Let r be the row rank of A. Then by Theorem 17.10, there exist invertiblematrices P,Q with

PAQ =

(Ir 00 0

)

.

(i) Note that PAQ is row equivalent to AQ by applying a sequence of elementaryrow operations (see Proposition 16.5). These elementary row operations leavethe last (n − r) columns zero. So at least n − r columns of AQ are zero. Hencethe column reduced echelon form of AQ has ≤ r non-zero columns. Hence thecolumn rank of AQ is ≤ r. Since A and AQ are column equivalent, we have thatthe column rank of A equals the column rank of AQ. Hence the column rank ofA ≤ row rank of A.

78 ANNE HENKE

(ii) Apply the result of (i) to AT . Then

row rank of A = column rank of AT

≤ row rank of AT

= column rank of A.

Hence by (i) and (ii), this implies that the row rank of A equals the column rankof A.

Example 17.12. Given

A =

(1 2 1−2 −4 1

)

.

Find matrices P,Q such that

PAQ =

(1 0 00 1 0

)

,

using the algorithm in the proof of Theorem 17.10.

Step 1: Bring A to row reduced echelon form E and apply the same elementaryrow operations to I2:

(1 00 1

) (1 2 1−2 −4 1

)

R2 → R2 + 2R1

(1 02 1

) (1 2 10 0 3

)

R2 → 13R2

(1 023

13

) (1 2 10 0 1

)

R1 → R1 − R2

(13−1

323

13

) (1 2 00 0 1

)

.

Hence for

P =

(13−1

323

13

)

we have PA = E.

Step 2/3: Bring E to column reduced echelon form and apply the same elementarycolumn operations to I3:

(1 2 00 0 1

)

1 0 00 1 00 0 1

C2 ↔ C3

(1 0 20 1 0

)

1 0 00 0 10 1 0

C3 → C3 − 2C1

(1 0 00 1 0

)

1 0 −20 0 10 1 0

.


Then taking Q =

1 0 −20 0 10 1 0

, we have PAQ =

(1 0 00 1 0

)

.

Exercise 53. Determine the row rank and the column rank of the followingmatrix:

X =

1 2 3 . . . n2 3 4 . . . n + 1...

......

. . ....

n n + 1 n + 2 . . . 2n− 1

.

Exercise 54. Matrix U comes from matrix A by subtracting row one from rowthree:

A =

1 3 20 1 11 3 2

U =

1 3 20 1 10 0 0

(i) Find bases for the two column spaces.(ii) Find bases for the two row spaces.(iii) Find bases for the two null spaces.

linear algebra notes

Documents