linear algebra and geometry david meredithonline.sfsu.edu/meredith/linear_algebra/spring_2010/linear...

Linear Algebra and Geometry

David Meredith

Department of MathematicsSan Francisco State UniversityE-mail address: [email protected]: http://online.sfsu.edu/~meredith

Contents

Preface v0.1. How to use this book v0.2. Acquiring Linear Algebra Software v0.3. A note to the teacher vi

Chapter 1. Vectors in Euclidean Space 11.1. Vectors and Coordinates 11.2. Vector Arithmetic 21.3. Euclidean spaces 31.4. Lengths, Angles, Vector Products and Orthogonality 5

Chapter 2. Matrices and Matrix Operations 132.1. Matrices Operations 132.2. Projections, Reflections and Rotations 26

Chapter 3. Systems of Linear Equations 333.1. Two Key Problems 333.2. Matrices with Pivots 353.3. Solving Homogeneous Systems 373.4. Row Equivalent Matrices and Reduced Row-Echelon Matrices 403.5. Linear Independence and Row Rank 49

Chapter 4. Subspaces and Dimension 554.1. Definitions and Examples 554.2. Constructing Subspaces 564.3. Spanning Sets and Dimension 594.4. Orthogonal Complements 624.5. Dimension of Row and Column Spaces 664.6. Invertible Matrices 684.7. Sums and Intersections of Subspaces 73

Chapter 5. Eigenvectors and Eigenvalues 775.1. Introduction and Definitions 775.2. Eigenvalues of 2× 2 Matrices 795.3. Finding Eigenvalues and Eigenvectors 835.4. Symmetric Matrices and Eigenvalues 89

Chapter 6. Linear Equations and Linear Approximations 976.1. Singular Value Decomposition Defined 976.2. Consequences of the Singular Value Decomposition 996.3. Singular Value Decomposition Proved 105

iii

Preface

In 1637 Rene Descartes publish La geometrie including his method of describinggeometric figures with equations and coordinates. Since then algebra and geometryhave been inseparably linked. These notes will teach you to use the algebraic meth-ods of vectors and matrices to solve problems in linear geometry. Three problemsin particular will guide our development of the theories and tools of linear algebra:

Algebra GeometrySolving linear equations Finding linear spans and basesMatrix multiplication Describing arrangements and motions in space

Solving equations approximately Finding nearest points

Don’t worry if you don’t understand all the words used to describe the three prob-lems. The goal of this course is to teach you what they mean.

0.1. How to use this book

This is an interactive book, and it will work for you if you use it interactively.You will find problems placed every few paragraphs, and you should do them asyou encounter them. You will know if you get them right. If you cannot understanda problem, that means that you did not fully grasp the material in the precedinghalf page. Go back and read it again, then do the problem. Do not skip anyproblems. If you do so, you are likely to fail to learn something important. Skip afew problems and you will find yourself awash in a sea of unfamiliar terminology.Do all the problems and you will learn the concepts one at a time. The problemsare not hard if you understand the definitions and theorems in the text.

Every theorem (lemma, proposition) is followed by a proof. You don’t have toread all the proofs the first time through. They are there because you should knowthat results can be proven. I’ve tried to follow most theorems with an example thatexplains, better than the proof, what the theorem means and how it can be used.

0.2. Acquiring Linear Algebra Software

One of the many goals of this course is to acclimate you to the doing math-ematics with an electronic assistant always at your side. Linear algebra calcu-lations can be long and arduous when done by hand, so you will use a com-puter program for the routine (and not so routine) arithmetic. The principallinear algebra program used by professional mathematicians and scientists is Mat-lab (http://www.mathworks.com/). A student version of Matlab is available foraround $100.

v

vi PREFACE

An adequate free alternative to Matlab is the open-source program Octave(http://www.gnu.org/software/octave/download.html). The command struc-ture of Octave is very similar to Matlab’s. This text includes many examples ofOctave code for linear algebra computations. All examples have been tested inOctave on Windows.

There are alternatives to Matlab and Octave that you may wish to explore.• X(PLORE), written by me, is available from my website http:\\online.sfsu.edu\~meredith. X(PLORE) includes sufficient matrix operationsfor all examples and problems in this text.

• Scilab is a free, extended clone of Matlab written by a European con-sortium. The command structure of Scilab is not as close to Matlab asOctave’s command structure, but Scilab has more problem-solving toolsthan Octave.

• Mathematica and Maple are huge programs that include complete linearalgebra packages for both exact and floating point arithmetic. They arenot free.

• Sage is an open-source package currently under development at the Uni-versity of Washington intended to compete with Mathematica and Maple.It has many active users and developers.

• Computer programmers who know or want to learn Python can use thePython extensions NumPy and SciPy to do all the calculations requiredfor this text.

0.3. A note to the teacher

This text is intended for sophomore students of science, engineering,computer science and mathematics. I believe that all these studentswill benefit from a concrete introduction to linear algebra and linear ge-ometry in Rn emphasizing the application of matrix algebra to geometricproblems. I believe further that they will benefit from a rigorous coursein mathematics. While there are no “abstract” vector spaces in thistext, there are plenty of proofs. Everything stated in the text is provenor left to the student to prove (except for the theory of determinants).My goal is for students to think of mathematics not as a collection ofuseful algorithms but as a coherent body of useful theory.

The three problems I stated, understood sufficiently broadly, coverthe usual topics in a first linear algebra course. The theory of linearequations and linear spaces will include subspaces, bases, dimensiontheory and orthogonal complements. Matrix multiplication includes thespecial cases of rotations and reflections as well as the general theory ofeigenvalues. Eigenvalues are sufficient to describe the motions inducedby symmetric matrices, so the spectral theorem is carefully proved.Finding nearest points on a subspace leads to further explorations oforthogonality and norms.

What’s left out: abstract vector spaces, complex numbers or more

general fields, quadratic forms, determinants, numerical methods with

error estimates, . . . .

CHAPTER 1

Vectors in Euclidean Space

1.1. Vectors and Coordinates

Linear algebra studies the geometry of “flat” objects in space: points, lines,planes and so forth. Curved objects like circles are not considered. The first thingyou will study are points, which you will call “vectors.” There is a surprisingamount of theory about such simple objects.

Definition 1.1. A vector is a finite list of numbers. The number of entriesin the vector is the size of the vector. The numbers in the vector are called thecoordinates of the vector. 1

The following are examples of vectors:

(1.1.1)

u = (1, 0)v = (3, 4, 5, 6)w = (−3,−2,−1)x = (0.345,−3.214, 2.876,−1.357)

Their sizers are 2, 4, 3 and 4. Vectors will always be denoted by bold-faced, lower-case letters. Numbers will be denoted by plain lower-case characters.

More advanced textbooks on linear algebra use the term “vector” somewhatmore generally than it us used here. These more general vectors behave exactly likethe vectors of numbers that are studied in this book, and I think it is a good ideato master this version of vector theory before tackling the more general version.

1This book will frequently introduce new terminology. It will always do so through definitions.Although you may already know a meaning for the word, putting the word in a definition meansthat you have to ignore anything you already know about the word and just use it as it is defined.

The entire meaning of the word, for purposes of this text, is given by the definition. A definitionwill almost always be followed by examples, but it is a mistake to try to understand the definitionjust by looking at the examples. You have to read the formal definitions and understand them.

1

2 1. VECTORS IN EUCLIDEAN SPACE

To enter a vector into Octave, you put the entries between brackets and separatethem with commas. Here is how you would enter the vectors above. Press <ENTER>at the end of each line, and you will see the vector you have created.

u = [1,0]v = [3,4,5,6]w = [-3,-2,-1]x = [0.345, -3.214, 2.876, -1.357]s = size(v)

The function size returns two numbers 1 4. In a later chapter you will see thatthe vector v is really a matrix with one row and four columns.

Whenever you enter an object like a vector into Octave or evaluate a functionlike size(), you should assign the result to a named variable. Here the vectorsare called u, v, w and x, and the size of v is called s. By assigning the results ofyour calculations to variables, you can use the results later in your work. Noticehow the size of the second vector was calculated by using the name of the vectorv in the size() function.

You need to be able to refer to the coordinates of a vector. If v = (3, 4, 5, 6),then the coordinates are denoted v(1) = 3, v(3) = 5 and so forth. If v is a vectorof size n, then it has coordinates v(1), . . . ,v(n). You can write:

v = (v(1),v(2), . . . ,v(n))

Octave understands coordinates. You can find the second coordinate of the vectorv with the following command:

v2 = v(2)

If all the coordinates of a vector are 0, then it is the zero vector. The zerovector is denoted as 0 regardless of its size.

The zero vector can be constructed in Octave. The zero vector of size 5 is con-structed as follows:

z5 = zeros(1,5)

The parameters 1,5 mean that there is is one row with five entries. Tryzeros(2,5) and see what you get.

1.2. Vector Arithmetic

It is traditional in linear algebra to use the term scalar for number. The two

terms are interchangeable; both mean real or decimal numbers like 3, -2.57,34

and

π. (It is possible to extend the scalars to include complex numbers, but most ofthis text will be restricted to real numbers.)

1.3. EUCLIDEAN SPACES 3

A vector can be multiplied by a scalar. Just multiply the coordinates by thescalar. Here is an example. If v = (1, 2, 3), then:

2v = 2(1, 2, 3) = (2, 4, 6)

Vectors of the same size can also be added or subtracted by adding or subtractingcoordinates in the same position. If v = (1, 2, 3) and w = (6, 5, 4), then:

v + w = (1, 2, 3) + (6, 5, 4) = (7, 7, 7)

v −w = (1, 2, 3)− (6, 5, 4) = (−5,−3,−1)

The general definitions of the operations are:

(av)(i) = a(v(i)) (v + w)(i) = v(i) + w(i) (v −w)(i) = v(i)−w(i)

Definition 1.2. A linear combination of vectors (of the same size) is theresult of adding, subtracting and scalar multiplying vectors.

If u = (1, 0,−1), v = (1, 2, 3) and w = (6, 5, 4), then we can construct the linearcombination:

3u + 2(w − v) = 3(1, 0,−1) + 2((6, 5, 4)− (1, 2, 3)) = (13, 6,−1)

You can use Octave to form linear combinations of vectors. Let’s do the exampleabove in Octave.

u = [1,0,-1]v = [1,2,3]w = [6,5,4]comb = 3*u+2*(w-v)

Note: the symbol “*” means “multiply” in Octave.

***Problem 1.3. Using u, v, w from the Octave instructions above:

(1) find 3 (2u + v)−w;(2) Find non-zero scalars a, b, c such that au + bv + cw = 0.

1.3. Euclidean spaces

Everything we will do with vectors can be done algebraically as in the previoussection. However it is much easier to think about vectors, and they are much moreuseful, if you also give them a geometric interpretation. Most applications of linearalgebra come from the geometric side, but most solutions come from the algebraicside. In linear algebra, algebraic methods provide solutions to geometric problems.

All vectors of size 2 comprise the plane R2. That is, R2 consists of pairs (x, y).You are accustomed to representing R2 by drawing two coordinate lines perpen-

dicular to each other. Putting coordinates on the lines gives a location or addressto every point on the plane. The figure shows how the vector (2, 3) can be displayedin R2, either as a point or as an arrow from the origin.


Vector (2, 3) as point and arrow in R2

You can also display a vector as a displacement from one vector to another.See the figure below.

Vector (2, 3) as displacement from (1,−1) to (3, 2)Adding vectors can be thought of as putting the tail of one vector at the head of

the other. The figure below shows w as the sum of u and v. Of course, if w = u+vthen v = w−u, so if the tails of the vectors u and w start at the same point, thenthe vector from the head of u to the head of w is w − u.

Vector v added to u. The result w satisfies

w = u + v and v = w − u

Just like R2 is the collection of all vectors of size 2, Rn is the collection of allvectors of size n.2 These spaces are called Euclidean spaces. They are examples

2Understanding mathtalk: There is no space called Rn. There are spaces called R1, R2, R3,and so forth. When we talk about Rn, we are talking about one of these spaces, and we don’t

know or care which one. We are saying something which is true for all the spaces R1, R2, R3,. . . .

1.4. LENGTHS, ANGLES, VECTOR PRODUCTS AND ORTHOGONALITY 5

of what mathematicians call vector spaces, although they do not exhaust all possiblevector spaces. We say that Rn is an n-dimensional Euclidean space. The space R3

is a model for the three-dimensional space we live in. Go up just one dimension toR4 and you can pose problems that have puzzled mathematicians and physicists fora century. To learn about one of these problems, look up “Poincare Conjecture.”Even higher dimensional spaces are useful, especially for statisticians and physicists.

Everything that was said about two-dimensional spaces is equally true forhigher dimensional spaces. In Rn the vectors are represented by lists of size n. Ifu,v and w are vectors in Rn, then w = u+v if and only if v = w−v. Pictures arehard to draw in R3 and just about impossible to draw in higher dimensional spaces.Nevertheless the pictures you draw in R2 can inform your intuition about the ge-ometry of higher dimensional spaces. One of the challenges of modern statisticsis taking data from high dimensional spaces and turning it into useful informationdisplayed in a low dimensional space.

There is some convenient shorthand often used in mathematical writing. In-stead of saying “u, v and w are vectors in Rn”, you can say “u,v,w ∈ Rn”. Insteadof saying “v ∈ Rn”, you can say “v is an n-vector.”

1.4. Lengths, Angles, Vector Products and Orthogonality

Given two vectors of the same size, the inner product or dot product com-bines them into a number.

Definition 1.4. If v and w are n-vectors:

v = (v(1), . . . ,v(n)) and w = (w(1), . . . ,w(n))

then the inner product of v and w is:

v ·w = v(1)w(1) + · · ·+ v(n)w(n)

For example,

(1, 3, 2) · (2,−1, 1) = 1 · 2 + 3 · (−1) + 2 · 1 = 2− 3 + 2 = 1

***Problem 1.5. Find (2, 1,−1) · (2, 3, 4)

***Problem 1.6. Prove 0 · v = 0 for any vector v.

Proposition 1.7. If v,w ∈ Rn and a ∈ R, then(1) v ·w = w · v(2) (av) ·w = v · (aw) = a(v ·w)

Proof. First prove (1):

v ·w = v(1)w(1) + · · ·+ v(n)w(n)

= w(1)v(1) + · · ·+ w(n)v(n) because multiplication is commutative= w · v

Now for (2). By (1) it suffices to show that (av) ·w = a(v ·w):

(av) ·w = (av(1))w(1) + · · ·+ (av(n))w(n)

= a(v(1)w(1)) + · · ·+ a(v(n)w(n))

= a(v(1)w(1) + · · ·+ v(n)w(n))

= a(v ·w)

�


***Problem 1.8. For n-vectors v and w, prove (−v)·w = v ·(−w) = −(v ·w)

***Problem 1.9. Suppose u,v,w ∈ Rn. Prove: u · (v + w) = u · v + u ·w

The inner product is useful for finding geometric lengths and angles. First wedefine the norm of a vector.

Definition 1.10. The norm of a vector v, denoted ‖v‖, is:

‖v‖ =√

v · v

Thus ‖v‖ =√

v(1)2 + · · ·+ v(n)2. By the Pythagorean Theorem, the normof a vector is its geometric length (not to be confused with the size of the vector,which is the number of coordinates in the vector).

***Problem 1.11. Find ‖(3, 2, 4,−1)‖

***Problem 1.12. Prove that v = 0 if and only if ‖v‖ = 0.

Next we calculate the angle between two vectors.

Proposition 1.13. Let v,w be n-vectors. If you put their tails together, theyform two sides of an angle θ. Then ‖v‖‖w‖ cos θ = v ·w. See the figure below.

Angle θ between v and w

Proof. By the law of cosines:

‖v‖2 + ‖w‖2 − 2‖v‖‖w‖ cos(θ) = ‖w − v‖2

= (w − v) · (w − v)= w ·w − 2w · v + v · v= ‖w‖2 − 2w · v + ‖v‖2

Thus:

‖v‖‖w‖ cos(θ) = w · v�

If both vectors are non-zero, so we can divide by their norms and obtain aclosed-form expression for the angle between them.


Corollary 1.14. If v and w are non-zero n-vectors, then the angle θ betweenv and w is

θ = arccos(

v ·w‖v‖‖w‖

)You can use Octave to find the inner product of two vectors, their norms and theangle between them. The function for the inner product of u and v is dot(u,v),and the function for the norm of u is norm(u). We can calculate the angle betweentwo vectors u and v as follows:

u = [3,4,5,6]v = [1,-1,1,-1]innerprod = dot(u,v)norm_u = norm(u)norm_v = norm(v)angle = acos(innerprod / (norm_u*norm_v))

The function acos(..) is arccos(· · · ).

***Problem 1.15. Let v = (1, 3, 2,−4) and w = (2, 4,−2, 1).(1) Use Octave to check ‖v‖ =

√v · v. The command for square root is

sqrt(...).(2) Use Octave to find the angle θ between v and w.

Next we will use the angle between vectors to determine when two vectors areperpendicular and when they are parallel.

Corollary 1.16. Suppose v and w are non-zero n-vectors. Then v ·w = 0 ifand only if the angle between v and w is

π

2Proof. Let θ be the angle between v and w. By the first Corollary, θ =

arccos(

v·w‖v‖‖w‖

). Thus θ =

π

2= arccos 0 if and only if v·w

‖v‖‖w‖ = 0, which holds ifand only if v ·w = 0. �

Definition 1.17. Two n-vectors, v and w, are said to be orthogonal 3, de-noted v ⊥ w, if v ·w = 0.

3Dean Sheldon Axler of SF State noted that the Supreme Court learned some geometry onJanuary 11, 2010.

Arguing before the U.S. Supreme Court on Monday in the case of Briscoe v. Virginia,

the University of Michigan law professor Richard Friedman gave the justices an unintentionalvocabulary lesson:

MR. FRIEDMAN: I think that issue is entirely orthogonal to the issue here because theCommonwealth is acknowledging

CHIEF JUSTICE ROBERTS: I’m sorry. Entirely what?MR. FRIEDMAN: Orthogonal. Right angle. Unrelated. Irrelevant.

CHIEF JUSTICE ROBERTS: Oh.JUSTICE SCALIA: What was that adjective? I liked that.

MR. FRIEDMAN: Orthogonal.CHIEF JUSTICE ROBERTS: Orthogonal.

MR. FRIEDMAN: Right, right.JUSTICE SCALIA: Orthogonal, ooh.(Laughter.)

JUSTICE KENNEDY: I knew this case presented us a problem.


Throughout these notes angles are measured in radians. By Corollary 1.16 twonon-zero vectors are orthogonal if and only if the angle between them is

π

2. By the

definition, the zero vector is orthogonal to every vector.

***Problem 1.18. Find x such that (3, x,−1) is orthogonal to (1,−1, 2)

The final geometric concept that must be translated into vector language isparallelism. Two vectors point in the same direction if the angle between them is0, and they point in opposite directions if the angle between them is π. Vectorsthat point in either the same or opposite directions are said to be parallel.

Proposition 1.19. Two non-zero n-vectors, v and w, point in the same di-rection if and only if

v ·w = ‖v‖‖w‖

Proof. Let θ be the angle between v and w. By Proposition 1.13:

v ·w = ‖v‖‖w‖ cos θ

The angle θ = 0 if and only if cos θ = 1 if and only if v ·w = ‖v‖‖w‖ cos θ �

***Problem 1.20. Show that two non-zero n-vectors, v and w, point in op-posite directions if and only if

v ·w = −‖v‖‖w‖

***Problem 1.21. Show that two non-zero n-vectors, v and w, are parallel ifand only if

(v ·w)2 = ‖v‖2‖w‖2

Another way to think about parallelism is to say that any two non-zero vectorsare parallel if one is a scalar multiple of the other. The scalar will be positive if thevectors point in the same direction and negative if they point in opposite directions.

Proposition 1.22. Two non-zero n-vectors, v and w, are parallel if and onlyif there exists a non-zero constant a such that av = w.

The following proof demonstrates that the angle between two vectors is 0 or πif and only if the vectors are scalar multiples of each other. Showing that scalarmultiples differ by an angle of 0 or π is easy, but going the other way is surprisinglycomplicated.

(Laughter.)MR. FRIEDMAN: I should have I probably should have said -JUSTICE SCALIA: I think we should use that in the opinion.(Laughter.)MR. FRIEDMAN: I thought I thought I had seen it before.

JUSTICE SCALIA: Or the dissent.(Laughter.)MR. FRIEDMAN: That is a bit of professorship creeping in, I suppose.


Proof. Suppose v = aw. By Problem 1.21 we must prove that (v · w)2 =(v ·w)(w ·w). Using our hypothesis:

(v ·w)2 = (aw ·w)2

= (a(w ·w))2

= a2(w ·w)2

= (aw · aw)(w ·w)

= (v · v)(w ·w)

Conversely, by Problem 1.21, if v and w are parallel, then (v ·w)2 = (v ·w)(w ·w).We have to prove that there is a scalar a such that v = aw. We begin with somepreliminary calculations:

‖(‖v‖w − ‖w‖v)‖2 = (‖v‖w − ‖w‖v) · (‖v‖w − ‖w‖v)

= ‖v‖2(w ·w)− 2‖v‖‖w‖(v ·w) + ‖w‖2(v · v)

= 2‖v‖2‖w‖2 − 2‖v‖‖w‖(v ·w)

‖(‖v‖w + ‖w‖v)‖2 = (‖v‖w + ‖w‖v) · (‖v‖w + ‖w‖v)

= ‖v‖2(w ·w) + 2‖v‖‖w‖(v ·w) + ‖v‖2(w ·w)

= 2‖v‖2‖w‖2 + 2‖v‖‖w‖(v ·w)

Now we show that the product of the two quantities just calculated is zero.

‖(‖v‖w − ‖w‖v)‖2‖(v‖w + ‖w‖v)‖2

=[2‖v‖2‖w‖2 − 2‖v‖‖w‖(v ·w)

] [(2‖v‖2‖w‖2 + 2‖v‖‖w‖(v ·w)

]= 4

[‖v‖4‖w‖4‖ − ‖v‖2‖w‖2(v ·w)2

]= 4v‖2‖w‖2

[v‖2‖w‖2 − (v ·w)2)

]= 0

Therefore either ‖(‖v‖w − ‖w‖v)‖ = 0 or ‖(‖v‖w + ‖w‖v)‖ = 0, so either ‖v‖w−

‖w‖v = 0 or ‖v‖w +‖w‖v = 0. That is v = ±(‖v‖‖w‖

)w, so v is a scalar multiple

of w. �

***Problem 1.23. Find x such that (1, x, 2) is parallel to (−1, 3,−2).

The next construction combines parallelism and orthogonality.

Proposition 1.24. Fix a non-zero vector w. Every vector v can be decomposedinto a sum v = w1 + w2, where w1 is parallel to w and w2 is orthogonal to w.

v = w1 + w2 with w1 parallel to w and w2 orthogonal to w.


The vector w1 is the projection of v onto w.

Proof. Letw1 =

v ·ww ·w

w

Then w1 is a scalar multiple of w and so is parallel to w. Let w2 = v −w1,the only possibility. It remains to show that w2 is orthogonal to w, or w2 ·w = 0:

w2 ·w = (v −w1) ·w

= (v − v ·ww ·w

w) ·w

= v ·w − v ·ww ·w

w ·w

= 0

�

***Problem 1.25. Write (3,0,-1) as the sum of a vector parallel to (1,1,1)and a vector orthogonal to (1,1,1). What is the projection of (3,0,-1) onto (1,1,1)?

In Euclidean geometry it is pictorially obvious but not particularly easy toprove that two sides of a triangle are together longer than the third side. The proofusing vectors is not difficult, and it starts with a technical result so important thatit has a name of its own.

Proposition 1.26. Cauchy-Schwarz inequality Let v,w be n-vectors. Then

|v ·w| ≤ ‖v‖‖w‖

Proof. Let θ be the angle between v and w. Recall | cos θ| ≤ 1. Thus byProposition 1.13:

|v ·w| = ‖v‖‖w‖| cos θ|≤ ‖v‖‖w‖

�

Theorem 1.27. Triangle Inequality Let v,w be n-vectors. Then

‖v + w‖ ≤ ‖v‖+ ‖w‖

Sum of v and w


Proof. It suffices to show

‖v + w‖2 ≤ (‖v‖+ ‖w‖)2

We calculate:

‖v + w‖2 = (v + w) · (v + w)= v · v + 2v ·w + w ·w≤ v · v + 2|v ·w|+ w ·w≤ ‖v‖2 + 2‖v‖‖w‖+ ‖w‖2

= (‖v‖+ ‖w‖)2

�

***Problem 1.28. Just to check that the triangle inequality really works indimensions higher than two, let v = (1, 3, 2,−1) and w = (3,−1, 2, 1). Show that‖v‖+ ‖w‖ ≥ ‖v + w‖.

***Problem 1.29. Two further results can be derived from the triangle in-equality. Let v,w be n-vectors. Prove:

(1) ‖v −w‖ ≤ ‖v‖ + ‖w‖ (Hint: apply the triangle inequality to the vectorsv and −w.)

(2) |‖v‖ − ‖w‖| ≤ ‖v−w‖ (Hint: apply the triangle inequality to the vectorsw and v −w.)

Before leaving the subject of vectors, let’s use the vector concepts just developedto do some geometry problems. Here is an example.

Problem: Let PQR be a triangle, and let A be the midpoint of PQ and Bthe midpoint of PR. Then AB is parallel to QR.

Solution: Construct vectors u = PQ, v = QR and w = RP , oriented as inthe diagram. Let x = AB, the vector from one midpoint to another.

We must show that v is parallel to x. Note that u + v + w = 0. Also12u + x + 1

2w = 0. Thus v = −u−w and x = − 12u− 1

2w. Therefore x = 12v, and

by Proposition 1.22 x is parallel to v.

***Problem 1.30. Let PQR be a triangle in R2. The barycenter isP +Q+R

3.

The expression makes sense because P , Q and R are points in a vector space so


they are vectors. Prove that a line from any vertex through the barycenter bisectsthe opposite side of the triangle. (Hint: place P at the origin, and let PQ be thevector u, and let PR = v. Show that the vector from the origin to the barycenter

is x =u + v

3, and that the vector from the origin to the midpoint of the opposite

side is y =u + v

2. Then prove x is parallel to y.)

***Problem 1.31. Let v, w, a and b be n-vectors. Suppose ‖v‖ = ‖a‖,‖w‖ = ‖b‖, and the angle from v to w is equal to the angle from a to b. Show that‖w − v‖ = ‖b− a‖. This is the side-angle-side theorem from geometry.

***Problem 1.32. Let PQRS be a parallelogram in R2. Show that the mid-point of PR is the same as the midpoint of QS. (Hint: Put P at the origin and letv = PQ and w = PS. Write all the other sides and crossing lines in terms of vand w.)

***Problem 1.33. (HARD) Let PQRS be a quadrilateral with equal lengthsides. Show that the diagonals PR and QS are orthogonal to each other. Thisresult is valid in all dimensions. The points do not have to be in the same plane.

CHAPTER 2

Matrices and Matrix Operations

2.1. Matrices Operations

In the first chapter a vector was defined as a list of numbers. We need to extendthat definition to a rectangular array of numbers. These arrays, and their arith-metic, turn out to be one of the cleverest symbolic, manipulative devices introducedinto mathematics in the last 200 years. In An Introduction to Mathematics (1911),Alfred North Whitehead wrote:

By relieving the brain of all unnecessary work, a good notationsets it free to concentrate on more advanced problems, and, ineffect, increases the mental power of the race.

Matrices are a wonderful example of “a good notation”.

2.1.1. Matrices and matrix multiplication.

Definition 2.1. A matrix is a rectangular array of numbers. If A is a matrix,then the element in row i and column j is denoted A(i, j). We say A is an r × cmatrix if it has r rows and c columns. Sometimes, for emphasis, we may denotean r × c matrix A as Ar×c.

Here is an example of a 3× 4 matrix:

A =

1 2 3 45 6 7 89 10 11 12

Then A(2, 3) = 7.

***Problem 2.2. Construct a 2× 4 matrix A with A(2, 3) = 5.

A matrix with one row or one column can be considered as a vector with extramatrix structure.

v =[1 2 3 4

]v =

1234

As vectors these two are the same, but as matrices they are different. A matrixwith one row is called a row vector; a matrix with one column is called a columnvector.

A matrix with the same number of rows and columns is called a square matrix.

***Problem 2.3. Construct a square matrix of size 3 with all 0 entries.

13

14 2. MATRICES AND MATRIX OPERATIONS

Matrix computations were among the first mathematical operations programmedon computers. Matlab, the model for Octave, was created by Cleve Moler in the1970’s as a language and software system to facilitate computing with matrices.Since then matrices have lost none of their importance to science and engineering.To define a matrix in Octave like the one above, enter:

A = [1,2,3,4;5,6,7,8;9,10,11,12]

To extract the element in row 2, column 1 of A, enter:A21 = A(2,1)

Notice that entries are separated by commas, and rows are separated by semi-colons. The matrix is entered by rows.

So now you have matrix. What can you do with it? First, you can do anythingyou can do with a vector.

(1) You can multiply a matrix by a scalar:

3[1 22 −1

]=[3 66 −3

]The general rule is (aA)(i, j) = a(A(i, j)).

(2) You can add two matrices of the same size:[1 22 −1

]+[3 −12 2

]=[4 14 1

]The general rule is (A+B)(i, j) = A(i, j) +B(i, j)

You can use Octave to multiply a scalar times a matrix or add two matrices.Suppose the matrix is called A. To compute 3A, enter:

B = 3*A

To add matrices A and B of the same size (which must already be entered intoOctave):

C = A+B

To see what happens if the matrices are not the same size, try:A = [1,2]B = [3;4]C = A+B

You might think you could multiply two matrices of the same size like you addthem, by multiplying their elements, and you could. But the result would not beinteresting or important. Instead there is another way of multiplying matrices thatgives them their computational power and utility, although this is far from obviousin the definition. One of the purposes of this course is to teach you how to usematrix multiplication to solve problems in geometry.

Before defining matrix multiplication, it will help to define some operators thatselect parts of a matrix. If A is an r × c matrix, we have already defined A(i, j),the element in row i, column j of the matrix. We also define:

A(i, :) = row i of A, a 1× c matrix

A(:, j) = column j of A, an r × 1 matrix

2.1. MATRICES OPERATIONS 15

For example:

A =

1 2 3 45 6 7 89 10 11 12

a 3× 4 matrix

A(2, :) =[5 6 7 8

]a 1× 4 matrix

A(:, 3) =

3711

a 3× 1 matrix

Octave will also extract rows and columns from a matrix using the same notation.Try this:

A = [1,2,3,4;5,6,7,8;9,10,11,12]row2 = A(2,:)col3 = A(:,3)

Let’s begin defining matrix multiplication with the simplest case. Multiplyinga row matrix by a column matrix (in that order) is the same as taking the innerproduct of the two vectors, except the result is a 1× 1 matrix instead of a scalar:

[1 3 2

] 21−1

=[1 · 2 + 3 · 1 + 2 · (−1)

]=[3]

You can only perform this operation if the two matrices have the same number ofentries. Considering the matrices as vectors, the product is the inner product ofthe two vectors. (Technical point: the matrix product is the 1×1 matrix

[3], while

the inner product is the scalar 3. Usually no distinction is made between 1 × 1matrices and scalars.)

Now suppose you have matrices A and B the number of columns of A equalsthe number of rows of B. The number of rows of A or columns of B do not matter.A has to be r × n and B has to be n× c. When the matrices fit together like this,we say that they can be multiplied. But how to multiply them? The result will bean r × c matrix C = AB:

AB =

A(1, 1) · · · A(1, n)...

. . ....

A(r, 1) · · · A(r, n)

B(1, 1) · · · B(1, c)

.... . .

...B(n, 1) · · · B(n, c)

=

C(1, 1) · · · C(1, c)...

. . ....

C(r, 1) · · · C(r, c)

= C

The question is: how do you define the entries C(i, j) of the product matrix C.Here’s how:

C(i, j) = A(i, :)B(:, j)C(1, 1) · · · C(1, c)...

. . ....

C(r, 1) · · · C(r, c)

=

A(1, :)B(:, 1) · · · A(1, :)B(:, c)...

. . ....

A(r, :)B(:, 1) · · · A(r, :)B(:, c)

All this is hard to follow in general, but an example will make it clear. Let:

A =[1 2 34 5 6

]B =

3 21 65 4


Let C = AB. Then C(1,2) is calculated as follows:

C(1, 2) = A(1, :)B(:, 2) =[1 2 3

] 264

= 26

Calculating the entire product, we have:

C = AB =[1 2 34 5 6

]3 21 65 4

=

[1 2 3

] 315

[1 2 3

] 246

[4 5 6

] 315

[4 5 6

] 246

=[20 2647 62

]

***Problem 2.4. Check the matrix multiplications:

(1)

[3 2 4 1

] 241−1

= 17

(2) [1 2 34 5 6

]−1 10 22 −1

=[5 26 4

](3) 1 2 3 4

5 6 7 89 10 11 12

0100

=

2610

The result is the second column of the first matrix.

(4) 1 0 00 1 00 0 1

a b cd e fg h k

=

a b cd e fg h k

The left factor is called an identity matrix.

***Problem 2.5. Suppose A and B are matrices that can be multiplied. Showthat the number of rows of AB is the number of rows of A, and the number ofcolumns of AB is the number of columns of B.

Octave will multiply matrices. If you have defined matrices A and B, you canmultiply them with an asterisk (*):

C = A*B

***Problem 2.6. Multiply by hand:[1 3 −12 1 3

]3 12 −21 1

, then check your

answer with Octave.


Proposition 2.7. Let A and B be matrices that can be multiplied. Then:

(AB)(:, j) = A(B(:, j))

(AB)(i, :) = A(i, :)B

That is, column j of AB is A times column j of B, and row i of AB is row i of Atimes B.

Proof. Suppose A is r × n and B is n× c. The definition of matrix multipli-cation can be expressed:

AB =

A(1, :)...

A(r, :)

[B(:, 1) · · · B(:, c)]

=

A(1, :)B(:, 1) · · · A(1, :)B(:, c)...

...A(r, :)B(:, 1) · · · A(r, :)B(:, c)

Now we can show (AB)(:, j) = A(B(:, j)):

(AB)(:, j) =

AB(1, j)...

AB(r, j)

=

A(1, :)B(:, j)...

A(r, :)B(:, j)

=

A(1, :)...

A(r, :)

B(:, j)

= A(B(:, j))

The proof of the second formula is almost the same. �

***Problem 2.8. Let A =[1 2 34 5 6

]and B =

−1 10 22 −1

.

Show that (AB)(:, 1) = A(B(:, 1)) and AB)(2, :) = A(2, :)B.

The most important fact about matrix multiplication is that, unlike ordinarynumber multiplication, matrix multiplication is not commutative. If A and B arematrices, then AB can be defined when BA is not defined. Suppose A is 2× 3 andB is 3× 4. Then AB is defined but BA is not even defined.

Even if AB and BA are both defined, they can be different.

***Problem 2.9. Let A =[1 23 4

]and B =

[3 −12 1

]. Calculate AB and BA.

Are they the same?


***Problem 2.10. Let v =[1 2 3 4

]and w =

1234

. Calculate vw, the

inner product of v and w. Also calculate wv. Your result should be a 4×4 matrix.

Matrix addition and multiplication satisfy many reasonable properties.

Proposition 2.11. Let A, B and C be matrices and a and b scalars.

(1) (a+ b)A = aA+ bA(2) If A and B are the same size, then a(A+B) = aA+ aB.(3) If A, B and C are the same size, then (A+B) + C = A+ (B + C).(4) If A and B can be multiplied, then (aA)B = a(AB) = A(aB)(5) If B and C are the same size, and if A and B can be multiplied (so A and

C can also be multiplied), then A(B + C) = AB +AC.(6) If A and B can be multiplied, and if B and C can be multiplied, then

(AB)C = A(BC).

***Problem 2.12. Use Octave to construct matrices with at least three rowsand columns, and show by example that all the results of the Proposition are true.

This proposition says that the results of certain matrix calculations are equal.The results would all be true if the matrices were single numbers, but given thecomplexity of matrix multiplication they need to be proven. The results are notobviously true. To prove them you need to check that the two sides of the equalitiesare the same size and that the elements in the two matrices are the same. We dothis in the following proofs.

Proof. (1) The scalar multiple of a matrix is the same size as the matrix,so all the terms aA, bA, (a+ b)A and aA+ bA are the same size. To seethat (a+ b)A = aA+ bA we check that corresponding elements in the twomatrices are equal:

((a+ b)A)(i, j) = (a+ b)(A(i, j))

= a(A(i, j)) + b(A(i, j))

= (aA)(i, j) + (bA)(i, j)

= (aA+ bA)(i, j)

(2) The matrices A, B, aA, aB, aA+aB and (a(A+B) are all the same size.To see that a(A+B) = aA+ aB, we check element by element:

(a(A+B))(i, j) = a((A+B)(i, j))

= a(A(i, j) +B(i, j))

= (aA(i, j)) + (a(B(i, j))

= (aA)(i, j) + (aB))(i, j)

= (aA+ aB)(i, j)

(3) Left for reader.


(4) If A is r×n and B is n× c then aA is r×n, aB is n× c, and AB, (aA)Band a(AB) are all r × c . Moreover:

((aA)B)(i, j) =n∑k=1

(aA)(i, k)B(k, j)

=n∑k=1

a(A(i, k))B(k, j)

= a

n∑k=1

(A(i, k))B(k, j)

= a(AB(i, j))

= (a(AB))(i, j)

The proof that a(AB) = A(aB) is similar.(5) Left for reader(6) If A is r×m and B is m×n and C is n× c, then AB is r×n and (AB)C

is r × c. Similarly BC is m × c and A(BC) is r × c . All the productsare properly defined, and the final products are the same size. To showequality:

((AB)C)(i, j) =n∑k=1

(AB)(i, k)C(k, j)

=n∑k=1

(m∑h=1

A(i, h)B(h, k))C(k, j)

=m∑h=1

n∑k=1

A(i, h)B(h, k)C(k, j)

=m∑h=1

A(i, h)(n∑k=1

B(h, k)C(k, j))

=m∑h=1

A(i, h)(BC)(h, j)

= (A(BC))(i, j)

�

***Problem 2.13. Suppose A, B and C are matrices.(1) Prove: if A, B and C are the same size, then (A+B)+C = A+(B+C).(2) Prove: if B and C are the same size, and if A and B can be multiplied

(so A and C can also be multiplied), then A(B + C) = AB +AC.

2.1.2. Linear combinations.

***Problem 2.14. Show that[1 3 −2

] 1 4 52 −1 03 −2 1

=[1 4 5

]+ 3

[2 −1 0

]− 2

[3 −2 1

]


Proposition 2.15. Let A be an r× c matrix, and let w be a row vector of sizec. Then w is a linear combination of the rows of A if and only if there exists a rowvector v of size r such that vA = w.

Proof. The row vector bw is a linear combination of the rows of A if and onlyif there exist scalars ai such that:

w = a1A(1, :) + · · ·+ arA(r, :) =[a1 · · · ar

]A

Take v =[a1 · · · ar

]. �

***Problem 2.16. Let A be an r × c matrix, and let w be a column vectorof size r. Then w is a linear combination of the columns of A if and only if thereexists a column vector v of size c such that Av = w.

2.1.3. Linear functions. This section will explain the reason for the strangedefinition of matrix multiplication. First of all, a matrix can be thought of as a

function. Let A =

3 12 −41 0

. A is a 3× 2 matrix, and if v is a column vector from

R2, then Av is a column vector in R3. For example:3 12 −41 0

[ 2−1

]=

5122

If you define a function:

f(v) = Avthen the domain of f is R2 and the range (codomain) of f is R3. You can diagramthis situation as follows:

R2 f→ R3

Note that f has two key properties. If v and w are vectors in R2, and if a is ascalar, then:

f(v + w) = A(v + w) = Av +Aw = f(v) + f(w)

f(av) = A(av) = a(Av) = af(v)

Everything in the preceding paragraphs would would just as well for any r× cmatrix A. We could define a function f(v) = Av with domain Rc and range Rr.Moreover, any such function would satisfy the following definition.

Definition 2.17. A function f : Rc → Rr is linear if for any vectors v andw in Rc and any scalar a:

f(v + w) = f(v) + f(w)

f(av) = af(v)

You have seen that multiplication by a matrix is a linear function. The converseis also true. If a function is linear, then it is multiplication by a matrix. Furthermoreyou can construct the matrix from the function.

Proposition 2.18. Let f : Rc → Rr be a linear function. Then you canconstruct a matrix A such that f(v) = Av for all vectors v in Rc. Let e1, . . . , ecbe the elementary basis vectors in Rc. Then A will be the r× c matrix with columni equal to f(ei).


Proof. Note that A has c columns, and each column f(ei) is a vector in Rr.Thus A is a r × c matrix. Recall that Aei = A(:, i), column i of A. Thereforef(ei) = column i of A = Aei. We must show that f(v) = Av for all vectors v inRc.

Since v is a vector in Rc, we can write v as a linear combination of the elemen-tary basis vectors ei:

v = a1e1 + · · ·+ acecThen

Av = A(a1e1 + · · ·+ acec)= a1Ae1 + · · ·+ acAec= a1f(e1) + · · ·+ acf(ec)

= f(a1e1 + · · ·+ acec) because f is linear

= f(v)

�

***Problem 2.19. Suppose f(a, b) = (3a− 2b, 2a+ b).(1) Show that f is a linear function.

(a) Show that f((a1, b1) + (a2, b2)) = f(a1, b1) + f(a2, b2)(b) Show that f(c(a, b)) = cf(a, b)

(2) Find the matrix A representing f .

(3) Show that A[ab

]= f(a, b)

Now we get to the explanation of matrix multiplication. Let A and B bematrices of sizes r × n and n× c. Then AB is a r × c matrix.

• If f(w) = Aw then f is a linear function with domain Rn and range Rr:

Rn f→ Rr

• If g(v) = Bv then g is a linear function with domain Rc and range Rn:

Rc g→ Rn

• If h(v) = (AB)v then h is a linear function with domain Rc and rangeRr:

Rc h→ Rr

Since matrix multiplication is associative,

h(v) = (AB)v = A(Bv) = f(g(v))

Therefore h = f ◦ g. The composition of the linear functions defined by A and Bis the linear function defined by AB. Matrix multiplication corresponds to compo-sition of linear functions.

***Problem 2.20. Continuing Problem 2.19, show that f(f(a, b)) = A2

[ab

].

2.1.4. Special matrices.2.1.4.1. Zero matrix. If all the entries in a matrix are 0, then the matrix is

called a zero matrix and denoted 0. For any matrix A and a zero matrix 0 of thesame size, A+0 = A and A−A = 0. If 0 is a zero matrix, and if 0 can multiply A,then 0A is a zero matrix. Similarly if B can multiply 0 then B0 is a zero matrix.


2.1.4.2. Identity matrix. The diagonal of a matrix A is the elements A(i, i).In the matrix below, the diagonal is highlighted:

1 2 −1 03 2 1 −12 1 0 −1−4 2 1 3

The identity matrix has 1’s on the diagonal and 0’s everywhere else:

I =

1 0 0 · · · 00 1 0 · · · 00 0 1 · · · 0...

......

. . ....

0 0 0 · · · 1

If you need to emphasize the size of the identity matrix, you can write In for then× n identity matrix. The identity matrix satisfies:

I(i, j) ={

1 if i = j0 if i 6= j

Proposition 2.21. If A is any matrix such that I and A can be multiplied,then IA = A. Similarly, if A and I can be multiplied, then AI = A.

Proof. We will show that IA = A by showing that (IA)(i, j) = A(i, j) forall i, j. If the elements in two matrices are all the same, then the matrices are thesame. Let I be n× n and A be n× c. Then

(IA)(ij) =n∑k=1

I(i, k)A(k, j)

= I(i, 1)A(1, j) + · · ·+ I(i, i)A(i, j) + · · ·+ I(i, n)A(n, j)

= 0 ·A(1, i) + · · ·+ 1 ·A(i, j) + · · ·+ 0 ·A(n, j)

= A(i, j)

A similar proof shows that AI = A. �

Corollary 2.22. Consider the matrix:

aI =

a 0 · · · 00 a · · · 0...

.... . .

...0 0 · · · a

For any matrix B of compatible size, (aI)B = a(IB) = aB and B(aI) = a(BI) =aB.

***Problem 2.23. Write down I4. If A =

1 0 −1 0 12 1 3 −1 03 −1 0 1 24 0 2 1 −1

, check

that I4A = A.


To enter a 5 × 5 identity matrix in Octave, use the formula eye(5). Isn’t that aclever pun?

***Problem 2.24. Use Octave to solve Problem 2.23.

2.1.4.3. Diagonal matrix. A diagonal matrix is similar to an identity matrixin that all off-diagonal elements are 0, but the elements on the diagonal can bedifferent, for example:

2 0 0 00 3 0 00 0 −1 00 0 0 2

The easy way to enter a diagonal matrix like the one above into Octave is to enterdiag([2,3,-1,2]). Notice that the argument of the command diag is not justa list of numbers; it is a row vector. The argument to diag must be a row orcolumn vector.

***Problem 2.25. Using Octave, create a 4 × 4 diagonal matrix D and any4 × 4 matrix A. How do the rows and columns of DA compare to the rows andcolumns of A? How do the rows and columns of AD compare to the rows andcolumns of A. Formulate a theorem about how multiplication by a diagonal matrixchanges a matrix. EXTRA CREDIT: prove your theorem.

2.1.4.4. Triangular matrices. A matrix is upper triangular if all of its non-zeroentries are on the diagonal or below. That is, L is upper-triangular if i > j impliesT (i, j) = 0. Consider the example:

T =

2 3 1 40 1 −1 20 0 3 10 0 0 2

Notice that T(3,2)=T(4,1)=0. Whenever i > j we have T (i, j) = 0.

Upper triangular matrices do not have to be square. The following two exam-ples are both upper triangular:1 2 3 4

0 5 6 70 0 8 9

1 2 30 4 50 0 60 0 0

Proposition 2.26. The product of two upper triangular matrices is upper tri-

angular.

Proof. Let U1 and U2 be upper triangular; U1 is r × n and U2 is n × c. Wemust show (U1U2)(i, j) = 0 if i > j. Suppose i > j. For any number k either i > kor k > j. The converse, i ≤ k and k ≤ j is impossible, because if these conditionsheld then we would have i ≤ j,and we are assuming i > j. Since:

U1U2(i, j) =n∑k=1

U1(i, k)U2(k, j)


and since at least one of the two factors in every summand is 0 (either i > k andU1(i, k) = 0 or k > j and U2(k, j) = 0), every summand is 0 and the sum U1U2(i, j)is zero. �

***Problem 2.27. Define “lower triangular” for matrices and demonstratewith an example that the product of lower triangular matrices is lower triangular.EXTRA CREDIT: prove that the product of two lower triangular matrices is lowertriangular.

2.1.5. Transposition and symmetric matrices. If A is a matrix, thenflipping A over its diagonal gives you the transpose of A. The rows of A becomecolumns of AT , and the columns of A become the rows of AT . Here are someexamples: [

1 23 4

]T=[1 32 4

] [1 2 34 5 6

]T=

1 42 53 6

And here is the formal definition:

Definition 2.28. Let A be an r × c matrix. Then AT is a c × r matrix withAT (i, j) = A(j, i).

***Problem 2.29. Find

1 32 23 1

T

Proposition 2.30. (1) Let A be a matrix and a a scalar. Then

(aA)T = aAT

(2) Let A and B be matrices of the same size. Then

(A+B)T = AT +BT

(3) Let A and B be matrices that can be multiplied. Then

(AB)T = BTAT

.

Proof. To show that two matrices are equal, we will show that they are thesame in their i, j position. We leave it to the reader to check that they are the samesize.

(1) (aA)T (i, j) = (aA)(j, i) = a(A(j, i)) = a(AT (i, j)) = (aAT )(i, j)(2) (A+ B)T (i, j) = (A+ B)(j, i) = A(j, i) + B(j, i) = AT (i, j) + BT (i, j) =

(AT +BT )(i, j)(3) (AB)T (i, j) = (AB)(j, i) = A(j, :)B(:, i) = BT (i, :)AT (:, j) = (BTAT )(i, j)

�

***Problem 2.31. Suppose A is r × n and B is n× c. Find the sizes of AT ,BT , AB and (AB)T . Show that BT and AT can be multiplied, and find the size ofBTAT . Show that (AB)T and BTAT have the same size.

The transpose of a matrix A in Octave is denoted A’.


***Problem 2.32. Let A =

1 2 12 3 23 1 0

and B =

−1 0 23 1 01 2 −1

. Use Octave

to show:

AT +BT = (A+B)T

BTAT = (AB)T

Definition 2.33. A matrix A is symmetric if A = AT .

***Problem 2.34. Give an example of a 2× 2 symmetric matrix.

***Problem 2.35. Show that a symmetric matrix must be square.

The sum and scalar product of symmetric matrices must be symmetric, butthe product need not be.

Proposition 2.36. Let A and B be symmetric matrices of the same size. ThenA+B is symmetric.

Proof. Using the fact that AT = A and BT = B, Proposition 2.30 implies:

(A+B)T = AT +BT = A+B

Therefore A+B is symmetric. �

***Problem 2.37. (1) Prove if A is a symmetric matrix and a is a scalarthen aA is symmetric.

(2) Find two 2 × 2 symmetric matrices A and B such that AB is not sym-metric.

(3) If A and B are symmetric matrices of the same size, prove that (AB)T =BA.

***Problem 2.38. A matrix B is anti-symmetric if BT = −B. Let A bean n× n matrix. Prove:

(1) A+AT is symmetric.(2) A−AT is anti-symmetric.(3) A = 1

2 (A+AT ) + 12 (A−AT ), so every matrix is the sum of a symmetric

and an anti-symmetric matrix.

If you cannot find a general proof, show the statements are true with a 3×3 example.

***Problem 2.39. Let v =

xyz

. Show that vTv = ‖v‖2, and show that vvT

is a symmetric 3× 3 matrix.

Next we will use a vector v to construct a matrix P with three useful properties:P = PT , Pv = 0 and P 2 = P . Suppose v is a column vector of size n. By theprevious problem, vTv is a scalar and vvT is an n× n matrix. Let

P = In −1

vTv

(vvT

)


Then PT = P , Pv = 0 and P 2 = P . The previous problem shows that P is sym-metric, and the proofs of the last two properties are straightforward calculations:

(2.1.1)

Pv =(In − 1

vT v

(vvT

))v

= Inv − 1vT v

(vvT

)v

= v − 1vT v

v(vTv

)= v − vT v

vT vv because vTv is a scalar

= 0P 2 =

(In − 1

vT v

(vvT

)) (In − 1

vT v

(vvT

))= I2

n − 2vT v

(vvT

)+(

1vT v

)2 (vvT

) (vvT

)= I2

n − 2vT v

(vvT

)+(

1vT v

)2v(vTv

)vT

= In − 2vT v

(vvT

)+ 1

vT v

(vvT

)= In − 1

vT v

(vvT

)= P

***Problem 2.40. Let v be a column matrix of size n, and let R = In −2

vTv

(vvT

).

(1) If v =

31−1

, use Octave to calculate R and show that R is symmetric,

Rv = −v and R2 = I. The identity matrix has many square roots.(2) EXTRA CREDIT: prove in general that R is symmetric, Rv = −v and

R2 = I.

2.2. Projections, Reflections and Rotations

2.2.1. Matrices as maps. This section will study the geometric effect ofmultiplying vectors in R3 by a 3× 3 matrix. If A is a matrix and v is a vector, letw = Av. We say that A maps v to w


1 3 24 1 −12 0 1

.

(1) If v =

21−1

, to what vector does A map v?

(2) What vector is mapped to w =

2−11

.

(3) Show that the vectors v1 =

121

, v2 =

3−25

and v3 =

203

lie on a

straight line perhaps not through the origin (show that v2 − v1 is parallelto v3 − v2), and show that they are mapped to vectors on a straight line.

(4) Show that v1, v2 and v3 lie on a plane through the origin. (The vectorslie on a line in space, and any line in space lies on a plane through theorigin.) Find the equation of the plane. Show that the vectors are mappedto vectors lying on a plane through the origin. Find the equation of theplane containing the mapped vectors.

2.2. PROJECTIONS, REFLECTIONS AND ROTATIONS 27

The previous problem was supposed to convince you that matrix multiplicationmaps linear structures to linear structures. You can (and should) think of matrixmultiplication as a function. If A is a 3 × 3 matrix, then f(v) = Av is a functionwith domain R3 and range R3.

2.2.2. Projections. Let P be a plane through the origin in R3, and let f(x)be the function that maps x orthogonally to the plane. You can find a matrix thatdoes this job, and there is even a simple formula for the matrix. You have alreadyseen the formula in (2.1.1).

Proposition 2.42. Let w be a non-zero column vector in R3, and let V be theplane through the origin orthogonal to w. Let:

P = I − 1wTw

wwT

Then for any vector x in R3:(1) Px is in V .(2) x− Px is orthogonal to V .

Proof. We have already seen in (2.1.1) that PT = P , Pw = 0 and P 2 = P .We need to give a geometric interpretation to these algebraic facts.

A column vector v is in V if and only if wTv = 0. To see that Px is in V forany vector x, calculate:

wTPx = wTPTv = (Pw)T v = 0v = 0

A vector is orthogonal to V if and only if it is parallel to w. We will show thatx− Px is parallel to w for any vector x:

x− Px = x−(I − 1

wTwwwT

)x = x− x +

1wTw

wwTx =wTxwTw

w

so x− Px is a scalar multiple of w. �

***Problem 2.43. Consider the plane V in R3 defined by the equation 3x −2y − z = 0.

(1) Find a vector w orthogonal to V .(2) Find the matrix P which is the orthogonal projection onto V .(3) Check that P = PT , Pw = 0 and P 2 = P .(4) Let x be a random vector in R3 (just pick a vector at random). Show that

Px lies in V and Px− x is parallel to w.

2.2.3. Reflections. A reflection R through a plane in R3 maps every point inspace to its mirror image on the other side of the plane. Let P be the orthogonalprojection onto the plane constructed in the previous section. If w is a vectororthogonal to the plane, then the reflection R can be constructed as follows:

Rx = x + 2(Px− x) = 2Px− x = (2P − I)x

R = 2P − I = 2(I − 1wTw

wwT )− I = I − 2wTw

wwT

The matrix R is called a Householder matrix, named after the American math-ematician Alston Householder (1904-1993) who discovered it in 1958 (informationfrom Wikipedia). Householder was one of the pioneers in numerical analysis andcomputer implementations of linear algebra.


***Problem 2.44. Consider the plane V in R3 defined by the equation 3x −2y − z = 0.

(1) Find a vector w orthogonal to V .(2) Find the matrix R which is the reflection through V .(3) Check that R = RT , Rw = −w and R2 = I.(4) Explain geometrically why a reflection should satisfy Rw = −w and R2 =

I.(5) Let x be a random vector in R3. Show that x +Rx lies in V .

2.2.4. Rotations.2.2.4.1. Rotations in R2. Suppose you want to rotate all the points in the plane

R2 counter-clockwise by the angle θ (t in the picture). The vector[10

]gets mapped

to[cos θsin θ

]and the vector

[01

]gets mapped to

[− sin θcos θ

].

The matrix that effects the rotation is:

R(θ) =[cos θ − sin θsin θ cos θ

]You can see that R has the right action on the elementary basis vectors, so it hasthe right action on all vectors.

***Problem 2.45. (1) Let R = R(0.75) be a rotation matrix, and let vbe a random vector in R2. Show that the angle between v and Rv is 0.75and that ‖Rv‖ = ‖v‖. Sketch v and Rv.

(2) Let R = R(4.2) be a rotation matrix. Use R to rotate the points A =[10

],

B =[21

]and C =

[12

]. Show that the triangle ABC is congruent to the

triangle (RA)(RB)(RC). Sketch the two triangles.

2.2.4.2. Cross product. Next we define the cross product of two vectors in R3.This operation is defined only in R3 and in no other dimension.


Definition 2.46. If v and w are vectors in R3, then their cross product is:

v ×w =

v(2)w(3)− v(3)w(2)v(3)w(1)− v(1)w(3)v(1)w(2)− v(2)w(1)


312

and w =

2−11

. Find u = v ×w.

The next problem interprets the cross product as matrix multiplication. It willbe very useful in proofs.

***Problem 2.48. Let v and w be a vectors in R3. Define:

v× =

0 −v(3) v(2)v(3) 0 −v(1)−v(2) v(1) 0

Numerical part. Let v =

121

and w =

312

.

(1) Find v×.(2) Show that v×w = v ×w.

Theoretical part: show

(1) v× is anti-symmetric ((v×)T = −v×).(2) v×w + w×v = 0(3) v×v = 0.(4) v×w× = wvT −

(vTw

)I.

(5) (w×v)× = w×v× − v×w×

(6) For any column vector w, v×w = v ×w.

(Hint: you can simplify the theoretical calculations if you let v =

abc

and w =def

. Then v× =

0 −c bc 0 −a−b a 0

, etc. You can simplify the calculations even

more if you do them on a symbolic calculator like Mathematica or Maple or theopen-source packages Maxima or Sage (best for Linux).

Corollary 2.49. If v is a column vector in R3 and vTv = 1 then (v×)3 =−v×.

Proof. (v×)3 = v×

(vvT −

(vTv

)I)

=(v×v

)vT − v×(1)I

= −v×

�

The next Proposition gives the important properties of the cross product.


Proposition 2.50. For vectors u, v and w in R3:(1) For any scalar a, (av)×w = a(v ×w) = v × (aw).(2) u× (v + w) = u× v + u×w(3) w × v = −(v ×w)(4) bv ×w is orthogonal to both v and w.(5) u× (v ×w) = (uTw)v − (uTv)w (Lagrange’s formula)(6) u× (v ×w) + v × (w × u) + w × (u× v) = 0(7) ‖v ×w‖2 = ‖v‖2‖w‖2 − (v ·w)2

(8) v ×w = 0 if and only if v is parallel to w. In particular, v × v = 0.(9) If θ is the positive angle from v to w, then

‖v ×w‖ = ‖v‖‖w‖ sin(θ)

Proof. These proofs use the results of Problem 2.48.(1)

a(v ×w) = a(v×w) = (av×)w = (av)×w = (av)×w

= v×(aw) = v × (aw)

(2)

u× (v + w) = u×(v + w) = u×v + bu×w = u× v + u×w

(3)w × v = w×v = −v×w = −(v ×w)

(4) We will show that vT (v ×w) = wT (v ×w) = 0.

vT (v ×w) = vT(v×w

)=(vTv×

)w

=((

v×)T

v)T

w

=(−v×v

)Tw

= 0v

= 0

Then wT (v ×w) = −wT (w × v) = 0.(5)

u× (v ×w) = u×v×w

= vuTw − (uTv)w

= (uTw)v − (uTv)w

(6)

u× (v ×w) + v × (w × u) + w × (u× v)

= u× (v ×w)− v × (u×w)− (u× v)×w

=(u×v× − v×u× −

(u×v

)×)w

= 0


(7)

‖v ×w‖2 =(v×w

)Tv×w

= wT(v×)T

v×w

= −wT(v×)2

w

= −wT(vvT −

(vTv

)I)w

= −(vTw

)2+(vTv

) (wTw

)= ‖v‖2‖w‖2 −

(vTw

)2= ‖v‖2‖w‖2 − (v ·w)2

(8) v × w = 0 if and only if ‖v × w‖2 = 0 if and only if v is parallel to w(previous result and Problem 1.21).

(9)

‖v ×w‖2 = ‖v‖2‖w‖2 − (v ·w)2

= ‖v‖2‖w‖2(

1− (cos θ)2)

= ‖v‖2‖w‖2 (sin θ)2

�

***Problem 2.51. Continue Problem 2.47. Show that u is orthogonal to bothv and w. Find the angle θ between v and w. Show that ‖u‖ = ‖v‖‖w‖ sin(θ).

2.2.4.3. Rotations in R3. Throughout this section, you will work with a fixedvector v of length one in R3: ‖v‖ = 1 or ‖v‖2 = (v · v)2 = vTv = 1. Given anynon-zero vector w, you can always find another vector v of length one pointing in

the same direction as w: v =1‖w‖

w.

You will also fix a number θ, which you should think of as an angle.The goal of this section is to find a matrix R that rotates R3 through the angle

θ about the line through v. If you look along the line in the direction v, the rotationwill be counter-clockwise.

Pick a vector w in R3. What is Rw? Start with some constructions:

w2 = v ×w = v×w

w1 = −v ×w2 = −(v×)2

w

w3 = w −w1 =(I +

(v×)2)

w

By Proposition 2.50, v, w1 and w2 are mutually orthogonal. Each vector is or-thogonal to the other two. The vectors form a right-handed system; the vectorsw1, w2 and v are arrayed in space the same way as the positive x- y- and z-axes.If you moved w1 to point along the positive x-axis and moved w2 to point alongthe positive y-axis, then v will point along the positive z-axis. Moreover,

‖w1‖ = ‖v‖‖w2‖ sinπ

2= ‖w2‖


The last preliminary fact needed is that w3 is parallel to v. It suffices to showthat v ×w3 = 0:

v ×w3 = v×(I +

(v×)2)

w

=(v× +

(v×)3)

w

= 0 by Corollary 2.49

Now the two-dimensional rotation formula will complete the three-dimensionalrotation formula:

w = w1 + 0w2 + w3

Rw = (cos θ)w1 + (sin θ)w2 + w3

= (cos θ)(−(v×)2

w)

+ (sin θ)v×w +(I +

(v×)2)

w

=(I + (sin θ)v× + (1− cos θ)

(v×)2)

w

Thus the matrix that rotates space R3 by an angle θ about the line through avector v of length one is:

R = I + (sin θ)v× + (1− cos θ)(v×)2

***Problem 2.52. Find the matrix R that rotates space R3 through an angle

ofπ

4radians counter-clockwise about the vector

111

. Remember to replace w by a

vector of length 1. Let n be the number of times R must be repeated to rotate space2π radians. That is Rn should be the identity. What is n. Check that Rn = I.

***Problem 2.53. Suppose v points along the z-axis: v =

001

. Find the

rotation matrix R that rotates space 90◦ counter-clockwise about the z-axis. Multiplya couple of vectors by R to check that the rotation works as expected.

CHAPTER 3

Systems of Linear Equations

3.1. Two Key Problems

Much of the algebraic side of linear algebra is devoted to exploring two problemsrelated to systems of linear equations. The next two chapters will be devotedto outlining their solution. This chapter will deal with the algebraic side of theproblems; geometry returns in the next chapter. Solutions to the problems will beposed in this chapter but not completely proved until the next chapter. It turnsout that both algebraic and geometric reasoning is required to fully understandsystems of linear equations. This chapter will introduce the algebraic ideas; thenext chapter will finish with the geometric part.

3.1.1. Key problem I: solving homogeneous systems of equations. Asystem of linear equations with constant terms all equal to zero is called a homo-geneous system of linear equations. Here is an example:

(3.1.1)2u − v + 3x − 2y + 4z = 0u + 2v − 2x + y − z = 03u − 2v + x − y + 3z = 0

Homogeneous systems always have a zero solution: u = v = x = y = z = 0. Aharder question is: are there non-zero solutions, and if so what are they?

Solutions (u, v, x, y, z) to a linear system of equations can be thought of asvectors. The first problem of linear algebra is to find an algorithm that

(1) determines if non-zero solutions exist; and if they do(2) finds all the solutions.

***Problem 3.1. Show that v1 = (1, 0, 4, 7, 0) and v2 = (−5, 0,−6, 0, 7) aresolutions to the homogeneous system above. Show that 3v1− 2v2 is also a solution.

***Problem 3.2. Find a non-zero solution to the homogeneous linear system:

(3.1.2)x + y + z = 02x − y − z = 0

***Problem 3.3. Show that the homogeneous linear system:

(3.1.3)3x + 2y − z = 0

y + 3z = 02z = 0

has only the zero solution.

***Problem 3.4. Show that the homogeneous linear system:

(3.1.4)3x + 2y = 02x − y = 0

33

34 3. SYSTEMS OF LINEAR EQUATIONS

has only the zero solution.

A homogeneous system of linear equations can be written as a single matrixequation. For the system (3.1.1), let

A =

2 −1 3 −2 41 2 −2 1 −13 −2 1 −1 3

If v =

uvxyz

then the system (3.1.1) is the same as Av = 0:

2 −1 3 −2 41 2 −2 1 −13 −2 1 −1 3

uvxyz

=

000

The matrix

2 −1 3 −2 41 2 −2 1 −13 −2 1 −1 3

is called the coefficient matrix for the

system.

***Problem 3.5. Express the system

(3.1.5)x + y + z = 02x − y − z = 0

as a matrix equation. What is the coefficient matrix?

3.1.2. Key problem II: finding linear combinations. Suppose you aregiven a collection of vectors v1, . . . ,vt and another vector w. For example youmight have:

v1 = (1, 3, 2, 4) v2 = (3,−1, 1,−1) v3 = (2, 4,−1,−1) w = (−9, 5, 2, 12)

Can you write w as a linear combination of the vi? Can you find a, b, c such that

(3.1.6) av1 + bv2 + cv3 = w

***Problem 3.6. Show that a = 2, b = −3 and c = −1 solve the system.

This system can be written as a matrix equation in two ways. The vi can bethe rows or the columns of the coefficient matrix.

[a b c

] 1 3 2 43 −1 1 −12 4 −1 −1

=[−9 5 2 12

](3.1.7)

1 3 23 −1 42 1 −14 −1 −1

abc

=

−95212

(3.1.8)

3.2. MATRICES WITH PIVOTS 35

***Problem 3.7. Show that the linear combination (3.1.6) and the two matrixequations (3.1.7) and (3.1.8) are all equivalent to the non-homogeneous system oflinear equations:

(3.1.9)

a + 3b + 2c = −93a − b + 4c = 52a + b − c = 24a − b − c = 12

***Problem 3.8. If v1 = (1, 1, 1) and v2 = (1,−1, 1), show that w = (1, 0, 0)is not a linear combination of v1 and v2. First state the problem as a matrixequation, then as a non-homogeneous system of linear equations, and show that thesystem has no solution.

The two key problems we have introduced are these: (1) find all solutions toa homogeneous linear system of equations; and (2) express a vector as a linearcombination of other vectors. By the end of this course you will have extremelypowerful theoretical and computational tools for answering questions about systemsof linear equations and linear combinations. Restating the questions as problemsabout matrices is the first step toward a solution.

3.2. Matrices with Pivots

Systems of linear equations, homogeneous or non-homogeneous, are hard tosolve. A lot of computation is required to find the solution. However there areespecially simple systems that can be solved immediately, for example:

(3.2.1)a + 2b − d = 0− b + c + 2d = 0

You can solve immediately for a and c:

(3.2.2)a = −2b + dc = b − 2d

Solutions to these equations are given by assigning any values whatsoever to b andd, and then giving a and c the values required by the equations. The variables b andd are called free variables, because their values can be chosen freely. The variablesa and c are bound variables, because their values are bound or determined bythe values given to the free variables.

These equations are special because some of the variables appear with coeffi-cient 1 in just one equation. There are two equations, and the two variables a andc appear with coefficient 1 in just one equation. That makes it easy to solve for aand c.

The system of equations can be rewritten as a matrix equation:

[1 2 0 −10 −1 1 2

]abcd

=[00

]


If we set:

A =[1 2 0 −10 −1 1 2

]

x =

abcd

the equations become:

Ax = 0The coefficient matrix A has the property that the elementary basis vectors

e1 =[10

]and e2 =

[01

]from R2 appear in the columns of A corresponding to the

bound variables. These are the pivot columns. In the example the pivot columnsare columns 1 and 3 corresponding to variables a and c.

Definition 3.9. A matrix A with d rows has pivots in columns p1, . . . , pd ifall the elementary basis vectors e1, . . . , ed from Rd appear in columns p1, . . . , pd.We say that A is a matrix with pivots.

Here is an example of a matrix with pivots. The pivots are boxed. 2 0 −1 1 0 3−1 1 2 0 0 −22 0 3 0 1 2

The pivot columns are columns 4, 2 and 5.

Remark 3.10. Experts will note the resemblance of matrices with pivots torow-reduced echelon matrices, but the leading zeros are not required. By omittingthe leading zeros, we will be able to create equations for a subspace from spanningsets and vice versa, at the cost of giving up uniqueness for the spanning set of asubspace. Later we need uniqueness, so we add the requirement for leading zeroswhen necessary.

***Problem 3.11. Which of the following matrices have pivots? If the matrixdoes not have pivots, say which elementary basis vectors are missing.1 2 0 1

0 1 0 20 0 1 3

1 0 0 10 1 0 20 0 1 3

0 1 0 00 0 2 00 0 0 1

***Problem 3.12. (1) Show that the identity matrix is a matrix with

pivots.

(2) Show that the reversed identity matrix

0 0 10 1 01 0 0

is a matrix with pivots.

(3) Show that a matrix with pivots cannot have more rows than columns. Youcould start by showing that a 4× 3 matrix cannot be a matrix with pivots.

(4) Suppose A is a matrix with pivots. Show, either by proof or by example,that you can exchange rows of A and the resulting matrix still has pivots.Show that the pivot columns have the same location but may appear in adifferent order.

3.3. SOLVING HOMOGENEOUS SYSTEMS 37

3.3. Solving Homogeneous Systems

Suppose you have a a system of homogeneous linear equations, and supposefurther that the coefficient matrix has pivots. How convenient! In this section youwill learn a procedure for finding all solutions to the system.

Let’s start with a homogeneous system of linear equations:

(3.3.1)2u + v + 3y = 0−u + 2y + z = 03u + x + 2y = 0

The coefficient matrix is:

A =

2 1 0 3 0−1 0 0 2 13 0 1 2 0

A has 3 rows and 5 columns, and the pivots are in columns 2, 5 and 3. A solution

to the system is a column vector x =

uvxyz

such that Ax = 0.

You can solve the example system just by looking at it. The bound variablesare v, z and x; the free variables are u and y; and the solutions are:

v = −2u− 3yz = u− 2yx = −3u− 2y

Another way to write the solutions is:

(u, v, x, y, z) = (u,−2u− 3y,−3u− 2y, y, u− 2y)

The free variables appear in the solution, and the bound variables have been re-placed by the combinations of free variable that they equal.

If you give the value 1 to one of the free variables and 0 to all the other freevariables, you get a basic solution to the system of equations. Here are all the basicsolutions:

u = 1 y = 0 v1 = (u, v, x, y, z) = (1,−2,−3, 0, 1)u = 0 y = 1 v2 = (u, v, x, y, z) = (0,−3,−2, 1,−2)

These solutions are called basic because every solution is a linear combination ofthem:

uv1+yv2 = u(1,−2,−3, 0, 1)+y(0,−3,−2, 1,−2) = (u,−2u−3y,−3u−2y, y, u−2y)

Combining the basic solutions gives you back the general solution.If you combine the basic solutions into a matrix:

B =[1 −2 3 0 10 −3 −2 1 −2

]You get a matrix with pivots. Moreover

(1) A and B have the same number of columns


(2) The number of rows of A plus the number of rows of B is the number ofcolumns of A or B.

(3) A and B are matrices with pivots, and every column is a pivot column foreither A or B but not both.

(4) ABT = 0. The rows of A are orthogonal to the rows of B.Any two matrices with pivots A and B with these four properties are comple-

mentary matrices. The rows of B will be the basic solutions to the homogeneoussystem of linear equations Ax = 0, and every solution to the system will be a linearcombination of the basic solutions. (One slight complication: if you think of the

solutions as column vectors x =

uvxyz

, then the solutions are linear combinations of

the columns of BT .)


1 2 0 3 0 40 5 1 6 0 70 8 0 9 1 10

. Let B be the matrix com-

plementary to A.(1) Find the size of B(2) Construct B(3) Check that B is a matrix with pivots and that ABT = 0.

The discussion above shows how to create the complementary matrix B fromA; the following proposition proves that the construction works.

Proposition 3.14. Let A be a d×n matrix with pivots, and suppose d < n. Letp1, . . . , pd be the pivot columns of A listed in the correct order so that A(i, pj) = δij.Let q1, . . . , qn−d be the remaining columns of A, listed in any order. Define the(n− d)× n matrix B by:

B(j, pi) = −A(i, qj) and B(j, qi) = δij

Then B has pivots in columns q1, . . . , qn−d, and ABT = 0. B is the matrix comple-mentary to A, and every solution to Ax = 0 is a linear combination of the columnsof BT .

Proof. The columns q1, . . . , qn−d of B are pivot columns because B(j, qi) =δij , so B satisfies conditions 1, 2 and 3 above. For condition 4, we will show thateach entry in ABT is zero:(

ABT)

(i, j) = A(i, :)BT (:, j)

= A(i, p1)BT (p1, j) + · · ·+A(i, pd)BT (pd, j)

+A(i, q1)BT (q1, j) + · · ·+A(i, qn−d)BT (qn−d, j)

= BT (pi, j) +A(i, qj) because A(i, pj) = BT (qi, j) = δij

= −A(i, qj) +A(i, qj) because B(j, pi) = −A(i, qj)= 0

Finally, we must show that if Ax = 0, then x is a linear combination of thecolumns of BT . That is, we must show that x = BTv for some column vector v ifand only if Ax = 0.

3.3. SOLVING HOMOGENEOUS SYSTEMS 39

Start by assuming x = BTv. Then Ax = ABTv = 0v = 0. Conversely, assume

Ax = 0. We will show that x = BT

x(q1)...

x(qn−d)

. It suffices to show for i = 1 · · ·n

that

x(i) = BT (i, :)

x(q1)...

x(qn−d)

There are two cases to consider: i = pj for j = 1 · · · d and i = qj for j = 1 · · ·n− d.Suppose i = qj . Then:

BT (qj , :)

x(q1)...

x(qn−d)

=n−d∑k=1

B(k, qj)x(qk) =n−d∑k=1

δkjx(qk) = x(qj)

Now for the harder case. Since Ax = 0:

0 = (Ax)(j) =d∑k=1

A(j, pk)x(pk) +n−d∑k=1

A(j, qk)x(qk) = x(pj) +n−d∑k=1

A(j, qk)x(qk)

Thus for j = 1 · · · d:

BT (pj , :)

x(q1)...

x(qn−d)

=n−d∑k=1

B(k, pj)x(qk)

=n−d∑k=1

−A(j, qk)x(qk)

= x(pj)

�

Corollary 3.15. Let A be a matrix with more columns than rows. Then thereexists a non-zero column vector x such that Ax = 0.

Proof. Let x be any column of the transpose of the matrix complementaryto A. �

***Problem 3.16. For the matrix A in Problem 3.13, find a non-zero columnvector x such that Ax = 0. If you did the previous problem, you should be able toanswer this one with no further work.

***Problem 3.17. Show that if a matrix A has more rows than columns, thenthere exists a non-zero row vector v such that vA = 0.

So far you have seen that the rows of the matrix complementary to A aresolutions to the matrix equation with


3.4. Row Equivalent Matrices and Reduced Row-Echelon Matrices

The previous section pretty much solved Key Problem I: homogeneous systemsof equations. Now turn your attention to Key Problem II: linear combinations ofvectors.

Consider the matrices

A =

1 3 2−1 2 13 −1 4

P =[2 1 31 −1 2

]Then

PA =[2 1 31 −1 2

] 1 3 2−1 2 13 −1 4

=[10 5 178 −1 9

]Observe that the rows of PA are linear combinations of the rows of A:[

10 5 17]

= 2[1 3 2

]+[−1 2 1

]+ 3

[3 −1 4

][8 −1 9

]=[1 3 2

]−[−1 2 1

]+ 2

[3 −1 4

]Proposition 3.18. Let A and B be matrices with the same number of columns.

The rows of B are linear combinations of the rows of A if and only if there existsa matrix P such that PA = B.

Proof. By Proposition 2.15 row i of B is a linear combination of the rows ofA if and only if there exists a row vector vi such that B(i, :) = viA. Thus all therows of B are linear combinations of the rows of A if and only if

B =

B(1, :)...

B(s, :)

=

v1A...

vsA

=

v1

...vs

A = PA

�

Definition 3.19. Let A and B be matrices with the same number of columns.Then A � B if every row of B is a linear combinations of the rows of A.

By the preceding proposition, B � A if and only if there exists a matrix Psuch that B = PA. Taking P = I you can see that A � A for every matrix A. Therelation � is also transitive.

Proposition 3.20. If C � B and B � A then C � A.

Proof. Since C = PB and B = QA for some matrices P and Q, C =P (QA) = (PQ)A, so C � A. �

Matrices are more complex than numbers. It is possible to have A � B andB � A with A 6= B.

***Problem 3.21. Let A =[1 3 12 −1 3

]and B =

[2 −1 31 3 1

]. Show that

A � B and B � A but A 6= B.

Definition 3.22. Let A and B be matrices with the same number of columns.Then A is row equivalent to B, written A ' B, if A � B and B � A.

3.4. ROW EQUIVALENT MATRICES AND REDUCED ROW-ECHELON MATRICES 41

Thus A ' B if and only if there exist matrices P and Q such that PA = Band QB = A.

Proposition 3.23. The relationship ' is an equivalence relationship:(1) A ' A for any matrix A;(2) if A ' B then B ' A; and(3) if A ' B and B ' C then A ' C.

Proof. (1) A ' A because A � A.(2) If A ' B then A � B and B � A so B ' A.(3) If A ' B and B ' C then A � B and B � C and C � B and B � A.

Therefore A � C and C � A so A ' C.�

***Problem 3.24. Let A =[1 0 −10 1 2

]and B =

[1 1 11 −1 −3

].

(1) If B = PA, what is the size of P?(2) If A = QB, what is the size of Q?(3) Find P and Q.(4) Conclude that A is row equivalent to B (A ' B).

The goal of this section is to prove that every matrix is row equivalent to aunique matrix with a special form. Next we define that special form.

Definition 3.25. A matrix A is a row-reduced echelon matrix if(1) A has pivots;(2) the elementary basis vectors e1, e2, . . . that form the pivot columns occur

left to right in the matrix; and(3) the elements to the left of every pivot are 0.

Here is an example of a row-reduced echelon matrix, with the pivots boxed:

A =

0 1 2 0 3 0 40 0 0 1 −1 0 20 0 0 0 0 1 −3

The pivot columns 2, 4 and 6 occur left to right, and every element to the left of apivot is 0.

***Problem 3.26. One of the following matrices is a row reduced echelonmatrix, one has pivots but is not echelon, and one does not have pivots. Say whichis which, and why.2 1 1 0 0

3 0 3 1 01 0 2 0 1

1 0 1 0 10 1 3 0 20 0 0 1 3

0 1 0 1 00 0 2 3 00 0 1 0 1

***Problem 3.27. (1) Show that the identity matrix I is a row-reduced

echelon matrix.(2) Show that the only square row-reduced echelon matrix is I.

To show that every matrix is row equivalent to a unique row-reduced echelonmatrix, you need to know how a matrix A can be changed to a different matrix Bthat is row equivalent toA. Four operations, called row operations, are permitted.


(1) The rows of A can be reordered. For example the third and fifth rowscould be swapped in the matrix, or the fourth row could be moved to thetop of the matrix.

(2) A row can be multiplied by a non-zero scalar. For example, the first rowcould be multiplied by 3.

(3) A row can be modified by adding a multiple of another row to it. Forexample the third row could be modified by adding twice the fourth rowto the third row.

(4) A row of zeros can be added to A or deleted from A.Here is an example of a matrix modified by a series of row operations. The end

of the process is a row-reduced echelon matrix.


(3.4.1)

2 3 2 −12 4 0 −23 0 1 4

swap r1,r2−→

2 4 0 −22 3 2 −13 0 1 4

mult r1×0.5−→

1 2 0 −12 3 2 −13 0 1 4

add −2×r1 to r2−→

1 2 0 −10 −1 2 13 0 1 4


1 2 0 −10 −1 2 10 −6 1 7

mult r2×(−1)−→

1 2 0 −10 1 −2 −10 −6 1 7


1 0 4 10 1 −2 −10 −6 1 7

add 6×r2 to r3−→

1 0 4 10 1 −2 −10 0 −11 1

mult r3×−1

11−→

1 0 4 10 1 −2 −10 0 1 −1

11


1 0 0 1511

0 1 −2 −10 0 1 −1

11

add 2×r3 to r2−→

1 0 0 1511

0 1 0 −1311

0 0 1 −111

Notice how the reduction to row echelon form proceeded. You create a pivot in

the first row and use it to zero out all the other elements in the first pivot column.Then you create a pivot in the second row and use it to zero out all the otherelements in the second pivot column. Continue creating pivots and pivot columnsuntil all required pivot columns are created.


***Problem 3.28. Use row operations to reduce the following matrices to rowechelon form:

1 2 3 41 2 4 52 4 7 10

1 2 34 5 67 8 9

1 1 11 1 11 1 1

It has been a while since you learned a new command for Octave. Octave can rowreduce a matrix. To duplicate the operations in the example above using Octave,just enter the commands:

A = [2,3,2,-1;2,4,0,-2;3,0,1,4]B = rref(A)

B will be the row-reduced echelon form of A. You won’t see the row operationsperformed by Octave, but internally Octave starts with A and does row operationsto create a row-reduced echelon matrix B.

***Problem 3.29. Use Octave to check (3.4.1). Do you get the same result?Use Octave to check your answers to the Problem 3.28.

How do you know that row operations change a matrix into a row equivalentmatrix? Let A be the original matrix and B the modified matrix. We will showthat there exist matrices P and Q such that PA = B and QB = A. This will provethat A ' B.

First we show that if you change a matrix A to a matrix B with one rowoperation, then there is a matrix P1 such that P1A = B. To construct P1, just dothe row operation to I. We show that this works for all four row operations:

(1) Reorder the rows. For example swap the first two rows.

A =

1 2 34 5 62 4 6

swap r1 and r2−→

4 5 61 2 32 4 6

= B

I =

1 0 00 1 00 0 1

swap r1 and r2−→

0 1 01 0 00 0 1

= P1

P1A =

0 1 01 0 00 0 1

1 2 34 5 62 4 6

=

4 5 61 2 32 4 6

= B


(2) Multiply a row by a non-zero scalar. For example, multiply the first rowby 2.

A =

1 2 34 5 62 4 6

mult r1×2−→

2 4 64 5 62 4 6

= B

I =

1 0 00 1 00 0 1

mult r1×2−→

2 0 00 1 00 0 1

= P1

P1A =

2 0 00 1 00 0 1

1 2 34 5 62 4 6

=

2 4 64 5 62 4 6

= B

(3) Add a multiple of one row to another. For example add three times thefirst row to the second row.

A =

1 2 34 5 62 4 6

add 3× r1 to r2−→

1 2 37 11 152 4 6

= B

I =

1 0 00 1 00 0 1

add 3× r1 to r2−→

1 0 03 1 00 0 1

= P1

P1A =

1 0 03 1 00 0 1

1 2 34 5 62 4 6

=

1 2 37 11 152 4 6

= B

(4) A row of zeros can be added to a matrix or deleted from from the matrix.For example, discard the third row.

A =

1 2 34 5 60 0 0

delete r3−→[1 2 34 5 6

]= B

I =

1 0 00 1 00 0 1

delete r3−→[1 0 00 1 0

]= P1

P1A =[1 0 00 1 0

]1 2 34 5 62 4 6

=[1 2 34 5 6

]= B

We are still demonstrating that row operations change a matrix into a rowequivalent matrix. The computations above show that changing A to B with asingle row operation can be accomplished by left multiplication by some matrix P1:


B = P1A. If you change A to C by a sequence of row operations

A −→ B1 −→ · · · −→ Bt −→ C

you can find matrices P1, . . . , Pt+1 such that

B1 = P1A B2 = P2B1 · · ·Bt = PtBt−1 C = Pt+1Bt

. Thus C = (Pt+1 · · ·P1)A. Changing A by a sequence of row operations yields amatrix C � A.

***Problem 3.30. Let A be the second matrix in Problem 3.28. Find a matrixP such that C = PA is the row-reduced echelon form of A.

To show that A and C are row-equivalent, you must show that A ' C. Youcan use the first part of the argument to do this. Show that C can be changed toA by row operations, and you are done.

To show that C can be changed back to A by row operations, it suffices to showthat each row operation is reversible. If you do a row operation A −→ B, you cando a row operation B −→ A. Here is a list of the four row operations and the waythat they can be undone.

(1) Reorder the rows. Then reorder them again into the original order.(2) Multiply a row by a non-zero scalar. Then multiply it by the inverse of

the scalar.(3) Add a multiple of one row to another. Then add the negative multiple of

the first row to the second.(4) A row of zeros can be added to a matrix or deleted from from the matrix.

These operations reverse each other.This completes the demonstration that row operations change matrices into rowequivalent matrices.

***Problem 3.31. Continuing Problem 3.30, find the row operations thatchange the row-reduced echelon matrix C back to A, then find a matrix Q suchthat QC = A.

Continuing toward the goal of showing that every non-zero matrix is row equiv-alent to a unique row-reduced echelon matrix, the next step is to show that rowoperations can change a non-zero matrix into a row-reduced echelon matrix. Youhave seen several examples of this fact, and you know how to use Octave to reducea matrix to row-reduced echelon form. All that is missing is the general proof.

Suppose you have a non-zero matrix A with d rows. Proceed by induction ond. It d = 1 then A is a non-zero row vector: A =

[∗ · · · ∗

]. Suppose A(1, i) = 0

for i < m and A(1,m) 6= 0.

A =[0 · · · 0 A(1,m) ∗ · · · ∗

]Multiply row 1 of A by A(1,m)−1:

C = A(1,m)−1[0 · · · 0 A(1,m) ∗ · · · ∗

]=[0 · · · 0 1 ∗ · · · ∗

]C is a one-row matrix in row-reduced echelon form.

Now for the induction step. At least one row of A is non-zero, so we may choosea row with the leftmost non-zero entry. Swap rows to put that row on top, anddivide that row by the left-most non-zero entry. Call the new matrix A1. Thus byapplying row operations you have changed A to a matrix A1 with an index p1 ≥ 1such that


(1) A1(i, j) = 0 for all i and j < p1

(2) A1(1, p1) = 1

A −→ A1 =

0 · · · 0 1 ∗ · · · ∗0 · · · 0 ∗ · · · ∗...

......

. . ....

0 · · · 0 ∗ · · · ∗

Adding a multiple of the first row to each of the other rows, we can create a zeroin rows 2, . . . , d and column p1:

A1 −→ A2 =

0 · · · 0 1 ∗ · · · ∗0 · · · 0 0 ∗ · · · ∗...

......

.... . .

...0 · · · 0 0 ∗ · · · ∗

=

0 · · · 0 1 ∗ · · · ∗0 · · · 0 0...

...... D

0 · · · 0 0

The submatrix D has d − 1 rows, so by induction row operations will reduce torow-reduced echelon form. The row operations do not change the zeros on the left.Row operations take us to:

A2 −→ A3 =

0 · · · 0 1 ∗ · · · ∗0 · · · 0 0...

...... E

0 · · · 0 0

=

0 · · · 0 1 ∗ · · · ∗ · · · ∗ · · ·0 · · · 0 0 0 · · · 1 · · · 0 · · ·...

......

......

...0 · · · 0 0 0 · · · 0 · · · 1 · · ·

where E is in row-reduced echelon form. Let the pivot columns of E (countingfrom 1 at the left edge of A3) be p2, . . . , pd. If the entries in the first row, columnsp2, . . . , pd were 0, the matrix would be in row-reduced echelon form. You can changethe entry in the first row, column pi to 0 by subtracting a multiple of row i fromthe first row. No entry in any other pivot column is changed by this operation. Thiscompletes the reduction of the matrix to row-reduced echelon form.

Corollary 3.32. If a matrix A is reduced to a row echelon matrix C, then Chas no more rows than A

Proof. The construction above might delete a row of zeros, but it never addsany rows going from a matrix to its row-reduced echelon form. �

Finally, you come to the most last step in showing that every matrix is rowequivalent to a unique row-reduced echelon matrix. This result is encapsulated ina theorem so it can be used later.

Theorem 3.33. Every non-zero matrix is row equivalent to a unique row-reduced echelon matrix.

Proof. You have seen that every non-zero matrix A is row equivalent to a row-reduced echelon matrix C. Suppose two row-reduced echelon matrices, C1 and C2

are both row equivalent to A. Then by Proposition 3.23 they are row equivalent toeach other. Suppose the pivot columns of C1 are p1, . . . , ps, and the pivot columnsof C2 are q1, . . . , qt. We use induction on s to prove that C1 = C2.

First suppose s = 1. The rows of C2 are non-zero multiples of the single row ofC1, so C2(i, j) = 0 for 1 ≤ i ≤ t and j < p1 and C2(i, p1) 6= 0. Thus all rows of C2

are zero before column p1 and non-zero in column p1. Since C2 is a row-reduced


echelon matrix, C2 has only one row: t = 1. Since the first non-zero elements ofthe single row of C1 and C2 are both in position p1, both vectors have a 1 in thatposition. The single row of C2 is a scalar multiple of the single row of C1, so themultiplier must be 1. C1 = C2.

Now for the induction step, which proceeds in three stages:

(1) p1 = q1;(2) s = t and C1 is the same as C2 in all rows but the first; and(3) the first row of C1 equals the first row of C2.

Since the rows of C2 are linear combinations of the rows of C1, all the rows of C2

are 0 in columns 1, 2, . . . , p1−1. Since the first row of C1 is a linear combination ofthe rows of C2, at least one row of C2 has a non-zero entry in column p1. Thereforeq1 = p1.

Rows 2 . . . t of C2 are linear combinations of the rows of C1. Since these rows ofC2 are 0 in column p1, the first row of C1 cannot be part of the linear combinations.Rows 2 . . . t of C2 are linear combinations of rows 2 . . . s of C1. Similarly rows 2 . . . sof C1 are linear combinations of rows 2 . . . t of C2. Thus the matrices C1 and C2,with their first rows removed, are row equivalent row-reduced echelon matrices.Therefore, by induction, they are equal. In particular s = t and, for i = 2 . . . s,pi = qi.

To conclude, we must show that the first rows of C1 and C2 are equal. Weknow that the first row of C1 can be written as a linear combination of the rows ofC2:

C(1, :) = a1C2(1, :) + · · ·+ asC2(s, :)

Considering the coordinate positions pi:

C1(1, pi) = a1C2(1, pi) + · · ·+ asC2(s, pi)

But C2(j, pi) = 1 if j = i and is 0 otherwise. Therefore C1(1, pi) = ai. ButC1(1, pi) = 1 for i = 1 and is 0 otherwise. Therefore a1 = 1 and ai = 0 for i > 1.C1(1, :) = C2(1, :), or the first rows of C1 and C2 are equal. �

The uniqueness theorem allows another characterization of row equivalence.

Corollary 3.34. Two matrices are row equivalent if and only if they have thesame row-reduced echelon form.

***Problem 3.35. Prove Corollary 3.34.

The uniqueness theorem will be our entry into dimension theory in the nextchapter. However you can use the result right away to solve homogeneous systemsof linear equations.

Proposition 3.36. Let A and B be row equivalent matrices. Then the equa-tions Ax = 0 and Bx = 0 have the same solutions

Proof. There exist matrices P and Q such that PA = B and QB = A. Wewill show first that Ax = 0 implies Bx = 0:

Ax = 0⇒ PAx = P0 = 0

⇒ Bx = 0

3.5. LINEAR INDEPENDENCE AND ROW RANK 49

Next we show that Bx = 0 implies Ax = 0:

Bx = 0⇒ QBx = Q0 = 0

⇒ Ax = 0

Therefore Ax = 0 if and only if Bx = 0. The two equations have the samesolutions. �

In Proposition 3.14 you learned to solve systems of homogeneous equations ifthe coefficient matrix had pivots. Given any homogeneous system Ax = 0, theprevious proposition says that you can replace the coefficient matrix A by theequivalent row-reduced echelon matrix C, and the system Cx = 0 will have thesame solutions as the original system Ax = 0.

***Problem 3.37. Consider the homogeneous system of linear equations:

3w − 2x+ 4y − z = 02w + 3x− y + 2z = 0−w − x+ y − z = 0

Find an equivalent system with a row-reduced echelon coefficient matrix, and useProposition 3.14 to find the basic solution to the system. You can use Octave torow reduce the coefficient matrix.

The uniqueness of the row-reduced echelon form of a matrix allows us to definea key property of matrices. The consequences of this definition will be explored inthe next section.

Definition 3.38. Let A be a matrix. If A = 0, then the row rank of A is0. Otherwise the row rank of A is the number of rows in its unique row-reducedechelon form. The row rank of A is denoted rank(A).

Proposition 3.39. The row rank of a matrix is not more than the number ofrows or the number of columns.

Proof. Corollary 3.32 shows that the row rank is not more than the numberof rows. But a row-reduced echelon matrix cannot have more rows than columns,so the row rank of a matrix, the number of rows of its row-reduced echelon form,is not more than the number of columns. �

***Problem 3.40. Octave will construct random matrices of decimals of sizer× c with the command A = rand(r,c). Construct a random 5× 3 matrix A anda random 3 × 4 matrix B. What is the size of AB? Find the row ranks of A, Band AB. Do your results confirm the previous corollary?

You don’t really have to row reduce matrices to find their row rank. Octave hasa command rank(..) that will calculate the row rank of a matrix.

3.5. Linear Independence and Row Rank

The rows of a matrix with pivots have a special property called linear indepen-dence that, from a theoretical point of view, is their most important property. Westart, however, with the opposite property, which is easier to understand.


Definition 3.41. A set of vectors v1, . . . ,vt is linearly dependent if thereexist scalars a1, . . . , at, not all 0, such that:

a1v1 + · · ·+ atvt = 0

***Problem 3.42. Show that the vectors (1, 0), (0, 1) and (1, 1) are linearlydependent.

By Proposition 2.15, the rows of a matrix are linearly dependent if and only ifthere exists a non-zero row vector v such that vA = 0. Here are some elementaryresults matrices with linearly dependence.

Lemma 3.43. A set of vectors v1, . . . ,vt is linearly dependent if and only ifone of the vectors is a linear combination of the others.

Proof. If, for simplicity, the last vector is a linear combination of the others,then there exist scalars a1, . . . , at−1 such that

vt = a1v1 + · · ·+ at−1vt−1

. Thusa1v1 + · · ·+ at−1vt−1 − 1vt = 0

which is a non-trivial linear combination of the vectors yielding 0, so the vectorsare linearly dependent.

Conversely if the vectors are linearly dependent, there exists scalars a1, . . . , atnot all zero such that:

a1v1 + · · ·+ atvt = 0

For simplicity assume at 6= 0. Then

vt = −a1

atv1 − · · · −

at−1

atvt−1

so vt is a linear combination of the other vectors. �

***Problem 3.44. Use the Lemma to show that the vectors (1, 0), (0, 1) and(1, 1) are linearly dependent.

Proposition 3.45. A matrix A has linearly dependent rows if and only if youcan remove a row from A to obtain a matrix C ' A.

Proof. Suppose the rows of A are linearly independent. Then one row is alinear combination of the others. Let C be the matrix with this row omitted. Thenevery row of C is a linear combination of rows of A, and every row of A is a linearcombination of rows of C. Therefore C ' A.

Conversely suppose you remove one of the rows of A, obtaining a matrix C ' A.Every row of A is a linear combination of the rows of C, so one of the rows of A is alinear combination of the other rows. Thus the rows of A are linearly dependent. �

Here’s the key result about linear dependence. It tells us when the rows of amatrix are linearly dependent.

Proposition 3.46. The rows of a matrix are linearly dependent if and only ifthe number of rows is greater than the row rank.


Proof. Let A be a matrix with r rows, and let C be the unique row-reducedechelon matrix row equivalent to A. The number of rows of C is rank(A). Lets = rank(A). Since C ' A, there exists a s× r matrix P such that PC = A.

If s < r then by Problem 3.17 there is a non-zero row vector v such thatvP = 0. Thus vA = vPC = 0, so the rows of A are linearly dependent.

Conversely, suppose the rows are linearly dependent. Then A is row equivalentto a matrix C with one fewer row. Since A and C have the same row rank, andsince the row rank of a matrix is not more than the number of rows, the row rankof A is not more than the number of rows of C, which is less than the number ofrows of A. �

Corollary 3.47. A set of more than n vectors from Rn is linearly dependent.

Proof. Create a matrix with the vectors as rows. The rank of the matrix isnot more than the number of columns n, so the rank is less than the number ofrows. Thus the rows are linearly dependent. �

Definition 3.48. A set of vectors that is not linearly dependent is said to belinearly independent.

Theorems about linearly independent sets of vectors are just the converses oftheorems about linearly dependent sets.

***Problem 3.49. Prove all of the following:(1) A set of vectors is either linearly dependent or linearly independent and

not both.(2) A set of vectors v1, . . . ,vt is linearly independent if

a1v1 + · · ·+ atvt = 0

implies a1 = · · · = at = 0.(3) The rows of a matrix are linearly independent if vA 6= 0 for all non-zero

row vectors v.(4) A set of vectors is linearly independent if and only if none of the vectors

is a linear combination of the others.(5) The rows of a matrix are linearly independent if and only if removing any

of the rows reduces the rank of the matrix.(6) The rows of a matrix are linearly independent if and only if the number

of rows is the row rank of the matrix.

***Problem 3.50. Show that a single non-zero vector forms a linearly inde-pendent set.

***Problem 3.51. Show that the elementary basis vectors for R3 are linearlyindependent. Show that the elementary basis vectors for Rn are linearly indepen-dent.

You can use Octave to determine if a set of vectors is linearly independent. Putthe vectors into the rows of a matrix and row reduce the matrix. If the numberof non-zero rows in the row reduced form is the same as the original number ofvectors, then your vectors are linearly independent. If there are fewer non-zerorows, the vectors are linearly dependent.


***Problem 3.52. Show that the vectors (1, 3, 2, 4), (3,−1, 1,−1), (2,−1, 0, 1)and (6, 1, 3, 4) are linearly dependent. Then find a non-trivial linear combinationsumming to 0. Here’s a plan for the second part. Put the vectors into the rows ofa matrix A. You are looking a non-zero row vector x such that xA = 0. Find anon-zero column vector xT such that ATxT = 0. You will have to row reduce AT

and then find the complementary matrix.

Proposition 3.53. The rows of a matrix with pivots are linearly independent.

Proof. Let the matrix be A, and suppose A has d rows. Let the pivot columnsbe p1, . . . , pd. Then column pi of A, A(:, pi) is the ith elementary basis vector ei ofsize d.

Suppose we have a row vector x such that xA = 0. To prove that the rows ofA are linearly independent, we must show that x = 0. But the row x times everycolumn of A is zero, so for i = 1, . . . , d:

0 = xA(:, pi) = xei = x(i)

Thus every coordinate of x is zero. �

***Problem 3.54. The point of this problem is to confirm the previous propo-

sition. Let A =

1 0 1 0 20 1 2 0 10 0 −1 1 3

, a matrix with pivots. Show by direct compu-

tation that [a b c

] 1 0 1 0 20 1 2 0 10 0 −1 1 3

=[0 0 0 0 0

]implies that a = b = c = 0. How does this show that the rows of A are linearlyindependent?

In the problem above, it is not enough to show that[0 0 0

]A = 0. A linear

combination with all zero coefficients always results in the zero vector. The questionis: given some vectors, is it possible or impossible to form a linear combination withnon-zero coefficients that results in the zero vector.

This section ends with three results that are essential for dimension theory.

Proposition 3.55. Let A be a non-zero matrix. If the rows of A are linearlydependent, you can remove some of the rows so that the resulting matrix B has haslinearly independent rows and B ' A.

Proof. Since the rows of A are linearly dependent, by Proposition 3.45 wecan remove one row from A resulting in a matrix C ' A. Keep removing rows untilthe remaining rows are linearly independent. Since one non-zero row is linearlyindependent, the process will terminate when only one row is remaining if notsooner. �

***Problem 3.56. Remove rows from A =

1 3 24 −1 03 −4 −25 2 2

until you get a

matrix C with linearly independent rows that is row equivalent to A.


Lemma 3.57. Let v1, . . . ,vt be a linearly independent set of vectors, and sup-pose vt+1 is not a linear combination of v1, . . . ,vt. Then v1, . . . ,vt+1 is a linearlyindependent set of vectors.

Proof. Supposea1v1 + · · ·+ at+1vt+1 = 0

We must show a1 = · · · = at+1 = 0. If at+1 6= 0 then vt+1 is a linear combinationof v1, . . . ,vt, contradicting an assumption of the lemma. Thus at+1 = 0. But thena1 = · · · = at = 0 because v1, . . . ,vt are linearly independent. �

Proposition 3.58. If B � A and B has linearly independent rows, then youcan add rows of A to B until you get a matrix C with linearly independent rowssuch that C ' A.

Proof. If B is row equivalent to A then take C = B. Otherwise choose a rowof A that is not a linear combination of the rows of B and add it to B, resultingin a matrix D. By Lemma 3.57 the rows of D are linearly independent. Also therows of D are linear combinations of the rows of A. Keep adding rows until youhave a matrix C with linearly independent rows such that C is row equivalent toA. The process of adding rows must stop when all the rows of A have been used ifnot before. �

***Problem 3.59. Let B =[1 3 2 4

]and A =

1 0 2 00 3 0 43 2 5 −14 −1 1 0

. Show

that(1) B has linearly independent rows;(2) B � A(3) What are the ranks of B and A. (Use Octave.)(4) If you add rows of A to B to get a matrix C with linearly independent

rows that is row equivalent to A, how many rows will C have? How manyrows will you use from A?

(5) Create a matrix C by adding some of the rows of A to B such that C haslinearly independent rows and C ' A.

Corollary 3.60. If B � A and rank(A) = rank(B) then A ' B.

Proof. It will be convenient to introduce some new notation. Let rows(A) bethe number of rows in a matrix A.

Let R1 be the row-reduced echelon form of A, so there exists a matrix P1 suchthat P1R1 = A. Let R2 be the row-reduced echelon form of B, so there exists amatrix Q2 such that Q2B = R2. Since B � A, there exists a matrix P such thatPA = B. Putting this all together, if X = Q2PP1, then XR1 = R2. Since

rows(R1) = rank(A) = rank(B) = rows(R2)

R1 and R2 are row-reduced echelon matrices with the same number of rows andR2 � R1. We will show that R2 ' R1. Since A ' R1 and B ' R2, this will proveA ' B.

Since R2 is row-reduced, R2 has linearly independent rows. By the Propositionwe can add a rows of R1 to R2 resulting in a new matrix R3 with linearly indepen-dent rows that is row equivalent to R1. But how many rows must we add? Since


R3 ' R1 and R1 is row-reduced, R1 is the unique row-reduced echelon form of R3

and rank(R3) = rows(R1). Thus:

rows(R3)− rows(R2) = rows(R1)− rows(R2) = 0

That is, R3 = R2 and R2 ' R1. �

***Problem 3.61. The Corollary has two hypotheses: rank(A) = rank(B)and A � B. The conclusion is A ' B. Show by an example that you can have twomatrices of equal rank that are not row equivalent.

CHAPTER 4

Subspaces and Dimension

4.1. Definitions and Examples

So far we have dealt mostly with finite sets of vectors. In this chapter wediscuss infinite sets of vectors called subspaces. The arguments are consequently alittle more abstract. It turns out this this excursion into infinity is necessary tojustify the idea of dimension and bring geometric ideas to linear algebra. After all,most geometric objects (lines, planes, etc.) contain an infinite number of points.

Intuitively, subspaces are flat sets through the origin of Rn. Any straight linethrough the origin in R2 is a subspace of R2. Any straight line or flat plane throughthe origin in R3 is a subspace of R3.

Definition 4.1. A subspace V of Rn is a subset V ⊂ Rn satisfying threeconditions.

(1) 0 ∈ V . The zero vector is in the subspace.(2) If v ∈ V and a is a scalar (real number) then av ∈ V . Any scalar multiple

of a vector in the subspace is also in the subspace.(3) If v ∈ V and w ∈ V then v + w ∈ V . The sum of any two vectors in the

subspace is also in the subspace.

From this definition it follows that(1) a subspace cannot be the empty subset since it contains 0; and(2) if v1, . . . ,vt ∈ V and if a1, . . . , at are scalars, then a1v1 + · · ·+ atvt ∈ V .

Any linear combination of vectors in V is also a vector in V .Here is a proof that the line x = y in R2 is a subspace of R2. Let V be the

line. Clearly V ⊂ R2 because the line lies in the plane. We will show that all threesubspace conditions are satisfied.

(1) The point (0,0) is on the line y = x, so 0 ∈ V(2) If v ∈ V then v = (b, b) for some real number b. If a is a scalar then

av = a(b, b) = (ab, ab). Since the two coordinates are equal, av ∈ V .(3) If v ∈ V and w ∈ V then v = (a, a) and w = (b, b) for some scalars a and

b. Then v + w = (a, a) + (b, b) = (a+ b, a+ b). Since the two coordinatesare equal, v + w ∈ V .

***Problem 4.2. Show that the plane consisting of points (x, y, z) such thatx− y + z = 0 is a subspace of R3. Hint: show:

(1) (0,0,0) satisfies the equation;(2) if (x, y, z) satisfies the equation and a is a scalar, then a(x, y, z) satisfies

the equation; and(3) if (x1, y1, z1) and (x2, y2, z2) satisfy the equation, then so does (x1, y1, z1)+

(x2, y2, z2).

55

56 4. SUBSPACES AND DIMENSION

Then explain why these three points are sufficient to prove the desired result.

There are two subspaces of Rn that mathematicians deprecate by calling themtrivial, but they satisfy the definition and cannot be ignored. They are all of Rnand the set {0}. This last is called the zero subspace.

***Problem 4.3. Prove that Rn and {0} are subspaces of Rn.

***Problem 4.4. (1) Prove that the closed first quadrant of R2 (includ-ing the boundary) is not a subspace of R2. Note that the zero vector is amember of this set. Which part of the subspace definition is not satisfied?

(2) Prove that union of the closed first and third quadrants of R2 (including theboundaries) is not a subspace of R2. Which part of the subspace definitionis not satisfied?

4.2. Constructing Subspaces

4.2.1. Subspaces defined by equations. One way to construct a subspaceis to define it with homogeneous linear equations (see Key Problem I 3.1.1). Hereis an example of a homogeneous system of linear equations:

3x− 2y − z = 02x+ y + z = 0

You can set up a homogeneous system of linear equations as a matrix equation:[3 −2 −12 1 1

]xyz

=[00

]***Problem 4.5. . Find a non-zero solution to these equations. Hint: row

reduce the matrix, since the row-reduced equations have the same solutions as theoriginal equations (Proposition 3.36).

Definition 4.6. The kernel of a matrix A, denoted ker(A), is the set ofsolutions to Ax = 0.

***Problem 4.7. Suppose A is an r × c matrix. Is ker(A) a subset of Rc orRr?

Proposition 4.8. The kernel of a matrix is a subspace.

Proof. Let A be a matrix. We will show that ker(A) satisfies the three sub-space properties.

(1) A0 = 0, so 0 ∈ ker(A).(2) Suppose v ∈ ker(A) and a is a scalar. Then A(av) = a(Av) = a0 = 0, so

av ∈ ker(A).(3) Suppose v,w ∈ ker(A). Then A(v + w) = Av + Aw = 0 + 0 = 0, so

v + w ∈ ker(A).�

***Problem 4.9. (1) Explain why the kernel of[3 −2 −1

]is a plane

through the origin in R3.

(2) Explain why the kernel of[3 −2 −12 1 1

]is a line through the origin in

R3.

4.2. CONSTRUCTING SUBSPACES 57

***Problem 4.10. Let A and B be matrices that can be multiplied. Show thatker(B) ⊂ ker(AB). Hint: Prove that if Bx = 0 then ABx = 0.

There is no obvious relation between ker(A) and ker(AB).

4.2.2. Subspaces defined by spanning sets. Another way to constructsubspaces is with spanning sets. Here is an example. Let v1 = (1, 3,−2) andv2 = (2,−1, 1). Then consider all linear combinations of v1 and v2:

av1 + bv2 = a(1, 3,−2) + b(2,−1, 1)

= (a+ 2b, 3a− b,−2a+ b)

These vectors form a plane in R3.

***Problem 4.11. (a) Show that all linear combinations av1 + bv2 satisfy theequation x− 5y − 7z = 0.

Since the linear combinations satisfy the equation, they form a plane.

Definition 4.12. Let v1, . . . ,vt be vectors in Rn. The set V = span(v1, . . . ,vt)is the set of all possible linear combinations using the vectors v1, . . . ,vt and anyscalars whatsoever. More formally:

V = span(v1, . . . ,vt) = {a1v1 + · · ·+ atvt : a1, . . . , at ∈ R}

Proposition 4.13. Let v1, . . . ,vt ∈ Rn. Then V = span(v1, . . . ,vt) is asubspace of Rn.

Proof. We will show that S = span(v1, . . . ,vt) satisfies the three subspaceproperties.

(1) 0 = 0v1 + · · ·+ 0vt, so 0 ∈ S.(2) Suppose v = a1v1 + · · ·+ atvt ∈ S. Then av = (aa1)v1 + · · ·+ (aat)vt is

a linear combination, so av ∈ S.(3) Suppose v = a1v1 + · · ·+ atvt ∈ S and w = b1v1 + · · ·+ btvt ∈ S. Then

v+w = (a1+b1)v1+· · ·+(at+bt)vt is a linear combination, so v+w ∈ S.�

***Problem 4.14. Show that each of the spanning vectors vi is containedthe subspace span(v1, . . . ,vt). EXTRA CREDIT Show that span(v1, . . . ,vt) is thesmallest subspace containing the vi.

***Problem 4.15. Prove that the zero vector spans the zero subspace.

***Problem 4.16. Let v = (1, 1, 0) and w = (0, 1, 1). Show that both vectorsare in the subspace x − y + z = 0 of R3 (see Problem 4.2). Show that v and wspan the subspace. That is, show that every solution to the equation is a linearcombination of v and w. (Hint: assume you are given a vector (x, y, z) such thatx− y + z = 0. You must show that you can find scalars a and b such that

a(1, 1, 0) + b(0, 1, 1) = (x, y, z)

As was shown in Key Problem II 3.1.2, linear combinations of vectors are matrixproducts. If you put v1 and v2 into the rows of a matrix:

A =[1 3 −22 −1 1

]


the linear combination becomes:

av1 + bv2 =[a b

] [1 3 −22 −1 1

]= xA

for x =[a b

]. The linear combinations of the rows of A are products xA where x

is a row vector.Similarly, if you put v1 and v2 into the columns of a matrix:

B =

1 23 −1−2 1

the linear combination becomes:

av1 + bv2 =

1 23 −1−2 1

[ab

]= By

for y =[ab

]. The linear combinations of the columns of B are products By where

y is a column vector.If the n-vectors v1, . . . ,vt are put into the rows of a t × n matrix A, then

span(v1, . . . ,vt) is all the products xA for all 1× t row vectors x. If the vectors areput into the columns of a n× t matrix B, then span(v1, . . . ,vt) is all the productsBy for all t× 1 column vectors y.

Definition 4.17. Let A be a matrix. The subspace spanned by the rows of A isthe row space of A. The row space consists of all products xA for all row vectorsx. The subspace spanned by the columns of A is the column space of A. Thecolumn space consists of all product Ay for all column vectors y.

Given a set of vectors in Rn spanning a subspace V , you can put the vectorsinto the rows of a matrix and make V into a row space, or you can put the vectorsinto the columns of a matrix and make V into a column space.

***Problem 4.18. Show that the row space and the column space of a r × cmatrix A are subspaces of Rc and Rr respectively.

A subspace can have many spanning sets. A plane in R3 through the origin isspanned by any pair of non-parallel vectors in the plane. In the previous chapteryou were introduced to the concept of row equivalent matrices. Now we can givegeometric content to this concept. It turns out that matrices are row equivalent ifand only if they have the same row space.

Proposition 4.19. Let A and B be matrices with the same number of columns.The A and B are row equivalent if and only if they have the same row spaces.

Proof. A and B are row equivalent if and only if there exist matrices P andQ such that PA = B and QB = A (Corollary 3.22).

The row space of A is the set of products xA for all row vectors x, and the rowspace of B is the set of products yB for all row vectors y.

Suppose the matrices are row equivalent. We will show that the row spaces areequal, or that the set of products xA is equal to the set of products yB. We show

4.3. SPANNING SETS AND DIMENSION 59

the two sets are equal by showing any vector in one is in the other. If v = xA forsome x then

v = xA = x(QB) = (xQ)BThus v = yB for y = xQ. Similarly, if w = yB then

w = yB = y(PA) = (yP )A

Thus w = xA for x = yP . Therefore if the matrices are row equivalent the spansare equal.

Conversely suppose the row spaces are equal. Then since each row of B is inthe row space, each row of B is a linear combination of the rows of A (B � A).Similarly each row of A is a linear combination of the rows of B (A � B). ThereforeA and B are row equivalent (A ' B). �

4.2.3. Subspaces of subspaces. One of the important relations for sub-spaces is inclusion, one subspace inside another. For example in R3 there could bea line inside a plane. A subspace W is contained in a subspace V , written W ⊂ V ,if every vector in W is also a vector in V : v ∈W ⇒ v ∈ V .

The most important fact about inclusion is also the simplest:

Proposition 4.20. Let V be a subspace of Rn, and suppose v1, . . . ,vt arevectors in V . Then W = span(v1, . . . ,vn) ⊂ V .

Proof. Since V is a subspace, every linear combination of vectors in V isa vector in V . Since W consists of linear combinations of v1, . . . ,vn, which arevectors in V , W ⊂ V . �

Corollary 4.21. Let A and B be matrices, and suppose that the rows of Bare linear combinations of the rows of A (B � A). Then the row space of B is asubspace of the row space of A.

Proof. The row space of A is the span of the rows of A, and the row space ofB is the span of the rows of B. Since each row of B is a vector in the row space ofA, the row space of B is a subspace of the row space of A. �

***Problem 4.22. Let V be the plane through the origin in R3 defined by theequation x+ y− z = 0, and let v = (1, 1, 2). Show that v ∈ V and W = span(v) isa line through the origin in R3. Show (like the Proposition asserts) W ⊂ V . Thatis, show that every point on the line W lies in the plane V .

4.3. Spanning Sets and Dimension

You have now seen that subspaces can be defined by equations or defined byspanning sets. In matrix terms, subspaces can be described as kernels or row andcolumn spaces. This raises the questions: is every subspace the kernel of somematrix? Does every subspace have a spanning set? The answers are yes. Next wewill show that every subspace has a spanning set. Showing that every subset is thekernel of some matrix will be delayed until you study orthogonal complements laterin this chapter.

Proposition 4.23. Let V be a non-zero subspace of Rn. Then V has a linearlyindependent spanning set 1.

1If V = {0} then mathematicians say that the empty set is a spanning set for V . This coursewill not deal with empty spanning sets.


Proof. Choose a non-zero vector v1 ∈ V . If {v1} is not a spanning setfor V , there is a vector v2 ∈ V that is not in span(v1). The set {v1,v2} islinearly independent (Lemma 3.57), and span(v1,v2) ⊂ V by Proposition 4.20.Continue this process. If {v1,v2} is not a spanning set for V , there is a vectorv3 ∈ V that is not in span(v1,v2). The set {v1,v2,v3} is linearly independent,and span(v1,v2,v3) ⊂ V . Since a set of linearly independent vectors in Rn cannothave more than n vectors (Corollary 3.47), this process must stop. For some d ≤ nthe set {v1, . . . ,vd} will be a linearly independent spanning set of V . �

All the work you did in the last chapter now begins to pay off with a string ofkey results about the structure of subspaces.

Corollary 4.24. Every subspace V of Rn is the row space of a matrix.

Proof. Choose a spanning set for V and put the vectors into the rows of amatrix A. Then the row space of A, the space spanned by the rows of A, will beV . �

***Problem 4.25. Suppose V is a subspace of Rn that is the row space of amatrix A. Do you know how many rows or columns are in A?

Corollary 4.26. Every non-zero subspace of Rn is the row space of a uniquerow-reduced echelon matrix.

Proof. Every subspace is the row space of a matrix A, and two matrices havethe same row space if and only if they are row equivalent (Proposition 4.19). ByProposition 3.33 there is a unique row-reduced echelon matrix that is row equivalentto A. �

Corollary 4.27. Let V be a non-zero subspace, and suppose v1, . . . ,vs andw1, . . . ,wt are linearly independent spanning sets for V . Then s = t. All linearlyindependent spanning sets for V contain the same number of vectors.

Proof. Put the vectors v1, . . . ,vs into the rows of a matrix A, and put thevectors w1, . . . ,wt into the rows of a matrix B. By Proposition 4.19, A and B arerow equivalent and therefore have the same row rank. Since A and B have linearlyindependent rows, by Problem 3.49 A and B have the same number of rows as theirrow rank. Therefore s = t. �

Proposition 4.23 tells you that every subspace V of Rn has a linearly indepen-dent spanning set. Corollary 4.27 says that all linearly independent spanning setsfor V contain the same number of vectors. Linearly independent spanning sets arethe most useful spanning sets, and their size is a key parameter for a subspace. Thenext definition gives names to these two concepts.

Definition 4.28. Let V be a subspace of Rn. A basis for V is a linearlyindependent spanning set for V . The dimension of V , denoted dim(V ), is theunique size of a basis for V .

To reiterate: every subspace has a non-unique basis and a unique dimension.

***Problem 4.29. Let A be a matrix, and let R be its row-reduced echelonform.

(1) What result from Chapter 3 tells you that A and R are row equivalent.

4.3. SPANNING SETS AND DIMENSION 61

(2) What result in this chapter tells you that A and R have the same rowspace?

(3) What result from Chapter 3 tells you that the dimension of the row spaceof A is the row rank of A?

(4) What results tell you that the dimension of the row space of A is thenumber of rows of R?

The problem shows you how to calculate the dimension of any subspace V .First find a matrix A whose row space is V . Then count the number of non-zerorows in the row-reduced echelon form of A.

***Problem 4.30. Find the dimension of the plane in R3 spanned by (1, 3, 2)and (−1, 1, 1).

***Problem 4.31. Find the dimension of the subspace of R5 spanned by:

(1, 3, 2,−1, 1) (2, 1,−1, 1, 2) (−1, 3, 1, 2, 1) (4, 1, 0,−2, 2)

***Problem 4.32. Show that dim(Rn) = n Hint: show that the elementary ba-sis vectors are a basis for Rn. Show that they span Rn and are linearly independent.See Problem 3.27

***Problem 4.33. Use Corollary 3.32 to prove that if V = span(v1, . . . ,vt)then dim(V ) ≤ t.

Proposition 4.34. Let V be a subspace of Rn.(1) If w1, . . . ,wt is a spanning set for V , then t ≥ dim(V ) and the set

w1, . . . ,wt contains a basis wi1 , . . . ,wid of V .(2) If v1, . . . ,vs is a linearly independent set in V , then s ≤ dim(V ) and the

set v1, . . . ,vs can be extended to a basis v1, . . . ,vs,vs+1, . . . ,vd of V .

Proof. Let dim(V ) = d.(1) Let w1, . . . ,wt be the rows of a matrix A. By Corollary 3.55 there exists

a matrix C ' A with linearly independent rows that consists of some(possibly all) of the rows of A. The rows of C are the desired basis. Sincethe number of rows of C is dim(V ), and A has at least as many rows asC, t ≥ dim(V ).

(2) Let A be a matrix whose row space is V , and let v1, . . . ,vs be the rows ofa matrix B. Then A and B satisfy the hypotheses of Proposition 3.58, soyou can add zero or more rows of A to B until you have a matrix C ' Awith linearly independent rows. The rows of C are the desired extensionto a basis of V .

�

***Problem 4.35. (1) Find a basis for R3 among the vectors:

(1, 3, 2) (1, 4, 3) (2, 7, 5) (1, 3, 6)

Show how you checked that the three vectors you selected in fact span R3.(2) Extend the linearly independent set

(1, 3, 2) (4, 1, 1)

to a basis of R3 by adding a third vector to the set. Check that your threevectors are a basis for R3. (How will you check?)


Next we have a criterion for knowing when a set of vectors is a basis for asubspace.

Theorem 4.36. Let V be a subspace, and let v1, . . . ,vd be vectors in V . Con-sider the three statements:

(1) v1, . . . ,vd span V(2) v1, . . . ,vd are linearly independent(3) d = dim(V )

If any two of these statements are true, then so is the third.

Proof. (1) If (1) and (2) hold then v1, . . . ,vd is a basis for V and d =dim(V ).

(2) Suppose (1) and (3) hold. By Proposition 4.34 a subset of v1, . . . ,vd isa basis of V . But the sublist must contain dim(V ) = d vectors, so thesublist is the entire list v1, . . . ,vd, which is therefore linearly independent.

(3) Suppose (2) and (3) hold. By Proposition 4.34 the list v1, . . . ,vd can beextended to a basis of V . But the extended list must contain dim(V ) = dvectors, so the extended list is just the original list v1, . . . ,vd, whichtherefore spans V .

�

Dimension measures the size of subspaces. If one subspace is a proper subspaceof another (like a line lying in a plane), them the smaller space has smaller dimensionthan the larger space.

Proposition 4.37. Let W and V be subspaces of Rn, and suppose W ⊂ V andW 6= V . Then dim(W ) < dim(V ).

Proof. Let v1, . . . ,vd be a basis for W . Then v1, . . . ,vd is a linearly inde-pendent set in V . By Proposition 4.34, the vectors can be extended to a basisv1, . . . ,vt for V . We must have added vectors because the original vectors did notspan V . Thus dim(V ) = t > d = dim(W ). �

***Problem 4.38. Use the Proposition to prove that if W ⊂ V and dim(W ) =dim(V ) then W = V .

4.4. Orthogonal Complements

In R2 for every line through the origin there is another line passing throughthe origin perpendicular to the first line.

For every line in R3 through the origin there is a perpendicular plane throughthe origin and for every plane in R3 through the origin there is a perpendicular linethrough the origin.

4.4. ORTHOGONAL COMPLEMENTS 63

You have learned that two vectors v and w are perpendicular or orthogonalwhen v ·w = 0. If v and w are column vectors, we can write vTw = 0. The idea oforthogonality for vectors will now be extended to subspaces. Just like a line and aplane in R3 can be orthogonal to each other, it is possible to construct a subspaceorthogonal to any given subspace.

Definition 4.39. Let V be a subspace of Rn. Then the orthogonal comple-ment of V is:

V ⊥ = {w ∈ Rn : v ·w = 0 for all v ∈ V }

That is, a vector is in V ⊥ if it is orthogonal to every vector in V .

In the picture of the plane and the line, every vector in the plane is orthogonal toevery vector in the line.

***Problem 4.40. Suppose a vector v in Rn is orthogonal to every vector inRn. Show that v = 0. Hint: The vector v is orthogonal to itself, so v ·v = 0. Showthat v = 0.

Proposition 4.41. V ⊥ is a subspace of Rn.

Proof. Certainly V ⊥ is a subset of Rn. We have to show that V ⊥ satisfiesthe three subspace properties.

(1) 0 · v = 0 for every vector v in V . Thus 0 ∈ V ⊥.(2) If w ∈ V ⊥ and a is a scalar, then for every vector v ∈ V we have:

(aw) · v = a(w · v) = a · 0 = 0

so aw ∈ V ⊥.(3) If w1 ∈ V ⊥ and w2 ∈ V ⊥, then for every vector v ∈ V we have:

(w1 + w2) · v = w1 · v + w2 · v = 0 + 0 = 0

so w1 + w2 ∈ V ⊥.

�

Let’s do an example. In R2 consider the subspace V spanned by (2, 1). V is a

line through the origin with slope12

.


Then V ⊥ is the line with equation 2x + y = 0. To prove this fact, let v0 = (2, 1).Note first that every vector in V has the form av0 for some scalar a. Thus everyvector in V has the form av0 = a(2, 1) = (2a, a). A vector w = (x, y) is orthogonalto all the vectors in V if and only if

(2a, a) · (x, y) = 2ax+ ay = a(2x+ y) = 0

for all scalars a. Thus (x, y) is a vector in V ⊥ if and only if 2x+ y = 0It is important to note that the equation for V ⊥ has the same coefficients as

the spanning vector for V .

***Problem 4.42. Let V be the line in R3 spanned by (1,−1, 1). Show thatV ⊥ is the plane x− y + z = 0. (Hint: Show that V is the set of vectors (a,−a, a),and that (x, y, z) satisfies the equation x − y + z = 0 if and only if (x, y, z) isorthogonal to (a,−a, a).)

The example and problem above can be generalized to a proposition:

Proposition 4.43. Suppose V is a subspace in Rn,and suppose V = span(v1, . . . ,vt).Then w ∈ V ⊥ if and only if w · v1 = 0, . . . ,w · vt = 0.

You should think of this proposition as saying that if V is defined by a spanningset, then V ⊥ contains the vectors orthogonal to the spanning set. Since there aret spanning vectors v1, . . . ,vt, the subspace V ⊥ is the set of simultaneous solutionsto a system of t homogeneous linear equations.

Proof. Throughout this proof vectors will be represented as column vectors.Let A be the matrix with columns vi. Then a vector is orthogonal to each vi ifand only if the vector is in ker(AT ). We will prove that V ⊥ = ker(AT ).

A vector w is in V ⊥ if and only if v ·w = vTw = 0 for every vector v in thesubspace V . Every vector in V is a linear combination of the vi, so every vector inV can be written v = Ax for some column vector x. Then

w ∈ V ⊥ ⇔ vTw = 0 all v ∈ V⇔ (Ax)Tw = 0 all column vectors x

⇔ xT (ATw) = 0

⇔ ATw = 0 by Problem 4.40

⇔ w ∈ ker(AT )

�

4.4. ORTHOGONAL COMPLEMENTS 65

***Problem 4.44. According to the proposition, the orthogonal complementin R3 to the plane spanned by v1 = (1, 2, 1) and v2 = (1,−1, 2) is the set of vectorsmutually orthogonal to v1 and v2. Find the orthogonal complement of the plane by:

(1) finding equations describing the orthogonal complement; and(2) using the equations to find all vectors in the orthogonal complement.

Corollary 4.45. Let A be a matrix. Then ker(A) is the orthogonal comple-ment of the row space of A.

Proof. By the Proposition, the orthogonal complement of the row space of Ais the set of vectors orthogonal to the rows of A, or the vectors x such that Ax = 0.This is ker(A). �

***Problem 4.46. Show that ker(AT ) is the orthogonal complement of thecolumn space of A.

One of my favorite authors on Linear Algebra, Gilbert Strang of MIT, calls therow space of A, the column space of A, ker(A) and ker(AT ) the four fundamentalsubspaces of A2.

The major result about orthogonal complements is a theorem about their di-mension.

Theorem 4.47. Let V be a subspace of Rn. Then dim(V ) + dim(V ⊥) = n.

Proof. Let d = dim(V ), and let A be the unique row-reduced echelon matrixwhose row space is V (Proposition 4.26). By Corollary 4.45, V ⊥ = ker(A). Ais a d × n matrix with pivots. By Proposition 3.14 there exists a (n − d) × ncomplementary matrix B. By Proposition 3.14, B has pivots and ker(A) is the rowspace of B. Since B has pivots its rows are linearly independent, so dim(V ⊥) =n− d. �

***Problem 4.48. This is a continuation of Problem 3.13. You can use Oc-tave to find the kernel of a matrix. If C is a matrix, then null(C) is a matrix whosecolumn space is ker(C). Use Octave to verify the theorem just proven by showingthat the row space of the complementary matrix B you calculated in Problem 3.13is the same as the kernel of A. Here’s a plan. If you use it, explain why each stepis valid.

(1) ker(A) is the orthogonal complement of the row space of A, and ker(A) isthe column space of C=null(A).

(2) To show that the column space of C and the row space of B are the same,show that CT and B have the same row spaces by showing that their row-reduced echelon forms are the same.

This section closes with a neat result combining orthogonality with subspaceinclusion.

Proposition 4.49. Suppose W and V are subspaces of Rn, and W ⊂ V . ThenV ⊥ ⊂W⊥.

Proof. Suppose u is a vector in V ⊥. We must show u is a vector in W⊥.Since u is a vector in V ⊥, u · v = 0 for every vector v in V . Since every vectorin W is a vector in V , u · w = 0 for every vector w in W . Thus u is a vector inW⊥. �

2See Strang, Introduction to Linear Algebra, Brooks Cole, 3rd Edition 1988.


Proposition 4.50. If V is a subspace of Rn then(V ⊥)⊥ = V .

Proof. First we show that V ⊂(V ⊥)⊥. Let v ∈ V . For all w ∈ V ⊥, w·v = 0.

Therefore v ∈(V ⊥)⊥

Now we show that the inclusion is an equality, using Theorem 4.47. Surpris-ingly this result depends strongly on finite dimensionality. It can fail in infinitedimensional spaces.

Let dim(V ) = d. Then dim(V ⊥) = n− d and dim((V ⊥)⊥) = n− (n− d) = d.

Since V ⊂(V ⊥)⊥, by Problem 4.38 V =

(V ⊥)⊥. �

***Problem 4.51. In R3 let W be the x-axis and V the xy-plane. Show thatW⊥ is the yz-plane and V ⊥ is the z-axis. Check that W ⊂ V and V ⊥ ⊂ W⊥.Check that

(V ⊥)⊥ = V and

(W⊥

)⊥ = W .

4.5. Dimension of Row and Column Spaces

A matrix can be thought of as a collection of rows and columns. Each row andeach column is a vector, and these vectors span the row space and the column spaceof the matrix. This section will prove the remarkable theorem that the subspacesspanned by the rows and columns have the same dimension. This is not obvious.Consider the matrix in Octave:

The rows of this matrix span a subspace of R5, while the columns span a subspaceof R3. Just looking at the numbers, there is no obvious reason why these twosubspaces should have the same dimension, but they do, and the dimension is 2.

To find dimension of the subspace spanned by the rows, Problem 4.29 tells youthat you can row-reduce the matrix and count the non-zero rows:

The columns of A are the rows of AT , so to find the dimension of the subspacespanned by the columns of A you can find the dimension of the subspace spannedby the rows of AT . Transpose A, row reduce AT , and count the non-zero rows:

In both cases the dimension is 2.

Proposition 4.52. If A is a matrix then rank(A) = rank(AT ).

4.5. DIMENSION OF ROW AND COLUMN SPACES 67

Proof. Let d = rank (A), and let R be the row-reduced echelon form of A. Rhas d rows. Since R ' A, there exists a matrix P2 such that P2R = A. Let S be therow-reduced echelon form of RT . Since RT has d columns, rows (S) = rank

(RT)≤

d = rank (A). Moreover since S ' RT there exists matrices Q1 and Q2 such thatQ1R

T = S and Q2S = RT . Also, Q1 has the same number of rows as S. Next weshow that AT ' Q1A

T . Obviously Q1AT � AT , and since

Q2Q1AT = Q2Q1R

TPT2

= Q2SPT2

= RTPT2

= AT

we have shown AT � Q1AT . Therefore AT ' Q1A

T . Consequently:

rank (A) ≥ rows (S) = rows(Q1A

T)≥ rank

(Q1A

T)

= rank(AT)

We have shown that rank (A) ≥ rank(AT)

for any matrix A. Applying thisresult to AT , we obtain:

rank(AT)≥ rank

((AT)T)

= rank (A)

Therefore rank (A) = rank(AT). �

Now we are ready for a very short proof of the main theorem of this section,which is one of the principal theorems of linear algebra.

Theorem 4.53. Let A be a non-zero r × c matrix. Then the row space of Aand the column space of A have the same dimension.

Proof.

dim (row space of A) = rank (A) = rank(AT)

= dim (column space of A)

�

The row rank of A was defined to be the dimension of the row space of A. Youmay have wondered why the companion notion of column rank of A, the dimensionof the column space of A, was never defined. The reason is that the column rankwould have been the same as the row rank. From now on we drop “row” and justrefer to the rank of A.

Corollary 4.54. Let A be an r× n matrix, and let V be the column space ofA. Then dim(V ) + dim(ker(A)) = n, the number of columns of A.

Proof. Let W be the row space of A. Then by Theorem 4.47:

dim(V ) + dim(W⊥) = dim(W ) + dim(W⊥) = dim(V ) + dim(ker(A)) = n

�

***Problem 4.55. Show that rank(A)+dim(ker(A)) is the number of columnsof A.

Proposition 4.56. Let A and B be matrices that can be multiplied. Thenrank(AB) ≤ rank(B) and rank(AB) ≤ rank(A).


Proof. Since ker(B) ⊂ ker(AB), dim(ker(B)) ≤ dim(ker(AB)) (Problem 4.10and Problem 4.38). Moreover, B and AB have the same number of columns (Prob-lem 2.5). Call this number c. Then by Proposition 4.54:

rank(AB) = c− dim(ker(AB)) ≤ c− dim(ker(B)) = rank(B)

Now we can use transposition to show rank(AB) ≤ rank(A):

rank(AB) = rank((AB)T ) = rank(BTAT ) ≤ rank(AT ) = rank(A)

�

You have seen that the rank of a matrix is no more than the number of rowsor the number of columns in the matrix. When the rank is the number of rowsor columns (whichever is less), we say that the matrix has maximal rank. Mostmatrices have maximal rank, and if you perturb a matrix of less than maximalrank you will probably get a matrix of maximal rank. We will use Octave toshow how to construct matrices of any rank, and to show how perturbing a matrixincreases its rank to maximal rank.

Octave can create random matrices of any size, matrices whose entries are randomdecimal numbers between 0 and 1. The command

A = rand(3,4)

will create a random matrix with three rows and four columns. Try it. Randommatrices are always maximal rank. Try computing the rank of your 3× 4 randommatrix. You should get a rank of 3.To create a 3 × 4 matrix of rank 2, we will create the matrix by multiplying twomatrices of rank 2 (see previous Proposition):

A = rand(3,2)*rand(2,4)rank(A)

The rank of A will be 2. We will now perturb A just a little by adding a verysmall value to one element of A, and see if the rank jumps up to 3.

A(2,2) = A(2,2)+0.0001rank(A)

You might be asking yourself: “The equal dimension of row and column spacesmight be surprising, but what can we do with this knowledge?” You will see in thevery next section.

4.6. Invertible Matrices

4.6.1. Inverses of general matrices.

Definition 4.57. If A and B are matrices, and if AB = I, then we say that Ais a left-inverse for B, and B is a right-inverse for A. If AB = I and BA = I,we say that A and B are inverses of each other. (We could say “two-sided inverses”for emphasis, but generally we do not. If AB = I we do not say that A and B areinverses of each other unless BA = I also.)

4.6. INVERTIBLE MATRICES 69

***Problem 4.58. (1) Let:

A =

1√2−1√

20

1√6

1√6−2√

6

B =

1√2

1√6

−1√2

1√6

0−2√

6

Show AB = I. Which matrix is the left inverse, and which is the rightinverse? Note that BA is defined. Calculate BA and show BA 6= I.

(2) Let

A′ =

[√2 0 1√

2√6

2

√6

2 0

]Show A′B = I, thus proving that left inverses need not be unique.

(3) Let

A =[2 51 3

]B =

[3 −5−1 2

]Show that A and B are inverses of each other.

Suppose AB = In. Then A is a n× t matrix and B is a t× n matrix. We cansay more.

Proposition 4.59. Continuing the notation above, rank(A) = rank(B) = nand t ≥ n. The rank of the left inverse is the number of rows that it has, and therank of the right inverse is the number of columns.

Proof.

n = rank(In) = rank(AB) ≤ rank(B) ≤ min(t, n)

Therefore rank(B) = n ≤ t. A similar proof shows rank(A) = n. �

The converse of this Proposition is also true, but much harder to prove.

Proposition 4.60. If rank(A) is the number of rows of A, then A has a rightinverse. If rank(B) is the number of columns of B, then B has a left inverse.

Proof. Suppose A is a r × c matrix and rank(A) = r. We must find a c × rmatrix B such that AB = Ir.

The column space of A is a subspace of Rr and has dimension r (Theorem 4.53,so the column space is all of Rr. The elementary basis vectors from Rr, e1, . . . , er,are all in the column space. But by Definition 4.17 every vector in the columnspace has the form Aw for some vector w. Therefore there exist column vectorsw1, . . . ,wr in Rc such that Awi = ei.

Let the vectors wi be the columns of a matrix B:

B =[w1 · · · wr

]Then we claim AB = I, so A has a right inverse. To show AB = I, we will provethat column i of AB is ei:

(AB)(:, i) = A(B(:, i)) = Awi = eiIf the rank of A is the number of columns of A, then the rank of AT is the

number of rows of AT . By the first part, there exists a matrix B such that ATB = I.But then (ATB)T = BTA = IT = I, so A has a left inverse BT . �


Corollary 4.61. Putting the previous two propositions together,(1) A has a right inverse if and only if rank(A) is the number of rows of A.(2) A has a left inverse if and only if rank(A) is the number of columns of A.

***Problem 4.62. Suppose A is a 2× 4 matrix of rank 2. Must A have a leftor right inverse?

***Problem 4.63. Suppose A is a left inverse of B. Prove that AT is a rightinverse of BT . (Hint: If AB = I prove BTAT = I.)

***Problem 4.64. Suppose A1 is a left inverse B1 and A2 is a left inverseof B2. Suppose further that A1A2 is defined. Show that A1A2 is a left inverse forB2B1.

So far you have learned which matrices have inverses, but you have not learnedhow to find or calculate the inverses. You will soon learn to use Octave to calculateinverses for square matrices, and in the last chapter you will learn to calculateleft and right inverses. First we need to collect everything we have learned aboutinverses and apply it to square matrices.

4.6.2. Inverses of square matrices.

Proposition 4.65. Let A be an n × n matrix. That is, A is square. Then Ahas a unique inverse B if and only if rank(A) = n.

Before proving this statement we need to clarify it. The Proposition says thatif A is n× n and rank(A) = n then

(1) there exists a unique matrix B such that AB = BA = In.(2) If CA = I then C = B.(3) If AC = I then C = B.

And the Proposition says that if A is n × n and has a left or a right inverse thenrank(A) = n and all three statements are true.

Proof. If A has an inverse then it has a left inverse so its rank is the numberof columns, namely n. Conversely if rank(A) = n then A has both a left and aright inverse because the rank is both the number of rows and columns. SupposeCA = AB = I. We will show that C = B:

C = CI = C(AB) = (CA)B = IB = B

�

Definition 4.66. The unique inverse of a square matrix A is denoted A−1

when it exists.

Be careful. Just putting -1 above a matrix does not make the matrix invertible.You can only write A−1 after you have verified that A is invertible.

***Problem 4.67. Suppose A and B are invertible square matrices of thesame size. Show that (AB)−1 = B−1A−1. Hint: the proof has two steps. Show that(B−1A−1)(AB) = I, then explain why this equation shows that (AB)−1 = B−1A−1.

***Problem 4.68. Suppose A and B are matrices and x is a non-zero columnvector. Show that AB = I and Bx = 0 cannot both be true.

4.6. INVERTIBLE MATRICES 71

***Problem 4.69. If A is an invertible matrix, prove that AT is invertibleand that

(AT)−1 =

(A−1

)T . Hint: Problem 4.63.

Octave will calculate inverses of square matrices. If A is an invertible matrix, thenthe inverse is A^(-1).

***Problem 4.70. (1) Use Octave to find the inverse of

A =

3 −2 −12 1 3−1 −1 1

(2) Let x =

xyz

and b =

412

. Explain why the system of equations:

3x− 2y − z = 42x+ y + 3z = 1−x− y + z = 2

is the same as the matrix equations:

Ax = b

(3) Find the solution to the system by using A−1:

x = A−1Ax = A−1b

Check your solution in the system of equations.

***Problem 4.71. Show that the matrix

A =

1 1 11 1 11 1 1

is not invertible. (Hint: what is the dimension of the row space.) What happens ifyou try to invert A with Octave?

***Problem 4.72. Create a 4×4 upper-triangular matrix with no zeros on thediagonal in Octave. Find its inverse. Is the inverse also upper-triangular? Whatcan you say about the diagonal elements of the matrix and its inverse? Does thisrule hold for the other elements in the matrix? EXTRA CREDIT: Prove that anupper-triangular matrix is invertible if and only if there are no zeros on the diagonal.Prove further that the inverse of such an upper-triangular matrix is upper-triangularand that the relationship you found for the diagonal elements is true. (Hint: startwith the 2× 2 case, then the 3× 3 case, then try to generalize.)

4.6.3. Inverses of 2×2 matrices. Many geometry problems just involve twodimensions, and the matrices you work with are 2×2 matrices. The next definitionand problems will show you when a 2×2 matrix has an inverse and how to computeit quickly.

Definition 4.73. Let A =[a bc d

]. The adjoint of A is

adj(A) =[d −b−c a

]


***Problem 4.74. Prove that

adj(A)A =[ad− bc 0

0 ad− bc

]***Problem 4.75. Let A =

[a bc d

], and suppose ad− bc 6= 0.

(1) Show that

A−1 =1

ad− bcadj(A)

(2) Find the inverse of A =[1 22 5

]and check your answer by calculating

A−1A.

(3) EXTRA CREDIT Prove that A =[a bc d

]is not invertible if ad− bc = 0.

You have proven that A =[a bc d

]is invertible if ad − bc 6= 0, and you have

found a formula for A−1.

4.6.4. An algorithm for A−1. Here is one way to calculate A−1 that ispractical if your matrix is small or if you program a computer to perform the manyarithmetic steps. Some computer programs use this method to calculate the inverse.

(1) Suppose A is n× n. Make a new matrix by putting In next to A:

B = [A| In]

(2) Row reduce B. If the part that was A becomes In, then your originalmatrix A was invertible, and you can continue. The row reduced matrixwill be:

E = [In| C](3) The matrix C is A−1.

***Problem 4.76. Use Octave to test the method just described.A = rand(4,4)B = [A,eye(4)]E = rref(B)C = E(:,5:8)Check that AC = I.

To prove that the method above works as advertised, recall that there is amatrix P such that PE = B. In general it was hard to find P , but in this case itis easy. We will show that P = C = A−1.

PE = P [A| In] = [PA| PIn] = [PA| P ] = [In| C]

Therefore PA = In so P = C = A−1.You might ask why the row-reduction method for finding matrix inverses has

been presented if you are not expected to use it for finding inverses. Why gothrough a routine in Octave like the previous problem when all you have to typeis C=A^(-1)? The row reduction method leads to a theoretically important resultabout the numbers that appear in the inverse of a matrix. The method used onlyrational operations (addition, subtraction, multiplication and division) applied tothe elements of A. Therefore you can conclude:

4.7. SUMS AND INTERSECTIONS OF SUBSPACES 73

Proposition 4.77. If A is an invertible matrix, then the elements of A−1 arerational combinations of 0, 1 and the elements of A.

This proposition says that you can construct the inverse of an invertible matrixover any field containing the elements of the matrix.

4.7. Sums and Intersections of Subspaces

Besides equations and spanning sets, there are two more ways to constructsubspaces, intersections and sums. The intersection of two sets is the collection ofelements common to both sets. The intersection of two planes in R3 is a line. IfV and W are subspaces, their intersection is denoted V ∩W . The example of twoplanes suggests the general result:

Proposition 4.78. The intersection of two subspaces of Rn is a subspace ofRn.

Proof. As always, showing that some construction is a subspace requires ver-ifying the three subspace properties.

(1) The zero vector 0 is in both subspaces V and W , so 0 is in V ∩W .(2) If v ∈ V ∩W , then v ∈ V and v ∈ W . Since V and W are subspaces, if

a is a scalar then av ∈ V and av ∈W . Thus av ∈ V ∩W .(3) If v ∈ V ∩W and w ∈ V ∩W , then both v and w are in V and both are

in W . Thus v + w is in both V and W . Therefore v + w ∈ V ∩W .

�

***Problem 4.79. Find a vector spanning the line which is the intersectionof the planes x− 2y − z = 0 and 2x+ y + z = 0.

Another way to combine subspaces is by adding them.

Definition 4.80. Let V and W be subspaces of Rn. Then

V +W = {v + w : v ∈ V and w ∈W}

Let’s consider an example. Let V be the subspace of R3 spanned by (1, 1,−1).Thus V consists of all vectors of the form v = (a, a,−a). V is a line. Similarly letW be the line spanned by (2, 1, 2). Then W is all vectors of the form w = (2b, b, 2b).(It is important to use two different letters a and b here. When we add v and w,they may have been created from the spanning vectors with different scalars.) Thena typical element of V +W is a vector of the form v + w = (a+ 2b, a+ b,−a+ 2b).These vectors form a plane.

***Problem 4.81. Find an equation for the plane in the previous example. Itsuffices to find an equation dx + ey + fz = 0 satisfied by the spanning vectors forV and W .

There is an important relation for the dimension of intersections and sums ofsubspaces.

Proposition 4.82. Let V and W be subspaces of Rn. Then:

dim(V ) + dim(W ) = dim(V ∩W ) + dim(V +W )


Proof. Choose a basis u1, . . . ,ud1 for V ∩W . The vectors are linearly inde-pendent and are contained in both V and W . Using Proposition 4.34, we can findv1, . . . ,vd2 and w1, . . . ,wd3 such that

u1, . . . ,ud1 ,v1, . . . ,vd2is a basis for V , and

u1, . . . ,ud1 ,w1, . . . ,wd3

is a basis for W . Therefore dim(V ∩W ) = d1, dim(V ) = d1 + d2 and dim(W ) =d1 + d3. If we show that

u1, . . . ,ud1 ,v1, . . . ,vd2 ,w1, . . . ,wd3

is a basis for V + W , we will have shown that dim(V + W ) = d1 + d2 + d3 =dim(V ) + dim(W )− dim(V ∩W and thus completed the proof of the proposition.

First we show that our set of vectors is linearly independent. Suppose

a1u1 + · · ·+ ad1ud1 + b1v1 + · · ·+ bd2vd2 + c1w1 + · · ·+ cd3wd3 = 0

We must show that all the scalar coefficients are 0. From the equation above wederive:

a1u1 + · · ·+ ad1ud1 + b1v1 + · · ·+ bd2vd2 = −c1w1 − · · · − cd3wd3

The left-hand side is in V and the right-hand side is in W . Since the two sides areequal, both are in V ∩W , and therefore the right-hand side is a linear combinationof the ui:

−c1w1 − · · · − cd3wd3 = e1u1 + · · ·+ ed1ud1Since the vectors ui and wi are a linearly independent basis of W , all the coefficientsin the last equation must be 0. In particular, ci = 0 for all i. But then:

a1u1 + · · ·+ ad1ud1 + b1v1 + · · ·+ bd2vd2 = 0

Since the vectors ui and vi are a linearly independent basis of V , ai = bi = 0 andwe have shown that vectors ui, vi and wi are linearly independent.

To show that the vectors ui, vi and wi span V +W , suppose x is a vector inV + W . We must show that x is a linear combination of ui, vi and wi. Since xis a vector in V +W , there are vectors v in V and w in W such that x = v + w.Since v is a linear combination of ui and vi, and w is a linear combination of uiand wi, the vector x is a linear combination of ui, vi and wi. �

***Problem 4.83. Let P1 and P2 be two different planes through the origin inR3.

(1) What is dim(P1)? dim(P2)?(2) Is P1 ∩ P2 a point, a line, a plane or R3? What is dim(P1 ∩ P2)?(3) Is P1 + P2 a point, a line, a plane or R3? What is dim(P1 + P2).(4) Check that dim(P1) + dim(P2) = dim(P1 ∩ P2) + dim(P1 + P2).

Finally, we combine sums, intersections and orthogonal complements into sev-eral related results. You can see that a line and its orthogonal plane in R3 intersectin the zero subspace and together span R3. This is a general fact.

Proposition 4.84. Let V be a subspace in Rn.(1) V ∩ V ⊥ = {0}.(2) V + V ⊥ = Rn.

4.7. SUMS AND INTERSECTIONS OF SUBSPACES 75

Proof. (1) Suppose v is a vector in V ∩ V ⊥. Then v is orthogonal to itself,or v · v = 0. Therefore v = 0.

(2) It suffices to show that dim(V + V ⊥

)= n. But

dim(V + V ⊥

)= dim(V ) + dim

(V ⊥)− dim

(V ∩ V ⊥

)= dim(V ) + (n− dim(V )) + 0= n

�

***Problem 4.85. The meaning of the last Proposition is that if V is a sub-space of Rn then every vector v in Rn can be written as a unique sum: v = x + ywhere x is in V and y is orthogonal to V . Let V be the plane x − 2y − z = 0 inR3. Find a vector in V and another vector orthogonal to V adding up to (1, 1, 1).Hint: first show that V ⊥ is the line spanned by (1,-2,-1). Then recall Proposition1.24, which dealt with decomposing a vector into a sum of two orthogonal vectors.Let y be the projection of (1,1,1) onto (1,−2, 1) and let x = v− y. Show that x isin the plane.

Proposition 4.86. Let V and W be subspaces of Rn. Then

(V +W )⊥ = V ⊥ ∩W⊥(4.7.1)

V ⊥ +W⊥ = (V ∩W )⊥(4.7.2)

Proof. Obviously

V ∩W ⊂ V ⊂ V +W

V ∩W ⊂W ⊂ V +W

By Proposition 4.49:

(V +W )⊥ ⊂ V ⊥ ⊂ (V ∩W )⊥

(V +W )⊥ ⊂W⊥ ⊂ (V ∩W )⊥

Therefore:

(V +W )⊥ ⊂ V ⊥ ∩W⊥(4.7.3)

V ⊥ +W⊥ ⊂ (V ∩W )⊥(4.7.4)

To conclude the theorem we must show the inclusions are equalities. To prove 4.7.1We will prove the inclusion opposite to 4.7.3. Suppose x is a vector in V ⊥ ∩W⊥.Thus x ·v = 0 for all vectors v in V , and x ·w = 0 for all vectors w in W . Thereforex · (v + w) = 0 for all vectors v + w in V +W and x is a vector in (V +W )⊥.

Finally, we will prove:

dim((V ∩W )⊥

)= dim

(V ⊥ +W⊥

)


By Problem 4.38, this equality plus 4.7.4 will prove 4.7.2.

dim((V ∩W )⊥

)− dim

(V ⊥ +W⊥

)= n− dim(V ∩W )− dim

(V ⊥)− dim

(W⊥

)+ dim

(V ⊥ ∩W⊥

)= n− dim(V )− dim(W ) + dim(V +W )− (n− dim(V ))− (n− dim(W )) + dim

(V ⊥ ∩W⊥

)= dim

(V ⊥ ∩W⊥

)− (n− dim(V +W ))

= dim(V ⊥ ∩W⊥

)− dim

((V +W )⊥

)= 0 by 4.7.1

�

***Problem 4.87. To illustrate the propositions in this section, define twosubspaces of R5: V is all vectors of the form (∗, ∗, ∗, 0, 0), and W is all vectors ofthe form (0, ∗, ∗, ∗, 0). (∗ means any number.) Show the following results:

(1) V +W is all vectors of the form (∗, ∗, ∗, ∗, 0).(2) V ∩W is all vectors of the form (0, ∗, ∗, 0, 0).(3) Check that dimV + dim(W ) = dim(V ∩W ) + dim(V +W )(4) V ⊥ is all vectors of the form (0, 0, 0, ∗, ∗).(5) W⊥ is all vectors of the form (∗, 0, 0, 0, ∗).(6) What is V ⊥ +W⊥?(7) What is V ⊥ ∩W⊥?(8) What is (V +W )⊥?(9) What is (V ∩W )⊥?

(10) Check that dim(V ⊥) + dim(W⊥) = dim(V ⊥ ∩W⊥) + dim(V ⊥ +W⊥)(11) Check that (V +W )⊥ = V ⊥ ∩W⊥ and V ⊥ +W⊥ = (V ∩W )⊥.

CHAPTER 5

Eigenvectors and Eigenvalues

5.1. Introduction and Definitions

This chapter is devoted to two of the key attributes of a square matrix: itseigenvalues and eigenvectors. These are used in many matrix calculations for sta-tistics, engineering, fluid dynamics, linear differential equations and other domains.They arise naturally in a wide variety of problems.

As we explain below, eigenvalue and eigenvectors are not strictly part of lin-ear algebra, because finding eigenvalues is a non-linear problem. Nevertheless,eigenvalues arise so often when matrices are used that you must learn about them.

Let A be a square matrix. An eigenvalue-eigenvector pair for A is a scalar λand non-zero column vector v such that:

Av = λv

Note that 0 must be excluded from possible eigenvectors because A0 = λ0 for allscalars λ. The vector 0 would be useless as an eigenvector because it would alwaysbe included for every matrix. The zero eigenvector would convey no information.

***Problem 5.1. (1) Let A =[2 21 3

]. Show that v =

[11

]is an eigen-

vector of A. What is the eigenvalue λ? Show that v =[12

]is not an

eigenvector.(2) Let A = 0. Show that every non-zero column vector is an eigenvector with

eigenvalue λ = 0.(3) Let A = I. Show that every non-zero column vector is an eigenvector with

eigenvalue λ = 1.(4) Let λ be a scalar, and let A = λI. Show that every non-zero column vector

is an eigenvector with eigenvalue λ.

(5) Let A =

2 0 00 −1 00 0 3

. Show that

100

is an eigenvector of A with eigen-

value λ = 2;

010

is an eigenvector with eigenvalue λ = −1; and

001

is

an eigenvector of A with eigenvalue λ = 3.

77

78 5. EIGENVECTORS AND EIGENVALUES

(6) Formulate a conjecture about the eigenvectors and eigenvalues of a di-

agonal matrix A =

d1 0 · · · 00 d2 · · · 0...

.... . .

...0 0 · · · dn

. EXTRA CREDIT: prove your

conjecture.

The major complexity in the theory of eigenvalues and eigenvectors is the pos-sibility that they may be complex. Eigenvalues can be complex numbers a + biwhere i2 = −1, and the coordinates of eigenvectors can be complex numbers.1 Forexample: [

0 −11 0

] [i1

]=[−1i

]= i

[i1

]Thus λ = i is an eigenvalue of A =

[0 −11 0

]with eigenvector v =

[i1

]. This

course will study mostly real eigenvalues and eigenvectors, but complex ones areoccasionally unavoidable.

In some ways eigenvalues and eigenvectors generalize the concept of kernel.

Proposition 5.2. Let A be an n× n matrix.

(1) λ is an eigenvalue of A if and only if rank(A− λI) < n.(2) v is an eigenvector of A with eigenvalue λ if and only if v is a non-zero

vector in ker(A− λI).

Proof. We prove the second statement first. The vector v is an eigenvectorof A with eigenvalue λ if and only if v is non-zero and Av = λv. This is equivalentto

Av = λv

Av − λv = 0

(A− λI)v = 0

Thus v is a non-zero vector in ker(A− λI).The first statement now follows almost immediately. λ is a eigenvalue of A if

and only if A− λI has a non-zero kernel, or if and only if rank(A− λI) < n. �

***Problem 5.3. (1) Show that λ = 0 is an eigenvalue of an n × nmatrix A if and only if A is not invertible. Hint: two key observations.(1) A is not invertible if and only rank(A) < n; and (2) A = A− 0I.

(2) Show that if λ = 0 is an eigenvalue of A, then the eigenvectors for λ arethe non-zero vectors in ker(A).

(3) Show that A and AT have the same eigenvalues. Hint: it suffices to showthat λ is an eigenvalue for A if and only if λ is an eigenvalue for AT .Prove this by showing rank(A− λI) = rank(AT − λI).

1Throughout this chapter we use the term “complex” in a non-standard way to mean “not

real”. While the set of complex numbers includes the real numbers as a subset, there is no simpleterm for a complex number with non-zero imaginary part, or for a complex number that is not a

real number. Therefore we temporarily hijack the word “complex” for this purpose.

5.2. EIGENVALUES OF 2× 2 MATRICES 79

***Problem 5.4. (1) Suppose v is an eigenvector of a square matrix Awith eigenvalue λ. For any non-zero scalar a, show that av is also aneigenvector with eigenvalue λ.

(2) Suppose that v and w are eigenvectors of a square matrix A with the sameeigenvalue λ. Show that any non-zero linear combination av + bw is alsoan eigenvector.

The point of the previous problem is that, once you have a single eigenvalue-eigenvector pair λ and v, you can find many other eigenvectors with the sameeigenvalue. Later you will learn that, although a matrix may have an infinitenumber of eigenvectors, it can only have a finite number of distinct eigenvalues.Before proving any more results about general eignvalues and eigenvectors, we willinvestigate the 2 × 2 case closely. Most of the interesting facts about eigenvaluesand eigenvectors can be illustrated with 2× 2 matrices.

5.2. Eigenvalues of 2× 2 Matrices

Although 2 × 2 matrices may seem small, they are often useful. Geometricproblems in the plane can be represented by 2×2 matrices, and they are all you needfor second-order linear differential equations. Many of the interesting behaviorsassociated with eigenvalues can be found in 2× 2 matrices.

Let’s begin with an example. Let

A =[2 21 3

]By Proposition 5.2 a scalar λ is an eigenvalue of A if and only if

A− λI =[2− λ 2

1 3− λ

]has a non-zero kernel. This matrix has a non-zero kernel if and only if its rank isless than 2, which is the case if and only if it is not invertible. Thus, by Problem4.75, λ is an eigenvalue of A if and only if

0 = (2− λ)(3− λ)− (2)(1) = λ2 − 5λ+ 4 = (λ− 4)(λ− 1)

The eigenvalues of A are the roots of this quadratic equation. Finding an eigenvaluefor a 2 × 2 matrix usually comes down to solving a quadratic equation. Thisquadratic equation is called the characteristic polynomial of A.

The eigenvalues are λ = 4 and λ = 1. Let’s focus on the eigenvalue λ = 4.According to Proposition 5.2, the eigenvectors for this eigenvalue are the non-zerovectors in the kernel of

A− 4I =[−2 21 −1

]Thus v =

[xy

]is an eigenvector if

0 = Av =[−2x+ 2yx− y

]The equivalent system of equations is

−2x+ 2y = 0x− y = 0


The solutions are x = a and y = a for any scalar a. The eigenvectors are anynon-zero vectors

v =[aa

]The eigenvectors for eigenvalue λ = 4 lie on a line. We can describe the set of

eigenvectors completely by giving just one of them, for example, v =[11

].All the

other eigenvectors are scalar multiples of this eigenvector.

To check that v =[11

]is an eigenvector for eigenvalue λ = 4, check that

Av = λv:

Av =[2 21 3

] [11

]=[44

]= 4

[11

]= λv

***Problem 5.5. (1) Continuing the example above, find an eigenvectorfor the eigenvalue λ = 1 and check that it is correct.

(2) Show that A satisfies its characteristic polynomial: show A2−5A+4I = 0.(3) Show that your two eigenvectors form a basis for R2.

(4) Do it all over again with a different matrix B =[5 −36 −4

]. Find the eigen-

values and eigenvectors, show that B satisfies its characteristic polynomialand show that your eigenvectors form a basis for R2.

The example and problem illustrate the situation where a 2×2 matrix has tworeal eigenvalues and two lines of eigenvectors. If the matrix is a scalar multipleof the identity, A = aI, then every non-zero vector is an eigenvector of the singleeigenvalue λ = a. There is another possibility: that there is a single eigenvaluewith a one-dimensional line of eigenvectors.

Let A =[a 10 a

]. Then A − λI =

[a− λ 1

0 a− λ

]. The characteristic poly-

nomial of A is (a − λ)2. The characteristic polynomial has only one root λ = a,so A has the single eigenvalue λ. But the eigenvectors are the non-zero vectors in

ker(A− aI) = ker[0 10 0

]. This kernel is spanned by the eigenvector v =

[10

].

***Problem 5.6. Show directly that v =[10

]is an eigenvector of A with

eigenvalue λ = a by calculating Av.

A 2 × 2 matrix with a single one-dimensional line of eigenvectors is called adefective matrix.

***Problem 5.7. Show that A =[

1 1−1 3

]is a defective matrix.

Defective matrices are very delicate. If you change the entries by just a littlebit, the resulting matrix is probably no longer defective. We do an example by

modifying the matrix A =[a 10 a

]. Let B =

[a 1ε a

], where ε is some small number.

The characteristic polynomial of B is (a−x)2−ε, so the eigenvalues are λ1 = a+√ε

and λ2 = a −√ε. The single eigenvalue λ = a for A has split into two nearby

eigenvalues for B.

5.2. EIGENVALUES OF 2× 2 MATRICES 81

The eigenvector corresponding to eigenvalue λ1 = a +√ε is in the kernel of

B − λI =[−√ε 1

ε −√ε

], so the eigenvector is v1 =

[1√ε

].

***Problem 5.8. Show that the eigenvector for eigenvalue λ2 = a −√ε is

v2 =[

1−√ε

].

The matrixA =[a 10 a

]had a single eigenvector λ = a and eigenvector v =

[10

].

Perturbing A slightly results in a matrix B =[a 1ε a

]with two eigenvalues near

λ = a: λ1 = a +√ε and λ2 = a −

√ε. The eigenvectors are also near v =

[10

]:

v1 =[

1√ε

]and v2 =

[1−√ε

].

Now we come to the last possibility for eigenvalues of 2× 2 matrices, one thatwill involve complex numbers. Complex numbers are numbers of the form a + biwhere a and b are real and i2 = −1. You may have encountered complex numbersas the roots of quadratic equations. The roots of x2 − 2x + 2 = 0 are complex, aswe can see by applying the quadratic formula:

x =2±√

22 − 4 · 1 · 22

= 1±√−1 = 1± i

If you start with a quadratic equation with real coefficients, the roots are either oneor two real numbers or a pair of complex numbers of the form a ± bi (a conjugatepair).

Since the eigenvalues of a real 2× 2 matrix are the roots of a quadratic charac-teristic polynomial with real coefficients, the eigenvalues of a real 2× 2 matrix canbe a pair of conjugate complex numbers. Let’s look at an example.

Let A =[1 −11 1

]. To find the eigenvalues of A, we must first calculate the

characteristic polynomial, which is:

(1− λ)(1− λ) + 1 = λ2 − 2λ+ 2

As we saw above, the roots (eigenvalues) are λ = 1 + i and λ = 1− i.

The eigenvectors will be complex too. First we will find an eigenvector v =[xy

]associated with the eigenvalue λ = 1 + i. The eigenvector v must satisfy theequation:

0 = (A− λI)v

=[1− (1 + i) −1

1 1− (1 + i)

] [xy

]=[−i −11 −i

] [xy

]=[−ix− yx− iy

]


The equivalent system of equations is:

−ix− y = 0x− iy = 0

A non-zero solution is x = 1 and y = −i. This solution satisfies both equations,

so an eigenvector for eigenvalue λ = 1 + i is v =[

1−i

]. To check that v is an

eigenvector for the eigenvalue λ = 1 + i:

Av =[1 −11 1

] [1−i

]=[1 + i1− i

]= (1 + i)

[1−i

]= λv

***Problem 5.9. (1) Continuing the example above, find an eigenvectorfor the eigenvalue λ = 1− i and check that it is correct.

(2) Show that A satisfies its characteristic polynomial: show A2−2A+2I = 0.

(3) Do it all over again with a different matrix B =[0 −11 0

]. Find the

eigenvalues and eigenvectors, and show that B satisfies its characteristicpolynomial.

We have shown that the eigenstructure of a 2×2 matrix A is one of the following:(1) A = aI is a scalar multiple of the identity, and every non-zero vector in

R2 is an eigenvector with eigenvalue a;(2) A has one real eigenvalue and a single one-dimensional line of eigenvectors.

In that case A is defective;(3) A has two distinct real eigenvalues and two one-dimensional lines of eigen-

vectors. Two eigenvectors for the two eigenvalues form a basis for R2; or(4) A has two complex eigenvalues that form a conjugate pair. The eigenvec-

tors are complex too.To give geometric meaning to complex eigenvalues and eigenvectors, we would

have to redo our study of linear algebra from the start, emphasizing the complexvector spaces Cn instead of Rn. Studying complex spaces can be left to a moreadvanced course. This course will focus on real vector spaces. Next you will showthat an important class of 2×2 matrices always have real eigenvalues and are neverdefective. This is the class of symmetric matrices.

If you are unable to work out the algebra for the general case in the nextproblem, then you should work out an example by substituting numbers for a, band c.

***Problem 5.10. Consider the symmetric matrix:

A =[a bb c

]You know what the eigenvalues and eigenvectors are when A is a scalar multipleof the identity I, so assume that A is not a scalar multiple of I. That is, assumeeither a 6= c or b 6= 0 (or both).

(1) Show that A has two distinct real eigenvalues. Here is an outline you canfollow.

Prove that the eigenvalues are:

λ =a+ c±

√(a− c)2 + 4b2

2

5.3. FINDING EIGENVALUES AND EIGENVECTORS 83

Then argue that the quantity under the square root is a positive real num-ber, so there must be two real eigenvalues.

(2) Show that the eigenvectors associated with the two eigenvalues are orthog-onal to each other. The algebra is hideous, but it is just algebra.

We close with an examination of the eigenvalues for the reflection and rotationmatrices developed at the end of Section 2.2.

***Problem 5.11. Let w =[12

]and define the reflection matrix:

R = I − 2wTw

wwT

Show that w is an eigenvector of R with eigenvalue λ = −1, and that there isanother eigenvector orthogonal to w with eigenvalue λ = 1. EXTRA CREDIT:instead of doing the numerical problem, show that the result holds for any vectorw.

***Problem 5.12. Let A be the 2 × 2 rotation matrix for angle θ = π6 (see

2.2.4.1). Show that A has complex eigenvalues. EXTRA CREDIT: instead of doingthe numerical problem, show that the result holds for the rotation through any angleθ that is not an integral multiple of π.

5.3. Finding Eigenvalues and Eigenvectors

Every square matrix has at least one real or complex eigenvalue. UnfortunatelyI must ask you to simply accept this critical result. Proving it would require too longan excursion into the theory of complex vector spaces and complex polynomials.You saw in the previous section that 2 × 2 matrices had at least one eigenvaluebecause you knew that a quadratic equation has at least one root. Proving the sametheorem for larger matrices would require us to define a characteristic polynomialfor every matrix and prove that all polynomials had roots. You can leave all thatfor a later course.

Knowing that every square matrix has an eigenvalue does not help calculatethe eigenvalues or eigenvectors. The advanced methods used to prove that everymatrix has an eigenvalue are not efficient for finding the eigenvalue. The eigenvaluesand eigenvectors of a square matrix of size larger than 2× 2 are difficult to find ingeneral. Complex formulas exist for the eigenvalues of 3×3 and 4×4 matrices, butGalois theory shows that there are no exact formulas for the eigenvalues of largermatrices. Yet for some problems in fluid dynamics, for example, the eigenvalues ofmatrices with hundreds or even thousands of rows and columns are needed. Howare they found? Fast and accurate numerical algorithms have been developed toapproximate the eigenvalues and eigenvectors of any square matrix. While thesealgorithms are the subject of ongoing research, very good versions can be found inOctave.


If you have defined a square matrix A in Octave, you can find its eigenvalues andeigenvalues with the command:

e = eig(A)

Octave will return a column vector e whose elements are the eigenvalues of A. Ifyou enter:

[P,D] = eig(A)

Octave returns a diagonal matrix D with the eigenvalues of A on the diagonal, anda matrix P whose columns are the eigenvectors of A. The eigenvalues appear inpositions corresponding to their eigenvectors.

Here is a sample calculation.

The eigenvalues of A =

1 2 34 5 67 8 9

are λ1 = 16.1, λ2 = −1.12 and λ3 = 0. The

eigenvectors are v1 =

−0.232−0.525−0.819

, v2 =

−0.786−0.0868

0.612

and v3 =

0.408−0.8160.408

.

***Problem 5.13. Check the eigenvalue-eigenvector algorithm in Octave. Let

A =

1 3 −1 23 2 −3 1−1 −3 1 32 1 3 −2

Enter A into Octave, and then use Octave to con-

struct a diagonal matrix D of eigenvalues λ1, . . . , λ4 and a matrix P of eigenvectorsv1, . . . ,v4. You do this with the command [P,D] = eig(A). The eigenvalues λiare denoted in Octave as D(i,i), and the eigenvectors vi are denoted in Octave asP(:,i). Use Octave to show that Avi = λivi for i = 1, . . . , 4. The easiest way isto show that Avi − λivi = 0.

The matrices P and D of eigenvectors and eigenvalues satisfy a matrix equation.Moreover this equation is satisfied only if P and D consists of eigenvectors andeigenvalues.


Proposition 5.14. Let A be an n×n matrix. Then AP = PD for some matrixP and diagonal matrix D if and only if every column of P is an eigenvector of A.In that case the eigenvector P (:, i) has eigenvalue D(i, i).

Although A must be a square matrix, P and D need not be square for thisresult to hold.

Proof. We will show that AP = PD if and only if, for every column P (:, i) ofP and corresponding diagonal element D(i, i) of D, AP (:, i) = D(i, i)P (:, i). Thecalculations use Proposition 2.7.

If AP = PD then

AP (:, i) = (AP )(:, i) = (PD)(:, i) = PD(:, i) = D(i, i)P (:, i)

Conversely, if AP (:, i) = D(i, i)P (:, i) for all i then:

(AP )(:, i) = AP (:, i) = D(i, i)P (:, i) = PD(:, i) = (PD)(:, i)

Thus every column of AP equals the corresponding column of PD, so AP = PD.�

***Problem 5.15. Continue the previous problem. Use Octave to show thatAP = PD.

We are most interested in what happens when the matrix P in the previousproblem is invertible.

Definition 5.16. Let A be a square matrix. A is diagonalizable if thereexists an invertible matrix P and a diagonal matrix D such that P−1AP = D.Square matrices that are not diagonalizable are called defective.

Equivalent conditions are AP = PD or A = PDP−1.

***Problem 5.17. Continue the previous problem. Show that A is diag-onalizable by showing using Octave to show that P−1AP is diagonal, and thatA = PDP−1.

Most square matrices are diagonalizable in the same way that most matriceshave maximal rank. If a matrix is defective (and you will see examples below),changing the entries by a small amount will yield a diagonalizable matrix. A randommatrix is diagonalizable. Therefore we start with a study of diagonalizable matrices.

Proposition 5.18. An n × n matrix A is diagonalizable if and only if A haseigenvectors v1, . . . ,vn that form a basis for Rn.

Proof. A is diagonalizable if and only if there exists a an invertible matrixP and diagonal matrix D such that AP = PD. If A is diagonalizable, then byProposition 5.14 the columns of P are eigenvectors of A. But the columns of ann× n invertible matrix are a basis of Rn.

Conversely, suppose v1, . . . ,vn are eigenvectors of A with eigenvalues λi, andsuppose the vi form a basis of Rn. If P is a matrix with columns vi and if D isa diagonal matrix with diagonal entries λi, then by Proposition 5.14 AP = PDand by Proposition 4.65 P is invertible because rank(P ) = n. Therefore A isdiagonalizable. �


***Problem 5.19. Continuing the previous problem, show that the columns ofP form a basis of R4. Hint: use Octave to show rank(P ) = 4 and argue (how?)that the columns of P therefore are a basis of R4.

Now we leave diagonalizable matrices to investigate defective matrices. Be-fore giving an example of a defective matrix, we need one proposition, and theproposition needs a lemma.

Lemma 5.20. Let T be a triangular n× n matrix (either upper or lower trian-gular). Then T has a 0 on the diagonal if and only if rank(T ) < n.

Proof. We will suppose T is upper-triangular. Suppose T does not have a 0on the diagonal. Then T can be row reduced to I: divide each row by its diagonalelement, so each row has a leading 1, then subtract multiples of each row fromhigher rows until the matrix is row reduced to I. Thus rank(T ) = n.

Conversely, Suppose T (i, i) = 0. Here is a picture of T with boxes around thediagonal elements. There is a zero on the diagonal in row i.

T =

∗ ∗ · · · ∗ ∗ · · · · · · ∗0 ∗ · · · ∗ ∗ · · · · · · ∗...

. . . . . ....

......

0 · · · 0 ∗ ∗ · · · · · · ∗0 · · · · · · 0 0 ∗ · · · ∗0 · · · · · · 0 0 ∗ · · · ∗...

. . . . . ....

0 · · · · · · · · · 0 ∗

Let T1 be the matrix consisting of the first i− 1 rows of T :

T1 ==

∗ ∗ · · · ∗ ∗ · · · · · · ∗0 ∗ · · · ∗ ∗ · · · · · · ∗...

. . . . . ....

......

0 · · · 0 ∗ ∗ · · · · · · ∗

and let T2 be the matrix consisting of rows i through n of T :

T2 =

0 · · · · · · 0 0 ∗ · · · ∗0 · · · · · · 0 0 ∗ · · · ∗...

. . . . . ....

0 · · · · · · · · · 0 ∗

Columns 1, . . . , i of T2 are all zeros. Since rank(T2) is not more than the number ofnon-zero columns of T2, rank(T2) ≤ n− i. Thus the rows of T2 span a subspace ofdimension no more than n− i, and adding the i− 1 rows of T1 shows that the rowsof T span a subspace of dimension no more than n− 1. Thus rank(T ) < n. �

Proposition 5.21. Let T be a triangular square matrix (either upper or lowertriangular). Then λ is an eigenvalue of T if and only if λ appears on the diagonalof T .

Proof. T−λI is a triangular matrix. λ is an eigenvalue of T if rank(T−λI) <n. Since T −λI is a triangular matrix, by the lemma λ is an eigenvalue of T if and


only if T − λI has a 0 on the diagonal, which is the case if and only if λ is on thediagonal of T . �

The next problem shows how to construct a special class of defective matricescalled Jordan blocks.

***Problem 5.22. Let T =

2 1 0 00 2 1 00 0 2 10 0 0 2

. Use the Proposition to show that

2 is the only eigenvalue of T , and show that rank(T −2I) = 3. Find an eigenvectorof T .

You can replace the diagonal elements 2 with any other value and get a JordanBlock with the same properties. You an also make bigger or smaller Jordan blocks,all with rank one less than their size.

***Problem 5.23. Continuing the previous problem, you will see how muchtrouble Octave has with defective matrices. Put T into Octave and execute [P,D]=eig(T).What does Octave say are the eigenvalues and eigenvectors? Is P*D-T*P equal to0? Is P*D*P^(-1)-T equal to 0? Can you explain the apparent contradiction?Examining P^(-1) might help.

How can you tell if a matrix is diagonalizable? If an n×n matrix has n differenteigenvalues, then it must be diagonalizable, as shown by the next result.

Proposition 5.24. Let λ1, . . . , λt be distinct eigenvalues for a matrix A, andsuppose each eigenvalue λi has an eigenvector vi. Then v1, . . . ,vt is a linearlyindependent set of vectors.

Proof. This is one of the more complicated proofs in the course, but the resultis important so hang on.

We want to show that the eigenvectors v1, . . . ,vt are linearly independent. Ift = 1 we are done. Eigenvectors are non-zero, and one non-zero vector forms alinearly independent set.

We will show that if v1, . . . ,vt is linearly dependent, then v2, . . . ,vt is linearlydependent. We can continue to drop the first vector without losing linear depen-dence until only one eigenvector is left, which cannot be a linearly dependent set.Thus the initial set of eigenvectors must have been linearly independent.

Suppose v1, . . . ,vt are linearly dependent. Then we have scalars a1, . . . , at, notall 0, such that

a1v1 + · · ·+ atvt = 0

At least two of the coefficients must be non-zero, because all the vi are non-zero.We have

0 = A(a1v1 + · · ·+ atvt)= a1Av1 + · · ·+ atAvt= a1λ1v1 + · · ·+ atλtvt

0 = λ1(a1v1 + · · ·+ atvt)= a1λ1v1 + · · ·+ atλ1vt


Subtracting one expression from the other we get:

0 = a2(λ2 − λ1)v2 + · · ·+ at(λt − λ1)vtSince at least one of the ai, i ≥ 2 is non-zero and all the λi − λ1 are non-zerobecause the eigenvalues are distinct, this is a linear combination of v2, . . . ,vt withat least one non-zero coefficient that adds up to 0. Therefore v2, . . . ,vt are linearlydependent. �

Corollary 5.25. Suppose A is an n×n matrix. Then A has at most n distincteigenvalues.

Proof. If A is n×n, then the eigenvectors are vectors in Rn. The Propositionshow that if there are t distinct eigenvalues, then the t eigenvectors form a linearlyindependent set in Rn. Thus t ≤ n because the largest size of a linearly independentset in Rn is n. �

Corollary 5.26. If an n × n matrix A has n distinct eigenvalues, then it isdiagonalizable.

Proof. In this case A has n linearly independent eigenvectors, so there is abasis of eigenvectors for Rn. �

Corollary 5.27. If an n × n matrix A has n distinct eigenvalues, then theeigenvectors for each eigenvalue form a one-dimensional line. All the eigenvectorsfor an eigenvalue are scalar multiples of each other. If you know one eigenvectorfor each eigenvalue, you know all the eigenvectors.

Proof. For each eigenvalue λ, the eigenvectors are the non-zero vectors inker(A − λI). Choose a basis for each of these kernels; the union of these bases isa linearly independent set in Rn. Since there are n eigenvectors and at most nvectors in a linearly independent set, there can be no more than one vector in thebasis of each kernel. �

***Problem 5.28. Suppose T is a triangular n × n matrix with n distinctelements on the diagonal. Prove that T is diagonalizable.

Here’s a simple result that can sometimes be useful.

Proposition 5.29. Let A be an n×n matrix with eigenvalue λ and eigenvectorv.

(1) For any positive integer m, λm is an eigenvalue of Am with eigenvectorv.

(2) If A is invertible, then λ 6= 0 and for any positive or negative integer m,λm is an eigenvalue of Am with eigenvector v.

Proof. For the first part we use induction on m. The statement is certainlytrue for m = 1. If it holds for m, if Amv = λmv, then to complete the inductionwe must show that Am+1v = λm+1v:

Am+1v = A (Amv)

= A (λmv)

= λm (Av)

= λm (λv)

= λm+1v

5.4. SYMMETRIC MATRICES AND EIGENVALUES 89

For the second part, Problem 5.3 shows that λ 6= 0 if A is invertible. Tocomplete the proof, by the first part, it suffices to show that λ−1 is an eigenvalueof A−1 with eigenvector v. We leave this to the reader. �

***Problem 5.30. Let A be an invertible matrix with an eigenvalue λ 6= 0 andeigenvector v. Show that λ−1 is an eigenvalue for A−1 with the same eigenvectorv. Hint: use Av = λv to prove that A−1v = λ−1v.

Before leaving this section, you need to know something about the complexeigenvalues of a matrix. They appear in conjugate pairs. That is, if λ1 = a+ bi isan eigenvalue of A, then λ2 = a− bi is also an eigenvalue of A. Because of this, asquare matrix of odd size must have a real eigenvalue.

***Problem 5.31. Let

A =

1 −2 4 −8 161 −1 1 −1 11 1 1 1 11 2 4 8 161 3 9 27 81

Find the eigenvalues of A and show that the complex eigenvalues appear in conjugatepairs. Is the size of he matrix odd or even? Are there any real eigenvalues?

The example is a Vandermonde matrix. Its rows are powers of different num-bers.

Let’s review what you have learned in this section about the eigenvalues andeigenvectors of a matrix A with real number entries.

(1) A has at least one real or complex eigenvalue.(2) The complex eigenvalues appear in conjugate pairs. A matrix of odd size

has at least one real eigenvalue and eigenvector.(3) If A is a n× n matrix then either:

(a) A is diagonalizable. We can find eigenvalues λ1, . . . , λn (not neces-sarily distinct) and eigenvectors v1, . . . ,vn such that the eigenvectorsform a basis for Rn. If the eigenvectors are the columns of a matrixP and the eigenvalues the diagonal elements of a diagonal matrix Dthen A = PDP−1.

(b) A is defective. There is no basis of eigenvectors.

5.4. Symmetric Matrices and Eigenvalues

5.4.1. Orthonormal sets.

***Problem 5.32. Enter a symmetric matrix A into Octave by entering:B = rand(4,4)A = B + B’

Calculate an eigenvector v for A:[P,Q] = eig(A)v = P(:,1)

Show that v has length 1 by showing that ‖v‖2 = vTv = 1. Calculate:v’*v

The problem shows that Octave always returns eigenvectors scaled to length 1.


Definition 5.33. A vector of length 1 is a unit vector.


132

. Show that there exists a vector w pointing

the the same direction as v with length 1. Hint: multiply v by a positive scalar to

make the length equal to 1. The scalar is1‖v‖

.

Unit vectors are convenient because they simplify certain formulas. If you lookat the formulas for projection, reflections and rotations in Section 2.2, you will seevTv in many denominators. If v is a unit vector, then all these denominators are 1and can be omitted from the formulas. For example, the projection along v changesfrom:

P = I − 1vTv

vvT

toP = I − vvT

Even more convenient than individual unit vectors are sets of unit vectors thatare mutually orthogonal.

Definition 5.35. Vectors v1, . . . ,vt satisfying ‖vi‖ = 1 and vi ·vj = 0 if i 6= jform an orthonormal set.

***Problem 5.36. Show that a set of vectors v1, . . . ,vt is orthonormal if

vi · vj ={

1 if i = j0 if i 6= j

Orthonormal sets of vectors are very convenient for calculating. The mainreason is that coefficients in linear combinations can be recovered easily.

Proposition 5.37. Let v1, . . . ,vt be an orthonormal set, and suppose for somescalars ai we have:

w = a1v1 + · · ·+ atvtThen ai = vi ·w.

Proof.

vi ·w = vi · (a1v1 + · · ·+ atvt)

= a1(vi · v1) + · · ·+ at(vi · vt)= ai by Problem 5.36

�

Corollary 5.38. An orthonormal set is linearly independent.

Proof. Let v1, . . . ,vt be an orthonormal set, and suppose for some scalars aiwe have:

a1v1 + · · ·+ atvt = 0To show that the vectors are linearly independent we must show that ai = 0 for alli. But ai = vi · (a1v1 + · · ·+ atvt) = vi · 0 = 0. �

***Problem 5.39. Let v1 =[

1√2

1√2

]and v2 =

[−1√

21√2

].

(1) Show that v1 and v2 form an orthonormal set.


(2) How can you conclude that this set is a basis for R2?(3) Use Proposition 5.37 to find a and b such that[

1 0]

= av1 + bv2

If you have a set of vectors w1, . . . ,wt spanning a subspace V , you can constructan orthonormal set spanning the same subspace. Since the set is orthonormal, itwill be linearly independent, so it will be an orthonormal basis for V . Theprocess that produces the orthonormal basis is called the Gram-Schmidt process.The number of vectors produced is the dimension of V , which will be no more thatt but which may be less.

The Gram-Schmidt process starts with the vectors w1, . . . ,wt and replaces thewi one vector at a time with new vectors vi that are orthonormal so that the newvectors vi and the remaining wj (those that haven’t be replaced yet) span the samespace as the orignal w1, . . . ,wt. Suppose you have replaced the first r − 1 vectors.You would have a set of vectors like this:

(5.4.1) v1, · · · ,vr−1,wr, · · · ,wt

where v1, · · · ,vr−1 are an orthonormal set and the entire set spans the same spaceas the original w1, . . . ,wt. The process begins with r = 1. One step of the processconsists of replacing wr with a vector vr so that the resulting set:

(5.4.2) v1, · · · ,vr,wr+1, · · · ,wt

spans the same subspace as (5.4.1), but where the orthonormal part of the sequencecontains one more vector. Should it ever happen during the process that vr = 0,discard vr and continue with the sequence:

v1, · · · ,vr−1,wr+1, · · · ,wt

To construct vr, use the formula:

x = wr − (v1 ·wr)v1 − · · · − (vr−1 ·wr)vr−1

vr =1‖x‖

x

We will prove that (5.4.2) spans the same subspace as (5.4.1) and that v1, . . . ,vris orthonormal. Let A be the matrix with the vectors in (5.4.1) as rows, and letB be the matrix with the vectors in (5.4.2) as rows. We changed A to B by rowoperations, so A and B are row equivalent matrices. Their rows span the samesubspace. It remains to show that v1, . . . ,vr are orthonormal. We can assume thatv1, . . . ,vr−1 are orthonormal. The second step of the construction guarantees that‖vr‖ = 1. We need only check for i < r that vi · vr = 0, and it suffices to checkthat vi · x = 0.

vi · x = vi ·wr − (v1 ·wr)(vi · v1)− · · · − (vr−1 ·wr)(vi · vr−1)

= vi ·wr − (vi ·wr)(vi · vi)= 0

We illustrate the Gram-Schmidt process using Octave for the arithmetic.


w1=[1,2,3]; w2=[-1,2,1]; w3=[4,0,2];v1=w1/norm(w1)

v1=[0.26726, 0.53452, 0.80178]x=w2-(v1*w2’)*v1; v2=x/norm(x)

v2=[-0.77152, 0.61721, -0.15430]x=w3-(v1*w3’)*v1-(v2w3’)*v2; v3=x/norm(x)

v3=[0.57735, 0.57735, 0.57735]

***Problem 5.40. (1) Check that the vectors v1, v2 and v3 form anorthonormal set.

(2) Starting with w1 = (1, 4, 3, 1), w2 = (2, 1, 3, 1) and w3 = (−1, 0, 1, 2) findan orthonormal basis v1, v2 and v3 for the subspace spanned by w1, w2

and w3.(3) Using Proposition 5.37, find the coefficients ai in the linear combination:

w1 = a1v1 + a2v2 + a3v3

Orthogonalization is such an important operation that it could not have been leftout of Octave. If A is a matrix then orth(A) returns a matrix whose columnsform an orthonormal set and span the same column space as A. Octave uses analgorithm slightly different from the Gram-Schmidt algorithm that better combatsround-off error, so the answers given by orth() may be slight different (in the 15thor 16th digit) than the ones you calculate.

***Problem 5.41. Using the data from part 2 of the previous problem, putthe vectors w1, w2 and w3 into the columns of a matrix, then apply the commandorth(). Do you get the same result as you got before?

5.4.2. Orthogonal matrices. Matrices whose columns are orthonormal areeasy to work with, particularly when it comes to finding inverses.

Definition 5.42. A matrix A is orthogonal if it has orthonormal columns.

Linguistic note: it would make more sense to call such matrices orthonormalmatrices, but sometimes tradition rules. For a set of vectors, “orthogonal” meansmutually perpendicular but not necessary of length 1, and “orthonormal” meansmutually perpendicular and individually of length 1. For a matrix, “orthogonal”means that the columns are orthonormal: mutually perpendicular and individuallyof length 1.

Next we have a criterion for orthogonality of matrices which is easy to applyin Octave and use in proofs.

Proposition 5.43. A matrix A is orthogonal if and only if ATA = I

Proof. The columns A(:, i) of A are the rows AT (i, :) of AT . The columns ofA are orthonormal if and only if

A(:, i) ·A(:, j) = AT (i, :)A(:, j) ={

0 if i 6= j1 if i = j

ATA is a square matrix with elements:

(ATA)(i, j) = AT (i, :)A(:, j)


Therefore ATA is the identity matrix if and only if the columns A are orthonormal.�

***Problem 5.44. (1) Check that the matrix created by the orth() com-mand in Problem 5.41 is an orthogonal matrix.

(2) If A is orthogonal, is AT a left or right inverse for A?(3) If A and B are orthogonal matrices that can be multiplied, prove AB is

orthogonal.(4) If A is square and orthogonal, prove (a) A−1 = AT and (b) AT is orthog-

onal. This is really a surprising result: if the columns of a square matrixare orthonormal, then so are the rows!

***Problem 5.45. (1) Show that a 2×2 rotation matrix Q =[cos θ − sin θsin θ cos θ

]is orthogonal.

(2) Let v be a unit vector in Rn. Show that the reflection matrix R = I−2vvT

is orthogonal.

From a geometric point of view, square orthogonal matrices represent rigidmotions of space. Every square matrix “moves” space in the following sense. If Ais n× n, then A maps every point v in Rn to another point Av in Rn. (The point0 is held fixed, because A0 = 0.) Orthogonal matrices move points in such a waythat angles and distances are unchanged.

Proposition 5.46. Suppose A is an orthogonal matrix.

(1) For any vector v, ‖Av‖ = ‖v‖.(2) For any two vectors v and w, the angle from Av to Aw is same as the

angle from v to w.

Note: this result holds even if A is not square, so the vectors Av and Aw aredifferent sizes than v and w. All you need is ATA = I.

Proof. (1)

‖Av‖2 = (Av)T (Av) = vTATAv = vTv = ‖v‖2

(2) Let θ1 be the angle from v to w, and let θ2 be the angle from Av to Aw.Then:

cos(θ2) =(Av)T (Aw)‖Av‖‖Aw‖

=vTATAw‖v‖‖w‖

=vTw‖v‖‖w‖

= cos(θ1)

so θ1 = θ2.�

***Problem 5.47. Let U be the orthogonal matrix created in Problem 5.41.

Let v =

13−2

and w =

2−11

.

(1) Check that ‖Av‖ = ‖v‖(2) Check that the angle between v and w equals the angle between Av and

Aw.


5.4.3. Eigenvalues of symmetric matrices. Many applications of eigen-values and eigenvectors occur when the matrix is a symmetric matrix (a matrixsuch that A = AT ). In some sense eigenvalues and eigenvectors are best adaptedto the analysis of symmetric matrices. There are even special algorithms for find-ing eigenvalues and eigenvectors of symmetric matrices that are faster than thealgorithms applied to general square matrices.

One reason that eigenvalues and eigenvectors are so effective in analyzing sym-metric matrices is that their eigenvalues and eigenvectors have a special structure.

Lemma 5.48. Let A be a symmetric matrix. Then the eigenvalues of A are allreal numbers.

Proof. The proof takes us into the realm of complex numbers. Suppose λ isan eigenvalue of A. To show that λ is real, it suffices to show λ = λ. If v is a vectorwith real and complex entries, let v be the vector whose entries are the complexconjugates of the entries of v. If v is a non-zero vector then vTv is a positive realnumber.

Now can show λ is real. Let v be an eigenvector for λ.

λvTv = vT (λv)

= vT (Av)

=(vTAT

)v because A = AT

= (Av)T v

=(Av)T

v because A is real

=(λv)T

v

=(λv)T

v

= λvTv

Since vTv is non-zero, λ = λ. �

The next lemma is taken from the theory of “normal matrices”. It is the keystep in establishing the eigenvalue/eigenvector structure of a symmetric matrix.The proof is a bit complicated, because it is one of the central results in linearalgebra.

Lemma 5.49. Let A be a symmetric matrix. Then there exists a square orthog-onal matrix P such that PTAP is diagonal.

Proof. The proof proceeds by induction on the size of A. If A is a 1 × 1matrix, the result holds trivially. Suppose A is n × n and the results holds forsmaller symmetric matrices.

Since every square matrix has an eigenvalue, we can let λ be an eigenvalue ofA. By Lemma 5.48, λ is real and has a real eigenvector v. We can assume v has

length 1 and is the first column of an orthogonal matrix P1. If e1 =

10...0

, then

P1e1 = v.


Let A1 = PT1 AP1. Then the first column of A1 is:

A1(:, 1) = A1e1 = PT1 AP1e1 = PT1 Av = PT1 λv = λPT1 v = λP−11 v = λe1 =

λ0...0

Thus A1 =

λ ∗ · · · ∗0 ∗ · · · ∗...

.... . .

...0 ∗ · · · ∗

. Since A is symmetric, A1 is symmetric and therefore:

A1 =

λ 0 · · · 00 ∗ · · · ∗...

.... . .

...0 ∗ · · · ∗

=

λ 0 · · · 00... B0

where B is a symmetric n − 1 × n − 1 matrix. By induction there exists a squareorthogonal matrix P2 such that PT2 BP2 = D2 is diagonal. If we define the squareorthogonal matrix:

P3 =

1 0 · · · 00... P2

0

then

D4 = PT3 A1P3 =

λ 0 · · · 00... D2

0

is a diagonal matrix. If P4 = P1P3, then P4 is an orthogonal matrix (Problem 5.44)and

PT4 AP4 = PT3 PT1 AP1P3 = PT3 A1P3 = D4

is a diagonal matrix. �

Theorem 5.50. Let A be an n × n symmetric matrix. A has real eigenvaluesλ1, . . . , λn and orthonormal eigenvectors v1, . . . ,vn. If D is the diagonal matrixwith the eigenvalues λi on the diagonal, and P is the orthogonal matrix that hasthe eigenvectors vi for columns, then:

A = PDP−1 = PDPT

Thus A is diagonalizable.

Proof. Using the lemma, choose a square orthogonal matrix P such thatD = PTAP is diagonal. Since P is orthogonal and square, PT = P−1. ThusA = PDP−1 or AP = PD with P an invertible matrix. The result now followsfrom Proposition 5.18. �

***Problem 5.51. Use Octave to construct a 6 × 6 symmetric matrix, andcheck that it has real eigenvalues and orthonormal eigenvectors.


***Problem 5.52. Suppose A is a symmetric matrix with positive eigenval-ues. Then A = PDPT for some orthogonal matrix P and diagonal matrix D withpositive diagonal entries. Symmetric matrices with positive eigenvalues are called

positive matrices. If D =

λ1 · · · 0...

. . ....

0 · · · λn

, let E =

√λ1 · · · 0...

. . ....

0 · · ·√λn

. Define

B = PEPT . Show that B is a positive matrix and that B2 = A. Thus everypositive matrix has a positive square root.

CHAPTER 6

Linear Equations and Linear Approximations

6.1. Singular Value Decomposition Defined

Suppose A is a non-zero matrix. You have seen that A can have a lot ofpathologies.

(1) A might be square(a) A might be invertible(b) A might not be invertible

(2) A might have more columns than rows(a) The rank of A might be the number of rows of A, in which case A

has a right inverse(b) The rank of A might be less than the number of rows of A, in which

case A has no right or left inverse(3) A might have more rows than columns

(a) The rank of A might be the number of columns of A, in which caseA has a left inverse

(b) The rank of A might be less than the number of columns of A, inwhich case A has no left or right inverse

(4) The equation Ax = 0 might have only the zero solution or might have aninfinite subspace of solutions. That is, ker(A) might be {0} or might bea non-zero subspace.

(5) The equation Ax = b might have no solutions, a unique solution or infinitesolutions.

There is a factorization for any non-zero matrix that will tell you almost every-thing you could want to know about the matrix, its rank, and solutions to equationswith the matrix. This factorization will even give you the best approximate solu-tion to an equation Ax = b when no exact solution exists. The factorization isthe singular value decomposition. We will introduce it in this section, demon-strate its power and versatility in the next section, and conclude with a proof ofthe decomposition.

Theorem 6.1. Let A be a non-zero r × c matrix of rank n. There exists anr × n orthogonal matrix U , a c × n orthogonal matrix V and an n × n diagonalmatrix D with positive diagonal entries listed in decreasing order ( i.e. D(1, 1) ≥· · · ≥ D(n, n) > 0) such that:

A = UDV T

The matrix D is uniquely determined by A. The diagonal entries of D are thesingular values of A.

97

98 6. LINEAR EQUATIONS AND LINEAR APPROXIMATIONS

For example:1 2 34 5 67 8 9

=

−0.215 0.887−0.521 0.250−0.826 −0.388

[16.8 00 1.07

] [−0.480 −0.572 −0.665−0.777 −0.075 0.625

]The singular value decomposition is truly amazing. It encodes much the in-

formation about bases, dimension and orthogonality developed in Chapters 3 and4.

***Problem 6.2. (1) If D is a n × n diagonal matrix with positive di-agonal entries listed in decreasing order, show that the singular value de-composition of D is D = InDI

Tn .

(2) If U is an r×n orthogonal matrix, show that the singular value decompo-sition of U is UInITn .

Octave will computer singular value decompositions. If A is a matrix, its decom-position is computed as:

[U,D,V] = svd(A,1)

***Problem 6.3. Find the singular value decomposition of A =

1 3 4 22 1 5 33 4 2 −1

.

What are U , V and D? What are the singular values? Check that UTU = I,V TV = I and A = UDV T .

***Problem 6.4. Suppose you have a singular value decomposition for a ma-trix A: A = UDV T .

(1) Show that UTU = I and V TV = I(2) Show that UUT = I if and only if U is square if and only if A has linearly

independent rows. Corollary 4.61 may be helpful.(3) Show that V V T = I if and only if V is square if and only if A has linearly

independent columns.

(4) Show that D is invertible and D−1 =

D(1, 1)−1 · · · 0...

. . ....

0 · · · D(d, d)−1

.

(5) Prove: AT = V DUT . This is the singular value decomposition of AT .

***Problem 6.5. Octave sometimes has a problem with matrices of less thanfull rank. Because numbers that should be zero instead come out as very smallvalues, Octave reports very small singular values that should be zero.

Let A =

−3 4 65 5 40 −10 −120 5 6

. Find rank(A). Then use the singular value de-

composition in Octave to find U , D and V . What are the singular values?You will have three singular values, but the last one is nearly 0. It should have

been exactly 0. When Octave calculated the rank of A, it looked at the singularvalues and found two of them significantly greater than zero. On the other hand,

6.2. CONSEQUENCES OF THE SINGULAR VALUE DECOMPOSITION 99

when it reported the singular values it reported three including one that is nearlyzero. Since the rank of A is 2, A should only have two positive singular values. Youshould change the very small singular value to 0 and throw it away. You should alsothrow away the corresponding columns of U and V , which are the third columns.You can create corrected values for U , D and V in Octave with the commands:

U1 = U(:,1:2)D1 = D(1:2,1:2)V1 = V(:,1:2)

What are the sizes of the corrected matrices U1, D1 and V1? Are the sizes what thesingular value decomposition theorem requires? Using the modified values U1, D1

and V1, check that UT1 U1 = I, V T1 V1 = I and A = U1D1VT1 .

Proposition 6.6. Let A be a non-zero r × c matrix of rank n, and let thesingular value decomposition of A be A = UDV T .

(1) The row space of A is the column space of V .(2) The column space of A is the column space of U .(3) The columns of V are an orthonormal basis for the row space of A and

the columns of U are an orthonormal basis for the column space of A.(4) ker(A) = ker(V T ) and ker(AT ) = ker(UT ).

Proof. (1) First we show that the column space of A is the column spaceof U . Begin by showing that every vector in the column space of A is alsoin the column space of U . A vector v in the column space of A is a productv = Ax for a column vector x of size c. But v = Ax = U(DV Tx) = Uyfor y = DV Tx, a column vector of size r. Thus v is in the column spaceof U . Conversely, since DV TV D−1 = I,

Uy = UDV TV D−1y = A(V D−1y)

so every vector in the column space of U is also in the column space of A.Therefore the column spaces of A and U are equal.

(2) Since AT = V DUT , the row space of A, which is the column space of AT ,is the column space of V .

(3) Since U and V are orthogonal matrices, their columns are orthonormalbases for their column spaces.

(4) Finally since A and V T have the same row spaces, they have the samekernels (Corollary 4.45). Similarly since AT and UT have the same rowspaces they have the same kernels.

�

***Problem 6.7. Continuing Problem 6.3, show that the row space of A is thecolumn space of V by showing that A and V T row reduce to the same row-reducedechelon matrix. Explain why the equality of these two reductions establishes theequality of the row space of A and the column space of V . Similarly show that thecolumn space of A and the column space of U are the same.

6.2. Consequences of the Singular Value Decomposition

6.2.1. Pseudoinverses.


***Problem 6.8. Suppose A is an invertible matrix with singular value de-composition A = UDV T . Show that A−1 = V D−1UT . Hint: A is invertible if andonly if A is square and rank(A) is equal to the number of rows and columns of A.Problem 6.4 will be helpful.

The formula for A−1 given in the problem above makes sense whether or notA is invertible.

Definition 6.9. Let A be a non-zero matrix with singular value decompositionA = UDV T . Then the pseudoinverse of A is:

A+ = V D−1UT

The formula for the pseudoinverse A+ is not the singular value decompositionof A+. The diagonal entries of D−1 are in increasing instead of decreasing order.The singular value decomposition of A+ can be easily found. Let D be a diagonalmatrix with the diagonal entries of D−1 listed in decreasing (reverse) order. Let Uhave the same columns as U but in reverse order, and let V have the same columnsas V but in reverse order. Then the singular value decomposition of A+ is:

A+ = V DUT

Note that U and U have the same column spaces, as do V and V . Thus the columnspace of A+ is the column space of V , and the row space of A+ is the column spaceof U .

Not every matrix has an inverse, not even a left or right inverse. However everynon-zero matrix has a pseudoinverse derived from its singular value decomposition.As you will see in the subsections that follow, when a matrix has a left or rightinverse or two-sided inverse, then the inverse is the pseudoinverse. When a matrixhas no inverses, the pseudoinverse will still help solve equations using the matrix.

The pseudoinverse is also called the generalized inverse or the Moore-Penroseinverse.

For example, if

A =

1 2 34 5 67 8 9

=

−0.215 0.887−0.521 0.250−0.826 −0.388

[16.8 00 1.07

] [−0.480 −0.572 −0.665−0.777 −0.075 0.625

]= UDV T

then

A+ = V D−1UT

=

−0.480 −0.777−0.572 −0.075−0.665 0.625

[0.059 00 0.936

] [−0.215 −0.521 −0.8260.887 0.250 −0.388

]

=

−0.639 −0.167 0.306−.056 0 0.0560.528 0.167 0.194


A is not invertible. It is a 3 × 3 matrix of rank 2, which you can see becausethere are only two singular values. If you compute AA+ or A+A you get:

AA+ = A+A =16

5 2 −12 2 2−1 2 5

***Problem 6.10. Continuing Problem 6.2:(1) If D is a n × n diagonal matrix with positive diagonal entries listed in

decreasing order, show that D+ = D−1.(2) If U is an r × n orthogonal matrix, show that U+ = UT .

Octave will compute the pseudoinverse of a matrix A directly. Enter pinv(A).

***Problem 6.11. Find the pseudoinverse A+ of A =

1 3 4 22 1 5 33 4 2 −1

. Check

the following properties:(1) AA+A = A

(2) A+AA+ = A+

(3) AA+ and A+A are symmetric

(4)(AT)+ = (A+)T

Proposition 6.12. AT and A+ have the same row and column spaces.

Proof. Suppose the singular value decomposition of A is A = UDV T . ThenAT = V DUT , while A+ = V D−1UT . By Proposition 6.6, the row spaces of AT

and A+ are both equal to the column space of U , while the column spaces of AT

and A+ are both equal to the column space of V . �

***Problem 6.13. Show for a non-zero matrix with singular value decompo-sitions

A = UDV T

that the following properties of the pseudoinverse hold:(1) AA+A = A

(2) A+AA+ = A+

(3) AA+ and A+A are symmetric

(4)(AT)+ = (A+)T

6.2.2. Rectangular matrices with linearly independent rows. Let A bean r × n matrix with r < n. Suppose the rows of A are linearly independent, sorank(A) = r. If the singular value decomposition of A is A = UDV T , Problem 6.4tells you that U and D are invertible r × r matrices, and V is n× r.

***Problem 6.14. Show that A+ is the right inverse of A by showing thatAA+ = I. Hint: use the singular value decompositions and the fact that UUT = I.


Proposition 6.15. Let A be an r×n matrix with r < n. Suppose rank(A) = r.Then for every column vector b ∈ Rr:

(1) the equation Ax = b has a solution x1 = A+b;(2) Ax = b if and only if x = x1 + y, where y ∈ ker(A);(3) x1 is orthogonal to ker(A); and(4) x1 is the unique smallest solution to the equation Ax = b.

Proof. (1)

Ax1 = AA+b = Ib = b

so x1 is a solution to Ax = b.(2)

Ax = b⇔ Ax = Ax1

⇔ A(x− x1) = 0

⇔ x− x1 = y ∈ ker(A)

⇔ x = x1 + y for some y ∈ ker(A)

(3) The vector x1 = A+y is in the column space of A+, which is the columnspace of AT (Proposition 6.12) or the row space of A. By Corollary 4.45x1 is orthogonal to ker(A).

(4) Suppose Ax = b. We must show that ‖x‖ ≥ ‖x1‖. Since x−x1 ∈ ker(A),x− x1 and x1 are orthogonal. Thus (x− x1) · x = 0.

‖x‖2 = x · x= ((x− x1) + x1) · ((x− x1) + x1)

= (x− x1) · (x− x1) + 2(x− x1) · x + x1 · x1

= ‖x− x1‖2 + | x1‖2

Therefore ‖x‖2 − ‖x1‖2 = ‖x − x1‖2 ≥ 0. The solution x1 is the uniquesmallest solution. If x is another solution and ‖x‖ = ‖x1‖, then by thelast equation ‖x− x1‖ = 0 and x = x1.

�

Thus if A is a r × n matrix of rank r with r < n, then every equation Ax = bhas an infinite number of solutions. The smallest solution is x1 = A+b and the setof solutions can be written as x1 + ker(A). You can think of the solution set as asubspace ker(A) displaced by a vector x1. Such sets are called affine sets.

***Problem 6.16. Let A =[1 2 −3

], and consider the equation Ax = 1.

(1) Find the smallest solution x1 to the equation.(2) Show that ker(A) is a plane in R3.(3) Show that the set of solutions is a plane parallel to the kernel.

Finding the point on the plane x+2y−3z = 1 closest to the origin is sometimesgiven as a calculus problem: find the point (x, y, z) minimizing the function x2 +y2 + z2 subject to the constraint x + 2y − 3z = 1. You just solved the problemwithout calculus.


6.2.3. Rectangular matrices with linearly independent columns. Let’sreverse the situation in the previous subsection. Let A be an n × c matrix withc < n. Suppose the columns of A are linearly independent, so rank(A) = c. Thesingular value decomposition of A is A = UDV T , and Problem 6.4 tells you thatU is c× r, and D and V are invertible c× c matrices.

***Problem 6.17. Show that A+ is the left inverse of A by showing thatA+A = I. Hint: use the singular value decompositions and the fact that V V T = I.

Proposition 6.18. Let A be an n× c matrix with c < n and rank(A) = c. Letb be a column vector in Rn, and let x1 = A+b. Then

(1) If Ay1 = Ay2 then y1 = y2. In other words the equation Ax = b has atmost one solution.

(2) For any column vector y ∈ Rc, Ay is orthogonal to b−Ax1; and(3) x1 is the unique vector in Rc minimizing ‖Ax − b‖. That is, x1 is the

best approximate solution to the equation Ax = b, and if the equation hasa solution then it is x1.

Proof. (1) If Ay1 = Ay2 then y1 = A+Ay1 = A+Ay2 = y2.(2) Ay is in the column space of A, and b−Ax1 is in ker(A+) because:

A+(b−Ax1) = A+b−A+AA+b = 0

Proposition 6.13 shows A+ = A+AA+. But the column space of A isorthogonal to ker(AT ) = ker(A+).

(3) We will show first that if x 6= x1 then ‖Ax− b‖ > ‖Ax1 − b‖.

‖Ax− b‖2 = ‖Ax−Ax1 +Ax1 − b‖2

= (Ax−Ax1 +Ax1 − b)T (Ax−Ax1 +Ax1 − b)

= (Ax−Ax1)T (Ax−Ax1) + (Ax−Ax1)T (Ax1 − b)

+ (Ax1 − b)T (Ax−Ax1) + (Ax−Ax1)T (Ax−Ax1)

= ‖Ax−Ax1‖2 + ‖Ax−Ax1‖2

The key step in this calculation is using the previous statement to showthat

(Ax−Ax1)T (Ax1 − b) = (Ax1 − b)T (Ax−Ax1) = 0

because Ax−Ax1 = Ay for y = x−x1. Since ‖Ax−Ax1‖ > 0, ‖Ax−b‖ >‖Ax1 − b‖.

�

If A is an n× c matrix with c < n and rank(A) = c, then an equation Ax = bhas at most one solution. When the equation has a solution, it is x1 = A+b;otherwise x1 is the best approximate solution.

***Problem 6.19. Suppose A has linearly independent columns and the sin-gular value decomposition for A is A = UDV T . Prove

(1) V is square and therefore invertible (Problem 6.4).(2) ATA = V D2V T

(3) ATA is invertible.(4) A+ =

(ATA

)−1AT


This formula is often used find the best approximate solution to the equationAx = b: x =

(ATA

)−1ATb. Notice that the formula is easy to calculate, since you

do not need to calculate the singular values to use it (but you did use the theoryof singular values to prove that it is correct!).

***Problem 6.20. Suppose you have points in R2:

(x1, y1), . . . , (xn, yn)

and suppose not all the xi are the same (the points don’t all lie on a vertical line).A line y = ax+ b is said to be the best fit line or regression line determined by thepoints if a and b have been chosen to minimize the sum:

s2 = (ax1 + b− y1)2 + · · ·+ (axn + b− yn)2

In this problem you will develop formulas for a and b. These formulas are widelyused in statistics to produce regression lines through data points.

Let x =

x1

...xn

and y =

y1...yn

. Let 1 =

1...1

. Define

A =[x 1

]=

x1 1...

...xn 1

(1) Show that s2 = ‖A

[ab

]− y‖2

(2) Show that the best approximate solution to the equation A[ab

]= y defines

the regression line. The best approximate solution is[ab

]= A+y.

(3) If the points are (1, 3), (2, 4), (4, 7), (5, 8) and (7, 11), find the regressionline. Plot the points and the regression line. What is s2?

(4) EXTRA CREDIT Develop the formulas given in statistics books. Showthat

a =n∑ni=1 xiyi −

∑ni=1 xi

∑ni=1 yi

n∑ni=1 x

2i − (

∑ni=1 xi)2

b =∑ni=1 yi − a

∑ni=1 xi

n

6.2.4. Invertible square matrices. If A is an invertible square matrix, thenA is a matrix whose rank is equal to the number of rows and the number of columns.By the two previous subsections, the pseudoinverse is the inverse: A+ = A−1. Everyequation Ax = b has a unique solution x = A+b = A−1b.

6.2.5. Matrices not of full rank. Suppose A is a non-zero r × c matrix ofrank n, and suppose n < r and n < c. A still has a singular value decompositionA = UDV T and a pseudoinverse A+ = V D−1UT . What can you say about theequation Ax = b?

Proposition 6.21. Let x1 = A+b.(1) Ay1 = Ay2 if and only if y2 = y1 + z, where z ∈ ker(A); and

6.3. SINGULAR VALUE DECOMPOSITION PROVED 105

(2) x1 is the unique smallest vector in Rc minimizing ‖Ax− b‖. That is,(a) There exists a unique vector b1 in the column space of A that is

closest to b; and(b) x1 is the smallest vector such that Ax = b1.

Proof. (1) Ay1 = Ay2 if and only if A(y1 − y2) = 0 if and only ify1 − y2 ∈ ker(A).

(2) First we show that there exists a unique vector b1 in the column spaceof A closest to b. The column space of A is the column space of U . Wemust find x minimizing ‖Ux − b‖. Since U is orthogonal, its columnsare linearly independent. Therefore we may apply Proposition 6.18. Thepoint in the column space of U closest to b is b1 = UU+b = UUTb (seeProblem 6.10).

Since b1 is in the column space of A, the equation Ax = b1 hassolutions. To complete the proof, we must show that x1 = A+b is thesmallest solution. To show that x1 is a solution:

Ax1 = AA+b = UDV TV D−1UTb = UUTb = b1

It remains only to show that x1 is the smallest solution. Note thatAx = b1 if and only if V Tx = D−1UTb1. It suffices to show that x1 is thesmallest solution to this equation. Since V T has linearly independent rows,Proposition 6.15 shows that the smallest solution to V Tx = D−1UTb1 is

(V T)+D−1UTb1 =

(V +)TD−1UTUUTb

=(V T)TD−1UTb

= V D−1UTb

= A+b

= x1

�

6.3. Singular Value Decomposition Proved

This section contains a proof of Theorem 6.1. Let A be a non-zero r× c matrixof rank n. You have already seen the big idea in Problem 6.19, where you provedATA = V D2V T . ATA is a symmetric matrix and V D2V T is its diagonalization.You will show that the singular values of A are the square roots of the positiveeigenvalues of ATA.

Let’s start from the beginning. Some results needed in the proof are left to youto show.

***Problem 6.22. Prove:

(1) ATA is a c× c symmetric matrix.


(2) Let P be a c × c orthogonal matrix and F =

λ1 · · · 0...

. . ....

0 · · · λc

. Suppose

λn+1 = · · · = λc = 0. Let V = P (:, 1 : n), the first n columns of P . Let

E =

λ1 · · · 0...

. . ....

0 · · · λn

. Show that PFPT = V EV T .

(3) Suppose you have an orthogonal c×n matrix V and an invertible diagonaln×n matrix D such that V D2V T = ATA. Show that U = AVD−1 is r×nand is orthogonal by showing that UTU = I. Hint:

(D−1

)T =(DT)−1 =

D−1 because D is symmetric. Show that DV T = UTA.

Now for the proof.(1) By Problem 6.22, ATA is a c× c symmetric matrix. Let n = rank(ATA).

At the end, we will show that n = rank(A).(2) ATA has non-negative eigenvalues

λ1 ≥ · · · ≥ λn > λn+1 = · · · = λc = 0

Proof. Since ATA is symmetric, it has real eigenvalues (Theorem5.50). Since the rank of ATA is n, there are n non-zero eigenvalues.It remains only to show that all eigenvalues of ATA are non-negative.Suppose λ is an eigenvalue. Then there is an eigenvector x such thatATAx = λx. Therefore

0 ≤ ‖Ax‖2 = xTATAx = λxTx = λ‖x‖2

Since eigenvectors are non-zero, ‖x‖2 6= 0 and λ =‖Ax‖2

‖x‖2≥ 0. �

(3) By Theorem 5.50 ATA = PEPT , where P is a c × c orthogonal matrixand E is a diagonal matrix with the eigenvalues λi on the diagonal. ByProblem 6.22, we can replace P by V = P (:, 1 : n) and F by E = F (1 :n, 1 : n), and ATA = V EV T . Since the diagonal entries of E are all

positive, they all have square roots. Let D =

√λ1 · · · 0...

. . ....

0 · · ·√λn

. Then

ATA = V D2V T . V is an orthogonal matrix (it has orthonormal columns)and D is an invertible diagonal matrix with decreasing positive entries onthe diagonal.

(4) Suppose n = r, A has linearly independent rows. If U = AVD−1, thenby Problem 6.22 U is a square orthogonal matrix and DV T = UTA.Since U is square and orthogonal, U is invertible and UT = U−1. ThusA = UDV T as required.

(5) Now suppose n = c, A has linearly independent columns. Then AT haslinearly independent rows, so by the previous point AT has a singular valuedecomposition: AT = UDV T . Then A = V DUT so A has a singular valuedecomposition too.

(6) For the general case, let C be the row-reduced echelon form of A. C hasn linearly independent rows, and A and C are row-equivalent matrices so

6.3. SINGULAR VALUE DECOMPOSITION PROVED 107

there exists an r × n matrix P such that A = PC. We can calculate therank of P :

n ≥ rank(P ) ≥ rank(PC) = n

because (a) the rank is not more than the number of columns, and (b)Proposition 4.56. Thus rank(P ) = n and P has linearly independentcolumns. Therefore both P and C have singular value decompositions:P = U1D1V

T1 and C = U2D2V

T2 . Thus A = U1(D1V

T1 U2D2)V T2 . More-

over V1 and U2 are square and invertible. The product matrix D1VT1 U2D2

is invertible because every factor is invertible, and therefore it has a sin-gular value decomposition D1V

T1 U2D2 = U3DV

T3 . Putting it all together:

A = U1U3DVT3 V

T2

Letting U = U1U3 and V = V1V3, and using the fact that the productof orthogonal matrices is orthogonal (Problem 5.44), A = UDV T is thesingular value decomposition of A.

(7) Suppose A has a singular value decomposition A = UDV T with orthogo-nal matrices U and V and a n×n diagonal matrix D with positive diagonalelements listed in decreasing order. To close we will prove:(a) rank(A) = n;(b) the diagonal elements of D (the singular values of A) are the square

roots of the positive eigenvalues of the matrix ATA; and(c) D is unique.

This result was obvious for matrices of full rank, but the general com-putation of the singular value decomposition lost track of the originaleigenvalues.(a) SinceD is a diagonal matrix with non-zero diagonal elements, rank(D) =

n Since A = UDV T , rank(A) ≤ rank(D) by Proposition 4.56. ButUTAV = D, so rank(A) ≥ rank(D). Therefore rank(A) = rank(D) =n.

(b) ATA = V D2V T , a diagonalization of ATA with the zero eigenvaluesomitted. Thus the singular values of A, squared, are the positiveeigenvalues of ATA

(c) A determines the positive eigenvalues of ATA, which in turn deter-mine D. Therefore D is uniquely determined by A.

***Problem 6.23. Singular value decomposition is the key mathematical stepin one method of automated text searching and paper grading called latent semanticanalysis. Some college teachers use latent semantic analysis to grade essays, andthe College Board has experimented with this and other methods of automated papergrading. Do an online search for “latent semantic analysis” or “singular valuedecomposition and essay grading”. What did you learn about the use of singularvalue decomposition for the apparently non-mathematical task of grading essays?

6.3.1. Two notes on pseudoinverses and singular values.6.3.1.1. A rational formula for the pseudoinverse. If A is an invertible square

matrix, then A+ = A−1. The matrix A−1 can be calculated from A using onlyrational operations (addition, subtraction, multiplication and division) applied to0, 1 and the elements of A (Proposition 4.77. If rank(A) is the number of columnsof A, then A+ = (ATA)−1AT , also a rational combination of 0,1 and the elementsof A, and A+A = I. Similarly if rank(A) is the number of rows of A, then A+ =


AT (AAT )−1, also a rational combination of 0,1 and the elements of A, and AA+ =I.

Suppose r = rank(A) is smaller than the number of rows and columns. We willderive a rational and easy-to-calculate formula for A+. Let E be the row-reducedechelon form of A. The elements of E are rational combinations of the elements ofA. Moreover rank(E) = r is the number of rows of E, so E+ = ET

(EET

)−1. Also(E+)T =

(EET

)−1E.

We know that there exists a matrix F with r columns such that FE = A.Since:

r = rank(FE) ≤ rank(F ) ≤ rrank(F ) = r is the number of columns of F . Thus F+ = (FTF )−1FT . But whatis F?

F = F (EE+) = (FE)E+ = AE+

Since E+ consists of rational combinations of 0, 1 and the elements of A, F is alsoa rational combination of the same elements.

It is not too hard to prove that if rank(F ) is the number of columns of F andrank(E) is the number of rows of E, then (FE)+ = E+F+. Therefore

A+ = E+F+

= ET(EET

)−1((AE+

)TAE+

)−1 (AE+

)T= ET

(EET

)−1((EET

)−1EATAET

(EET

)−1)−1 (

EET)−1

EAT

= ET(GGT

)−1G where G = EAT

a rational combination of 0, 1 and the elements of A.6.3.1.2. Who needs singular values? If there is an easy formula for the pseu-

doinverse of A, is there still a need for singular values? The answer is “yes”, becausesingular values give a way to approximate large matrices with much smaller onesthat is used in practical calculations. Suppose A has thousands of rows and columnsand a singular value decomposition A = UDV T . It is possible to calculate the firstfew singular values of D and the first few columns of U and V without calculatingeverything. Let U ′ be the first 20 columns of U and V ′ be the first 20 columns ofV . Let D′ be the 20× 20 upper-left corner of D, the part containing the 20 largestsingular values. Then U ′D′V ′T ≈ A. This approximation is especially useful inapproximately solving equations Ax = b. Take x ≈ V ′D′−1U ′Tb. This method isused in latent semantic analysis calculations described in the final problem.

linear algebra and geometry david meredithonline.sfsu.edu/meredith/linear_algebra/spring_2010/linear...

Documents