1 sets and set notation. - north dakota state universitymicohen/oldlecturenotes/lin...linear algebra...

LINEAR ALGEBRA MATH 2700.006 SPRING 2013 (COHEN) LECTURE NOTES

1 Sets and Set Notation.

Definition 1 (Naive Definition of a Set). A set is any collection of objects, called the elements of thatset. We will most often name sets using capital letters, like A, B, X, Y , etc., while the elements of a setwill usually be given lower-case letters, like x, y, z, v, etc.

Two sets X and Y are called equal if X and Y consist of exactly the same elements. In this case wewrite X = Y .

Example 1 (Examples of Sets). (1) Let X be the collection of all integers greater than or equal to5 and strictly less than 10. Then X is a set, and we may write:

X = {5, 6, 7, 8, 9}

The above notation is an example of a set being described explicitly, i.e. just by listing outall of its elements. The set brackets {· · ·} indicate that we are talking about a set and not anumber, sequence, or other mathematical object.

(2) Let E be the set of all even natural numbers. We may write:

E = {0, 2, 4, 6, 8, ...}

This is an example of an explicity described set with infinitely many elements. The ellipsis (...)in the above notation is used somewhat informally, but in this case its meaning, that we should“continue counting forever,” is clear from the context.

(3) Let Y be the collection of all real numbers greater than or equal to 5 and strictly less than 10.Recalling notation from previous math courses, we may write:

Y = [5, 10)

This is an example of using interval notation to describe a set. Note that the set Y obviouslyconsists of infinitely many elements, but that there is no obvious way to write down the elementsof Y explicitly like we did the set E in Example (2). Even though [5, 10) is a set, we don’t needto use the set brackets in this case, as interval notation has a well-established meaning which wehave used in many other math courses.

(4) Now and for the remainder of the course, let the symbol ∅ denote the empty set, that is, theunique set which consists of no elements. Written explicitly, ∅ = { }.

(5) Now and for the remainder of the course, let the symbol N denote the set of all natural numbers,i.e. N = {0, 1, 2, 3, ...}.

(6) Now and for the remainder of the course, let the symbol R denote the set of all real numbers.We may think of R geometrically as being the collection of all the points on the number line.

1

2 LINEAR ALGEBRA MATH 2700.006 SPRING 2013 (COHEN) LECTURE NOTES

(7) Let R2 denote the set of all ordered pairs of real numbers. That is, let R2 be the set whichconsists of all pairs (x, y) where x and y are both real numbers. We may think of R2 geometri-cally as the set of all points on the Cartesian coordinate plane.

If (x, y) is an element of R2, it will often be convenient for us to write the pair as the column

vector

[xy

]. For our purposes the two notations will be interchangeable. It is important to

note here that the order matters when we talk about pairs, so in general we have

[xy

]6=[yx

].

(8) Let R3 be the set of all ordered triples of real numbers, i.e. R3 is the set of all triples (x, y, z)such that x, y, and z are all real numbers. R3 may be visualized geometrically as the set of allpoints in 3-dimensional Euclidean coordinate space. We will also write elements (x, y, z) of R3

be using the column vector notation

xyz

.

(9) Lastly and most generally, let n ≥ 1 be any natural number. We will let Rn be the set of allordered n-tuples of real numbers, i.e. the set of all n-tuples (x1, x2, ..., xn) for which each

coordinate xi, 1 ≤ i ≤ n, is a real number. We will also use the column vector notation

x1x2...xn

in this context.

Definition 2 (Set Notation). If A is a set and x is an element of A, then we write:

x ∈ A.

If B is a set such that every element of B is an element of A (i.e. if x ∈ B then x ∈ A), then we callB a subset of A and we write:

B ⊆ A.

In order to distinguish particular subsets we wish to talk about, we will frequently use set-buildernotation, which for convenience we will describe informally using examples, rather than give a formaldefinition. For an example, suppose we wish to formally describe the set E of all even positive integers(See Example 1 (2)). Then we may write

E = {x ∈ N : x is evenly divisible by 2}.

The above notation should be read as The set of all x in N such that x is evenly divisible by 2, whichclearly and precisely defines our set E. For another example, we could write

Y = {x ∈ R : 5 ≤ x < 10},

which reads The set of all x in R such that 5 is less than or equal to x and x is strictly less than 10. Thestudent should easily verify that Y = [5, 10) from Example 1 (3). In general, given a set A and a precisemathematical sentence P (x) about a variable x, the set-builder notation should be read as follows.

{ x ∈ A : P (x)}“The set of all elements x in A such that sentence P (x) is true for the element x.

LINEAR ALGEBRA MATH 2700.006 SPRING 2013 (COHEN) LECTURE NOTES 3

2 Vector Spaces and Subspaces.

Definition 3. A (real) vector space is a nonempty set V , whose elements are called vectors, togetherwith an operation +, called addition, and an operation ·, called scalar multiplication, which satisfythe following ten axioms:

Addition Axioms.

(1) If ~u ∈ V and ~v ∈ V , then ~u+ ~v ∈ V .(Closure under addition.)

(2) ~u+ ~v = ~v + ~u for all ~u,~v ∈ V .(Commutative property of addition.)

(3) (~u+ ~v) + ~w = ~u+ (~v + ~w) for all ~u,~v, ~w ∈ V .(Associative property of addition.)

(4) There exists a vector ~0 ∈ V which satisfies ~u+~0 = ~u for all ~u ∈ V .(Existence of an additive identity.)

(5) For every ~u ∈ V , there exists a vector −~u ∈ V such that ~u+ (−~u) = ~0.(Existence of additive inverses.)

Scalar multiplication axioms.

(6) If ~u ∈ V and c ∈ R, then c · ~u ∈ V .(Closure under scalar multiplication.)

(7) c · (~u+ ~v) = c · ~u+ c · ~v for all c ∈ R, ~u,~v ∈ V .(First distributive property of multiplication over addition.)

(8) (c+ d) · ~u = c · ~u+ d · ~u for all c, d ∈ R, ~u ∈ V .(Second distributive property of multiplication over addition.)

(9) c · (d · ~u) = (c · d) · ~u for all c, d ∈ R, ~u ∈ V .(Associative property of scalar multiplication.)

(10) 1 · ~u = ~u for every ~u ∈ V .

We use the “arrow above” notation to help differentiate vectors (~u, ~v, etc.), which may or may notbe real numbers, from scalars, which are always real numbers. When no confusion will arise, we willoften drop the · symbol in scalar multiplcation and simply write c~u instead of c · ~u, c(~u + ~v) instead ofc · (~u+ ~v), etc.

Example 2. Let V be an arbitrary vector space.

(1) Prove that ~0 + ~u = ~u for every ~u ∈ V .

(2) Prove that the zero vector ~0 is unique. That is, prove that if ~w ∈ V has the property that

~u+ ~w = ~u for every ~u ∈ V , then we must have ~w = ~0.

(3) Prove that for every ~u ∈ V , the additive inverse −~u is unique. That is, prove that if ~w ∈ V has

the property that ~u+ ~w = ~0, then we must have ~w = −~u.

Proof. (1) By Axiom (2), the commutativity of addition, we have ~0 + ~u = ~u + ~0. Hence by Axiom

(4), we have ~0 + ~u = ~u+~0 = ~u. �


(2) Suppose ~w ∈ V has the property that ~u + ~w = ~u for every ~u ∈ V . Then in particular, we have~0 + ~w = ~0. But ~0 + ~w = ~w by part (1) above; so ~w = ~0 + ~w = ~0. �

(3) Let ~u ∈ V , and suppose ~w ∈ V has the property that ~u+ ~w = ~0. Let −~u be the additive inverseof ~u guaranteed by Axiom (5). Adding −~u to both sides of the equality above, and applyingAxioms (2) and (3) (commutativity and associativity), we get

−~u+ (~u+ ~w) = −~u+~0

(−~u+ ~u) + ~w = −~u(~u+ (−~u)) + ~w = −~u

~0 + ~w = −~u.

Now by part (1) above, we have ~w = ~0 + ~w = −~u.�

Example 3 (Examples of Vector Spaces). (1) The real number line R is a vector space, where both+ and · are interpreted in the usual way. In this case the axioms are the familiar properties ofreal numbers which we learn in elementary school.

(2) Consider the plane R2 =

{[xy

]: x, y ∈ R

}. Define an operation + on R2 by the rule[

xy

]+

[zw

]=

[x+ zy + w

]for all x, y, z, w ∈ R,

and a scalar multiplication · by the rule

c ·[xy

]=

[cxcy

]for every c ∈ R, x, y ∈ R.

Then R2 becomes a vector space. (Verify that each of the axioms holds.)

(3) Consider the following purely geometric description. Let V be the set of all arrows in two-dimensional space, with two arrows being regarded as equal if they have the same length andpoint in the same direction. Define an addition + on V as follows: if ~u and ~v are two arrows inV , then lay them end-to-end, so the base of ~v lies at the tip of ~u. Then define ~u + ~v to be thearrow which shares its base with ~u and its tip with ~v. (A picture helps here.) Define a scalarmultiplication · by letting c · ~u be the arrow which points in the same direction as ~u, but whoselength is c times the length of ~u. Is V a vector space? (What is the relationship of V with R2?)

(4) In general if n ∈ N, n ≥ 1, then Rn is a vector space, where the addition and scalar multiplicationare coordinate-wise a la part (2) above.

(5) Let n ∈ N, and let Pn denote the set of all polynomials of degree at most n. That is, Pn consistsof all polynomials of the form

p(x) = a0 + a1x+ a2x2 + ...+ anx

n

where the coefficients a0, a1, ..., an are real numbers, and x is an abstract variable. Define+ and · as follows: Suppose c ∈ R, and p, q ∈ Pn, so p(x) = a0 + a1x + ... + anx

n andq(x) = b0 + b1x+ ...+ bnx

n for some coefficients a0, ..., an, b0, ..., bn ∈ R. Then

(p+ q)(x) = p(x) + q(x) = (a0 + b0) + (a1 + b1)x+ ...+ (an + bn)xn


and

(cp)(x) = cp(x) = ca0 + ca1x+ ...+ canxn.

Then Pn is a vector space.

Definition 4. Let V be a vector space, and let W ⊆ V . If W is also a vector space using the sameoperations + and · inherited from V , then we call W a vector subspace, or just subspace, of V .

Example 4. For each of the following, prove or disprove your answer.

(1) Let V =

{[xy

]∈ R2 : x ≥ 0, y ≥ 0

}, so V is the first quadrant of the Cartesian plane. Is V a

vector subspace of R2?

(2) Let V be the set of all points on the graph of y = 5x. Is V a vector subspace of R2?(3) Let V be the set of all points on the graph of y = 5x+ 1. Is V a vector subspace of R2?(4) Is R a vector subspace of R2?

(5) Is {~0} a vector subspace of R2? (This space is called the trivial space.)

Proof. (2) We claim that the set V of all points on the graph of y = 5x is indeed a subspace of R2.To prove this, observe that since V is a subset of R2 with the same addition and scalar multipli-cation operations, it is immediate that Axioms (2), (3), (7), (8), (9), and (10) hold (Verify thisin your brain!) So we need only check Axioms (1), (4), (5), and (6).

If

[x1y1

],

[x2y2

]∈ V , then by the definition of V we have y1 = 5x1 and y2 = 5x2. It follows

that y1+y2 = 5x1+5x2 = 5(x1+x2). Hence the sum

[x1 + x2y1 + y2

]satisfies the defining condition

of V , and so

[x1 + x2y1 + y2

]=

[x1y1

]+

[x2y2

]∈ V . So V is closed under addition and Axiom (1)

is satisfied.

Check that the additive identity

[00

]is in V , so Axiom (3) is satisfied.

If

[x1y1

]∈ V and c ∈ R, then c

[x1y1

]=

[cx1cy1

]∈ V since cy1 = c(5x1) = 5(cx1), so

V is closed under scalar multiplication. Hence Axiom (6) is satisfied. Moreover, if we take

c = −1 in the previous equalities, we see that each vector

[x1y1

]∈ V has an additive inverse[

−x1−y1

]∈ V . So Axiom (5) is satisfied. Thus V meets all the criteria to be a vector space, as

we claimed. �

(3) On the other hand, if we take V to be the set of all points on the graph of y = 5x+ 1, then Vis not a vector subspace of R2. To see this, it suffices to check, for instance, that V fails Axiom(1), i.e. V is not closed under addition.

To show that V is not closed under addition, it suffices to exhibit two vectors in V whose sum

is not in V . So let ~u =

[01

]and let ~v =

[16

]. Both ~u and ~v are in V . (Why?) But their sum

~u + ~v =

[17

]is not a solution of the equation y = 5x + 1, and hence not in V . So V fails to

satisfy Axiom (1), and cannot be a vector space.�

We will not prove the following fact, but the reader should think about why it must be true.


Fact 1. Let V be a vector space, and let W ⊆ V . Then W , together with the addition and scalarmultiplication inherited from V , satisfies Axioms (2), (3), (7), (8), (9), (10) in the definition of a vectorspace.

Theorem 1. Let V be a vector space and let W ⊆ V . Then W is a subspace of V if and only if thefollowing three properties hold:

(1) ~0 ∈W .(2) W is closed under addition. That is, for each ~u,~v ∈W , we have ~u+ ~v ∈W .(3) W is closed under scalar multiplication. That is, for each c ∈ R and ~u ∈W , we have c~u ∈W .

Proof. One direction of the proof is trivial: if W is a vector subspace of V , then W satisfies the threeconditions above because it satisfies Axioms (4), (1), and (6) respectively in the definition of a vectorspace.

Conversely, suppose W ⊆ V , and W satisfies the three conditions above. The three conditions implythat W satisfies Axioms (4), (1), and (6), respectively, while our previous fact implies that W alsosatisfies Axioms (2), (3), (7), (8), (9), and (10). So the only axiom left to check is (5).

To see that (5) is satisfied, let ~u ∈ W . Since W ⊆ V , the vector ~u is in V , and hence has anadditive inverse −~u ∈ V . We must show that in fact −~u ∈W . Note that since W is closed under scalarmultiplication, the vector −1 · ~u is in W . But on our homework problem #1(d), we show that in fact−1 · ~u = −~u. So −~u = −1 · ~u ∈ V as we hoped, and the proof is complete. �

Example 5. What do the vector subspaces of R2 look like? What about R3?

3 Linear Combinations and Spanning Sets.

Definition 5. Let V be a vector space. Let ~v1, ~v2, ..., ~vn ∈ V , and let c1, c2, ..., cn ∈ R. We say that avector ~u defined by

~u = c1 ~v1 + c2 ~v2 + ...+ cn ~vn

is a linear combination of the vectors ~v1, ..., ~vn with weights c1, ..., cn. Notice that since addition isassociative in any vector space V , we may omit parentheses from the sum above. Also notice that thezero vector ~0 is always a linear combination of any collection of vectors ~v1, ..., ~vn, since we may alwaystake c1 = c2 = ... = cn = 0.

Example 6. Let ~v1 =

[−11

]and let ~v2 =

[21

].

(1) Which points in R2 are linear combinations of v1 and v2, using integer weights?(2) Which points in R2 are linear combinations of v1 and v2, using any weights?

Example 7. Let ~a1 =

1−2−5

and ~a2 =

256

be vectors in R3.

(1) Let ~b =

74−3

. May ~b be written as a linear combination of ~a1 and ~a2?

(2) Let ~b =

63−5

. May ~b be written as a linear combination of ~a1 and ~a2?

Partial Solution. (1) We wish to answer the question: Do there exist real numbers c1 and c2 forwhich

c1 ~a1 + c2 ~a2 = ~b?


Written differently, are there c1, c2 ∈ R for which

c1 ·

1−2−5

+ c2 ·

256

=

c1 + 2c2−2c1 + 5c2−5c1 + 6c2

=

74−3

?

Recall that vectors in R3 are equal if and only if each pair of entries are equal. So to identify ac1 and c2 which make the above equality true, we wish to solve the following system of equationsin two variables:

c1 + 2c2 = 7

−2c1 + 5c2 = 4

−5c1 + 6c2 = −3

This can be done manually using elementary algebra techniques. We should get c1 = 3 and

c2 = 2. Since a solution exists, ~b = 3 ~a1 + 2 ~a2 is indeed a linear combination of ~a1 and ~a2.�

Definition 6. Let V be a vector space and ~v1, ..., ~vn ∈ V . We denote the set of all possible linearcombinations of ~v1, ..., ~vn by Span{~v1, ..., ~vn}, and we call this set the subset of V spanned by ~v1, ..., ~vn.We also call Span{~v1, ..., ~vn} the subset of V generated by ~v1, ..., ~vn.

Example 8. (1) Find Span

{[3−1

]}in R2.

(2) Find Span

1

00

, 0

01

in R3.

Theorem 2. Let V be a vector space and let W ⊆ V . Then W is a subspace of V if and only if Wis closed under linear combinations, i.e., for every ~v1, ..., ~vn ∈ W and every c1, ..., cn ∈ R, we havec1 ~v1 + ...+ cn ~vn ∈W .

Corollary 1. Let V be a vector space and ~v1, ..., ~vn ∈ V . Then Span{~v1, ..., ~vn} is a subspace of V .

4 Matrix Row Reductions and Echelon Forms.

As our previous few examples should indicate, our techniques for solving systems of linear equationsare extremely relevant to our understanding of linear combinations and spanning sets. The studentshould recall the basics of solving systems from a previous math course, but in this section we willdevelop a new notationally convenient method for finding solutions to such systems.

Definition 7. Recall that a linear equation in n variables is an equation that can be written in the form

a1x1 + a2x2 + ...+ anxn = b

where a1, ..., an, b ∈ R and x1, ..., xn are variables. A system of linear equations is any finite collec-tion of linear equations involving the same variables x1, ..., xn. A solution to the system is an n-tuple(s1, s2, ..., sn) ∈ Rn, that makes each equation in the system true if we substitute s1, ..., sn for x1, ..., xnrespectively.

The set of all possible solutions to a system is called the solution set. Two systems are calledequivalent if they have the same solution set. It is only possible for a system of linear equations tohave no solutions, exactly one solution, or infinitely many solutions. (Geometrically one may think of“parallel lines,” “intersecting lines,” and “coincident lines,” respectively.


A system of linear equations is called consistent if it has at least one solution; if the system’s solutionset is ∅, then the system is inconsistent.

Example 9. Solve the following system of three equations in three variables.

x1 − 2x2 + x3 = 02x2 − 8x2 = 8

−4x1 + 5x2 + 9x3 = −9

Solution. Our goal here will be to solve the system using the elimination technique, but with a strippeddown notation which preserves only the necessary information and makes computations by hand go quitea bit faster.

First let us introduce a little terminology. Given the system we are working with, the matrix ofcoefficients is the 3x3 matrix below.

1 −2 10 2 −8−4 5 9

The augmented matrix of the system is the 3x4 matrix below.

1 −2 1 00 2 −8 8−4 5 9 −9

Note that the straight black line between the third and fourth columns in the matrix above is not

necessary. Its sole purpose is to remind us where the “equals sign” was in our original system, and maybe omitted if the student prefers.

To solve our system of equations, we will perform operations on the augmented matrix which “encode”the process of elimination. For instance, if we were using elimination on the given system, we mightadd 4 times the first equation to the third equation, in order to eliminate the x1 variable from the thirdequation. Let’s do the analogous operation to our matrix instead.

Add 4 times the top row to the bottom row:

1 −2 1 00 2 −8 8−4 5 9 −9

7→ 1 −2 1 0

0 2 −8 80 −3 13 −9

Notice that the new matrix we obtained above can also be interpreted as the augmented matrix of

a system of linear equations, which is the new system we would have obtained by just using the elim-ination technique. In particular, the first augmented matrix and the new augmented matrix representsystems which are equivalent to one another. (However, the two matrices are obviously not equal toone another, which is why we will stick to the arrow 7→ notation instead of using equals signs!)

Now we will continue the process, replacing our augmented matrix with a new, simpler augmentedmatrix, in such a way that the linear systems the matrices represent are all equivalent to one another.

Multiply the second row by 12 :

1 −2 1 00 2 −8 80 −3 13 −9

7→ 1 −2 1 0

0 1 −4 40 −3 13 −9


Add 3 times the second row to the third row:

1 −2 1 00 1 −4 40 −3 13 −9

7→ 1 −2 1 0

0 1 −4 40 0 1 3

Add 4 times the third row to the second row:

1 −2 1 00 1 −4 40 0 1 3

7→ 1 −2 1 0

0 1 0 160 0 1 3

Add −1 times the third row to the first row:

1 −2 1 00 1 0 160 0 1 3

7→ 1 −2 0 −3

0 1 0 160 0 1 3

Add 2 times the second row to the first row:

1 −2 0 −30 1 0 160 0 1 3

7→ 1 0 0 29

0 1 0 160 0 1 3

Now let’s stop and examine what we’ve done. The last augmented matrix above represents the

following system:

x1 = 29x2 = 16

x3 = 3

So the system is solved, and has a unique solution (29, 16, 3). �

Definition 8. Given an augmented matrix, the three elementary row operations are the following.

(1) (Replacement) Replace one row by the sum of itself and a multiple of another row.(2) (Interchange) Interchange two rows.(3) (Scaling) Multiply all entries in a row by a nonzero constant.

These are the “legal moves” when solving systems of equations using augmented matrices. Twomatrices are called row equivalent if one can be transformed into the other using a finite sequence ofelementary row operations.

Fact 2. If two augmented matrices are row equivalent, then the two linear systems they represent areequivalent, i.e. they have the same solution set.

Definition 9. A matrix is said to be in echelon form, or row echelon form, if it has the followingthree properties:

(1) All rows which have nonzero entries are above all rows which have only zero entries.(2) Each leading nonzero entry of a row is in a column to the right of the leading nonzero entry of

the row above it.(3) All entries in a column below a leading nonzero entry are zeros.


A matrix is said to be in reduced echelon form or reduced row echelon form if it is in echelonform, and also satisfies the following two properties:

(4) Each nonzero leading entry is 1.(5) Each leading 1 is the only nonzero entry in its column.

Every matrix may be transformed, via elementary row operations, into a matrix in reduced rowechelon form. This process is called row reduction.

Example 10. Use augmented matrices and row reductions to find solution sets for the following systemsof equations.

(1)2x1 − 6x3 = −8

x2 + 2x3 = 33x1 + 6x2 − 2x3 = −4

(2)x1 − 5x2 + 4x3 = −3

2x1 − 7x2 + 3x3 = −2−2x1 + x2 + 7x3 = −1

5 Functions and Function Notation

Definition 10. Let X and Y be sets. A function f from X to Y is just a rule by which we associateto each element x ∈ X, an element f(x) ∈ Y . The input set X is called the domain of f , while theset Y of possible outputs is called the codomain of f . To denote the domain and codomain when wedefine a function, we write

f : X → Y .

We regard the terms map, mapping, and transformation as all being synonymous with function.

Example 11. (1) For the familiar quadratic function f(x) = x2, we would write f : R → R, sincef takes real numbers for input and returns real numbers for output. Notice that the codomainR here is distinct from the range, i.e. the set of all actual outputs of the function, which thereader should know is just the set [0,∞). This is an ok part of the notation.

(2) On the other hand, for the familiar hyperbolic function f(x) = 1x , we would NOT write f : R→

R; this is because 0 ∈ R but 0 is not part of the domain of f .

6 Linear Transformations

Definition 11. Let V and W be vector spaces. A function T : V →W is a linear transformation if

(1) T (~v + ~w) = T (~v) + T (~w) for every ~v, ~w ∈ V .

(2) T (c~v) = cT (~v) for every ~v ∈ V and c ∈ R.

Note that the transformations which are linear are those which respect the addition and scalarmultiplication operations of the vector spaces involved.

Example 12. Are the following maps linear transformations?

(1) T : R→ R, T (x) =

[x3x

].

(2) T : R→ R2, T (x) =

[xx2

].

(3) T : R2 → R2, T (~v) = A~v, where A =

[1 20 −1

].


Fact 3. Every matrix transformation T : Rn → Rm is a linear transformation.

Fact 4. If T : V → W is a linear transformation, then T preserves linear combinations. In otherwords, for every collection of vectors ~v1, ..., ~vn ∈ V and every choice of weights c1, ..., cn ∈ R, we have

T (c1 ~v1 + ...+ cn ~vn) = c1T (~v1) + ...+ cnT ( ~vn).

Definition 12. Let V = Rn be n-dimensional Euclidean space. Define vectors ~e1, ..., ~en ∈ V by:

~e1 =

100...00

; ~e2 =

010...00

; ...; ~en =

000...01

.

These vectors ~e1, ..., ~en are called the standard basis for Rn. Note that any vector ~v =

v1v2...vn

may

be written as a linear combination of the standard basis vectors in an obvious way: ~v = v1 ~e1 + v2 ~e2 +...+ vn ~en.

Example 13. Suppose T : R2 → R3 is a linear transformation such that T (~e1) =

5−72

and T (~e2) = −380

. Find an explicit formula for T .

Theorem 3. Let T : Rn → Rm (n,m ∈ N) be a linear transformation. Then there exists a unique m×nmatrix A such that T (~v) = A~v for every ~v ∈ Rn. In fact, we have

A =

T (~e1) T (~e2) ... T ( ~en)

.

Proof. Notice that for any ~v =

v1v2...vn

∈ Rn, we have ~v = v1 ~e1 + ... + vn ~en. Since T is a linear trans-

formation, it respects linear combinations, and hence

T (~v) = v1T (~e1) + ...+ vnT ( ~en)

=

T (~e1) T (~e2) ... T ( ~en)

v1v2...vn

= A~v,

where A is as we defined in the statement of the theorem. �

Example 14. Let T : R2 → R2 be the linear transformation which scales up every vector by a factor of3, i.e. T (~v) = 3~v for every ~v ∈ R2. Find a matrix representation for T .

Example 15. Let T : R2 → R2 be the linear transformation which rotates the plane about the originat some fixed angle θ. Is T a linear transformation? If so, find its matrix representation.


7 The Null Space of a Linear Transformation

Definition 13. Let V and W be vector spaces, and let T : V → W be a linear transformation. Definethe following set:

NulT = {~v ∈ V : T (~v) = ~0}.

Then this set NulT is called the null space or kernel of the map T .

Example 16. Let T : R3 → R3 be the matrix transformation T (~v) = A~v, where A =

1 −1 52 0 7−3 −5 −3

.

(1) Let ~u =

−732

. Is ~u ∈ NulT?

(2) Let ~w =

−5−50

. Is ~w ∈ NulT?

Theorem 4. Let V,W be vector spaces and T : V →W a linear transformation. Then NulT is a vectorsubspace of V .

Proof. We will show that NulT is closed under taking linear combinations, and hence the result willfollow from Theorem 2.

Let ~v1, ..., ~vn ∈ NulT and c1, ..., cn ∈ R be arbitrary. We must show that c1 ~v1 + ... + cn ~vn ∈ NulT ,i.e. that T (c1 ~v1 + ...+ cn ~vn) = ~0. To see this, simply observe that since T is a linear transformation, wehave

T (c1 ~v1 + ...+ cn ~vn) = c1T (~v1 + ...+ cnT ( ~vn).

But since ~v1, ..., ~vn ∈ NulT , we have T (~v1) = ... = T ( ~vn) = ~0. So in fact we have

T (c1 ~v1 + ...+ cn ~vn) = c1~0 + ...+ cn~0 = ~0.

It follows that c1 ~v1+...+cn ~vn ∈ NulT , and NulT is closed under linear combinations. This completesthe proof. �

Example 17. For the following matrix transformations, give an explicit description of NulT by findinga spanning set.

(1) T : R4 → R2, T

x1x2x3x4

=

[1 2 4 00 1 3 −2

]x1x2x3x4

for every x1, x2, x3, x4 ∈ R.

(2) T : R5 → R3, T

x1x2x3x4x5

=

1 −4 0 2 00 0 1 −5 00 0 0 0 2

x1x2x3x4x5

for every x1, x2, x3, x4, x5 ∈ R.

Definition 14. Let V,W be sets. A mapping T : V → W is called one-to-one if the following holds:for every ~v, ~w ∈ V , if ~v 6= ~w, then T (~v) 6= T (~w).


Equivalently, T is one-to-one if the following holds: for every ~v, ~w ∈ V , if T (~v) = T (~w), then ~v = ~w).

A map is one-to-one only if it sends distinct elements in V to distinct elements in W , i.e. no twoelements in V are mapped to the same place in W .

Example 18. Are the following linear transformations one-to-one?

(1) T : R→ R2, T (x) =

[x3x

].

(2) T : R2 → R2, T

([x1x2

])=

[1 22 4

] [x1x2

].

Theorem 5. A linear transformation T : V →W is one-to-one if and only if NulT = {~0}.

Proof. (⇒) First suppose T is one-to-one, and let ~v ∈ NulT . We will show ~v = ~0. To see this, note that

T (~v) = ~0 since ~v ∈ NulT . But we also have ~0 ∈ NulT since NulT is a subspace of V , so T (~0) = ~0 = T (~v).

Since T is one-to-one, we must have ~0 = ~v, and hence NulT is the trivial subspace NulT = {~0}.

(⇐) Conversely, suppose T is not one-to-one; we will show NulT is non-trivial. Since T is not one-to-one, there exist two distinct vectors ~v, ~w ∈ V , ~v 6= ~w, such that T (~v) = T (~w). Set ~u = ~v − ~w. Since

~v 6= ~w and additive inverses are unique, we have ~u 6= ~0, and we also have

T (~u) = T (~v − ~w) = T (~v)− T (~w) = T (~v)− T (~v) = ~0.

So ~u ∈ NulT . Since ~u is not the zero vector, NulT 6= {~0}. �

Definition 15. Let V,W be sets. A mapping T : V → W is called onto if the following holds: forevery ~w ∈W , there exists a vector ~v ∈ V such that T (~v) = ~w.

Example 19. Are the following linear transformations onto?

(1) T : R→ R2, T (x) =

[x3x

].

(2) T : R2 → R, T

([x1x2

])= x2 − x1.

(3) T : R2 → R2, T

([x1x2

])=

[x2

2x1 + x2

].

Definition 16. Let V,W be vector spaces and T : V →W a linear transformation. If T is both one-to-one and onto, then T is called an isomorphism. In this case the domain V and codomain W are calledisomorphic as vector spaces, or just isomorphic. It means that V and W are indistinguishablefrom one another in terms of their vector space structure.

Example 20. Prove that the mapping T : R2 → R2, where T rotates the plane by a fixed angle θ, isan isomorphism.

Example 21. Let W be the graph of the line y = 3x, a vector subspace of R2. Prove that R isisomorphic to W .

8 The Range Space of a Linear Transformation and the Column Space of aMatrix

Definition 17. Let V,W be vector spaces and T : V →W a linear transformation. Define the followingset:

RanT = {~w ∈W : there exists a vector ~v ∈ V such that T (~v) = ~w}.


Then RanT is called the range of T . (This definition should coincide with the student’s knowledgeof the range of a function from previous courses.)

Theorem 6. Let V,W be vector spaces and T : V → W a linear transformation. Then RanT is avector subspace of W .

Proof. Again we will show that RanT is closed under linear combinations, and appeal to Theorem 2.

To that end, let ~w1, ..., ~wn ∈ RanT and let c1, ..., cn ∈ R all be arbitrary. We must show c1 ~w1 + ...+cn ~wn ∈ RanT . To see this, note that since ~w1, ..., ~wn ∈ RanT , there exist vectors ~v1, ..., ~vn ∈ V suchthat T (~v1) = ~w1, ..., T ( ~vn) = ~wn. Set

~u = c1 ~v1 + ...+ cn ~vn.

Since V is a vector space, V is closed under taking linear combinations and hence ~u ∈ V . Moreover,we have

T (~u) = T (c1 ~v1 + ...+ cn ~vn)

= c1T (~v1) + ...+ cnT ( ~vn)

= c1 ~w1 + ...+ cn ~wn.

So the vector c1 ~w1 + ...+ cn ~wn is the image of ~u under T , and hence c1 ~w1 + ...+ cn ~wn ∈ RanT . Thisshows RanT is closed under linear combinations and hence a vector subspace of W . �

Theorem 7. Let V,W be vector spaces and T : V →W a linear transformation. Then T is onto if andonly if RanT = W .

Proof. Obvious if you think about it! �

Corollary 2. Let V,W be vector spaces and T : V → W a linear transformation. Then the followingstatements are all equivalent:

(1) T is an isomorphism.

(2) T is one-to-one and onto.

(3) NulT = {~0} and RanT = W .

Definition 18. Let m,n ∈ N, and let A be an m×n matrix (so A induces a linear transformation fromRn into Rm). Write

A =

~w1 ... ~wn

,

where each of ~w1, ..., ~wn is a column vector in Rm.Define the following set:

ColA = Span{ ~w1, ..., ~wn}.

Then ColA is called the column space of the matrix A. ColA is exactly the set of all vectors in Wwhich may be written as a linear combination of the columns of the matrix A. Of course ColA is also avector subspace of Rm, by Corollary 1.


Example 22. Let A =

0 12 10 0

.

(1) Determine whether or not the vector

350

is in ColA.

(2) What does ColA look like in R3?

Example 23. Let W =

6a− b

a+ b−7a

: a, b ∈ R

. Find a matrix A so that W = ColA.

Theorem 8. Let T : Rn → Rm (n,m ∈ N) be a linear transformation. Let A be the m × n matrixrepresentation of A guaranteed by Theorem 3. Then RanT = ColA.

Proof. Recall from Theorem 3 that the matrix A is given by

A =

T (~e1) T (~e2) ... T ( ~en)

,

where ~e1, ..., ~en are the standard basis vectors in Rn. So its columns are just T (~e1), ..., T ( ~en).

The first thing we will show is that RanT ⊆ ColA. So suppose ~w ∈ RanT . Then there exists some

vector ~v ∈ Rn for which T (~v) = ~w. Write ~v =

v1...vn

= v1 ~e1 + ... + vn ~en for some real numbers

v1, ..., vn ∈ R. Then, since T is a linear transformation, we have

~w = T (~v)

= T (v1 ~e1 + ...+ vn ~en)

= v1T (~e1 + ...+ vnT ( ~en).

So the above equality displays ~w as a linear combination of the columns of A, with weights v1, ..., vn.This proves ~w ∈ ColA. Since ~w was taken arbitrarily out of RanT , we must have RanT ⊆ ColA.

The next thing we will show is that ColA ⊆ RanT . So let ~w ∈ ColA. Then ~w can be written as alinear combination of the columns of A, so there exist some weights c1, ..., cn ∈ R for which

~w = c1T (~e1) + ...+ cnT ( ~en).

Set ~v = c1 ~e1 + ...+ cn ~en. So ~v ∈ Rn, and we claim that T (~v) = ~w. To see this, just compute (againusing the fact that T is linear):

T (~v) = T (c1 ~e1 + ...cn ~en)

= c1T (~e1) + ...+ cnT ( ~en)

= ~w.


This shows ~w ∈ RanT , and again since ~w was taken arbitrarily out of ColA, we have shown thatColA ⊆ RanT .

Since RanT ⊆ ColA and ColA ⊆ RanT , we must have RanT = ColA. This completes the proof. �

Example 24. Let T : R2 → R2 be the linear transformation defined by T (~e1) = −2~e1 + 4~e2 and

T (~e2) = −~e1 + 2~e2. Let ~w =

[21

]. Is ~w ∈ RanT?

9 Linear Independence and Bases.

Definition 19. Let V be a vector space and let ~v1, ..., ~vn ∈ V . Consider the equation:

c1 ~v1 + ...+ cn ~vn = ~0

where c1, ..., cn are interpreted as real variables. Notice that c1 = ... = cn = 0 is always a solution to theequation, which is called the trivial solution, but that there may be others depending on our choiceof ~v1, ..., ~vn.

If there exists a nonzero solution (c1, ..., cn) to the equation, i.e. a solution where ck 6= 0 for at leastone ck (1 ≤ k ≤ n) then the vectors ~v1, ..., ~vn are called linearly dependent. Otherwise if the trivialsolution is the only solution, then the vectors ~v1, ..., ~vn are called linearly independent.


123

, ~v2 =

456

, and ~v3 =

210

.

(1) Are the vectors ~v1, ~v2, ~v3 linearly independent?

(2) If possible, find a dependence relation, i.e. a non-trivial linear combination of ~v1, ~v2, ~v3 which

sums to ~0.

Fact 5. Let ~v1, ..., ~vn be column vectors in Rm. Define an m× n matrix A by:

A =

~v1 ... ~vn

,

and define a linear transformation T : Rn → Rm by the rule T (~v) = A~v for every ~v ∈ Rn. Then thefollowing statements are all equivalent:

(1) The vectors ~v1, ..., ~vn are linearly independent.

(2) NulT = {~0}.

(3) T is one-to-one.


[21

]and ~v2 =

[4−1

]. Are ~v1 and ~v2 linearly independent?

Fact 6. Let V be a vector space and ~v1, ..., ~vn ∈ V . Then ~v1, ..., ~vn are linearly dependent if and only ifthere exists some k ∈ {1, ..., n} such that ~vk ∈ Span{~v1, ..., ~vk−1, ~vk+1, ..., ~vn}, i.e. ~vk can be written as alinear combination of the other vectors.


[21

]∈ R2.

(1) Describe all vectors ~v2 ∈ R2 for which ~v1, ~v2 are linearly independent.


(2) Describe all pairs of vectors ~v2, ~v3 ∈ R2 for which ~v1, ~v2, ~v2 are linearly independent.


013

and ~v2 =

03−1

. Describe all vectors ~v3 ∈ R3 for which ~v1, ~v2, ~v3 are

linearly independent.

Definition 20. Let V be a vector space and let ~b1, ..., ~bn ∈ V . The set {~b1, ..., ~bn} ⊆ V is called a basisfor V if

(1) ~b1, ..., ~bn are linearly independent, and

(2) V = Span{~b1, ..., ~bn}.


30−6

, ~v2 =

−417

, and ~v3 =

−215

. Is {~v1, ~v2, ~v3} a basis for R3?


[23

]and ~v2 =

[50

].

(1) Is {~v1, ~v2} a basis for R2?

(2) What if ~v2 =

[−10−15

]?

Fact 7. The standard basis {~e1, ..., ~en} is a basis for Rn.

Fact 8. The set {1, x, x2, ..., xn} is a basis for Pn.

Fact 9. Let V be any vector space and let B = {~b1, ..., ~bn} be a basis for V .

(1) If you remove any one element ~bk from B, then the resulting set no longer spans V . This is

because ~bk is linearly independent from the other vectors in B, and hence cannot be written as alinear combination of them by Fact 6. In this sense a basis is a “minimal spanning set.”

(2) If you add any one element ~v, not already in B, to B, then the result set is no longer linearlyindependent. This is because B spans V , and hence there are no vectors in V which are inde-pendent from those in B by Fact 6. In this sense a basis is a “maximal linearly independentset.”

Theorem 9 (Unique Representation Theorem). Let V be a vector space and let B = {~b1, ..., ~bn} be a

basis for V . Then every vector ~v ∈ V may be written as a linear combination ~v = c1 ~b1 + ... + cn ~bn inone and only one way, i.e. ~v has a unique representation with respect to B.

Proof. Since {~b1, ..., ~bn} is a basis for V , in particular the set spans V , so we have ~v ∈ Span{~b1, ..., ~bn}.It follows that there exist some scalars c1, ..., cn ∈ Rn for which

~v = c1 ~b1 + ...+ cn ~bn.

So we need only check that the representation above is unique. To that end, suppose d1, ..., dn ∈ R isanother set of scalars for which

~v = d1 ~b1 + ...+ dn ~bn.

We will show that in fact d1 = c1, ..., dn = cn, and hence there is really only one choice of scalars tobegin with. To see this, observe the following equalities:


(d1 − c1)~b1 + ...+ (dn − cn) ~bn = [d1 ~b1 + ...+ dn ~bn]− [c1 ~b1 + ...+ cn ~bn]

= ~v − ~v

= ~0.

So we have written ~0 as a linear combination of ~b1, ..., ~bn. But the collection is a basis and hencelinearly independent, so the coefficients must all be zero, i.e. d1− c1 = ... = dn− cn = 0. The conclusionnow follows. �

10 Dimension

Theorem 10 (Steinitz Exchange Lemma). Let V be a vector space and let B = {~b1, ..., ~bn} be a basisfor V . Let ~v ∈ V . By Theorem 9 there are unique scalars c1, ..., cn for which

~v = c1 ~b1 + ...+ cn ~bn.

If there is k ∈ {1, ..., n} for which ck 6= 0, then exchanging ~v for ~bk yields another basis for the space,

i.e. the collection {~b1, ..., ~bk−1, ~v, ~bk+1, ..., ~bn} is a basis for V .

Proof. Call the new collection B̂ = {~b1, ..., ~bk−1, ~v, ~bk+1, ..., ~bn}. We must show that B̂ is both linearlyindependent and that it spans V .

First we check that B̂ is linearly independent. Consider the equation

d1 ~b1 + ...+ dk−1 ~bk−1 + dk~v + dk+1~bk+1 + ...+ dn ~bn = ~0.

Let’s substitute the representation for ~v in the above:

d1 ~b1 + ...+ dk−1 ~bk−1 + dk(c1 ~b1 + ...+ cn ~bn) + dk+1~bk+1 + ...+ dn ~bn = ~0.

Now collecting like terms we get:

(d1 + dkck)~b1 + ....+ (dk−1 + dkck−1) ~bk−1 + dkck ~bk + (dk+1 + dkck+1) ~bk+1 + ...+ (dn + dkcn) ~bn = ~0.

Now since ~b1, ..., ~bn are linearly independent, all the coefficients in the above equation must be zero.In particular, we have dkck = 0. But ck 6= 0 by our hypothesis, so we may divide through by ck and getdk = 0. Now substituting 0 back in for dk we have:

d1 ~b1 + ...dk−1 ~bk−1 + dk+1~bk+1 + ...+ dn ~bn.

Now using the linear independence of ~b1, ..., ~bn one more time, we conclude that d1 = ...dk−1 = dk =

dk+1 = ... = dn = 0. So the only solution is the trivial solution, and hence B̂ is linearly independent.

It remains only to check that V = Span B̂. To see this, let ~w ∈ V be arbitrary. By Theorem 9, write~w as a linear combination of the vectors in B:

~w = a1 ~b1 + ...+ an ~bn


for some scalars a1, ..., an ∈ R. Now notice that since ~v = c1 ~b1 + ...cn ~bn and since ck 6= 0, we may solve

for the vector ~bk as follows:

~bk = − c1ck~b1 − ...− ck−1

ck~bk−1 + 1

ck~v − ck+1

ck~bk+1 − ...− cn

ck~bn.

Note that for the above to make sense, it is crucial that ck 6= 0. Now, if we substitute the above

expression for bk in the equation ~w = a1 ~b1 + ... + an ~bn and then collect like terms, we will see ~w

written as a linear combination of the vectors ~b1, ..., ~bk−1, ~bk+1, ..., ~bn and the vector ~v. This implies that

~w ∈ Span B̂. Since ~w was arbitrary in V , we have V = Span B̂. So B̂ is indeed a basis for V and thetheorem is proved. �

Corollary 3. Let V be a vector space which has a finite basis. Then every basis of V is the same size.

Proof. Let B = {~b1, ..., ~bn} be a basis of minimal size. Let D be any other basis for V , consisting ofarbitrarily many elements of V . We will show that in fact D has exactly n elements.

Let ~d1 ∈ D be arbitrary. Since ~d1 ∈ V = Span{~b1, ..., ~bn}, it is possible to write ~d1 as a nontrivial

linear combination ~d1 = c1 ~b1 + ... + cn ~bn, i.e. ck 6= 0 for at least one k ∈ {1, .., n}. By reordering theterms of the basis B if necessary, we may assume without loss of generality that c1 6= 0. Define a new

set B1 = { ~d1, ~b2, ..., ~bn}; by the Steinitz Exchange Lemma, B1 is also a basis for V .

Now we go one step further. Since B was a basis of minimal size, we know D has at least n many

elements. So let ~d2 ∈ D be distinct from ~d1. We know ~d2 ∈ V = SpanB1, so there exist constants

c1, ..., cn for which ~d2 = c1 ~d1 + c2 ~b2 + ...+ cn ~bn. Notice that we must have ck 6= 0 for some k ∈ {2, ..., n},since if c1 were the only nonzero coefficient, then d∈ would be a scalar multiple of d∞, which is not the

case since they are linearly independent. By reordering the terms ~b2, ..., ~bn if necessary, we may assume

without loss of generality that c2 6= 0. Define B2 = { ~d1, ~d2, ~b3, ..., ~b4}. By the Steinitz Exchange Lemma,B2 is also a basis for V .

Continue this process. At each step k ∈ {1, ..., n}, we will have obtained a new basis Bk = { ~d1, ..., ~dk, ~bk+1, ..., ~bn}for V . Since D has at least n many elements, there is a ~dk+1 ∈ V which is not equal to any of ~d1, ..., ~dk.

Write ~dk+1 = c1 ~d1 + ...+ ck ~dk + ck+1~bk+1 + ...+ cn ~bn for some constants c1, ..., cn, and observe that one

of the constants ck+1, ..., cn must be nonzero since dk+1 is independent from d1, ..., dk. Then reorder theterms bk+1, ..., bn and use the Steinitz Exchange Lemma to replace bk+1 with dk+1 to obtain a new basis.

After finitely many steps we end up with the basis Bn = { ~d1, ..., ~dn} ⊆ D. Since Bn spans V , it is notpossible for D to contain any more elements, since nothing in V is linearly independent from the vectorsin Bn. So in fact D = Bn and the proof is complete. �

Definition 21. Let V be a vector space, and suppose V has a finite basis B = {~b1, ..., ~bn} consisting ofn vectors. Then we say that the dimension of V is n, or that V is n-dimensional. We also write thatdimV = n. Notice that every finite-dimensional space V has a unique dimension by Corollary 3.

If V has no finite basis, we say that V is infinite-dimensional.

Example 31. Determine the dimension of the following vector spaces.

(1) R.

(2) R2.

(3) R3.

(4) Rn.


(5) Pn.

(6) The trivial space {~0}.

(7) Let V be the set of all infinite sequences (a1, a2, a3, ...) of real numbers which eventualy endin an infinite string of 0’s. (Verify that V is a vector space using coordinate-wise addition andscalar multiplication.)

(8) The space P of all polynomials. (Verify that P is a vector space using standard polynomialaddition and scalar multiplication.)

(9) Let V be the set of all continuous functions f : R → R. (Verify that V is a vector space usingfunction addition and scalar multiplication.

(10) Any line through the origin in R3.

(11) Any line through the origin in R2.

(12) Any plane through the origin in R3.

Corollary 4. Suppose V is a vector space with dimV = n. Then no linearly independent set in Vconsists of more than n elements.

Proof. This follows from the proof of Corollary 3, since when considering D we only used the fact thatD was linearly independent, not spanning. �

Corollary 5. Let V be finite-dimensional vector space. Any linearly independent set in V may beexpanded to make a basis for V .

Proof. If a linearly independent set { ~d1, ..., ~dn} already spans V then it is already a basis. If it doesn’t

span, then we can find a linearly independent vector ~dn+1 /∈ Span{ ~d1, ..., ~dn} by Fact 6 and consider

{ ~d1, ..., ~dn, ~dn+1}. If it spans V , then we’re done. Otherwise, continue adding vectors. By the previouscorollary, we know the process stops in finitely many steps. �

Corollary 6. Let V be a finite-dimensional vector space. Any spanning set in V may be shrunk to makea basis for V .

Proof. Let S be a spanning set for V , i.e. V = SpanS. Without loss of generality we may assume that~0 /∈ S, for if ~0 is in S, then we can toss it out and the set S will still span V .

If S is empty then V = SpanS = {~0} and S is already a basis, and we are done.

If S is not empty, then choose ~s1 ∈ S. If V = Span{~s1}, we are done and {~s1} is a basis. Otherwise,observe that if every vector in S were in Span{~s1}, then we would have SpanS = Span{~s1}; since Sspans V and {~s1} does not, this is impossible. So there is some ~s2 ∈ S with ~s2 /∈ Span{~s1}, i.e. ~s1 and~s2 are linearly independent.

If {~s1, ~s2} spans V , we are done. Otherwise we can find a third linearly independent vector ~s3, andso on. The process stops in finitely many steps. �

Corollary 7. Let V be an n-dimensional vector space. Then a set of n vectors in V is linearly inde-pendent if and only if it spans V .

11 Coordinate Systems

Definition 22. Let V be a vector space and B = {~b1, ..., ~bn} be a basis for V . Let ~v ∈ v. By the Unique

Representation Theorem, there are unique weights c1, ..., cn ∈ R for which ~v = c1 ~b1 + ... + cn ~bn. Call


these constants c1, ..., cn the coordinates of x relative to B. Define the coordinate representationof x relative to B to be the vector

[~v]B =

c1...cn

∈ Rn.

Example 32. Consider a basis {~b1, ~b2} for R2, where ~b1 =

[10

]and ~b2 =

[12

].

(1) Suppose a vector ~v ∈ R2 has B-coordinate representation ~v =

[−23

]. Find ~v.

(2) Let ~w =

[−18

]. Find [~w]B.

Theorem 11. Let V be a vector space and B = {~b1, ..., ~bn} be a basis for V . Define a mapping T : V →Rn by T (~v) = [~v]B. Then T is an isomorphism.

Proof. (T is one-to-one.) Let ~v1, ~v2 ∈ V be such that [~v1]B = [~v2]B. Then ~v1 and ~v2 have the same co-ordinates relative to B. Now the Unique Representation Theorem impies that ~v1 = ~v2. So T is one-to-one.

(T is onto.) For every vector

c1...cn

∈ Rn, there corresponds a vector ~v = c1 ~b1 + ... + cn ~bn ∈ V .

Obviously T (~v) =

c1...cn

, so the map T is onto.

(T is linear.) Let ~v1, ~v2 ∈ V be arbitrary, and let k ∈ R be an arbitrary scalar. Use the Unique

Representation Theorem to find their unique coordinates ~v1 = c1 ~b1 + ...+cn ~bn and ~v2 = d1 ~b1 + ...+dn ~bn.Now just check that the linearity conditions hold:

T (~v1 + ~v2) = T ((c1 ~b1 + ...+ cn ~bn) + (d1 ~b1 + ...+ dn ~bn))

= T ((c1 + d1)~b1 + ...+ (cn + dn) ~bn)

=

c1 + d1...

cn + dn

=

c1...cn

+

d1...dn

= T (~v1) + T (~v2);

and

T (k ~v1) = T (kc1 ~b1 + ...+ kcn ~bn)

=

kc1...kcn

= k

c1...cn

= kT (~v1).

This completes the proof. �


Corollary 8. Any two n-dimensional vector spaces are isomorphic to one another (and to Rn).

Recall that isomorphisms are exactly the maps which perfectly preserve vector space structure. Inother words, suppose V is a vector space and T : Rn is an isomorphism. Then ~v1, .., ~vn are independentin V if and only if T (~v1), ..., T ( ~vn) are independent in Rn; the set {~v1, ..., ~vn} spans V if and only if theset {T (~v1), ..., T ( ~vn} spans Rn; a linear combination c1 ~v1 + ... + cn ~vn is equal to ~w in V if and only ifc1T (~v1) + ...+ cnT ( ~vn) is equal to T (~w) in W , etc., etc.

So the previous theorem and corollary imply that all problems in an n-dimensional vector space Vmay be effectively translated into problems in Rn, which we generally know how to handle via matrixcomputations.

Example 33. Determine whether or not the vectors 1 + 2x2, 4 + x + 5x2, and 3 + 2x are linearlyindependent in P2.

Example 34. Determine whether or not 1 + x+ x2 ∈ Span{1 + 5x,−4x+ 2x2} in P2.

Example 35. Define a map T : R3 → P2 by T

abc

= (a+ b)x2 + c for every a, b, c ∈ R.

(1) Is T a linear transformation?(2) Is T one-to-one?


362

and ~v2 =

−101

, and let B = {~v1, ~v2}. The vectors in B are linearly

independent, so B is a basis for V = Span{~v1, ~v2}. Let ~w =

3127

. Determine if ~w is in V , and if it is,

find [~w]B.

Fact 10 (Change of Basis: Non-Standard into Standard). Let ~v ∈ Rn and let B = {~b1, ..., ~bn} be anybasis for Rn. Then there exists a unique matrix PB with the property that

~v = PB[~v]B.

We call PB the change-of-coordinates matrix from B to the standard basis in Rn. Moreover, wehave

PB =

~b1 ... ~bn

.

Example 37. Let ~b1 =

1−13

, ~b2 =

208

, and ~b3 =

2−24

. Then ~b1, ~b2, ~b3 are three linearly

independent vectors and hence B = {~b1, ~b2, ~b3} forms a basis for the three-dimensional space R3. Suppose

[~v]B =

1−23

. Find ~v.

The previous fact gives us an easy tool for converting a vector in Rn from its non-standard coordinaterepresentation [~v]B into its standard representation ~v. What about going the opposite direction, i.e.converting from standard coordinates to non-standard? The idea will not be hard: since we can write~v = PB[~v]B using a B-to-standard matrix PB, we would like to be able to write [~v]B = P−1B ~v and regard

the inverse matrix P−1B as a change-of-basis from standard coordinates to B-coordinates. In order tojustify this idea carefully, we first need to recall a few facts about inverting matrices.


12 Matrix Inversion

Definition 23. For every n ∈ N, there is a unique n× n matrix I with the property that

AI = IA = A

for every n× n matrix A. This matrix is given by

I =

~e1 ... ~en

where ~e1, ..., ~en are the standard basis vectors for Rn.

Let A be an n × n matrix. If there exists a matrix C for which AC = CA = I, then we say A isinvertible and we call C the inverse of A. We denote A−1 = C, so

AA−1 = A−1A = I.

Fact 11. Let A =

[a bc d

]. If ad− bc 6= 0, then A is invertible and

A−1 =1

ad− bc

[d −b−c a

].

If ad− bc = 0, then A is not invertible.

Example 38. Find the inverse of A =

[3 45 6

]and verify that it is an inverse.

Example 39. Give an example of a matrix which is not invertible.

Theorem 12. If A is an invertible n × n matrix, then for each ~w ∈ Rn, there is a unique ~v ∈ Rn forwhich A~v = ~w.

Proof. Set ~v = A−1 ~w. Then A~v = AA−1 ~w = I ~w = ~w. If ~u were another vector with the property thatA~u = ~w = A~v, then we would have A−1A~u = A−1A~v and hence I~u = I~v. So ~u = ~v and this vector isunique. �

Fact 12. Let A be an n× n matrix. If A is row equivalent to I, then [A|I] is row equivalent to [I|A−1].Otherwise A is not invertible.

Example 40. Find the inverse of A =

0 1 21 0 34 −3 8

, if it exists.

Example 41. Find A−1 if possible.

(1) A =

0 3 −11 0 11 −1 0

(2) A =

1 −2 −1−1 5 65 −4 5

13 The Connection Between Isomorphisms and Invertible Matrices

Definition 24. Let V be a vector space. Define the identity map with respect to V , denotedidV : V → V , to be the function which leaves V unchanged, i.e. idV (~v) = ~v for every ~v ∈ V . It iseasy to verify mentally that idV is a one-to-one and onto linear transformation, i.e. an isomorphism.


Now let W be another vector space and let T : V → W be an isomorphism. Then there is a uniqueinverse map T−1 : W → V , with the property that

T−1 ◦ T = idV and T ◦ T−1 = idW ,

or

T−1(T (~v)) = ~v for every ~v ∈ V and T (T−1(~w)) = ~w for every ~w ∈W .

Example 42. Let T : R3 → P2 be defined by T

abc

= ax2 + bx + c. We proved T is an

isomorphism on the homework. Find an expression for T−1.

Theorem 13. Let T : Rn → Rn be an isomorphism and let A be its unique n×n matrix representationas guaranteed by 3. Then A is invertible and the matrix representation of T−1 is A−1.

Proof. First notice that if T is an isomorphism, it is one-to-one and hence the columns of A are linearlyindependent by Fact 5. This means that A may be row reduced to the n × n identity matrix I, andhence A is invertible by Fact 12.

Notice that the unique n×n matrix representation of the identity map idRn must be the n×n identitymatrix I. It is sufficient to notice that idRn( ~ek) = ~ek = I ~ek for every standard basis vector ~ek, 1 ≤ k ≤ n.

Now let ~v ∈ Rn be arbitrary. Set ~u = A−1~v. Notice that

T (~u) = A~u = A(A−1~v) = (AA−1)~v = I~v = ~v.

Taking T−1 of both sides above, we get T−1(~v) = ~u = A−1~v. So T−1 and A1 send ~v to the sameplace for every ~v, i.e. A−1 is a matrix representation for T−1. Since this matrix representation must beunique, we are done. �

Example 43. Let T : Rn → Rn be the linear transformation defined by the rules T (~e1) =

[35

]and

T (~e2) =

[46

].

(1) Prove that T is an isomorphism.(2) Find an explicit formula for T−1.

Solution. (1) T has matrix representation A =

[3 45 6

]by Theorem 3. A is invertible by Fact 11 since

3 · 6− 4 · 5 6= 0. So T is invertible, i.e. T is an isomorphism.

(2) By Fact 11, we haveA−1 = − 12

[6 −4−5 3

]. Then by Theorem 13, we have T (~v) = − 1

2

[6 −4−5 3

]~v

for every ~v ∈ R2. �

14 More Change of Basis

Fact 13 (Change of Basis: Standard into Non-Standard). Let B = {~b1, ..., ~bn} be a basis for Rn. Let PBbe the unique n× n for which

~v = PB[~v]B


for every ~v ∈ Rn, so PB is the change-of-coordinates matrix from B to the standard basis. Then PB isalways invertible since its columns are linearly independent, and we call the inverse P−1B the change-of-coordinates matrix from the standard basis to B. This matrix has the property that

[~v]B = P−1B ~v

for every ~v ∈ Rn.

Example 44. Let B =

{[21

],

[−11

]}. Find a matrix A such that [~v]B = A~v for every ~v ∈ R2.

Example 45. Let B =

{[1−3

],

[−24

]}and C =

{[−79

],

[−57

]}. Find a matrix A for which

[~v]C = A[~v]B for every ~v ∈ R2.

Fact 14. Let B = {~b1, ..., ~bn} and C = {~c1, ..., ~cn} be two bases for Rn. Then there exists a unique n×nmatrix, which we denote P

C←B, with the property that

[~v]C = PC←B

[~v]B

for every ~v ∈ Rn. We call PC←B

the change-of-coordinates matrix from B to C.

Notice that

PC←B

= P−1C · PB.

We may also write

PC←B

=

~a1 ... ~an

.

where ~a1 = [~b1]C, ..., ~an = [ ~bn]C.

Fact 15. Let B and C be any two bases for Rn. Then PC←B

is invertible, and(

PC←B

)−1= PB←C

.

Example 46. Let B and C be as in the previous example.

(1) Find PC←B

.

(2) Find PB←C

.

Example 47. Suppose C = {~c1, ~c2} is a basis for R2, and B = {~b1, ~b2} where ~b1 = 4~c1 + ~c2 and

~b2 = −6~c1 + ~c2. Suppose [~v]B =

[31

]. Find [~v]C .

15 Rank and Nullity

Definition 25. Let V and W be vector spaces and T : V → W a linear transformation. Define therank of T , denoted rankT , by

rankT = dim RanT .

We also define the nullity of T to be dim NulT , the dimension of the null space of T .


Intuitively we think of the nullity of a linear transformation T as being the amount of “informationlost” by the map, and the rank is the amount of “information preserved,” as the next example may helpillustrate.

Example 48. Compute rankT and dim NulT for the following maps.

(1) T : R3 → R2, T

xyz

=

[yz

].

(2) T : R3 → R3, T

xyz

=

x+ y2x+ 2y

0

.

(3) T : R→ R2, T (x) =

[−2xx

].

(4) T : R5 → R3, T

x1x2x3x4x5

=

000

.

Theorem 14 (Rank-Nullity Theorem). Let V,W be finite-dimensional vector spaces and let T : V →Wbe a linear transformation. Then

rankT + dim NulT = dimV .

Proof. Suppose dim NulT = k and dimV = n. Obviously k ≤ n since NulT is a subspace of

V . Let {~b1, ..., ~bk}} be a basis for NulT . By Corollary 5 we may expand this set to form a basis

{~b1, ..., ~bk, ~bk+1, ..., ~bn} for the larger space V . We claim that the set {T ( ~bk+1), ..., T ( ~bn)} comprises abasis for RanT , and hence rankT = dim RanT = n− k.

First we must show that T ( ~bk+1), ..., T ( ~bn) are linearly independent. To see this, consider the equation

ck+1T ( ~bk+1) + ...+ cnT ( ~bn) = ~0.

Since T is a linear transformation, we may rewrite the above as

T (ck+1~bk+1 + ...+ cn ~bn) = ~0.

It follows that the linear combination ck+1~bk+1 + ...+ cn ~bn is in the null space NulT . This means we

can write it as a linear combination of the elements of our basis {~b1, ..., ~bk} for NulT :

ck+1~bk+1 + ...+ cn ~bn = c1 ~b1 + ...+ ck ~bk

for some c1, ..., ck ∈ R. Moving all terms to one side of the equation, we get:

c1 ~b1 + ...+ ck ~bk − ck+1~bk+1 − ...− cn ~bn = ~0.

In other words we have written ~0 as a linear combination of the basis elements {~b1, ..., ~bn} for V . Sincethese are linearly independent, all the coefficients must be 0, i.e. c1 = ... = ck = ck+1 = ... = cn = 0.

This shows that T ( ~bk+1), ..., T ( ~bn) are linearly independent.


It remains only to show that {T ( ~bk+1), ..., T ( ~bn)} forms a spanning set for RanT . To that end, let~w ∈ RanT , so there exists a ~v ∈ V with T (~v) = ~w. Write ~v as a linear combination of the basis elementsfor V :

~v = c1 ~b1 + ...+ ck ~bk + ck+1~bk+1 + ...+ cn ~bn.

Notice that T (~b1) = ... = T (~bk) = ~0 since ~b1, ..., ~bk ∈ NulT . So, applying the map T to both sides ofthe equation above, we get:

~w = T (~v)

= T (c1 ~b1 + ...+ ck ~bk + ck+1~bk+1 + ...+ cn ~bn)

= c1T (~b1) + ...+ ckT (~bk) + ck+1T ( ~bk+1) + ...+ cnT ( ~bn)

= c1~0 + ...+ ck~0 + ck+1T ( ~bk+1) + ...+ cnT ( ~bn)

= ck+1T ( ~bk+1) + ...+ cnT ( ~bn).

So we have written ~w as a linear combination of the elements of {T ( ~bk+1), ..., T ( ~bn)}, and since ~w

was arbitrary, we see that they span RanT . So {T ( ~bk+1), ..., T ( ~bn)} is a basis for RanT and hencerankT = n− k. The conclusion of the theorem now follows. �

Theorem 15. Let V and W both be n-dimensional vector spaces, and let T : V → W be a lineartransformation. Then the following statements are all equivalent.

(1) T is an isomorphism.(2) T is one-to-one and onto.(3) T is invertible.(4) T is one-to-one.(5) T is onto.

(6) NulT = {~0}.(7) dim NulT = 0.(8) RanT = W .(9) rankT = n.

Furthermore, if V = W = Rn and A is the unique n× n matrix representation of T , then the followingstatements are all equivalent, and equivalent to the statements above.

(10) A is invertible.(11) A is row-equivalent to the n× n identity matrix I.

(12) The equation A~x = ~0 has only the trivial solution.(13) The columns of A form a basis for Rn.(14) The columns of A are linearly independent.(15) The columns of A span Rn.(16) ColA = Rn.(17) dim ColA = n.

Proof. We first check that (5) ⇔ (8) ⇔ (9). We have (5) ⇔ (8) by Theorem 7. If (8) holds andRanT = W , then rankT = dim RanT = dimW = n, so (9) holds. Conversely if (9) holds, then RanTadmits a basis B consisting of n vectors. The vectors in B are linearly independent in W and hence theyspan W by Corollary 7. So W = SpanB = RanT and hence (8) holds; so (5), (8), and (9) are equivalent.

We also check that (4)⇔ (6)⇔ (7). (4)⇔ (6) follows from Theorem 5 and (6)⇔ (7) is obvious fromthe definition of dimension.


By the Rank-Nullity Theorem 14, we have (7) ⇔ (9). So all of statements (4) − (9) are equivalent.Statements (1)−(3) are obviously equivalent, and obviously imply (4)−(9). If any of statements (4)−(9)hold, then all of them do- in particular, T will be one-to-one and onto, so (1)− (3) will hold. This con-cludes the proof of the first half of the theorem.

Now suppose we have V = W = Rn and A is the n × n matrix representation of T . We have(1)⇒ (10) by Theorem 13. If (10) holds and A is invertible, then we may define a map T−1 : Rn → Rn

by T−1(~v) = A−1~v for every ~v ∈ Rn; it is easy to check that this T−1 is truly the inverse of T , and hence(3) holds. So (10) holds if and only if (1)− (9) holds.

We have (10)⇔ (11) by Fact 12. (11) holds if and only if A is row-equivalent to I, if and only if the

augmented matrix [A | ~0] is row-equivalent to [I | ~0], if and only if the equation A~x = ~0 has only onesolution, i.e. (12) holds. So (11)⇔ (12). So (1)− (12) are all equivalent.

Since RanT = ColA by Theorem 8, we have (16) ⇔ (8) and (17) ⇔ (9). So (1) − (12) are eachequivalent to (16) and (17).

We have (15)⇔ (16) by the definition of ColA, and we have (13)⇔ (14)⇔ (15) by Corollary 7. So(1)− (17) are all equivalent as promised. �

16 Determinants and the LaPlace Cofactor Expansion

Theorem 15 says that linear transformations are actually isomorphisms whenever their matrix repre-sentations are invertible. When is a given n× n matrix invertible? We have easy ways to check if n = 1or n = 2, but how can one tell if, say, a 52× 52 matrix is invertible? We hope to develop a computableformula for checking the invertibility of an arbitrary square matrix, which we will call the determinantof that matrix.

There are many different ways to introduce and define the notion of a determinant for a generaln× n matrix A. In the interest of brevity, we will restrict our attention here to two major approaches.The first method of computing a determinant is given implicitly in the upcoming Definition 28. Thisdefinition best conveys the intuition that the determinant of a matrix characterizes its invertibility ornon-invertibility, and is computationally easiest for large matrices. The second method for computingdeterminants is the LaPlace expansion or cofactor expansion, which is probably computationally easierfor small matrices, and which will prove very useful in the next section on eigenvalues and eigenvectors.

Definition 26. (1) Let A = [a] be a 1 × 1 matrix. The determinant of A, denoted detA, is thenumber detA = a.

(2) Let A =

[a bc d

]be a 2×2 matrix. The determinant of A, also denoted detA, is the number

detA = ad− bc.

Notice that in both cases above, the matrix A is invertible if and only if detA 6= 0. This statementis obvious for (1), and for (2) it follows from Fact 11, which the reader may be asked to prove on thehomework.

We wish to generalize this idea for n × n matrices A of arbitrary size n, i.e. we hope to find somecomputable number detA with the property that A is invertible if and only if detA 6= 0. Of courseA is invertible if and only if it is row-equivalent to the n × n identity matrix I, so there must be astrong relationship between the determinant detA and the elementary row operations we use to reducea matrix. Building on this vague observation, in the next example we investigate the effect of the rowoperations on detA for 2× 2 matrices A.

Example 49. Let A =

[a bc d

]be a 2× 2 matrix.


(1) If c = 0, then what is detA?

(2) Row-reduce A to echelon form without interchanging any rows and without scaling any rows.Let E be the new matrix obtained after this reduction. What is detE?

(3) Let k ∈ R. What is det

[ka kbc d

]? What about det

[a bkc kd

]? What about det

[ka kbkc kd

]?

(4) What is det

[c da b

]?

Definition 27. Let A be an n× n matrix. Write

A =

a11 a12 ... a1na21 a22 ... a2n

...an1 an2 ... ann

,

so aij denotes the entry of A in row i and column j. For shorthand we also denote this matrix by A = [aij ].

A matrix A = [aij ] is called upper-triangular if aij = 0 whenever j < i. A matrix is obviouslyupper-triangular if and only if it is in echelon form.

Definition 28. We postulate the existence of a number detA associated to any matrix A = [aij ], calledthe determinant of A, which satisfies the following properties.

(1) If A is upper-triangular then detA = a11 · a22 · ... · ann, i.e. the determinant is the product ofthe entries along the diagonal of A.

(2) (Replacement) If E is a matrix obtained by using a row replacement operation on A, thendetA = detE.

(3) (Interchange) If E is a matrix obtained by using the row interchange operation on A, thendetA = −detE.

(4) (Scaling) If E is a matrix obtained by scaling a row of A by some constant k ∈ R, then detA =1k detE.

Notice that the definition above is consistent with our definitions of detA for 1×1 and 2×2 matricesA. The definition above gives a recursive, or algorithmic method for computing determinants of largermatrices A, as we will see in the upcoming examples. The fact that the above definition is a good one,i.e. that such a number detA exists for each n× n matrix A and is also unique, for every n > 2, shouldnot be at all obvious to the reader, but its proof would involve a long digression and so unfortunatelywe must omit it here.

Example 50. (1) Compute det

1 2 3 4 50 6 7 8 90 0 −1 −2 −30 0 0 −4 −50 0 0 0 1

2

.

(2) Compute det

1 −4 2−2 8 −9−1 7 0

.

(3) Compute det

2 −8 6 83 −9 5 10−3 0 1 −21 −4 0 6

.


Theorem 16. A square matrix is invertible if and only if detA 6= 0.

Proof. (⇒) Suppose detA 6= 0. Notice that by Definition 28, if E is the matrix obtained after per-forming any elementary row operation on A, then we either have detE = detA, or detE = −detA, ordetE = k detA for some constant k ∈ R. In any of the three cases detE 6= 0 since detA 6= 0.

It follows that if B = [bij ] is the echelon form of A, then detB 6= 0, since B is obtained from A by afinite sequence of elementary row operations. But detB = b11 · b22 · ... · bnn since B is upper-triangular.Since detB 6= 0, we have b11 6= 0, b22 6= 0, ..., bnn 6= 0. So all the entries along the diagonal are non-zero,and hence A is row-equivalent to B is row-equivalent to the identity matrix I. Hence A is invertible byTheorem 15.

(⇐) On the other hand suppose detA = 0, and let E be the matrix obtained after using any elementaryrow operation on A. Then either detE = detA or detE = −detA or detE = k detA for some constantk; in all three cases detE = 0 since detA = 0. Then if B is the echelon form of A, we have detB = 0since B is obtained from A after applying finitely many elementary row operations. It follows thatdetB = b11 · b22 · ... · bnn = 0 and hence bii = 0 for some i ∈ {1, ..., n}. So B is not row-equivalent to I,and hence neither is A. Then A is not invertible by Theorem 15. �

Definition 29. Let A = [aij ] be an n × n matrix. For any fixed integers i and j, let Aij denote the(n−1)×(n−1) matrix obtained by deleting the i-th row and the j-th column from A. Written explicitly,we have:

Aij =

a11 ... a1,i−1 a1,i+1 ... a1n

...aj−1,1 ... aj−1,i−1 aj−1,i+1 ... aj−1,naj+1,1 ... aj+1,i−1 aj+1,i+1 ... aj+1,n

...an1 ... an,j−1 an,j+1 ... ann

.

The (i, j)-cofactor of A is the number

Cij = (−1)i+j detAij .

Fact 16 (LaPlace Expansion for Determinants). Let A = [aij ] be an n × n matrix. Then for anyi, j ∈ {1, ..., n},

detA = ai1Ci1 + ai2Ci2 + ...+ ainCin

= a1jC1j + a2jC2j + ...+ anjCnj .

The above is also called the method of cofactor expansion for computing determinants.

Example 51. Use a cofactor expansion across the third row to compute detA, whereA =

1 5 02 4 −10 −2 0

.

Example 52. Compute detA, where

A =

3 −7 8 9 −60 2 −5 7 30 0 1 5 00 0 2 4 −10 0 0 −2 0

.


17 Eigenvectors and Eigenvalues

Definition 30. Let V be a vector space and T : V → V a linear transformation. A number λ is calleda eigenvalue of T if there exists a non-zero vector ~v ∈ V for which

T (~v) = λ~v.

If λ is an eigenvalue of T , then any vector ~v for which T (~v) = λ~v is called a eigenvector of Tcorresponding to the eigenvalue λ.

Example 53. Let T : R2 → R2 be the linear transformation which has matrix representation A =[2 10 3

].

(1) Plot the vectors T (~e1), T (~e2), and T (~e1 + ~e2) in the Cartesian coordinate plane.(2) Which of ~e1, ~e2, and ~e1 + ~e2 are eigenvectors of T? For what eigenvalues λ?

Example 54. Let T : R2 → R2 be the linear transformation which has matrix representation A =[1 65 2

].

(1) Show that 7 is an eigenvalue of T .(2) Find all eigenvectors which correspond to the eigenvalue 7.

Definition 31. Let V be a vector space and T : V → V a linear transformation. Let λ be an eigenvalueof T . The set of all eigenvectors corresponding to λ is called the eigenspace of T corresponding to λ.

Theorem 17. Let V be a vector spaces, and T : V → V a linear transformation. Let λ be an eigenvalueof T . Let idV : V → V be the identity map as in Definition 24. Then the eigenspace of T correspondingto λ is exactly Nul(T−λ·idV ), where T−λ·idV : V → V is defined by (T−λ·idV )(~v) = T (~v)−λ·idV (~v) =T (~v)− ~v for every ~v ∈ V .

Proof. Suppose ~v ∈ V is an eigenvalue of T corresponding to λ, i.e.

T (~v) = λ~v.

Then moving all terms to the left side of the equality, we have

T (~v)− λ~v = ~0.

Rewriting the above, we have

(T − λ · idV )(~v) = T (~v)− λ · idV (~v) = T (~v)− λ~v = ~0.

So ~v ∈ Nul(T − λ · idV ). The same arguments above, applied in reverse, will show that if ~v ∈Nul(T − λ · idV ), then T (~v) = λ~v and hence ~v is an eigenvector corresponding to λ. So the eigenspacecorresponding to λ is exactly Nul(T − λ · idV ). �

Corollary 9. Let V be a vector space and T : V → V a linear transformation. If λ is any eigenvalueof T , then its corresponding eigenspace is a vector subspace of V .

Corollary 10. Let T : Rn → Rn be a linear transformation and let A be its n×n matrix representation.Let λ be an eigenvalue of T , and let I denote the n×n identity matrix. Then the eigenspace correspondingto λ is exactly Nul(A− λI).


Example 55. Let A =

4 −1 62 1 62 −1 8

. An eigenvalue of A is 2. Find a basis for the corresponding

eigenspace.

Example 56. Find the eigenvalues of A =

[2 33 −6

].

Example 57. Find the eigenvalues of A =

1 2 12 0 −2−1 2 3

.

Definition 32. Let A be an n× n matrix. Then the characteristic polynomial of A is the function

det(A− λI),

where λ is treated as a variable.

Theorem 18. Let A be an n × n matrix. A number λ ∈ R is an eigenvalue of A if and only ifdet(A− λI) = 0.

Proof. λ is an eigenvalue if and only if there is a non-zero vector ~v with A~v = λ~v. This happens if andonly if there is a non-zero vector ~v for which (A−λI)~v = ~0, i.e. if and only if Nul(A−λI) is non-trivial.Nul(A − λI) is non-trivial if and only if A − λI is not invertible, by Theorem 15. But A − λI is notinvertible if and only if det(A− λI) = 0 by Theorem 16. �

Example 58. Let A =

4 0 −10 4 −11 0 2

.

(1) Find the characteristic polynomial of A.

(2) Find all eigenvalues of A. (Hint: 3 is one.)

(3) For each eigenvalue λ of A, find a basis for the eigenspace corresponding to λ.

18 Diagonalization

Definition 33. An n × n matrix A = [aij ] is called diagonal if aij = 0 whenever i 6= j, i.e. the onlynon-zero entries in A are along the main diagonal.

We think of diagonal matrices as the “simplest” of all possible linear transformations. To illustratethis intuition, consider the following example.

Example 59. Let D =

[5 00 3

].

(1) Find a formula for the matrix power Ak.

(2) Let A =

[7 2−4 1

]. Observe that A = PDP−1, where P =

[2 1−1 −1

]. Find a formula for

the matrix power Ak.

Definition 34. Let A and C be n × n matrices. We say that A is similar to C if there exists aninvertible n× n matrix P such that A = PCP−1.

Here is some intuition for the above definition. Let E be the standard basis for Rn. Since P is invert-ible, its columns form a basis B for Rn and hence P = PB = P

E←Bis the change-of-basis matrix from B

to the standard basis. Likewise P−1 = P−1B = PB←E

is the change-of-basis matrix from the standard basis

to B. Then the following is a diagram of the transformations A, C, P , and P−1:


Rn A−→ Rn

P−1 ↓ P ↑

Rn C−→ Rn

The two “up-down” arrows above represent just a change of basis. To say that C is similar to B isto say that travelling along the top arrow is the same as travelling along the bottom three arrows. Inother words, the matrix C acting on basis B-coordinates induces the same transformation of Rn as Adoes acting on standard coordinates.

Definition 35. An n× n matrix A is called diagonalizable if A is similar to a diagonal matrix D.

Theorem 19 (Diagonalization Theorem). An n × n matrix A is diagonalizable if and only if A has nlinearly independent eigenvectors.

Moreover, if B = {~b1, ..., ~bn} is a set of n linearly independent eigenvectors of A corresponding to theeigenvalues λ1, ..., λn respectively, then

A = PDP−1

where P = PB and D is a diagonal matrix with entries λ1, ..., λn along the diagonal.

Proof. (⇒) Let B = {~b1, ..., ~bn} be a collection of n linearly independent eigenvectors of A correspondingto the eigenvalues λ1, ..., λn respectively. Set P = PB and let D be the diagonal matrix with entriesλ1,...,λn along the diagonal. We can use two different arguments to show that A = PDP−1; the first ispurely computational while the second uses the fact that B is a basis for Rn.

(Argument 1) Recall that PB =

~b1 ... ~bn

. In that case, we have

AP = A

~b1 ... ~bn

=

A~b1 ... A ~bn

and

PD =

~b1 ... ~bn

λ1 ... 0...

0 ... λn

=

λ1 ~b1 ... λn ~bn

.

Since A~b1 = λ1 ~b1, ..., A~bn = λn ~bn, we have that AP = PD. Now multiplying on the right by P−1,we get A = PDP−1.

(Argument 2) We will show that A~v = PDP−1~v for every vector ~v ∈ Rn, and hence A = PDP−1 bythe uniqueness clause in Theorem 3. So let ~v ∈ Rn be arbitrary. Since B is a basis for Rn, there exist

constants c1, ..., cn for which ~v = c1 ~b1 + ...+ cn ~bn. Now compute A~v:

A~v = A(c1 ~b1 + ...+ cn ~bn)

= c1A~b1 + ...+ cnA~bn

= c1λ1 ~b1 + ...+ cnλn ~bn.


To compute PDP−1~v, observe that P−1 is the change-of-basis matrix from the standard basis to

B; so P−1~v =

c1...cn

. Since D is a diagonal matrix with entries λ1, ..., λn, we have AP−1~v =

A

c1...cn

=

λ1c1...λncn

. Lastly since P is the change-of-basis matrix from B to the standard basis,

we have PDP−1~v = P

λ1c1...λncn

= λ1c1 ~b1 + ... + λncn ~bn = A~v. Since ~v was arbitrary, this completes

the proof.

(⇐) Conversely, suppose A is diagonalizable, i.e. A = PDP−1 for some diagonal matrix D and some

invertible matrix P . Write P =

~b1 ... ~bn

, and denote the entries along the diagonal of D by

λ1, ..., λn respectively. By multiplying on the right by P , we see that AP = PD and hence

A~b1 ... A ~bn

=

λ1 ~b1 ... λn ~bn

as we computed previously. So all columns are equal, i.e. A~b1 = λ1 ~b1, ..., A~bn = λn ~bn. This implies that~b1, ..., ~bn are eigenvectors of A corresponding to λ1, ..., λn respectively. In addition since P is invertible,

its columns ~b1, ..., ~bn are linearly independent. So A has n linearly independent eigenvectors. �

Example 60. Diagonalize the following matrix, if possible.

A =

1 3 3−3 −5 −33 3 1

.

Example 61. Compute A8, where A =

[4 −32 −1

].

Definition 36. Recall from a previous course that if p(x) is a polynomial and a is a root of p (i.e.p(a) = 0), then the multiplicity of a with respect to p is the number of times the term (x− a) appearsin the factored form of p. (This factored form always exists by the Fundamental Theorem of Algebra.)For an example, if p(x) = 5(x− 2)4(x− 3)5(x− 4), then p has three roots 2, 3, and 4, and the roots havemultiplicity 4, 5, and 1 respectively. The sum of the multiplicities is 4 + 5 + 1 = 10, which is exactly thedegree of p.

Now let A be an n × n matrix and let λ1 be an eigenvalue of A. Then define the multiplicityof λ1 with respect to A to be just the multiplicity of λ1 with respect to the characteristic polynomialp(λ) = det(A− λI).

Fact 17. Let A be an n×n matrix. Let λ1, ..., λp be all the distinct eigenvalues of A, with multiplicitiesm1, ...,mp respectively. Then m1 + ...+mp ≤ n.

The reason we have a ≤ above instead of an = is because for the moment, we are considering only real

eigenvalues and not complex eigenvalues. For example, the characteristic polynomial of A =

[−1 1−1 −1

]is p(λ) = λ2+2λ+2, which is an irreducible quadratic, i.e. it cannot be factored into a product of degreeone binomials with real coefficients. So A has no real eigenvalues and hence the sum of the multiplicitiesof its eigenvalues is 0, which is strictly less than 2. (If we allowed complex eigenvalues, we would alwayshave m1 + ...+mp = n in the above fact.)


Theorem 20. Let A be an n × n matrix. Let λ1, ..., λp be all the distinct eigenvalues of A, withmultiplicities m1, ...,mp respectively.

(1) For 1 ≤ k ≤ p, the dimension of the eigenspace corresponding to λk is less than or equal to mk.

(2) A is diagonalizable if and only if the sum of the dimensions of the eigenspaces is equal to n,and this happens if and only if the dimension of the eigenspace corresponding to λk is mk for all1 ≤ k ≤ p.

(3) If A is diagonalizable and Bk is a basis for the eigenspace corresponding to λk for each 1 ≤ k ≤ p,then the total collection of vectors in B1, ...,Bp comprises a basis of eigenvectors for Rn.

(4) If p = n then A is diagonalizable.

Example 62. Diagonalize the following matrix, if possible.

A =

5 0 0 00 5 0 01 4 −3 0−1 −2 0 −3

.

1 sets and set notation. - north dakota state universitymicohen/oldlecturenotes/lin...linear algebra...

Documents