chapter 4 & 5: vector spaces & linear transformationsgressman/overheads_chap4.pdf ·...

Chapter 4 & 5: Vector Spaces & LinearTransformations

Philip Gressman

University of Pennsylvania

Philip Gressman Math 240 002 2014C: Chapters 4 & 5 1 / 40

Objective

The purpose of Chapter 4 is to think abstractly about vectors.

• We will identify certain fundamental laws which are true forthe kind of vectors (i.e., column vectors) that we alreadyknow.

• We will consider the abstract notion of a vector space.

• The purpose of doing this is to apply the theory that wealready know and love to situations which appear superficiallydifferent but are really very similar from the right perspective.

• The main application for us will be solutions of linearordinary differential equations. Even though they are functionsinstead of column vectors, they behave in exactly the sameway and can be understood completely.


Abstract Setup

We begin with a collection V of things we call vectors and acollection of scalars F (which will be either real numbers, rationalnumbers, or complex numbers usually).

A1 We are given a formula for vector addition u + v . The sumof two things in V will be another thing in V .

A2 We are given a formula for scalar multiplication cv for c ∈ Fand v ∈ V . It should be another thing in V .

A3 (Commutativity) It should be true that u + v = v + u for anyu, v ∈ V .

A4 (Associativity of +) It should be true that(u + v) + w = u + (v + w) for any u, v ,w ∈ V .

A5 (Additive identity) There should be a vector called 0 such thatv + 0 = v for any v ∈ V .

A6 (Additive inverses) For any v ∈ V , there should be a vector−v so that v + (−v) = 0.

A7 (Multiplicative identity) For any v ∈ V , we should have1v = v .


Abstract Setup (continued)

A8 (Associativity of ·) For any v ∈ V and c1, c2 ∈ F , we shouldhave (c1c2)v = c1(c2v).

A9 (Distributivity I) For any u, v ∈ V and c ∈ F , it should betrue that

c(u + v) = cu + cv .

A10 (Distributivity II) For any c1, c2 ∈ F and u ∈ V , it should betrue that

(c1 + c2)u = c1u + c2u.

The Point: Someone hands you a set of things V they call vectorsand a set F of scalars. If they call V a vector space over F , theymean that they have checked that A1 through A10 are always true.If someone asks you if a given V is a vector space over F , youmust check that each one of A1 through A10 is true under allcircumstances.


Vector Spaces: Examples

• Real-valued convergent sequences {an} are a vector spaceover R when {an}+ {bn} is the sequence cn := an + bn andscalar multiplication is done termwise. They are not a vectorspace over C. Complex-valued convergent sequences {bn} area vector space over C. They are also a vector space over R.

• Real-valued functions f on an open interval I such thatf , f ′, f ′′, . . . , f (k) are all continuous functions on I form avector space under pointwise addition and pointwise scalarmultiplication. It’s called C k(I ).

• Item polynomials with real coefficients are a vector space overR under the usual notions of addition and scalarmultiplication.


Vector Spaces: Beyond the Definition

Any two things which are vector spaces over F will have manythings in common, far beyond just the properties A1 through A10.For example:

• 0u = 0 for any u ∈ V . Why? Well,0u + u = (0 + 1)u = 1u = u, but now add −u to both sides.

• There is only one vector 0 which always works for A5. That’sbecause v = v + 0′ = 0′ + v ; now add 0 to each side.

• c0 = 0 for any c ∈ F . Why? If c 6= 1, notice that1

c−10 + 0 = 1c−10. Un-distribute the left-hand side and then

multiply both sides by (c − 1). If c = 1 then axiom A7already works.

See Theorem 4.2.6 for several more examples.


Subspaces

A vector subspace S is a subset of V which is a vector space inits own right. The conditions to check are A1 and A2 (namely,that sums of things in S need to remain in S and scalar multiplesof things in S still belong in S). All the other conditions will stillbe true because they were true in V .

Important Example

If you have any homogeneous system of equations in n unknowns,then the solution set of the system will be a subspace of F n (that’sthe notation for n-tuples of things in F ).

Second Important Example

Solutions of any linear homogeneous ODE form a vector subspaceof the vector space of all functions on the real line (or on a finiteinterval I if you wish).


Subspaces

For systems of equations with m unknowns, solutions of anyhomogeneous system of equations always form a vector subspaceof Rm (or Cm if solutions are allowed to be complex).Given an n ×m matrix A, we define the null space or kernel of Ato be those vectors x in Rm (or Cm) such that Ax = 0.


Linear Combinations

If you are given vectors v1, v2, . . . , vk , any vector w which can bewritten in the form

c1v1 + c2v2 + · · ·+ ckvk

for scalars c1, c2, . . . , ck is said to be a linear combination ofv1, . . . , vk .

Definition

Given any list of vectors v1, . . . , vk ∈ V , the set of all linearcombinations of v1, . . . , vk is a vector subspace of V . It is calledthe span of v1, . . . , vk .


Types of Problems

• Determine whether a given set of vectors spans a vectorspace.

• A collection of fewer than n vectors in Rn never spans all ofRn.

• A collection of exactly n vectors in Rn spans all of Rn if andonly if the determinant of the matrix they make by adjoiningcolumns is nonzero.

• A collection of more than n row vectors spans Rn if and only ifthe rank of the matrix made by adjoining rows is n.

• Given a list of vectors, classify all vectors in the span.• Write a system of equations with unknowns equal to the

coefficients cj in the linear combination. Try to solve thesystem for an arbitrary vector b solving b = c1v1 + · · ·+ ckvk .

• Determine whether the spans of two sets are the same ordifferent.

• Identify whether a particular vector is in the span or not.


Types of Problems

• Identify when a specific vector y is in the span of v1, . . . , vk?Determine if these equations can be solved: | |

v1 · · · vk| |

c1

...ck

=

|y|

• Classify the span of v1, . . . , vk? Treat the entries of y as

variables, determine what constraints (if any) on the entries ofy are needed to make the equations consistent.


Two Ways to Describe Subspaces

There are two basic ways to give complete descriptions of a vectorsubspace:

• Give a list of vectors that span the vector subspace; in otherwords, find a collection of vectors which is (1) small enoughto belong to the subspace, and (2) big enough so that theydon’t belong to a smaller subspace.

• Give a complete list of linear constraints (i.e., equations) thatall vectors in the subspace must satisfy.

In both cases, it is important to know what the minimal number ofobjects (either vectors or constraints) must be—there’s not muchpoint in working harder than is necessary.


Linear Dependence

A collection of vectors v1, . . . , vk is said to be linearly dependentwhen it’s possible to write 0 as a linear combination of v1, . . . , vkwhich is nontrivial, i.e.,

c1v1 + · · · ckvk = 0

and not every one of c1, . . . , ck is zero.Linear dependence of a set of vectors means that one or morevectors may be removed without changing their span. Lineardependence means that the system | |

v1 · · · vk| |

c1

...ck

=

0...0

has nontrivial solutions.


Bases

Definition

A basis of a vector space V is a collection of vectors v1, . . . , vkwhich is linearly independent and spans V . In other words,

c1v1 + c2v2 + · · ·+ ckvk = y

has a unique solution for any choice of y ∈ V (i.e., always onesolution, never more than one).

The column vector c1...ck

is called the coordinates of y with respect to the ordered basisv1, . . . , vk .


Bases

• The standard basis of Rn is given by10...0

,

01...0

, . . . ,

00...1

• Bases of the same vector space must always have the same

number of vectors. This is called the dimension of the vectorspace. ANY basis of Rn must have n vectors.

• If the dim. = n, then any list of more than n vectors must belinearly dependent.

• If dim. = n, then any list of fewer than n vectors never spans.• If you know you have the right number of vectors, then linear

independence and spanning go hand-in-hand (i.e., if one istrue then the other is true; if one is false then the other isfalse).


Changing Bases

• Coordinates change when the basis changes; the effect ofchanging bases is encoded in the change of basis matrix.The columns of the change of basis matrix are the coordinatesin the new basis of the old basis vectors.

• Basic Skills:• How do you verify that a given set is a basis of a vector space?• How do you discover a basis for a given vector space?• How do you convert coordinates from one basis to another?


Linear Independence Redux

What are the mechanical steps to check for linear independence ofa list of vectors v1, . . . , vk :

• If they all belong to some vector space V of dimension < k,then stop: they cannot be linearly independent.

• If they belong to Rn or Cn, treat the statement

c1v1 + · · ·+ ckvk = 0

as an equation to solve for unknowns c1, . . . , ck . Solve by rowreduction like normal; if there is only one solution then theyare linearly independent.

• If they belong to an exotic vector space BUT you happen toknow a basis, write everything out in coordinates. Thisreduces the computational problem to the previous case.

• If you do not know a nice basis, e.g. C∞(I ), you have to bemore clever. In this case, one option is to use what is knownas the Wronskian.


Rank-Nullity Theorem

Theorem

For any m × n matrix A,

rank(A) + nullity(A) = n.

In other words, the rank of A plus the dimension of the null spaceof A equals the number of columns.

Remember:

• The rank of the matrix equals the number of nonzero rowsafter row reduction.

• The rank is also be the dimension of the row space.

• The rank is also the dimension of the span of the columns.


Applications of Rank-Nullity

Analysis of the equations Ax = 0 when A is an m × n matrix:

• If the rank of A is n, then the only solution is x = 0.

• Otherwise if r is the rank, then the nullity is n − r , meaningthat all solutions have the form

x = c1x1 + · · ·+ cn−rxn−r

for any linearly-indepdendent set of solutions x1, . . . , xn−r .

Analysis of Ax = b:

• If b is not in the column space, then there is no solution.

• If b is in the column space and rank(A) = n, then there is aunique solution.

• If b is in the column space and rank(A) = r < n, then

x = c1x1 + · · ·+ cn−rxn−r + xp

where x1, . . . , xn−r are linearly-independent solutions ofAx = 0 and xp is any single solution of Ax = b.


Linear Transformations

Definition

A linear transformation is any mapping T : V →W betweenvector spaces V and W over the same field which satisfies

T (u + v) = Tu + Tv and T (cv) = c(Tv)

for any vectors u, v ∈ V and any scalar c .

The canonical example is when T is multiplication by some givenmatrix A, but there are a variety of things which are lineartransformations.


Connection to Matrices

Suppose T is a linear transformation of a vector space V withordered basis B into a vector space W with ordered basis C .

Basic Question

If v ∈ V has coordinates c1...cn

B

what will be the coordinates d1...dm

C

of Tv?


Geometry of Matrix Multiplication

To understand the importance of eigenvectors and eigenvalues(§5.6–8), it is first necessary to understand the geometric meaningof matrix multiplication. If you visit the linkhttp://www.math.hawaii.edu/phototranspage.html, you canexplore the result of applying any 2× 2 matrix you choose to anyimage of your choice.I choose the image http://freezertofield.com/wp-content/

uploads/2011/09/baby-doll-sheep-300x275.jpg (thanks,Google!):


http://www.math.hawaii.edu/phototranspage.html

http://freezertofield.com/wp-content/uploads/2011/09/baby-doll-sheep-300x275.jpg

http://freezertofield.com/wp-content/uploads/2011/09/baby-doll-sheep-300x275.jpg


[x ′

y ′

]=

[0.5 00 1

] [xy

]Left sheep in xy -plane Right sheep in x ′y ′-plane



[x ′

y ′

]=

[1 00 1.8

] [xy




[x ′

y ′

]=

[1 0.70 1

] [xy




[x ′

y ′

]=

[1 0−0.6 1

] [xy




[x ′

y ′

]=

[1.2 0.50.5 1.2

] [xy



Eigenvectors

Eigenvectors and Eigenvalues

For a square matrix A, if there is a (nonzero) column vector E anda number λ so that

AE = λE

then we call E an eigenvector and λ its eigenvalue.

Geometrically, the existence of an eigenvector means that there isa direction which is preserved by the matrix A: in the samedirection as E , the only effect of multiplication by A is to stretch(or shrink, or reflect) by a factor λ.The nicest possible situation is when an n× n matrix has n linearlyindependent eigenvectors. This would mean that the matrix A canbe understood solely in terms of stretchings in each of the variousdirections.In practice, this does not always happen.


Finding the Eigenvalues

Characteristic Equation

If A is an n × n matrix, then the determinant

det(A− λI )

is a polynomial of degree n in terms of λ (called the characteristicpolynomial). The eigenvalues of A must be solutions of theequation

det(A− λI ) = 0.

Why? Because det(A− λI ) = 0 if and only if there is a nonzerocolumn vector E for which (A− λI )E = 0. This means E is aneigenvector with eigenvalue A.

Fact

For triangular matrices, the eigenvalues are just the diagonalentries.


Finding the Eigenvectors

Solving for E

Once you know the roots of the characteristic polynomial, the waythat you find eigenvalues is to solve the system of equations

(A− λj I )X = 0

where you fix λj to be any root. Any solution X is an eigenvector.

Fact

Assuming you’ve solved for the eigenvalues correctly, there willalways be at least one solution of this system (i.e., at least oneeigenvector). Sometimes there will be multiple eigenvectors. Youshould always choose them to be linearly independent (andremember that the number of identically zero rows you find whenrow reducing A− λI will tell you exactly how many eigenvectors tofind).


Points to Remember

Not Enough Eigenvectors

Sometimes an n × n matrix may not have n linearly independenteigenvectors (eigenvectors with different eigenvalues areautomatically linearly independent). This only happens when thecharacteristic polynomial has repeated roots, and even then it onlyhappens some of the time.

Complex Eigenvalues

If there are complex roots to the characteristic equation, you willget complex eigenvalues. If your matrix had all real entries, thencomplex eigenvalues will always appear in conjugate pairs.Moreover, if you find an eigenvector for one of these eigenvalues,you can automatically construct an eigenvector for the conjugateeigenvalue by taking the complex conjugate of the eigenvectoritself.


Diagonalization

Suppose V1, . . . ,Vn are column vectors. Linear combinations ofthese vectors are vectors which can be written in terms of theformula

W = c1V1 + · · ·+ cnVn

for some scalars c1, . . . , cn. We can write the vectors as columnvectors; the formula becomes | |

V1 · · · Vn

| |

c1

...cn

=

|W|

.If V1, . . . ,Vn are linearly independent vectors in n-dimensionalspace, then the matrix formed by concatenating the columnvectors (call it P) will be invertible.


Diagonalization

In short, if W is a column vector and P is a matrix whose columnsare linearly independent vectors V1, . . . ,Vn, then the column vector

P−1W

tells you the unique choice of coefficients c1, . . . , cn necessary tomake W = c1V1 + · · ·+ cnVn.Next, suppose that V1, . . . ,Vn happen to be eigenvectors of amatrix A (with eigenvalues λ1, . . . , λn.

W = c1V1 + · · ·+ cnVn ⇒ AW = c1λ1V1 + · · ·+ cnλnVn.

In other words,

P−1AW =

λ1 0 · · · 0

0 λ2. . .

......

. . .. . . 0

0 · · · 0 λn

P−1W


Diagonalization

We conclude: If A is an n × n matrix which happens to have nlinearly independent eigenvectors, let P be the matrix whosecolumns are those eigenvectors and let D be a diagonal matrixmade of the eigenvalues of A. We conclude P−1AW = DP−1Wfor any column vector W you like. Consequently

A = PDP−1 and D = P−1AP.

We call D the diagonalization of A.Having a diagonalization A means that in some appropriate basis,the matrix A just acts like a coordinate-wise scaling.A sufficient condition to be diagonalizable is when A has ndistinct eigenvalues.High powers of A are easy to compute with diagonalizations:

AN = PDNP−1


Diagonalization

• If A is symmetric, then it may always be diagonalized, and itwill always be true in this case that the eigenvectors can bechosen so that they’re mutually orthogonal.

• Diagonalizations are useful to reduce the complexity ofproblems (since operating with diagonal matrices is generallymuch easier than it would be with more general matrices).

• Applications include classification of conic sections andsolutions of systems of linear ordinary differentialequations.


Diagonalization: Further Examples

• Classification and Transformation of Conic Sections

• Logistics / Planning / Random Walks

• Computing Functions of a Matrix

• Stress Matrices:http://en.wikipedia.org/wiki/Stress_(mechanics)

• Data Analysis: http://en.wikipedia.org/wiki/

Principal_component_analysis

• Moments of Inertia:http://en.wikipedia.org/wiki/Moment_of_inertia

• Quantum Mechanics...


http://en.wikipedia.org/wiki/Stress_(mechanics)

http://en.wikipedia.org/wiki/Principal_component_analysis

http://en.wikipedia.org/wiki/Principal_component_analysis

http://en.wikipedia.org/wiki/Moment_of_inertia

Jordan Forms or When Diagonalization Doesn’t Work

Some matrices / linear transformations cannot be diagonalized.Examples: 1 2 3

0 1 20 0 1

or Differentiation on Pn, n ≥ 1

In this case we can do a next best option and find what is calledthe Jordan Canonical Form.A Jordan Block corresponding to λ is a square matrix with λ ineach diagonal entry, 1 in every entry immediately above thediagonal, and zero elsewhere.


Finding the Jordan Form

For any fixed eigenvalue λ0:

• The number of blocks equals the number of eigenvectors,which equals the nullity of A− λ0I .

• The sum of the sizes of all blocks associated to λ0 equals thepower of (λ− λ0) found in the characteristic polynomial.

Example: If det(A− λI ) = (λ− 1)(λ− 2)3 then we can say that

• A is a 4× 4 matrix

• There is one eigenvector with eigenvalue 1.

• As for eigenvalue 2, there are either• Three eigenvectors• Two eigenvectors: 1× 1 and 2× 2 Jordan blocks• One eigenvector: 3× 3 Jordan block


Some Examples to Think About

Example 1: det(A− λI ) = (λ− 1)(λ− 2)2(λ− 3)3, nullity ofA− 2I is 2 and nullity of A− 3I is also 2.Example 2: det(A− λI ) = (λ− 1)2(λ− 2)2(λ− 3)4, nullity ofA− I , A− 2I , and A− 3I are all 2.


Generalized Eigenvectors

Definition

A vector v 6= 0 is called a generalized eigenvector witheigenvalue λ when (A− λI )pv = 0 for some p.

If (A− λI )pv = 0, then automatically (A− λI )p+1v = 0 and so onfor all higher powers. Thus if v is specified, we generally need toknow what the smallest value of p is that makes the formula true.To build a basis in which the matrix takes its Jordan Form, westart with vectors v such that (A− λI )pv = 0 and p is minimal;then we put the vectors

(A− λI )p−1v , (A− λI )p−2v , . . . , (A− λI )v , v

in that order into the basis.


chapter 4 & 5: vector spaces & linear transformationsgressman/overheads_chap4.pdf ·...

Documents